D as a Better C

Posted on

D was designed from the ground up to interface directly and easily to C, and to a lesser extent C++. This provides access to endless C libraries, the Standard C runtime library, and of course the operating system APIs, which are usually C APIs.

But there’s much more to C than that. There are large and immensely useful programs written in C, such as the Linux operating system and a very large chunk of the programs written for it. While D programs can interface with C libraries, the reverse isn’t true. C programs cannot interface with D ones. It’s not possible (at least not without considerable effort) to compile a couple of D files and link them in to a C program. The trouble is that compiled D files refer to things that only exist in the D runtime library, and linking that in (it’s a bit large) tends to be impractical.

D code also can’t exist in a program unless D controls the main() function, which is how the startup code in the D runtime library is managed. Hence D libraries remain inaccessible to C programs, and chimera programs (a mix of C and D) are not practical. One cannot pragmatically “try out” D by add D modules to an existing C program.

That is, until Better C came along.

It’s been done before, it’s an old idea. Bjarne Stroustrup wrote a paper in 1988 entitled “A Better C“. His early C++ compiler was able to compile C code pretty much unchanged, and then one could start using C++ features here and there as they made sense, all without disturbing the existing investment in C. This was a brilliant strategy, and drove the early success of C++.

A more modern example is Kotlin, which uses a different method. Kotlin syntax is not compatible with Java, but it is fully interoperable with Java, relies on the existing Java libraries, and allows a gradual migration of Java code to Kotlin. Kotlin is indeed a “Better Java”, and this shows in its success.

D as Better C

D takes a radically different approach to making a better C. It is not an extension of C, it is not a superset of C, and does not bring along C’s longstanding issues (such as the preprocessor, array overflows, etc.). D’s solution is to subset the D language, removing or altering features that require the D startup code and runtime library. This is, simply, the charter of the -betterC compiler switch.

Doesn’t removing things from D make it no longer D? That’s a hard question to answer, and it’s really a matter of individual preference. The vast bulk of the core language remains. Certainly the D characteristics that are analogous to C remain. The result is a language somewhere in between C and D, but that is fully upward compatible with D.

Removed Things

Most obviously, the garbage collector is removed, along with the features that depend on the garbage collector. Memory can still be allocated the same way as in C – using malloc() or some custom allocator.

Although C++ classes and COM classes will still work, D polymorphic classes will not, as they rely on the garbage collector.

Exceptions, typeid, static construction/destruction, RAII, and unittests are removed. But it is possible we can find ways to add them back in.

Asserts are altered to call the C runtime library assert fail functions rather than the D runtime library ones.

(This isn’t a complete list, for that see http://dlang.org/dmd-windows.html#switch-betterC.)

Retained Things

More importantly, what remains?

What may be initially most important to C programmers is memory safety in the form of array overflow checking, no more stray pointers into expired stack frames, and guaranteed initialization of locals. This is followed by what is expected in a modern language — modules, function overloading, constructors, member functions, Unicode, nested functions, dynamic closures, Compile Time Function Execution, automated documentation generation, highly advanced metaprogramming, and Design by Introspection.

Footprint

Consider a C program:

#include <stdio.h>

int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

It compiles to:

_main:
push EAX
mov [ESP],offset FLAT:_DATA
call near ptr _printf
xor EAX,EAX
pop ECX
ret

The executable size is 23,068 bytes.

Translate it to D:

import core.stdc.stdio;

extern (C) int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

The executable size is the same, 23,068 bytes. This is unsurprising because the C compiler and D compiler generate the same code, as they share the same code generator. (The equivalent full D program would clock in at 194Kb.) In other words, nothing extra is paid for using D rather than C for the same code.

The Hello World program is a little too trivial. Let’s step up in complexity to the infamous sieve benchmark program:

#include <stdio.h>

/* Eratosthenes Sieve prime number calculation. */

#define true    1
#define false   0
#define size    8190
#define sizepl  8191

char flags[sizepl];

int main() {
    int i, prime, k, count, iter;

    printf ("10 iterations\n");
    for (iter = 1; iter <= 10; iter++) {
        count = 0;
        for (i = 0; i <= size; i++)
            flags[i] = true;
        for (i = 0; i <= size; i++) {
            if (flags[i]) {
                prime = i + i + 3;
                k = i + prime;
                while (k <= size) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf ("\n%d primes", count);
    return 0;
}

Rewriting it in Better C:

import core.stdc.stdio;

extern (C):

__gshared bool[8191] flags;

int main() {
    int count;

    printf("10 iterations\n");
    foreach (iter; 1 .. 11) {
        count = 0;
        flags[] = true;
        foreach (i; 0 .. flags.length) {
            if (flags[i]) {
                const prime = i + i + 3;
                auto k = i + prime;
                while (k < flags.length) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf("%d primes\n", count);
    return 0;
}

It looks much the same, but some things are worthy of note:

  • extern (C): means use the C calling convention.
  • D normally puts static data into thread local storage. C sticks them in global storage. __gshared accomplishes that.
  • foreach is a simpler way of doing for loops over known endpoints.
  • flags[] = true; sets all the elements in flags to true in one go.
  • Using const tells the reader that prime never changes once it is initialized.
  • The types of iter, i, prime and k are inferred, preventing inadvertent type coercion errors.
  • The number of elements in flags is given by flags.length, not some independent variable.

And the last item leads to a very important hidden advantage: accesses to the flags array are bounds checked. No more overflow errors! We didn’t have to do anything
in particular to get that, either.

This is only the beginning of how D as Better C can improve the expressivity, readability, and safety of your existing C programs. For example, D has nested functions, which in my experience work very well at prying goto’s from my cold, dead fingers.

On a more personal note, ever since -betterC started working, I’ve been converting many of my old C programs still in use into D, one function at a time. Doing it one function at a time, and running the test suite after each change, keeps the program in a correctly working state at all times. If the program doesn’t work, I only have one function to look at to see where it went wrong. I don’t particularly care to maintain C programs anymore, and with -betterC there’s no longer any reason to.

The Better C ability of D is available in the 2.076.0 beta: download it and read the changelog.

27 thoughts on “D as a Better C”

  1. “Exceptions, typeid, static construction/destruction, RAII, and unittests are removed. ”
    RAII?
    how about structs with “~this” ?

    1. Oh and ” static construction/destruction”
      this means no static this() ?
      This w´ll definitly miss if going into this betterC lands.

      1. Yes, that is what it means. The D runtime startup executes the static ctors on program and thread startup (and static dtors on shutdown).

        However, you can still have static initialization of globals, so it’s possible to use lazy initialization to achieve a similar result. Andrei has been migrating many of the static ctors in druntime/phobos to this mechanism to allow better usage of the standard libraries with betterC.

  2. I really like that the D project is focusing on the usability of the toolchain as much as the language itself. It makes a very refreshing change from Rust, where the language is nice but the toolchain is a joke.

    1. Dear Jeff S.,

      I agree about D’s toolchain being usable. It is the most usable I have ever used.

      But I do not agree with your assessment of Rust. It is not a “nice” language.

      Rust is a complicated and convoluted language. It makes tradeoffs that harm productivity in the name of improving productivity!

      D as a Better C is what Rust should have been.

      Even better, D as a Better C gives you an upgrade path to normal D. That is exactly what we need when our small project is successful, and needs things like RAII and proper OO and all of the other goodies that D gives us.

      D as a Better C means that Rust is no longer necessary. If you need what Rust pretends to offer, then you can now start with D as a Better C instead.

      Instead of rewriting Rust code in C++ when it outgrows Rust, we could just take our D as a Better C code and start using more D idioms.

      Yours sincerely,
      Jeff

    2. What do you mean? `cargo` is a pretty awesome part of the Rust toolchain…I’m honestly curious to hear your concerns. 🙂

      1. For starters, how about the fact that Rust is billed as a systems programming language but it’s standard library calls abort() on malloc failure…

        Whenever this is mentioned to the core team or any Rust religious fanatic, at best they “umm” and “ahh” and then tell you that’s probably what you want or worse, start explaining to you that “on most systems, malloc never fails” (which is a laughable statement).

        It seems like the Californian airheads working on Rust think that “systems programming” means “anything with less than 32GB of RAM”.

  3. I dislike, the way better C is done. So large existing programs can’t really interface with C without removing all those D goodies huh?

    1. Swoorup Joshi

      Simply disliking it is besides the point. Those things are removed because they’re either impossible or impractical to retain in an interoperable way.

      If you dislike betterC then just don’t use it.

  4. Out of curiosity: Given C++ interoperates with C on pretty much every platform, and C++ can provide many of these features (particularly RAII and calling static initializers), and Objective-C offers ARC as a slightly less convenient alternative to Garbage Collection, are there any plans to restore these missing D features on top of these (or similar) mechanisms?

  5. Dear Walter,

    It is things like this that always bring me back to D for any new code I write.

    D isn’t there to try to force an obscure, inconsistent, quasi-religious ideology on me, like Go so often seems to do.

    D doesn’t lose sight of productivity and ease-of-use, like Rust so often seems to do.

    D doesn’t keep growing larger and larger, to the point where I just can’t remember it all, like has happened to C++, Java and C#.

    Here we can see the true power of D. Taking features away from it makes it even more powerful in certain ways!

    I, too, will start converting my C programs to D using this method. Not only is D now the perfect language for my new code, it is also the perfect language for my old code, too!

    My mind is currently blown, but in the best way possible. I have seen enlightenment, and it is D.

    Yours in admiration,
    Jeff

  6. Stupid q – shouldn’t it be 0 .. flags.length – 1 ? (If your aim is ‘how to do D’? Unless doing it that way is the whole point.)

  7. Well..

    The main problem with D is the same problem with C++. They push these not working features and new and new features which make the code really cryptic. If you go check Linux Kernel, you can see how beautiful C can be. It can be really easy to read. When making a D or C++ program, then people tend to use huge amount of stuff in the code and the reading speed drops.

    The reading speed is the most important thing productivity has. If you need to read a lot then there is problem. C can be written that way too but usually when you don’t have these fancy features and well documented code, then the code is productive. It is not new user friendly but that can be tackled with training and leadership. We use C because of this. When everyone writes same C without any fancy stuff we produce fast with less bugs and memory errors. If we would change to C it would make the training harder and the method to write code more varied. That will make the code more cryptic (programmer has too many ways to do same thing) and therefore there will be more bugs and we would need more documentation, comments and reviews, and still we would have more bugs.

    Writing a secure code doesn’t get more secure with features, it gets more secure with less features and more reviews (and tests and documentations). Writing code fast can be done with a good planning and by forcing a certain code rules. This works with all the situation except WEB frontend and mobile.

    We will stick with C. We don’t go GOLANG (they mess the language by adding nonsense features) or change to RUST (oh god, the productivity drops) or C++ (#1 enemy of programming) or any class based (bug hell). We use C and in internal tooling the python3 (in backend) and it will keep that way until there is no programming needed anymore (except if there is a better solution to python). If C code feels like crap, then it is management/ software productivity problem. You can code pointers but doesn’t understand the metaprogramming.

    – From Automotive Industry Software specialist / manager

    1. I understand that important factors for you are ease of reading, fast development, fewer bugs, and better documentation. I agree with you on all points.

      D can help with all these. For example, code is much easier to read if it is clear what a function’s inputs and outputs are. D offers annotations, like `pure`, which help with this. `pure` functions cannot modify global state, so there’s no need to manually check that – the inputs and outputs will all be in the function signature.

      Being easier to read helps with fast development, of course.

      There are a number of aspects of C that are known to be the source of lots of errors. Beating them back in C requires manual checking and review of the code. D makes many of them impossible, saving your team time and money. More information: https://www.slideshare.net/dcacm/patterns-of-human-error

      Better documentation – D has a builtin documentation generator, Ddoc. For us, Ddoc was completely transformative in improving the quantity and quality of the code documentation. Clearly, management of teams is important to you, and Ddoc makes it easier to enforce minimal documentation standards.

  8. After reading this post, and thinking again about it, I thought about the following (killer ?) use case: just securing __existing__ C programs. It would be all about converting existing C programs into D programs to take, for example, advantage of buffers with bounds checks.

    While thinking about it, and while not knowing enough about D and even more the subset “a better C”, I wonder about the following points:

    * are C programs D valid programs too ?
    * if not, what are the minimal modification steps to operate to make an existing C program a D program ?

    * to take benefits from XXX D’s feature (interesting from security point of view), what is the transformation step to operate over an D program (from C ) ? <=== I mean for each XXX feature worthwhile to use.

    Well, is there already an existing document presenting those points ? If not, I think (IMHO) it may be worthwhile to have such doc to increase D interest. / visibility.

        1. Replace the right arrow -> in C programs with periods. As for the rest, the easiest way is to simply start compiling the code with the D compiler, and fix the errors as the compiler spits them out.

  9. Is there an article on BetterC mode somewhere that details which issues necessitated the removal of features? Given the existence of things like functions called when a binary is loaded etc. on most platforms at the image level, it seems impossible that there would be no way to create a C-callable D that retains all its features.

    Certainly, this is a promising first step, but as long as you don’t pass an object out to C (a point which could be clearly marked), the garbage collector should still be usable.

  10. I think is a good idea on one next blog entry, to show an tutorial/examples on binding c/c++ libs with D (betterC). (but RAII must be working till there!)

  11. I quite like this article, but a couple of the points in the sieve example seem a little contrived/disingenuous:

    flags[] = true; sets all the elements in flags to true in one go.

    Idiomatic C would probably do:

    memset(flags, 0x1, sizeof(flags))

    I guess it would be cool if betterC could do array initialisation where the type is aggregate (maybe it already can)?

    Using const tells the reader that prime never changes once it is initialized.

    C can use const in the same place, I’m not sure why the C code declares all the variables up top?

Leave a Reply

Your email address will not be published. Required fields are marked *