D as a Better C

Posted on

D was designed from the ground up to interface directly and easily to C, and to a lesser extent C++. This provides access to endless C libraries, the Standard C runtime library, and of course the operating system APIs, which are usually C APIs.

But there’s much more to C than that. There are large and immensely useful programs written in C, such as the Linux operating system and a very large chunk of the programs written for it. While D programs can interface with C libraries, the reverse isn’t true. C programs cannot interface with D ones. It’s not possible (at least not without considerable effort) to compile a couple of D files and link them in to a C program. The trouble is that compiled D files refer to things that only exist in the D runtime library, and linking that in (it’s a bit large) tends to be impractical.

D code also can’t exist in a program unless D controls the main() function, which is how the startup code in the D runtime library is managed. Hence D libraries remain inaccessible to C programs, and chimera programs (a mix of C and D) are not practical. One cannot pragmatically “try out” D by add D modules to an existing C program.

That is, until Better C came along.

It’s been done before, it’s an old idea. Bjarne Stroustrup wrote a paper in 1988 entitled “A Better C“. His early C++ compiler was able to compile C code pretty much unchanged, and then one could start using C++ features here and there as they made sense, all without disturbing the existing investment in C. This was a brilliant strategy, and drove the early success of C++.

A more modern example is Kotlin, which uses a different method. Kotlin syntax is not compatible with Java, but it is fully interoperable with Java, relies on the existing Java libraries, and allows a gradual migration of Java code to Kotlin. Kotlin is indeed a “Better Java”, and this shows in its success.

D as Better C

D takes a radically different approach to making a better C. It is not an extension of C, it is not a superset of C, and does not bring along C’s longstanding issues (such as the preprocessor, array overflows, etc.). D’s solution is to subset the D language, removing or altering features that require the D startup code and runtime library. This is, simply, the charter of the -betterC compiler switch.

Doesn’t removing things from D make it no longer D? That’s a hard question to answer, and it’s really a matter of individual preference. The vast bulk of the core language remains. Certainly the D characteristics that are analogous to C remain. The result is a language somewhere in between C and D, but that is fully upward compatible with D.

Removed Things

Most obviously, the garbage collector is removed, along with the features that depend on the garbage collector. Memory can still be allocated the same way as in C – using malloc() or some custom allocator.

Although C++ classes and COM classes will still work, D polymorphic classes will not, as they rely on the garbage collector.

Exceptions, typeid, static construction/destruction, RAII, and unittests are removed. But it is possible we can find ways to add them back in.

Asserts are altered to call the C runtime library assert fail functions rather than the D runtime library ones.

(This isn’t a complete list, for that see http://dlang.org/dmd-windows.html#switch-betterC.)

Retained Things

More importantly, what remains?

What may be initially most important to C programmers is memory safety in the form of array overflow checking, no more stray pointers into expired stack frames, and guaranteed initialization of locals. This is followed by what is expected in a modern language — modules, function overloading, constructors, member functions, Unicode, nested functions, dynamic closures, Compile Time Function Execution, automated documentation generation, highly advanced metaprogramming, and Design by Introspection.

Footprint

Consider a C program:

#include <stdio.h>

int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

It compiles to:

_main:
push EAX
mov [ESP],offset FLAT:_DATA
call near ptr _printf
xor EAX,EAX
pop ECX
ret

The executable size is 23,068 bytes.

Translate it to D:

import core.stdc.stdio;

extern (C) int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

The executable size is the same, 23,068 bytes. This is unsurprising because the C compiler and D compiler generate the same code, as they share the same code generator. (The equivalent full D program would clock in at 194Kb.) In other words, nothing extra is paid for using D rather than C for the same code.

The Hello World program is a little too trivial. Let’s step up in complexity to the infamous sieve benchmark program:

#include <stdio.h>

/* Eratosthenes Sieve prime number calculation. */

#define true    1
#define false   0
#define size    8190
#define sizepl  8191

char flags[sizepl];

int main() {
    int i, prime, k, count, iter;

    printf ("10 iterations\n");
    for (iter = 1; iter <= 10; iter++) {
        count = 0;
        for (i = 0; i <= size; i++)
            flags[i] = true;
        for (i = 0; i <= size; i++) {
            if (flags[i]) {
                prime = i + i + 3;
                k = i + prime;
                while (k <= size) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf ("\n%d primes", count);
    return 0;
}

Rewriting it in Better C:

import core.stdc.stdio;

extern (C):

__gshared bool[8191] flags;

int main() {
    int count;

    printf("10 iterations\n");
    foreach (iter; 1 .. 11) {
        count = 0;
        flags[] = true;
        foreach (i; 0 .. flags.length) {
            if (flags[i]) {
                const prime = i + i + 3;
                auto k = i + prime;
                while (k < flags.length) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf("%d primes\n", count);
    return 0;
}

It looks much the same, but some things are worthy of note:

  • extern (C): means use the C calling convention.
  • D normally puts static data into thread local storage. C sticks them in global storage. __gshared accomplishes that.
  • foreach is a simpler way of doing for loops over known endpoints.
  • flags[] = true; sets all the elements in flags to true in one go.
  • Using const tells the reader that prime never changes once it is initialized.
  • The types of iter, i, prime and k are inferred, preventing inadvertent type coercion errors.
  • The number of elements in flags is given by flags.length, not some independent variable.

And the last item leads to a very important hidden advantage: accesses to the flags array are bounds checked. No more overflow errors! We didn’t have to do anything
in particular to get that, either.

This is only the beginning of how D as Better C can improve the expressivity, readability, and safety of your existing C programs. For example, D has nested functions, which in my experience work very well at prying goto’s from my cold, dead fingers.

On a more personal note, ever since -betterC started working, I’ve been converting many of my old C programs still in use into D, one function at a time. Doing it one function at a time, and running the test suite after each change, keeps the program in a correctly working state at all times. If the program doesn’t work, I only have one function to look at to see where it went wrong. I don’t particularly care to maintain C programs anymore, and with -betterC there’s no longer any reason to.

The Better C ability of D is available in the 2.076.0 beta: download it and read the changelog.

New D Compiler Release: DMD 2.075.0

Posted on

DMD 2.075.0 was released a few days back. As with every release, the changelog is available so you can browse the list of fixed bugs and new features. 2.075.0 can be fetched from the dlang.org download page, which always makes available the latest DMD release alongside a nightly build.

Notable Changes

Every DMD release brings with it a number of bug fixes, changes, and enhancements. Here are some of the more noteworthy changes in this release.

Two array properties removed

Anyone who does a lot of work with D’s ranges will likely have encountered this little annoyance that arises from the built-in .sort property of arrays.

void main()
{
    import std.algorithm : remove, sort;
    import std.array : array;
    int[] nums = [5, 3, 1, 2, 4];
    nums = nums.sort.remove(2).array;
}

The .sort property has been deprecated for ages, so the above would result in the following error:

sorted.d(6): Deprecation: use std.algorithm.sort instead of .sort property

The workaround would be to add an empty set of parentheses to the sort call. With DMD 2.075.0, this is no longer necessary and the above will compile. Both the .sort and .reverse array properties have finally been removed from the language.

For the uninitiated, D has two features that have proven convenient in the functional pipeline programming style typically used with ranges. One is that parentheses on a function call are optional when there are no parameters. The other is Universal Function Call Syntax (UFCS), which allows a function call to be made using the dot notation on the first argument, so that a function int add(int a, int b) can be called as: 10.add(5).

Each of D’s built-in types comes with a set of built-in properties. Given that the built-in properties are not functions, no parentheses are used to access them. The .sort array property has been around since the early days of D1. At the time, it was rather useful and convenient for anyone who was happy with the default implementation. When D2 came along with the range paradigm, the standard library was given a set of functions that can treat arrays as ranges, opening them up to use with the many range based functions in the std.algorithm package and elsewhere.

With optional parentheses, UFCS, and a range-based function in std.algorithm called sort, conflict was inevitable. Now range-based programmers can put that behind them and take one more pair of parentheses out of their pipelines.

The breaking up of std.datetime

The std.datetime module has had a reputation as the largest module in D’s standard library. Some developers have been known to use it a stress test for their tooling. It was added to the library long before D got the special package module feature, which allows multiple modules in a package to be imported as a single module.

Once package modules were added, Jonathan M. Davis, the original std.datetime developer, found it challenging to split the monolith into multiple modules. Then, at DConf 2017, he could be seen toiling away on his laptop in the conference hall and the hotel lobby. On the final day of the conference, the day of the DConf Hackathon, he announced that std.datetime was now a package. DMD 2.075.0 is the first release where the new module structure is available.

Any existing code using the old module should still compile. However, any static libraries or object files lying around with the old symbols stuffed inside may need to be recompiled.

Colorized compiler messages

This one is missing from the changelog. DMD now has the ability to output colorized messages. The implementation required going through the existing error messages and properly annotating them where appropriate, so there may well be some messages for which the colors are missing. Also, given that this is a brand new feature and people can be picky about their terminal colors, more work will likely be done on this in the future. Perhaps that might include support for customization.

 

Compiler Ddoc documentation online

DMD, though originally written in C++, was converted to D some time ago. Now that more D programmers are able to contribute to the compiler, work has gone into documenting its source using D’s built-in Ddoc syntax. The result is now online, accessible from the sidebar of the existing library reference. A good starting point is the ddmd.mars module.

And more…

The above is a small part of the bigger picture. The bugfix list shows 89 bugs, regressions, and enhancements across the compiler, runtime, standard library, and web site. See the full changelog for the details.

Thanks to everyone who contributed to this release, whether it was by reporting issues, submitting or reviewing pull requests, testing out the beta, or carrying out any of the numerous small tasks that help a new release see the light of day.

Introspection, Introspection Everywhere

Posted on

Prelude: Orem, UT, May 29 2015

Just finished delivering my keynote. Exiting the character, I’m half dead. People say it needs to look easy. Yeah, just get up there and start saying things. Like it’s natural. Spontaneous. Not for me it’s not. Weeks before any public talk, I can only think of how to handle it. What to say. What angles come up. The questions. The little jokes made in real time. Being consistently spontaneous requires so much rehearsal. The irony.

So all I want now is sneak into my hotel room. Replenish the inner introvert after the ultimate act of extroversion. Lay on the bed thinking “What the heck was that?” for the rest of the day. Bit of slalom to get to the door. Almost there. In the hallway, an animated gentleman talks to a few others. He sports a shaved head and a pair of eyebrows that just won’t quit. Stands just by the door, notices me in the corner of his eye, and it’s instantly clear to both of us he’s waiting for me. Still, he delicately gives me the chance to pretend I didn’t notice and walk around him. “Hi, I’m Andrei.” “Liran, co-founder and CTO of Weka.IO. I’m leaving a bit early and before that I wanted to tell you—you should come visit us in Tel Aviv. We’ve been using D for a year now in a large system. I think you’ll like what you see. I might also arrange you a Google tech talk and visits at a couple of universities. Think it over.”

Tel Aviv, May 8 2017

Coming out of the hotel, heat hits like a punch. We’re talking 41 Celsius (before you pull that calculator: 106 Fahrenheit), if you’re lucky to be in the shade. Zohar, software engineer and factotum extraordinaire at Weka, is driving us on the busy streets of Tel Aviv to his employer’s headquarters. A fascinating exotic place, so far away from my neck of the woods.

First, Liran gives me an overview of their system—a large-scale distributed storage based on flash memory. Not my specialty, so I’m desperately searching my mind for trick questions—Information Theory—wait! peeps a lonely neuron who hasn’t fired since 1993—Reed-Solomon and friends, used to know about it. (Loved that class. The professor was quite a character. Wrote anticommunist samizdat poetry before it was cool. Or even legal. True story.) “How do you deal with data corruption when you have such a low write amplification?” “Glad you asked!” (It’s actually me who’s glad. Yay, I didn’t ask a stupid question.) “We use a redundant encoding with error correction properties; you get to choose the trade-off between redundancy and failure tolerance. At any rate, blind data duplication is mathematically a gross thing to do.” I ask a bunch more questions, and clearly these guys did their homework. The numbers look impressive, too—they beat specialized hardware at virtually all metrics. “What’s the trick?” I finally ask. Liran smiles. “I get that all the time. There’s not one trick. There’s a thousand carefully motivated, principled things we do, from the high level math down to the machine code in the drivers. The trick is to do everything right.”

We now need to hop to my first stop of the tour—Tel Aviv University. Liran accompanies me, partly to see his alma mater after twenty years. Small, intimate audience; it’s the regular meeting of their programming languages research group. A few graduate students, a postdoc, and a couple of professors. The talk goes over smoothly. We get to spend five whole minutes on an oddball—what’s the role of the two semicolons in here?

mixin(op ~ "payload;");

It’s tricky. mixin is a statement so it needs a terminator, hence the semicolon at the very end. In turn, mixin takes a string (the concatenation of variable op, which in this case happens to be either "++" or "--", and "payload;") and passes it to the compiler as a statement. Mix this string in the code, so to say. It follows that the string ultimately compiled is "++payload;" or "--payload;". The trick is the generated statement needs its own semicolon. However, within an expression context, mixin is an expression so no more need for the additional semicolon:

auto x = mixin(op ~ "payload"); // mixin is an expression here, no semicolon!

This seems to leave one researcher a bit unhappy, but I point out that all macro systems have their oddities. He agrees and the meeting ends on a cordial note.

The evening ends with dinner and beers with engineers at Weka. The place is called “Truck Deluxe” and it features an actual food truck parked inside the restaurant. We discuss a million things. I am so galvanized I can only sleep for four hours.

May 9 2017

Omg omg OMG. The alarm clock rings at 6:50 AM, and then again at 7 and 7:10, in ever more annoying tones. Just as I’d set it up anticipating the cunning ways of my consciousness to become alert just enough to carefully press the stop—careful, not the snooze—button, before slumbering again. Somewhat to my surprise I make it in time for meeting Liran to depart to Haifa. Technion University is next.

Small meeting again; we start with a handful of folks, but word goes on the grapevine and a few more join during the act. Nice! I pity Liran—he knows the talk by heart by now, including my jokes. I try to make a couple new ones to entertain him a bit, too. It all goes over nicely, and I get great questions. One that keeps on coming is: how do you debug all that compile-time code? To which I quip: “Ever heard of printf-based debugging? Yeah, I thought so. That’s pretty much what you get to do now, in the following form:”

pragma(msg, string_expression);

The expression is evaluated and printed if and only if the compiler actually “goes through” that line, i.e. you can use it to tell which branch in a static if was taken. It is becoming clear, however, that to scale up Design by Introspection more tooling support would be needed.

Back at Weka, as soon as I get parked by a visitor desk, folks simply start showing up to talk with me, one by one or in small groups. Their simple phonetic names—Eyal, Orem, Tomer, Shachar, Maory, Or, …—are a welcome cognitive offload for yours socially inept truly. They suggest improvements to the language, ask me about better ways to do this or that, and show me code that doesn’t work the way it should.

In this faraway place it’s only now, as soon as I see their code, that I feel at home. They get it! These people don’t use the D language like “whatevs”. They don’t even use it as “let’s make this more interesting.” They’re using it as strategic advantage to beat hardware storage companies at their own game, whilst moving unfairly faster than their software storage competitors. The code is as I’d envisioned the D language would be used at its best for high leverage: introspection, introspection everywhere. Compile time everything that can be done at compile time. It’s difficult to find ten lines of code without a static if in there—the magic fork in design space. I run wc -l and comment on the relatively compact code base. They nod approvingly. “We’ve added a bunch of features since last year, yet the code size has stayed within 5%.” I, too, had noticed that highly introspective code has an odd way of feeding upon itself. Ever more behaviors flow through the same lines.

Their questions and comments are intent, focused, so much unlike the stereotypical sterile forum debate. They have no interest to show off, prove me wrong, or win a theoretical argument; all they need is to do good work. It is obvious to all involved that a better language would help them in the way better materials can help architects and builders. Some template-based idioms are awfully slower to compile than others, how can we develop some tooling to inform us about that? Ideas get bounced. Plans emerge. I put a smudgy finger on the screen: “Hmm, I wonder how that’s done in other languages.” The folks around me chuckle. “We couldn’t have done that in any other language.” I decide to take it as a compliment.

A few of us dine at a fancy restaurant (got to sample the local beer—delicious!), where technical discussions go on unabated. These folks are sharp, full of ideas, and intensely enthusiastic. They fully realize my visit there offers the opportunity to shed collective months of toil from their lives and to propel their work faster. I literally haven’t had ten minutes with myself through the day. The only time I get to check email on my phone is literally when I lock myself into (pardon) a restroom stall. Tomorrow my plan is to sleep in, but there’s so much going on, and so much more yet to come, that again I can only sleep a couple of hours.

May 10 2017

Today’s the big day: the Google Campus Tel Aviv talk. We’re looking at over 160 attendees this evening. But before that there’s more talking to people at Weka. Attribute calculus comes on the table. So for example we have @safe, @trusted, and @system with a little algebra: @safe functions can call only @safe and @trusted functions. Any other function may call any function, and inference works on top of everything. That’s how you encapsulate unsafe code in a large system—all nice and dandy. But how about letting users define their own attributes and calculi? For example, a “may switch fibers” attribute such that functions that yield cannot be called naively. Or a “has acquired a lock” attribute. From here to symbolic computation and execution cost estimation the road is short! To be sure, that would complicate the language. But looking at their code it’s clear how it would be mightily helped by such stuff.

I got a lunchtime talk scheduled with all Weka employees in attendance, but I’m relaxed—by this point it’s just a family reunion. Better yet, one in which I get to be the offbeat uncle. My only concern is novelty—everything I preach, these folks have lived for two years already. Shortly before the talk Liran comes to me and asks “Do you think you have a bit more advanced material for us?” Sorry mate, I’m at capacity over here—can’t produce revolutionary material in the next five minutes. Fortunately the talk goes well in the sense that Design by Introspection formalizes and internalizes the many things they’ve already been doing. They appreciate that, and I get a warm reception and a great Q&A session.

As I get to the Google campus in Tel Aviv, late afternoon, the crowds start to gather. This is big! And I feel like I’d kill myself right now! I mentioned I take weeks to prepare a single public appearance, by the end of which I will definitely have burned a few neurons. And I’m fine with that—it’s the way it’s supposed to be. Problem is, this one week packs seven appearances. The hosts offer me coffee before, during, and after the talk. Coffee in Israel is delicious, but I’m getting past the point where caffeine may help.

Before the talk I get to chit chat incognito with a few attendees. One asks whether a celebrity will be speaking. “No, it’s just me.” He’s nice enough to not display disappointment.

There’s something in the air that tells you immediately whether an audience will be welcoming or not so much. Like a smell. And the scent in this room is fabulously festive. These folks are here to have a good time, and all I need to do to keep the party going is, in the words of John Lakos, “show up, babble, and drool.” The talk goes over so well, they aren’t bored even two hours later. Person in the front row seems continuously confused, asks a bunch of questions. Fortunately we get to an understanding before resorting to the dreaded “let’s take this offline.” Best questions come of course from soft-spoken dudes in the last row. As I poke fun at Mozilla’s CheckedInt, I realize Rust is also Mozilla’s and I fear malice by proxy will be alleged. Too late. (Late at night, I double checked. Mozilla’s CheckedInt is just as bad as I remembered. They do a division to test for multiplication overflow. Come on, put a line of assembler in there! Portability is worth a price, just not any price.) My talk ends just in time for me to not die. I’m happy. If you’re exhausted it means it went well.

May 11 2017

Again with the early wake-up, this time to catch a train to Beersheba. Zohar rides with me. The train is busy with folks from all walks of life. I trip over the barrel of a Galil. Its owner—a girl in uniform—exchanges smiles with me.

Two talks are on the roster today, college followed by Ben Gurion University. The first goes well except one detail. It’s so hot and so many people in the room that the AC decides—hey, y’know, I don’t care much for all that Design by Introspection. It’s so hot, folks don’t even bother to protest my increasingly controversial jokes about various languages. I take the risk to give them a coffee break in the middle thus giving them the opportunity to leave discreetly. To my pleasant surprise, they all return to the sauna for part deux.

The AC works great at Ben Gurion University, but here I’m facing a different problem: it’s the dreaded right-after-lunch spot and some people have difficulty staying awake. Somebody gives up and leaves after 20 minutes. Fortunately a handful of enthused (and probably hungry) students and one professor get into it. The professor really likes the possibilities opened by Design by Introspection and loves the whole macro expansion idea. Asks me a bazillion questions after the talk that make it clear he’s been hacking code generation engines for years. Hope to hear back from him.

Just when I think I’m done, there’s one more small event back at Weka. A few entrepreneurs and executives, friends of Weka’s founders, got wind of their successful use of the D language and were curious to have a sit down with me. One researcher, too. I’m well-prepared for a technical discussion: yes, we know how to do safety. We have a solution in the works for applications that don’t want the garbage collector. “So far so good,” told himself the lamb walking into the slaughterhouse.

To my surprise, the concerns these leaders have are entirely nontechnical. It’s all about community building, leadership, availability of libraries and expertise, website, the “first five minutes” experience, and such. These people have clearly have done their homework; they know the main contributors by name along with their roles, and are familiar with the trendy topics and challenges within the D language community.

“Your package distribution system needs ranking,” mentions a CTO to approving nods from the others. “Downloads, stars, activity—all criteria should be available for sorting and filtering. People say github is better than sourceforge because the latter has a bunch of crap. In fact I’m sure github has even more crap than sourceforge, it’s just that you don’t see it because of ranking.”

“Leadership could be better,” mentions a successful serial entrepreneur. “There’s no shortage of great ideas in the community, and engagement should be toward getting work done on those great ideas. Problem is, great ideas are easy to debate against because almost by definition what makes them great is also what takes them off the beaten path. They’re controversial. There’s risk to them. They may even seem impossible. So those get pecked to death. What gets implemented is good ideas, the less controversial ones. But you don’t want to go with the good ideas. You want the great ideas.”

And so it goes for over three hours. My head is buzzing with excitement as Zohar drives us back to the hotel. The ideas. The opportunities. The responsibility. These people at Weka have a lot riding on the D language. It’s not only money—”not that there’s anything wrong with that!”—but also their hopes, dreams, pride of workmanship. The prime of their careers. We’ve all got our work cut out for us.

I tell Zohar no need to wake up early again tomorrow, I’ll just get a cab to the airport. I reckon a pile of backlogged work is waiting for him. He’s relieved. “Man, I’m so glad you said that. I’m totally pooped after this week. I don’t know how you do it.”

Funny thing is, I don’t know either.

Serialization in D

Posted on

Vladimir Panteleev has spent over a decade using and contributing to D. He is the creator and maintainer of DFeed, the software powering the D forums, has made numerous contributions to Phobos, DRuntime, DMD, and the D website, and has created several tools useful for maintaining D software (like Digger and Dustmite).


A few days ago, I saw this blog post by Justin Turpin on the front page of Hacker News:

The Grass is Always Greener – My Struggles with Rust

This was an interesting coincidence in that it occurred during DConf, where I had mentioned serialization in D a few times during my talk. Naturally, I was curious to see how D stands up to this challenge.

The Task

Justin’s blog starts off with the following Python code:

import configparser
config = ConfigParser()
config.read("config.conf")

This is actually very similar to a pattern I use in many of my D programs. For example, DFeed (the software behind forum.dlang.org), has this code for configuring its built-in web server:

struct ListenConfig
{
    string addr;
    ushort port = 80;
}

struct Config
{
    ListenConfig listen;
    string staticDomain = null;
    bool indexable = false;
}
const Config config;

import ae.utils.sini;
shared static this() { config = loadIni!Config("config/web.ini"); }

This is certainly more code than the Python example, but that’s only the case because I declare the configuration as a D type. The loadIni function then accepts the type as a template parameter and returns an instance of it. The strong typing makes it easier to catch typos and other mistakes in the configuration – an unknown field or a non-numeric value where a number is expected will immediately result in an error.

On the last line, the configuration is saved to a global by a static constructor (shared indicates it runs once during program initialization, instead of once per thread). Even though loadIni‘s return type is mutable, D allows the implicit conversion to const because, as it occurs in a static constructor, it is treated as an initialization.

Traits

The Rust code from Justin’s blog is as follows:

#[macro_use]
extern crate serde_derive;
extern crate toml;

#[derive(Deserialize)]
struct MyConfiguration {
  jenkins_host: String,
  jenkins_username: String,
  jenkins_token: String
}

fn gimme_config(some_filename: &str) -> MyConfiguration {
  let mut file = File::open(some_filename).unwrap();
  let mut s = String::new();
  file.read_to_string(&mut s).unwrap();
  let my_config: MyConfiguration = toml::from_str(s).unwrap();
  my_config
}

The first thing that jumps out to me is that the MyConfiguration struct is annotated with #[derive(Deserialize)]. It doesn’t seem optional, either – quoting Justin:

This was something that actually really discouraged me upon learning, but you cannot implement a trait for an object that you did not also create. That’s a significant limitation, and I thought that one of the main reason Rust decided to go with Traits and Structs instead of standard classes and inheritance was for this very reason. This limitation is also relevant when you’re trying to serialize and deserialize objects for external crates, like a MySQL row.

D allows introspecting the fields and methods of any type at compile-time, so serializing third-party types is not an issue. For example (and I’ll borrow a slide from my DConf talk), deserializing one struct field from JSON looks something like this:

string jsonField = parseJsonString(s);
enforce(s.skipOver(":"), ": expected");

bool found;
foreach (i, ref field; v.tupleof)
{
    enum name = __traits(identifier, v.tupleof[i]);
    if (name == jsonField)
    {
        field = jsonParse!(typeof(field))(s);
        found = true;
        break;
    }
}
enforce(found, "Unknown field " ~ jsonField);

Because the foreach aggregate is a tuple (v.tupleof is a tuple of v‘s fields), the loop will be unrolled at compile time. Then, all that’s left to do is compare each struct field with the field name we got from the JSON stream and, if it matches, read it in. This is a minimal example that can be improved e.g. by replacing the if statements with a switch, which allows the compiler to optimize the string comparisons to hash lookups.

That’s not to say D lacks means for adding functionality to existing types. Although D does not have struct inheritance like C++ or struct traits like Rust, it does have:

  • alias this, which makes wrapping types trivial;
  • opDispatch, allowing flexible customization of forwarding;
  • template mixins, which allow easily injecting functionality into your types;
  • finally, there is of course classic OOP inheritance if you use classes.

Ad-lib and Error Handling

It doesn’t always make sense to deserialize to a concrete type, such as when we only know or care about a small part of the schema. D’s standard JSON module, std.json, currently only allows deserializing to a tree of variant-like types (essentially a DOM). For example:

auto config = readText("config.json").parseJSON;
string jenkinsServer = config["jenkins_server"].str;

The code above is the D equivalent of the code erickt posted on Hacker News:

let config: Value = serde::from_reader(file)
    .expect("config has invalid json");

let jenkins_server = config.get("jenkins_server")
    .expect("jenkins_server key not in config")
    .as_str()
    .expect("jenkins_server key is not a string");

As D generally uses exceptions for error handling, the checks that must be done explicitly in the Rust example are taken care of by the JSON library.

Final thoughts

In the discussion thread for Justin’s post, Reddit user SilverWingedSeraph writes:

You’re comparing a systems language to a scripting language. Things are harder in systems programming because you have more control over, in this case, the memory representation of data. This means there is more friction because you have to specify that information.

This struck me as a false dichotomy. There is no reason why a programming language which has the necessary traits to be classifiable as a system programming language can not also provide the convenience of scripting languages to the extent that it makes sense to do so. For example, D provides type inference and variant types for when you don’t care about strong typing, and garbage collection for when you don’t care about object lifetime, but also provides the tools to get down to the bare metal in the parts of the code where performance matters.

For my personal projects, I’ve greatly enjoyed D’s capability of allowing rapidly prototyping a design, then optimizing the performance-critical parts as needed without having to use a different language to do so.

See also

The New CTFE Engine

Posted on

Stefan Koch is the maintainer of sqlite-d, a native D sqlite reader, and has contributed to projects like SDC (the Stupid D Compiler) and vibe.d. He was also responsible for a 10% performance improvement in D’s current CTFE implementation and is currently writing a new CTFE engine, the subject of this post.


For the past nine months, I’ve been working on a project called NewCTFE, a reimplementation of the Compile-Time Function Evaluation (CTFE) feature of the D compiler front-end. CTFE is considered one of the game-changing features of D.

As the name implies, CTFE allows certain functions to be evaluated by the compiler while it is compiling the source code in which the functions are implemented. As long as all arguments to a function are available at compile time and the function is pure (has no side effects), then the function qualifies for CTFE. The compiler will replace the function call with the result.

Since this is an integral part of the language, pure functions can be evaluated anywhere a compile-time constant may go. A simple example can be found in the standard library module, std.uri, where CTFE is used to compute a lookup table. It looks like this:

private immutable ubyte[128] uri_flags = // indexed by character
({

    ubyte[128] uflags;

    // Compile time initialize
    uflags['#'] |= URI_Hash;

    foreach (c; 'A' .. 'Z' + 1)
    {
        uflags[c] |= URI_Alpha;
        uflags[c + 0x20] |= URI_Alpha; // lowercase letters

    }

    foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit;

    foreach (c; ";/?:@&=+$,") uflags[c] |= URI_Reserved;

    foreach (c; "-_.!~*'()") uflags[c] |= URI_Mark;

    return uflags;

})();

Instead of populating the table with magic values, a simple expressive function literal is used. This is much easier to understand and debug than some opaque static array. The ({ starts a function-literal, the }) closes it. The () at the end tells the compiler to immediately evoke that literal such that uri_flags becomes the result of the literal.

Functions are only evaluated at compile time if they need to be. uri_flags in the snippet above is declared in module scope. When a module-scope variable is initialized in this manner, the initializer must be available at compile time. In this case, since the initializer is a function literal, an attempt will be made to perform CTFE. This particular literal has no arguments and is pure, so the attempt succeeds.

For a more in-depth discussion of CTFE, see this article by H. S. Teoh at the D Wiki.

Of course, the same technique can be applied to more complicated problems as well; std.regex, for example, can build a specialized automaton for a regex at compile time using CTFE. However, as soon as std.regex is used with CTFE for non-trivial patterns, compile times can become extremely high (in D everything that takes longer than a second to compile is bloat-ware :)). Eventually, as patterns get more complex, the compiler will run out of memory and probably take the whole system down with it.

The blame for this can be laid at the feet of the current CTFE interpreter’s architecture. It’s an AST interpreter, which means that it interprets the AST while traversing it. To represent the result of interpreted expressions, it uses DMD’s AST node classes. This means that every expression encountered will allocate one or more AST nodes. Within a tight loop, the interpreter can easily generate over 100_000_000 nodes and eat a few gigabytes of RAM. That can exhaust memory quite quickly.

Issue 12844 complains about std.regex taking more than 16GB of RAM. For one pattern. Then there’s issue 6498, which executes a simple 0 to 10_000_000 while loop via CTFE and runs out of memory.

Simply freeing nodes doesn’t fix the problem; we don’t know which nodes to free and enabling the garbage collector makes the whole compiler brutally slow. Luckily there is another approach which doesn’t allocate for every expression encountered. It involves compiling the function to a virtual ISA (Instruction Set Architecture). This virtual ISA, also known as bytecode, is then given to a dedicated interpreter for that ISA (in the case in which a virtual ISA happens to be the same as the ISA of the host, we call it a JIT (Just in Time) interpreter).

The NewCTFE project concerns itself with implementing such a bytecode interpreter. Writing the actual interpreter (a CPU emulator for a virtual CPU/ISA) is reasonably simple. However, compiling code to a virtual ISA is exactly as much work as compiling it to a real ISA (though, a virtual ISA has the added benefit that it can be extended for customized needs, but that makes it harder to do JIT later). That’s why it took a month just to get the first simple examples running on the new CTFE engine, and why slightly more complicated ones still aren’t running even after 9 months of development. At the end of the post, you’ll find an approximate timeline of the work done so far.

I’ll be giving a presentation at DConf 2017, where I’ll discuss my experience implementing the engine and explain some of the technical details, particularly regarding the trade-offs and design choices I’ve made. The current estimation is that the 1.0 goals will not be met by then, but I’ll keep coding away until it’s done.

Those interested in keeping up with my progress can follow my status updates in the D forums. At some point in the future, I will write another article on some of the technical details of the implementation. In the meantime, I hope the following listing does shed some light on how much work it is to implement NewCTFE 🙂

  • May 9th 2016
    Announcement of the plan to improve CTFE.
  • May 27th 2016
    Announcement that work on the new engine has begun.
  • May 28th 2016
    Simple memory management change failed.
  • June 3rd 2016
    Decision to implement a bytecode interpreter.
  • June 30th 2016
    First code (taken from issue 6498) consisting of simple integer arithmetic runs.
  • July 14th 2016
    ASCII string indexing works.
  • July 15th 2016
    Initial struct support
  • Sometime between July and August
    First switches work.
  • August 17th 2016
    Support for the special cases if(__ctfe) and if(!__ctfe)
  • Sometime between August and September
    Ternary expressions are supported
  • September 08th 2016
    First Phobos unit tests pass.
  • September 25th 2016
    Support for returning strings and ternary expressions.
  • October 16th 2016
    First (almost working) version of the LLVM backend.
  • October 30th 2016
    First failed attempts to support function calls.
  • November 01st
    DRuntime unit tests pass for the first time.
  • November 10th 2016
    Failed attempt to implement string concatenation.
  • November 14th 2016
    Array expansion, e.g. assignment to the length property, is supported.
  • November 14th 2016
    Assignment of array indexes is supported.
  • November 18th 2016
    Support for arrays as function parameters.
  • November 19th 2016
    Performance fixes.
  • November 20th 2016
    Fixing the broken while(true) / for (;;) loops; they can now be broken out of 🙂
  • November 25th 2016
    Fixes to goto and switch handling.
  • November 29th 2016
    Fixes to continue and break handling.
  • November 30th 2016
    Initial support for assert
  • December 02nd 2016
    Bailout on void-initialized values (since they can lead to undefined behavior).
  • December 03rd 2016
    Initial support for returning struct literals.
  • December 05th 2016
    Performance fix to the bytecode generator.
  • December 07th 2016
    Fixes to continue and break in for statements (continue must not skip the increment step)
  • December 08th 2016
    Array literals with variables inside are now supported: [1, n, 3]
  • December 08th 2016
    Fixed a bug in switch statements.
  • December 10th 2016
    Fixed a nasty evaluation order bug.
  • December 13th 2016
    Some progress on function calls.
  • December 14th 2016
    Initial support for strings in switches.
  • December 15th 2016
    Assignment of static arrays is now supported.
  • December 17th 2016
    Fixing goto statements (we were ignoring the last goto to any label :)).
  • December 17th 2016
    De-macrofied string-equals.
  • December 20th 2016
    Implement check to guard against dereferencing null pointers (yes… that one was oh so fun).
  • December 22ed 2016
    Initial support for pointers.
  • December 25th 2016
    static immutable variables can now be accessed (yes the result is recomputed … who cares).
  • January 02nd 2017
    First Function calls are supported !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  • January 17th 2017
    Recursive function calls work now 🙂
  • January 23rd 2017
    The interpret3.d unit-test passes.
  • January 24th 2017
    We are green on 64bit!
  • January 25th 2017
    Green on all platforms !!!!! (with blacklisting though)
  • January 26th 2017
    Fixed special case cast(void*) size_t.max (this one cannot go through the normal pointer support, which assumes that you have something valid to dereference).
  • January 26th 2017
    Member function calls are supported!
  • January 31st 2017
    Fixed a bug in switch handling.
  • February 03rd 2017
    Initial function pointer support.
  • Rest of Feburary 2017
    Wild goose chase for issue #17220
  • March 11th 2017
    Initial support for slices.
  • March 15th 2017
    String slicing works.
  • March 18th 2017
    $ in slice expressions is now supported.
  • March 19th 2017
    The concatenation operator (c = a ~ b) works.
  • March 22ed 2017
    Fixed a switch fallthrough bug.