Life in the Fast Lane

The first post I wrote in the GC series introduced the D garbage collector and the language features that use it. Two key points that I tried to get across in the article were:

  1. The GC can only run when memory allocations are requested. Contrary to popular misconception, the D GC isn’t generally going to decide to pause your Minecraft clone in the middle of the hot path. It will only run when memory from the GC heap is requested, and then only if it needs to.
  2. Simple C and C++ allocation strategies can mitigate GC pressure. Don’t allocate memory in inner loops – preallocate as much as possible, or fetch it from the stack instead. Minimize the total number of heap allocations. These strategies work because of point #1 above. The programmer can dictate when it is possible for a collection to occur simply by being smart about when GC heap allocations are made.

The strategies in point #2 are fine for code that a programmer writes herself, but they aren’t going to help at all with third-party libraries. For those situations, D provides built-in mechanisms to guarantee that no GC allocations can occur, both in the language and the runtime. There are also command-line options that can help make sure the GC stays out of the way.

Let’s imagine a hypothetical programmer named J.P. who, for reasons he considers valid, has decided he would like to avoid garbage collection completely in his D program. He has two immediate options.

The GC chill pill

One option is to make a call to GC.disable when the program is starting up. This doesn’t stop allocations, but puts a hold on collections. That means all collections, including any that may result from allocations in other threads.

void main() {
    import core.memory;
    import std.stdio;
    GC.disable;
    writeln("Goodbye, GC!");
}

Output:

Goodbye, GC!

This has the benefit that all language features making use of the GC heap will still work as expected. But, considering that allocations are still going without any cleanup, when you do the math you’ll realize this might be problematic. If allocations start to get out of hand, something’s gotta give. From the documentation:

Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

Depending on J.P.’s perspective, this might not be a good thing. But if this constraint is acceptable, there are some additional steps that can help keep things under control. J.P. can make calls to GC.enable or GC.collect as necessary. This provides greater control over collection cycles than the simple C and C++ allocation strategies.

The GC wall

When the GC is simply intolerable, J.P. can turn to the @nogc attribute. Slap it at the front of the main function and thou shalt suffer no collections.

@nogc
void main() { ... }

This is the ultimate GC mitigation strategy. @nogc applied to main will guarantee that the garbage collector will never run anywhere further along the callstack. No more caveats about collecting “where the implementation deems necessary”.

At first blush, this may appear to be a much better option than GC.disable. Let’s try it out.

@nogc
void main() {
    import std.stdio;
    writeln("GC be gone!");
}

This time, we aren’t going to get past compilation:

Error: @nogc function 'D main' cannot call non-@nogc function 'std.stdio.writeln!string.writeln'

What makes @nogc tick is the compiler’s ability to enforce it. It’s a very blunt approach. If a function is annotated with @nogc, then any function called from inside it must also be annotated with @nogc. As may be obvious, writeln is not.

That’s not all:

@nogc 
void main() {
    auto ints = new int[](100);
}

The compiler isn’t going to let you get away with that one either.

Error: cannot use 'new' in @nogc function 'D main'

Any language feature that allocates from the GC heap is out of reach inside a function marked @nogc (refer to the first post in this series for an overview of those features). It’s turtles all the way down. The big benefit here is that it guarantees that third-party code can’t use those features either, so can’t be allocating GC memory behind your back. Another downside is that any third-party library that is not @nogc aware is not going to be available in your program.

Using this approach requires a number of workarounds to make up for non-@nogc language features and library functions, including several in the standard library. Some are trivial, some are not, and others can’t be worked around at all (we’ll dive into the details in a future post). One example that might not be obvious is throwing an exception. The idiomatic way is:

throw new Exception("Blah");

Because of the new in that line, this isn’t possible in @nogc functions. Getting around this requires preallocating any exceptions that will be thrown, which in turn runs into the issue that any exception memory allocated from the regular heap still needs to be deallocated, which leads to ideas of reference counting or stack allocation… In other words, it’s a big can of worms. There’s currently a D Improvement Proposal from Walter Bright intended to stuff all the worms back into the can by making throw new Exception work without the GC when it needs to.

It’s not an insurmountable task to get around the limitations of @nogc main, it just requires a good bit of motivation and dedication.

One more thing to note about @nogc main is that it doesn’t banish the GC from the program completely. D has support for static constructors and destructors. The former are executed by the runtime before entering main and the latter upon exiting. If any of these exist in the program and are not annotated with @nogc, then GC allocations and collections can technically be present in the program. Still, @nogc applied to main means there won’t be any collections running once main is entered, so it’s effectively the same as having no GC at all.

Working it out

Here’s where I’m going to offer an opinion. There’s a wide range of programs that can be written in D without disabling or cutting the GC off completely. The simple strategies of minimizing GC allocations and keeping them out of the hot path will get a lot of mileage and should be preferred. It can’t be repeated enough given how often it’s misunderstood: D’s GC will only have a chance to run when the programmer allocates GC memory and it will only run if it needs to. Use that knowledge to your advantage by keeping the allocations small, infrequent, and isolated outside your inner loops.

For those programs where more control is actually needed, it probably isn’t going to be necessary to avoid the GC entirely. Judicious use of @nogc and/or the core.memory.GC API can often serve to avoid any performance issues that may arise. Don’t put @nogc on main, put it on the functions where you really want to disallow GC allocations. Don’t call GC.disable at the beginning of the program. Call it instead before entering a critical path, then call GC.enable when leaving that path. Force collections at strategic points, such as between game levels, with GC.collect.

As with any performance tuning strategy in software development, it pays to understand as fully as possible what’s actually happening under the hood. Adding calls to the core.memory.GC API in places where you think they make sense could potentially make the GC do needless work, or have no impact at all. Better understanding can be achieved with a little help from the toolchain.

The DRuntime GC option --DRT-gcopt=profile:1 can be passed to a compiled program (not to the compiler!) for some tune-up assistance. This will report some useful GC profiling data, such as the total number of collections and the total collection time.

To demonstrate, gcstat.d appends twenty values to a dynamic array of integers.

void main() {
    import std.stdio;
    int[] ints;
    foreach(i; 0 .. 20) {
        ints ~= i;
    }
    writeln(ints);
}

Compiling and running with the GC profile switch:

dmd gcstat.d
gcstat --DRT-gcopt=profile:1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
        Number of collections:  1
        Total GC prep time:  0 milliseconds
        Total mark time:  0 milliseconds
        Total sweep time:  0 milliseconds
        Total page recovery time:  0 milliseconds
        Max Pause Time:  0 milliseconds
        Grand total GC time:  0 milliseconds
GC summary:    1 MB,    1 GC    0 ms, Pauses    0 ms <    0 ms

This reports one collection, which almost certainly happened as the program was shutting down. The runtime terminates the GC as it exits which, in the current implementation, will generally trigger a collection. This is done primarily to run destructors on collected objects, even though D does not require destructors of GC-allocated objects to ever be run (a topic for a future post).

DMD supports a command-line option, -vgc, that will display every GC allocation in a program, including those that are hidden behind language features like the array append operator.

To demonstrate, take a look at inner.d:

void printInts(int[] delegate() dg)
{
    import std.stdio;
    foreach(i; dg()) writeln(i);
} 

void main() {
    int[] ints;
    auto makeInts() {
        foreach(i; 0 .. 20) {
            ints ~= i;
        }
        return ints;
    }

    printInts(&makeInts);
}

Here, makeInts is an inner function. A pointer to a non-static inner function is not a function pointer, but a delegate (a context pointer/function pointer pair; if an inner function is static, a pointer of type function is produced instead). In this particular case, the delegate makes use of a variable in its parent scope. Here’s the output of compiling with -vgc:

dmd -vgc inner.d
inner.d(11): vgc: operator ~= may cause GC allocation
inner.d(7): vgc: using closure causes GC allocation

What we’re seeing here is that memory needs to be allocated so that the delegate can carry the state of ints, making it a closure (which is not itself a type – the type is still delegate). Move the declaration of ints inside the scope of makeInts and recompile. You’ll find that the closure allocation goes away. A better option is to change the declaration of printInts to look like this:

void printInts(scope int[] delegate() dg)

Adding scope to any function parameter ensures that any references in the parameter cannot be escaped. In other words, it now becomes impossible to do something like assign dg to a global variable, or return it from the function. The effect is that there is no longer a need to create a closure, so there will be no allocation. See the documentation for more on function pointers, delegates and closures, and function parameter storage classes.

The gist

Given that the D GC is very different from those in languages like Java and C#, it’s certain to have different performance characteristics. Moreover, D programs tend to produce far less garbage than those written in a language like Java, where almost everything is a reference type. It helps to understand this when embarking on a D project for the first time. The strategies an experienced Java programmer uses to mitigate the impact of collections aren’t likley to apply here.

While there is certainly a class of software in which no GC pauses are ever acceptable, that is an arguably small set. Most D projects can, and should, start out with the simple mitigation strategies from point #2 at the top of this article, then adapt the code to use @nogc or core.memory.GC as and when performance dictates. The command-line options demonstrated here can help ferret out the areas where that may be necessary.

As time goes by, it’s going to become easier to micromanage garbage collection in D programs. There’s a concerted effort underway to make Phobos, D’s standard library, as @nogc-friendly as possible. Language improvements such as Walter’s proposal to modify how exceptions are allocated should speed that work considerably.

Future posts in this series will look at how to allocate memory outside of the GC heap and use it alongside GC allocations in the same program, how to compensate for disabled language features in @nogc code, strategies for handling the interaction of the GC with object destructors, and more.

Thanks to Vladimir Panteleev, Guillaume Piolat, and Steven Schveighoffer for their valuable feedback on drafts of this article.

The article has been amended to remove a misleading line about Java and C#, and to add some information about multiple threads.

Compile-Time Sort in D

Björn Fahller recently wrote a blog post showing how to implement a compile-time quicksort in C++17. It’s a skillful demonstration that employs the evolving C++ feature set to write code that, while not quite concise, is more streamlined than previous iterations. He concludes with, “…the usefulness of this is very limited, but it is kind of cool, isn’t it?”

There’s quite a bit of usefulness to be found in evaluating code during compilation. The coolness (of which there is much) arises from the possibilities that come along with it. Starting from Björn’s example, this post sets out to teach a few interesting aspects of compile-time evaluation in the D programming language.

The article came to my attention from Russel Winder’s provocative query in the D forums, “Surely D can do better”, which was quickly resolved with a “No Story”-style answer by Andrei Alexandrescu. “There is nothing to do really. Just use standard library sort,” he quipped, and followed with code:

Example 1

void main() {
    import std.algorithm, std.stdio;
    enum a = [ 3, 1, 2, 4, 0 ];
    static b = sort(a);
    writeln(b); // [0, 1, 2, 3, 4]
}

Though it probably won’t be obvious to those unfamiliar with D, the call to sort really is happening at compile time. Let’s see why.

Compile-time code is runtime code

It’s true. There are no hurdles to jump over to get things running at compile time in D. Any compile-time function is also a runtime function and can be executed in either context. However, not all runtime functions qualify for CTFE (Compile-Time Function Evaluation).

The fundamental requirements for CTFE eligibility are that a function must be portable, free of side effects, contain no inline assembly, and the source code must be available. Beyond that, the only thing deciding whether a function is evaluated during compilation vs. at run time is the context in which it’s called.

The CTFE Documentation includes the following statement:

In order to be executed at compile time, the function must appear in a context where it must be so executed…

It then lists a few examples of where that is true. What it boils down to is this: if a function can be executed in a compile-time context where it must be, then it will be. When it can’t be excecuted (it doesn’t meet the CTFE requirements, for example), the compiler will emit an error.

Breaking down the compile-time sort

Take a look once more at Example 1.

void main() {
    import std.algorithm, std.stdio;
    enum a = [ 3, 1, 2, 4, 0 ];
    static b = sort(a);
    writeln(b);
}

The points of interest that enable the CTFE magic here are lines 3 and 4.

The enum in line 3 is a manifest constant. It differs from other constants in D (those marked immutable or const) in that it exists only at compile time. Any attempt to take its address is an error. If it’s never used, then its value will never appear in the code.

When an enum is used, the compiler essentially pastes its value in place of the symbol name.

enum xinit = 10;
int x = xinit;

immutable yinit = 11;
int y = yinit;

Here, x is initialized to the literal 10. It’s identical to writing int x = 10. The constant yinit is initialized with an int literal, but y is initialized with the value of yinit, which, though known at compile time, is not a literal itself. yinit will exist at run time, but xinit will not.

In Example 1, the static variable b is initialized with the manifest constant a. In the CTFE documentation, this is listed as an example scenario in which a function must be evaluated during compilation. A static variable declared in a function can only be initialized with a compile-time value. Trying to compile this:

Example 2

void main() {
    int x = 10;
    static y = x;
}

Will result in this:

Error: variable x cannot be read at compile time

Using a function call to initialize a static variable means the function must be executed at compile time and, therefore, it will be if it qualifies.

Those two pieces of the puzzle, the manifest constant and the static initializer, explain why the call to sort in Example 1 happens at compile time without any metaprogramming contortions. In fact, the example could be made one line shorter:

Example 3

void main() {
    import std.algorithm, std.stdio;
    static b = sort([ 3, 1, 2, 4, 0 ]);
    writeln(b);
}

And if there’s no need for b to stick around at run time, it could be made an enum instead of a static variable:

Example 4

void main() {
    import std.algorithm, std.stdio;
    enum b = sort([ 3, 1, 2, 4, 0 ]);
    writeln(b);
}

In both cases, the call to sort will happen at compile time, but they handle the result differently. Consider that, due to the nature of enums, the change will produce an equivalent of this: writeln([ 0, 1, 2, 3, 4 ]). Because the call to writeln happens at run time, the array literal might trigger a GC allocation (though it could be, and sometimes will be, optimized away). With the static initializer, there is no runtime allocation, as the result of the function call is used at compile time to initialize the variable.

It’s worth noting that sort isn’t directly returning a value of type int[]. Take a peek at the documentation and you’ll discover that what it’s giving back is a SortedRange. Specifically in our usage, it’s a SortedRange!(int[], "a < b"). This type, like arrays in D, exposes all of the primitives of a random-access range, but additionally provides functions that only work on sorted ranges and can take advantage of their ordering (e.g. trisect). The array is still in there, but wrapped in an enhanced API.

To CTFE or not to CTFE

I mentioned above that all compile-time functions are also runtime functions. Sometimes, it's useful to distinguish between the two inside the function itself. D allows you to do that with the __ctfe variable. Here's an example from my book, 'Learning D'.

Example 5

string genDebugMsg(string msg) {
    if(__ctfe)
        return "CTFE_" ~ msg;
    else
        return "DBG_" ~ msg;
}

pragma(msg, genDebugMsg("Running at compile-time."));
void main() {
    writeln(genDebugMsg("Running at runtime."));
}

The msg pragma prints a message to stderr at compile time. When genDebugMsg is called as its second argument here, then inside that function the variable __ctfe will be true. When the function is then called as an argument to writeln, which happens in a runtime context, __ctfe is false.

It's important to note that __ctfe is not a compile-time value. No function knows if it's being executed at compile-time or at run time. In the former case, it's being evaluated by an interpreter that runs inside the compiler. Even then, we can make a distinction between compile-time and runtime values inside the function itself. The result of the function, however, will be a compile-time value when it's executed at compile time.

Complex compile-time validation

Now let's look at something that doesn't use an out-of-the-box function from the standard library.

A few years back, Andrei published 'The D Programming Language'. In the section describing CTFE, he implemented three functions that could be used to validate the parameters passed to a hypothetical linear congruential generator. The idea is that the parameters must meet a set of criteria, which he lays out in the book (buy it for the commentary -- it's well worth it), for generating the largest period possible. Here they are, minus the unit tests:

Example 6

// Implementation of Euclid’s algorithm
ulong gcd(ulong a, ulong b) { 
    while (b) {
        auto t = b; b = a % b; a = t;
    }
    return a; 
}

ulong primeFactorsOnly(ulong n) {
    ulong accum = 1;
    ulong iter = 2;
    for (; n >= iter * iter; iter += 2 - (iter == 2)) {
        if (n % iter) continue;
        accum *= iter;
        do n /= iter; while (n % iter == 0);
    }
    return accum * n;
}

bool properLinearCongruentialParameters(ulong m, ulong a, ulong c) { 
    // Bounds checking
    if (m == 0 || a == 0 || a >= m || c == 0 || c >= m) return false; 
    // c and m are relatively prime
    if (gcd(c, m) != 1) return false;
    // a - 1 is divisible by all prime factors of m
    if ((a - 1) % primeFactorsOnly(m)) return false;
    // If a - 1 is multiple of 4, then m is a multiple of 4, too. 
    if ((a - 1) % 4 == 0 && m % 4) return false;
    // Passed all tests
    return true;
}

The key point this code was intended to make is the same one I made earlier in this post: properLinearCongruentialParameters is a function that can be used in both a compile-time context and a runtime context. There's no special syntax required to make it work, no need to create two distinct versions.

Want to implement a linear congruential generator as a templated struct with the RNG parameters passed as template arguments? Use properLinearCongruentialParameters to validate the parameters. Want to implement a version that accepts the arguments at run time? properLinearCongruentialParameters has got you covered. Want to implement an RNG that can be used at both compile time and run time? You get the picture.

For completeness, here's an example of validating parameters in both contexts.

Example 7

void main() {
    enum ulong m = 1UL << 32, a = 1664525, c = 1013904223;
    static ctVal = properLinearCongruentialParameters(m, a, c);
    writeln(properLinearCongruentialParameters(m, a, c));
}

If you've been paying attention, you'll know that ctVal must be initialized at compile time, so it forces CTFE on the call to the function. And the call to the same function as an argument to writeln happens at run time. You can have your cake and eat it, too.

Conclusion

Compile-Time Function Evaluation in D is both convenient and painless. It can be combined with other features such as templates (it's particularly useful with template parameters and constraints), string mixins and import expressions to simplify what might otherwise be extremely complex code, some of which wouldn't even be possible in many languages without a preprocessor. As a bonus, Stefan Koch is currently working on a more performant CTFE engine for the D frontend to make it even more convenient. Stay tuned here for more news on that front.

Thanks to the multiple members of the D community who reviewed this article.

Project Highlight: excel-d

Ever had the need to write an Excel plugin? Check this out.

Atila Neves opened his lightning talk at DConf 2017 like this:

I’m going to talk about how you can write Excel add-ins in D. Don’t ask me why. It’s just because people need it.

From there he goes into a quick intro on how to write plugins for Excel and gives a taste of what it looks like to register a single function in a C++ implementation:

Excel12f(
    xlfRegister, NULL, 11,          // 11: Number of args
    &xDLL,                          // name of the DLL
    TempStr12(L"Fibonacci"),        // procedure name
    TempStr12(L"UU"),               // type signature
    TempStr12(L"Compute to..."),    // argument text
    TempStr12(L"1"),                // macro type
    TempStr12(L"Generic Add-In"),   // category
    TempStr12(L""),                 // shortcut text
    TempStr12(L""),                 // help topic
    TempStr12(L"Number to compute to"), // function help
    TempStr12(L"Computes the nth Fibonacci number") // arg help 
);

Two things to note about this. First, Excel12f is a C++ function (wrapping a C API) that must be called in an add-in DLL’s entry point (xlAutoOpen) for each function that needs to be registered when the add-in is loaded by Excel. For a small plugin, the implementations of any registered functions might be in the same source file, but in a larger one they might be a bit of a maintenance headache by being located somewhere else. Also take note of all the comments used to document the function arguments, a common sight in C and C++ code bases.

The example D code Atila showed using excel-d is a world of difference:

@Register(
    ArgumentText("Array"),
    HelpTopic("Length of Array"),
    FunctionHelp("Length of an Array"),
    ArgumentHelp(["array"])
)
double DoublesLength(double[] arg) {
    return arg.length;
}

Here, the boilerplate for the registration is being generated at compile-time via a User Defined Attribute, which is used to annotate the function. Implementation and registration are happening in the same place. Another key difference is that the UDA has fields with descriptive names, eliminating the need to comment each argument. Finally, the UDA only requires four arguments, nine less than the C++ function. This is because it makes use of D’s compile-time introspection features to figure out as much as it possibly can and, at the same time, optional arguments (like the shortcut text) can just be omitted.

Since this is a Project Highlight on the D Blog, we’re going to ignore Atila’s opening request and ask, “Why?” There are actually two parts to that. First, why Excel?

Our customers are traders, and they work with Excel as one of their main tools. They need/want to, amongst other things, receive live stock updates in a cell and have their formulae automatically update. There’s other functionality they’d like to have and that means adding this to Excel somehow.

Of all the possible languages that could be used for this purpose, the business chose D. That brings us to the second part of the question: why D?

This is possible in Visual Basic, Python or C#, and possibly other languages. But none of them match D’s performance. C++ does, but it’s tedious and requires a lot of boilerplate to get going. D combines the speed and power of C++ with the reflection capabilities of those other languages. No boilerplate, just code, runs fast.

There’s more to the story, of course. The company is heavily invested in D.

We use D for nearly everything, even some “scripts”. The bulk of it is calculations for market indicators. Lots of data in -> munge -> new data out that needs to look pretty for traders. So integrating with existing code was an important factor.

Even though excel-d is targeting Windows, much of it was actually developed on Linux.

We use a Linux container as our reference development machine, but people use what they want. I do nearly all of my work on Linux and only boot into Windows when I have to. For the Excel work, that’s a necessity. But, as usual for me, I wrote all the tests to be platform agnostic, so I do the Excel development on Linux and test there. Every now and again that means a particular quirk of Excel wasn’t captured well in my mocking code, but it’s usually a quick fix after that.

He says they use both DMD and LDC for development, and both are running in continuous integration.

Although DMD doesn’t technically require Visual Studio to be installed (out of the box, it generates 32-bit OMF objects, and uses the OPTLINK linker, rather than the VS-compatible COFF), anyone doing serious work on Windows is going to need VS (or the Visual Studio Build Tools and the Windows SDK) for 64-bit and 32-bit COFF support. The latest LDC binary releases currently require the MS tools (support for MinGW was dropped, but according to the D Wiki, could be picked up again if there’s a champion for it).

Atila already had VS on his Windows partition. For this project, he got a bit of help from the VS plugin for D, Visual D.

I had to install VisualD because our reference project for Excel was in a Visual Studio solution, but afterwards I reverse engineered the build and didn’t open Visual Studio ever again.

Currently, excel-d has no support for custom dialogs or menus. Both items are on his TODO list.

If you’re working with D and need to write an Excel add-in, or want to try something cleaner than C++ to do so, excel-d is available in the DUB package registry. If not, the sponsors of the project, Kaleidic Associates and Symmetry Investments, have made several other open source projects available. They are interested in hiring talented hackers with a moral compass who aspire towards excellence and would like to work in D.

excel-d was developed by Stefan Koch and Atila Neves.

Faster Command Line Tools in D

Jon Degenhardt is a member of eBay’s search team focusing on recall, ranking, and search engine design. He is also the author of eBay’s TSV utilities, an open source data mining toolkit for delimited files. The toolkit is written in D and is quite fast. Much of its performance derives from approaches like those described here.


This post will show how a few simple D programming constructs can turn an already fast command line tool into one that really screams, and in ways that retain the inherent simplicity of the original program. The techniques used are applicable to many programming problems, not just command line tools. This post describes how these methods work and why they are effective. A simple programming exercise is used to illustrate these optimizations. Applying the optimizations cuts the run-time by more than half.

Task: Aggregate values in a delimited file based on a key

It’s a common programming task: Take a data file with fields separated by a delimiter (comma, tab, etc), and run a mathematical calculation involving several of the fields. Often these programs are one-time use scripts, other times they have longer shelf life. Speed is of course appreciated when the program is used more than a few times on large files.

The specific exercise we’ll explore starts with files having keys in one field, integer values in another. The task is to sum the values for each key and print the key with the largest sum. For example:

A   4
B   5
B   8
C   9
A   6

With the first field as key, second field as value, the key with the max sum is B, with a total of 13.

Fields are delimited by a TAB, and there may be any number of fields on a line. The file name and field numbers of the key and value are passed as command line arguments. Below is a Python program to do this:

max_column_sum_by_key.py

#!/usr/bin/env python

import argparse
import fileinput
import collections

def main():
    parser = argparse.ArgumentParser(description='Sum a column.')
    parser.add_argument('file', type=open)
    parser.add_argument('key_field_index', type=int)
    parser.add_argument('value_field_index', type=int)

    args = parser.parse_args()
    delim = '\t'

    max_field_index = max(args.key_field_index, args.value_field_index)
    sum_by_key = collections.Counter()

    for line in args.file:
        fields = line.rstrip('\n').split(delim)
        if max_field_index < len(fields):
            sum_by_key[fields[args.key_field_index]] += int(fields[args.value_field_index])

    max_entry = sum_by_key.most_common(1)
    if len(max_entry) == 0:
        print 'No entries'
    else:
        print 'max_key:', max_entry[0][0], 'sum:', max_entry[0][1]

if __name__ == '__main__':
    main()

(Note: For brevity, error handling is largely omitted from programs shown.)

The program follows a familiar paradigm. A dictionary (collections.Counter) holds the cumulative sum for each key. The file is read one line at a time, splitting each line into an array of fields. The key and value are extracted. The value field is converted to an integer and added to the cumulative sum for the key. After the program processes all of the lines, it extracts the entry with the largest value from the dictionary.

The D program, first try

It’s a common way to explore a new programming language: write one of these simple programs and see what happens. Here’s a D version of the program, using perhaps the most obvious approach:

max_column_sum_by_key_v1.d

int main(string[] args)
{
    import std.algorithm : max, maxElement;
    import std.array : split;
    import std.conv : to;
    import std.stdio;

    if (args.length < 4)
    {
        writefln ("synopsis: %s filename keyfield valuefield", args[0]);
        return 1;
    }

    string filename = args[1];
    size_t keyFieldIndex = args[2].to!size_t;
    size_t valueFieldIndex = args[3].to!size_t;
    size_t maxFieldIndex = max(keyFieldIndex, valueFieldIndex);
    char delim = '\t';

    long[string] sumByKey;

    foreach(line; filename.File.byLine)
    {
        auto fields = line.split(delim);
        if (maxFieldIndex < fields.length)
        {
            string key = fields[keyFieldIndex].to!string;
            sumByKey[key] += fields[valueFieldIndex].to!long;
        }
    }

    if (sumByKey.length == 0) writeln("No entries");
    else
    {
        auto maxEntry = sumByKey.byKeyValue.maxElement!"a.value";
        writeln("max_key: ", maxEntry.key, " sum: ", maxEntry.value);
    }
    return 0;
}

Processing is basically the same as the Python program. An associative array (long[string] sumByKey) holds the cumulative sum for each key. Like the Python program, it splits each line into an array of fields, extracts the key and value fields, and updates the cumulative sum. Finally, it retrieves and prints the entry with the maximum value.

We will measure performance using an ngram file from the Google Books project: googlebooks-eng-all-1gram-20120701-0 (ngrams.tsv in these runs). This file is 10.5 million lines, 183 MB. Each line has four fields: the ngram, year, total occurrence count, and the number of books the ngram appeared in. Visit the ngram viewer dataset page for more information. The file chosen is for unigrams starting with the digit zero. Here are a few lines from the file:

0       1898    114067  6140
0       1906    208805  7933
0       1922    204995  9042
0.5     1986    143398  13938
0.5     1999    191449  19262

The year (second column) is used as the key, and the total occurrence count (third column) as the value. There are 414 distinct years in the data file.

The LDC compiler is used to build the D programs, as it generates fast code:

$ ldc2 -release -O max_column_sum_by_key_v1.d

Here are the commands to perform the task:

$ max_column_sum_by_key.py ngrams.tsv 1 2   # Python program
max_key: 2006 sum: 22569013

$ max_column_sum_by_key_v1 ngrams.tsv 1 2   # D program
max_key: 2006 sum: 22569013

(Note: These programs use field numbers starting at zero.)

The time command was used to measure performance. e.g. $ time max_column_sum_by_key.py ngrams.tsv 1 2. On the author’s MacBook Pro, the Python version takes 12.6 seconds, the D program takes 3.2 seconds. This makes sense as the D program is compiled to native code. But suppose we run the Python program with PyPy, a just-in-time Python compiler? This gives a result of 2.4 seconds, actually beating the D program, with no changes to the Python code. Kudos to PyPy, this is an impressive result. But we can still do better with our D program.

Second version: Using splitter

The first key to improved performance is to switch from using split to splitter. The split function is “eager”, in that it constructs and returns the entire array of fields. Eventually the storage for these fields needs to be deallocated. splitter is “lazy”. It operates by returning an input range that iterates over the fields one-at-a-time. We can take advantage of that by avoiding constructing the entire array, and instead keeping a single field at a time in a reused local variable. Here is an augmented program that does this, the main change being the introduction of an inner loop iterating over each field:

max_column_sum_by_key_v2.d

int main(string[] args)
{
    import std.algorithm : max, maxElement, splitter;
    import std.conv : to;
    import std.range : enumerate;
    import std.stdio;

    if (args.length < 4)
    {
        writefln ("synopsis: %s filename keyfield valuefield", args[0]);
        return 1;
    }

    string filename = args[1];
    size_t keyFieldIndex = args[2].to!size_t;
    size_t valueFieldIndex = args[3].to!size_t;
    size_t maxFieldIndex = max(keyFieldIndex, valueFieldIndex);
    string delim = "\t";

    long[string] sumByKey;

    foreach(line; filename.File.byLine)
    {
        string key;
        long value;
        bool allFound = false;

        foreach (i, field; line.splitter(delim).enumerate)
        {
            if (i == keyFieldIndex) key = field.to!string;
            if (i == valueFieldIndex) value = field.to!long;
            if (i == maxFieldIndex) allFound = true;
        }

        if (allFound) sumByKey[key] += value;
    }

    if (sumByKey.length == 0) writeln("No entries");
    else
    {
        auto maxEntry = sumByKey.byKeyValue.maxElement!"a.value";
        writeln("max_key: ", maxEntry.key, " sum: ", maxEntry.value);
    }
    return 0;
}

The modified program is quite a bit faster, running in 1.8 seconds, a 44% improvement. Insight into what changed can be seen by using the --DRT-gcopt=profile:1 command line option. This turns on garbage collection profiling, shown below (output edited for brevity):

$ max_column_sum_by_key_v1 --DRT-gcopt=profile:1 ngrams.tsv 1 2
max_key: 2006 sum: 22569013
        Number of collections:  132
        Grand total GC time:  246 milliseconds
GC summary:   35 MB,  132 GC  246 ms

$ max_column_sum_by_key_v2 --DRT-gcopt=profile:1 ngrams.tsv 1 2
max_key: 2006 sum: 22569013
      Number of collections:  167
      Grand total GC time:  101 milliseconds
GC summary:    5 MB,  167 GC  101 ms

(Note: The --DRT-gcopt=profile:1 parameter is invisible to normal option processing.)

The reports show two key differences. One is the ‘max pool memory’, the first value shown on the “GC summary line”. The significantly lower value indicates less memory is being allocated. The other is the total time spent in collections. The improvement, 145ms, only accounts for a small portion of the 1.4 seconds that were shaved off by the second version. However, there are other costs associated with storage allocation. Note that allocating and reclaiming storage has a cost in any memory management system. This is not limited to systems using garbage collection.

Also worth mentioning is the role D’s slices play. When splitter returns the next field, it is not returning a copy of characters in the line. Instead, it is returning a “slice”. The data type is a char[], which is effectively a pointer to a location in the input line and a length. No characters have been copied. When the next field is fetched, the variable holding the slice is updated (pointer and length), a faster operation than copying a variable-length array of characters. This is a remarkably good fit for processing delimited files, as identifying the individual fields can be done without copying the input characters.

Third version: The splitter / Appender combo

Switching to splitter was a big speed win, but came with a less convenient programming model. Extracting specific fields while iterating over them is cumbersome, more so as additional fields are needed. Fortunately, the simplicity of random access arrays can be reclaimed by using an Appender. Here is a revised program:

max_column_sum_by_key_v3.d

int main(string[] args)
{
    import std.algorithm : max, maxElement, splitter;
    import std.array : appender;
    import std.conv : to;
    import std.stdio;

    if (args.length < 4)
    {
        writefln ("synopsis: %s filename keyfield valuefield", args[0]);
        return 1;
    }

    string filename = args[1];
    size_t keyFieldIndex = args[2].to!size_t;
    size_t valueFieldIndex = args[3].to!size_t;
    size_t maxFieldIndex = max(keyFieldIndex, valueFieldIndex);
    string delim = "\t";

    long[string] sumByKey;
    auto fields = appender!(char[][])();

    foreach(line; filename.File.byLine)
    {
        fields.clear;
        fields.put(line.splitter(delim));
        if (maxFieldIndex < fields.data.length)
        {
            string key = fields.data[keyFieldIndex].to!string;
            sumByKey[key] += fields.data[valueFieldIndex].to!long;
        }
    }

    if (sumByKey.length == 0) writeln("No entries");
    else
    {
        auto maxEntry = sumByKey.byKeyValue.maxElement!"a.value";
        writeln("max_key: ", maxEntry.key, " sum: ", maxEntry.value);
    }
    return 0;
}

The Appender instance in this program works by keeping a growable array of char[] slices. The lines:

    fields.clear;
    fields.put(line.splitter(delim));

at the top of the foreach loop do the work. The statement fields.put(line.splitter(delim)) iterates over each field, one at a time, appending each slice to the array. This will allocate storage on the first input line. On subsequent lines, the fields.clear statement comes into play. It clears data from the underlying data store, but does not deallocate it. Appending starts again at position zero, but reusing the storage allocated on the first input line. This regains the simplicity of indexing a materialized array. GC profiling shows no change from the previous version of the program.

Copying additional slices does incur a performance penalty. The resulting program takes 2.0 seconds, versus 1.8 for the previous version. This is still a quite significant improvement over the original program (down from 3.2 seconds, 37% faster), and represents a good compromise for many programs.

Fourth version: Associative Array (AA) lookup optimization

The splitter / Appender combo gave significant performance improvement while retaining the simplicity of the original code. However, the program can still be faster. GC profiling indicates storage is still being allocated and reclaimed. The source of the allocations is the following two lines in the inner loop:

    string key = fields.data[keyFieldIndex].to!string;
    sumByKey[key] += fields.data[valueFieldIndex].to!long;

The first line converts fields.data.[keyFieldIndex], a char[], to a string. The string type is immutable, char[] is not, forcing the conversion to make a copy. This is both necessary and required by the associative array. The characters in the fields.data buffer are valid only while the current line is processed. They will be overwritten when the next line is read. Therefore, the characters forming the key need to be copied when added to the associative array. The associative array enforces this by requiring immutable keys.

While it is necessary to store the key as an immutable value, it is not necessary to use immutable values to retrieve existing entries. This creates the opportunity for an improvement: only copy the key when creating the initial entry. Here’s a change to the same lines that does this:

    char[] key = fields.data[keyFieldIndex];
    long fieldValue = fields.data[valueFieldIndex].to!long;

    if (auto sumValuePtr = key in sumByKey) *sumValuePtr += fieldValue;
    else sumByKey[key.to!string] = fieldValue;

The expression key in sumByKey returns a pointer to the value in the hash table, or null if the key was not found. If an entry was found, it is updated directly, without copying the key. Updating via the returned pointer avoids a second associative array lookup. A new string is allocated for a key only the first time it is seen.

The updated program runs in 1.4 seconds, an improvement of 0.6 seconds (30%). GC profiling reflects the change:

$ ./max_column_sum_by_key_v4 --DRT-gcopt=profile:1 ngrams.tsv 1 2
max_key: 2006 sum: 22569013
        Number of collections:  2
        Grand total GC time:  0 milliseconds
GC summary:    5 MB,    2 GC    0 ms

This indicates that unnecessary storage allocation has been eliminated from the main loop.

Note: The program will still allocate and reclaim storage as part of rehashing the associative array. This shows up on GC reports when the number of unique keys is larger.

Early termination of the field iteration loop

The optimizations described so far work by reducing unnecessary storage allocation. These are by far the most beneficial optimizations discussed in this document. Another small but obvious enhancement would be to break out of the field iteration loops after all needed fields have been processed. In version 2, using splitter, the inner loop becomes:

    foreach (i, field; line.splitter(delim).enumerate)
    {
        if (i == keyFieldIndex) key = field.to!string;
        if (i == valueFieldIndex) value = field.to!long;
        if (i == maxFieldIndex)
        {
            allFound = true;
            break;
        }
    }

This produced a 0.1 second improvement. A small gain, but will be larger in use cases excluding a larger number of fields.

The same optimization can be applied to the splitter / Appender combo. The D standard library provides a convenient way to do this: the take method. It returns an input range with at most N elements, effectively short circuiting traversal. The change is to the fields.put(line.splitter(delim)) line:

    import std.range : take;
    ...
    fields.put(line.splitter(delim).take(maxFieldIndex + 1));

Putting it all together

The final version of our program is below, adding take for early field iteration termination to version 4 (splitter, Appender, associative array optimization). For a bit more speed, drop Appender and use the manual field iteration loop shown in version two (version 5 in the results table at the end of this article).

max_column_sum_by_key_v4b.d

int main(string[] args)
{
    import std.algorithm : max, maxElement, splitter;
    import std.array : appender;
    import std.conv : to;
    import std.range : take;
    import std.stdio;

    if (args.length < 4)
    {
        writefln ("synopsis: %s filename keyfield valuefield", args[0]);
        return 1;
    }

    string filename = args[1];
    size_t keyFieldIndex = args[2].to!size_t;
    size_t valueFieldIndex = args[3].to!size_t;
    size_t maxFieldIndex = max(keyFieldIndex, valueFieldIndex);
    string delim = "\t";

    long[string] sumByKey;
    auto fields = appender!(char[][])();

    foreach(line; filename.File.byLine)
    {
        fields.clear;
        fields.put(line.splitter(delim).take(maxFieldIndex + 1));
        if (maxFieldIndex < fields.data.length)
        {
            char[] key = fields.data[keyFieldIndex];
            long fieldValue = fields.data[valueFieldIndex].to!long;

            if (auto sumValuePtr = key in sumByKey) *sumValuePtr += fieldValue;
            else sumByKey[key.to!string] = fieldValue;
        }
    }

    if (sumByKey.length == 0) writeln("No entries");
    else
    {
        auto maxEntry = sumByKey.byKeyValue.maxElement!"a.value";
        writeln("max_key: ", maxEntry.key, " sum: ", maxEntry.value);
    }
    return 0;
}

Summary

This exercise demonstrates several straightforward ways to speed up command line programs. The common theme: avoid unnecessary storage allocation and data copies. The results are dramatic, more than doubling the speed of an already quick program. They are also a reminder of the crucial role memory plays in high performance applications.

Of course, these themes apply to many applications, not just command line tools. They are hardly specific to the D programming language. However, several of D’s features proved especially well suited to minimizing both storage allocation and data copies. This includes ranges, dynamic arrays, and slices, which are related concepts, and lazy algorithms, which operate on them. All were used in the programming exercise.

The table below compares the running times of each of the programs tested:

Program What Time(sec)
Python Program Run with Python2 12.6
Python Program Run with PyPy 2.4
D version 1 Using split 3.2
D version 2 Replace split with splitter 1.8
D version 3 splitter/Appender combo 2.0
D version 4 splitter/Appender, AA optimization 1.4
D version 4b Version 4 plus take 1.3
D version 5 splitter, AA optimization, loop exit 1.1

The author thanks Ali Çehreli, Steven Schveighoffer, and Steve Schneider for providing valuable input to this article.

Introspection, Introspection Everywhere

Prelude: Orem, UT, May 29 2015

Just finished delivering my keynote. Exiting the character, I’m half dead. People say it needs to look easy. Yeah, just get up there and start saying things. Like it’s natural. Spontaneous. Not for me it’s not. Weeks before any public talk, I can only think of how to handle it. What to say. What angles come up. The questions. The little jokes made in real time. Being consistently spontaneous requires so much rehearsal. The irony.

So all I want now is sneak into my hotel room. Replenish the inner introvert after the ultimate act of extroversion. Lay on the bed thinking “What the heck was that?” for the rest of the day. Bit of slalom to get to the door. Almost there. In the hallway, an animated gentleman talks to a few others. He sports a shaved head and a pair of eyebrows that just won’t quit. Stands just by the door, notices me in the corner of his eye, and it’s instantly clear to both of us he’s waiting for me. Still, he delicately gives me the chance to pretend I didn’t notice and walk around him. “Hi, I’m Andrei.” “Liran, co-founder and CTO of Weka.IO. I’m leaving a bit early and before that I wanted to tell you—you should come visit us in Tel Aviv. We’ve been using D for a year now in a large system. I think you’ll like what you see. I might also arrange you a Google tech talk and visits at a couple of universities. Think it over.”

Tel Aviv, May 8 2017

Coming out of the hotel, heat hits like a punch. We’re talking 41 Celsius (before you pull that calculator: 106 Fahrenheit), if you’re lucky to be in the shade. Zohar, software engineer and factotum extraordinaire at Weka, is driving us on the busy streets of Tel Aviv to his employer’s headquarters. A fascinating exotic place, so far away from my neck of the woods.

First, Liran gives me an overview of their system—a large-scale distributed storage based on flash memory. Not my specialty, so I’m desperately searching my mind for trick questions—Information Theory—wait! peeps a lonely neuron who hasn’t fired since 1993—Reed-Solomon and friends, used to know about it. (Loved that class. The professor was quite a character. Wrote anticommunist samizdat poetry before it was cool. Or even legal. True story.) “How do you deal with data corruption when you have such a low write amplification?” “Glad you asked!” (It’s actually me who’s glad. Yay, I didn’t ask a stupid question.) “We use a redundant encoding with error correction properties; you get to choose the trade-off between redundancy and failure tolerance. At any rate, blind data duplication is mathematically a gross thing to do.” I ask a bunch more questions, and clearly these guys did their homework. The numbers look impressive, too—they beat specialized hardware at virtually all metrics. “What’s the trick?” I finally ask. Liran smiles. “I get that all the time. There’s not one trick. There’s a thousand carefully motivated, principled things we do, from the high level math down to the machine code in the drivers. The trick is to do everything right.”

We now need to hop to my first stop of the tour—Tel Aviv University. Liran accompanies me, partly to see his alma mater after twenty years. Small, intimate audience; it’s the regular meeting of their programming languages research group. A few graduate students, a postdoc, and a couple of professors. The talk goes over smoothly. We get to spend five whole minutes on an oddball—what’s the role of the two semicolons in here?

mixin(op ~ "payload;");

It’s tricky. mixin is a statement so it needs a terminator, hence the semicolon at the very end. In turn, mixin takes a string (the concatenation of variable op, which in this case happens to be either "++" or "--", and "payload;") and passes it to the compiler as a statement. Mix this string in the code, so to say. It follows that the string ultimately compiled is "++payload;" or "--payload;". The trick is the generated statement needs its own semicolon. However, within an expression context, mixin is an expression so no more need for the additional semicolon:

auto x = mixin(op ~ "payload"); // mixin is an expression here, no semicolon!

This seems to leave one researcher a bit unhappy, but I point out that all macro systems have their oddities. He agrees and the meeting ends on a cordial note.

The evening ends with dinner and beers with engineers at Weka. The place is called “Truck Deluxe” and it features an actual food truck parked inside the restaurant. We discuss a million things. I am so galvanized I can only sleep for four hours.

May 9 2017

Omg omg OMG. The alarm clock rings at 6:50 AM, and then again at 7 and 7:10, in ever more annoying tones. Just as I’d set it up anticipating the cunning ways of my consciousness to become alert just enough to carefully press the stop—careful, not the snooze—button, before slumbering again. Somewhat to my surprise I make it in time for meeting Liran to depart to Haifa. Technion University is next.

Small meeting again; we start with a handful of folks, but word goes on the grapevine and a few more join during the act. Nice! I pity Liran—he knows the talk by heart by now, including my jokes. I try to make a couple new ones to entertain him a bit, too. It all goes over nicely, and I get great questions. One that keeps on coming is: how do you debug all that compile-time code? To which I quip: “Ever heard of printf-based debugging? Yeah, I thought so. That’s pretty much what you get to do now, in the following form:”

pragma(msg, string_expression);

The expression is evaluated and printed if and only if the compiler actually “goes through” that line, i.e. you can use it to tell which branch in a static if was taken. It is becoming clear, however, that to scale up Design by Introspection more tooling support would be needed.

Back at Weka, as soon as I get parked by a visitor desk, folks simply start showing up to talk with me, one by one or in small groups. Their simple phonetic names—Eyal, Orem, Tomer, Shachar, Maory, Or, …—are a welcome cognitive offload for yours socially inept truly. They suggest improvements to the language, ask me about better ways to do this or that, and show me code that doesn’t work the way it should.

In this faraway place it’s only now, as soon as I see their code, that I feel at home. They get it! These people don’t use the D language like “whatevs”. They don’t even use it as “let’s make this more interesting.” They’re using it as strategic advantage to beat hardware storage companies at their own game, whilst moving unfairly faster than their software storage competitors. The code is as I’d envisioned the D language would be used at its best for high leverage: introspection, introspection everywhere. Compile time everything that can be done at compile time. It’s difficult to find ten lines of code without a static if in there—the magic fork in design space. I run wc -l and comment on the relatively compact code base. They nod approvingly. “We’ve added a bunch of features since last year, yet the code size has stayed within 5%.” I, too, had noticed that highly introspective code has an odd way of feeding upon itself. Ever more behaviors flow through the same lines.

Their questions and comments are intent, focused, so much unlike the stereotypical sterile forum debate. They have no interest to show off, prove me wrong, or win a theoretical argument; all they need is to do good work. It is obvious to all involved that a better language would help them in the way better materials can help architects and builders. Some template-based idioms are awfully slower to compile than others, how can we develop some tooling to inform us about that? Ideas get bounced. Plans emerge. I put a smudgy finger on the screen: “Hmm, I wonder how that’s done in other languages.” The folks around me chuckle. “We couldn’t have done that in any other language.” I decide to take it as a compliment.

A few of us dine at a fancy restaurant (got to sample the local beer—delicious!), where technical discussions go on unabated. These folks are sharp, full of ideas, and intensely enthusiastic. They fully realize my visit there offers the opportunity to shed collective months of toil from their lives and to propel their work faster. I literally haven’t had ten minutes with myself through the day. The only time I get to check email on my phone is literally when I lock myself into (pardon) a restroom stall. Tomorrow my plan is to sleep in, but there’s so much going on, and so much more yet to come, that again I can only sleep a couple of hours.

May 10 2017

Today’s the big day: the Google Campus Tel Aviv talk. We’re looking at over 160 attendees this evening. But before that there’s more talking to people at Weka. Attribute calculus comes on the table. So for example we have @safe, @trusted, and @system with a little algebra: @safe functions can call only @safe and @trusted functions. Any other function may call any function, and inference works on top of everything. That’s how you encapsulate unsafe code in a large system—all nice and dandy. But how about letting users define their own attributes and calculi? For example, a “may switch fibers” attribute such that functions that yield cannot be called naively. Or a “has acquired a lock” attribute. From here to symbolic computation and execution cost estimation the road is short! To be sure, that would complicate the language. But looking at their code it’s clear how it would be mightily helped by such stuff.

I got a lunchtime talk scheduled with all Weka employees in attendance, but I’m relaxed—by this point it’s just a family reunion. Better yet, one in which I get to be the offbeat uncle. My only concern is novelty—everything I preach, these folks have lived for two years already. Shortly before the talk Liran comes to me and asks “Do you think you have a bit more advanced material for us?” Sorry mate, I’m at capacity over here—can’t produce revolutionary material in the next five minutes. Fortunately the talk goes well in the sense that Design by Introspection formalizes and internalizes the many things they’ve already been doing. They appreciate that, and I get a warm reception and a great Q&A session.

As I get to the Google campus in Tel Aviv, late afternoon, the crowds start to gather. This is big! And I feel like I’d kill myself right now! I mentioned I take weeks to prepare a single public appearance, by the end of which I will definitely have burned a few neurons. And I’m fine with that—it’s the way it’s supposed to be. Problem is, this one week packs seven appearances. The hosts offer me coffee before, during, and after the talk. Coffee in Israel is delicious, but I’m getting past the point where caffeine may help.

Before the talk I get to chit chat incognito with a few attendees. One asks whether a celebrity will be speaking. “No, it’s just me.” He’s nice enough to not display disappointment.

There’s something in the air that tells you immediately whether an audience will be welcoming or not so much. Like a smell. And the scent in this room is fabulously festive. These folks are here to have a good time, and all I need to do to keep the party going is, in the words of John Lakos, “show up, babble, and drool.” The talk goes over so well, they aren’t bored even two hours later. Person in the front row seems continuously confused, asks a bunch of questions. Fortunately we get to an understanding before resorting to the dreaded “let’s take this offline.” Best questions come of course from soft-spoken dudes in the last row. As I poke fun at Mozilla’s CheckedInt, I realize Rust is also Mozilla’s and I fear malice by proxy will be alleged. Too late. (Late at night, I double checked. Mozilla’s CheckedInt is just as bad as I remembered. They do a division to test for multiplication overflow. Come on, put a line of assembler in there! Portability is worth a price, just not any price.) My talk ends just in time for me to not die. I’m happy. If you’re exhausted it means it went well.

May 11 2017

Again with the early wake-up, this time to catch a train to Beersheba. Zohar rides with me. The train is busy with folks from all walks of life. I trip over the barrel of a Galil. Its owner—a girl in uniform—exchanges smiles with me.

Two talks are on the roster today, college followed by Ben Gurion University. The first goes well except one detail. It’s so hot and so many people in the room that the AC decides—hey, y’know, I don’t care much for all that Design by Introspection. It’s so hot, folks don’t even bother to protest my increasingly controversial jokes about various languages. I take the risk to give them a coffee break in the middle thus giving them the opportunity to leave discreetly. To my pleasant surprise, they all return to the sauna for part deux.

The AC works great at Ben Gurion University, but here I’m facing a different problem: it’s the dreaded right-after-lunch spot and some people have difficulty staying awake. Somebody gives up and leaves after 20 minutes. Fortunately a handful of enthused (and probably hungry) students and one professor get into it. The professor really likes the possibilities opened by Design by Introspection and loves the whole macro expansion idea. Asks me a bazillion questions after the talk that make it clear he’s been hacking code generation engines for years. Hope to hear back from him.

Just when I think I’m done, there’s one more small event back at Weka. A few entrepreneurs and executives, friends of Weka’s founders, got wind of their successful use of the D language and were curious to have a sit down with me. One researcher, too. I’m well-prepared for a technical discussion: yes, we know how to do safety. We have a solution in the works for applications that don’t want the garbage collector. “So far so good,” told himself the lamb walking into the slaughterhouse.

To my surprise, the concerns these leaders have are entirely nontechnical. It’s all about community building, leadership, availability of libraries and expertise, website, the “first five minutes” experience, and such. These people have clearly have done their homework; they know the main contributors by name along with their roles, and are familiar with the trendy topics and challenges within the D language community.

“Your package distribution system needs ranking,” mentions a CTO to approving nods from the others. “Downloads, stars, activity—all criteria should be available for sorting and filtering. People say github is better than sourceforge because the latter has a bunch of crap. In fact I’m sure github has even more crap than sourceforge, it’s just that you don’t see it because of ranking.”

“Leadership could be better,” mentions a successful serial entrepreneur. “There’s no shortage of great ideas in the community, and engagement should be toward getting work done on those great ideas. Problem is, great ideas are easy to debate against because almost by definition what makes them great is also what takes them off the beaten path. They’re controversial. There’s risk to them. They may even seem impossible. So those get pecked to death. What gets implemented is good ideas, the less controversial ones. But you don’t want to go with the good ideas. You want the great ideas.”

And so it goes for over three hours. My head is buzzing with excitement as Zohar drives us back to the hotel. The ideas. The opportunities. The responsibility. These people at Weka have a lot riding on the D language. It’s not only money—”not that there’s anything wrong with that!”—but also their hopes, dreams, pride of workmanship. The prime of their careers. We’ve all got our work cut out for us.

I tell Zohar no need to wake up early again tomorrow, I’ll just get a cab to the airport. I reckon a pile of backlogged work is waiting for him. He’s relieved. “Man, I’m so glad you said that. I’m totally pooped after this week. I don’t know how you do it.”

Funny thing is, I don’t know either.

DConf 2017 Ex Post Facto

Another May, another DConf in the rear view. This year’s edition, organized and hosted once again by the talented folks from Sociomantic, and emceed for the second consecutive year by their own Dylan Cromwell, brought more talks, more fun, and an extra day. The livestream this year, barring a glitch or two with the audio, went almost perfectly. And for the first time in DConf history, videos of the talks started appearing online almost as soon as the conference was over. The entire playlist is available now.

As usual, there were three days of talks. The first opened with a keynote by Walter and the last with one by Andrei (he gave a longer version of the same talk at Google’s Tel Aviv campus a few days later). The keynote from Scott Meyers on Day Two, in his second DConf appearance, was actually the second talk of the day thanks to some technical issues. He told everyone about the things he finds most important in software development, a talk recommended for any developer no matter their language of preference.

The keynotes were followed by presentations from a mix of DConf veterans and first-time speakers. This year, livestream viewers were treated to some special segments during the lunch breaks. On Day One, Luís Marques showed off a live demo using D as a hardware description language, which had been the subject of his presentation just before the lunch break (he used a Papillo Pro from Gadget Factory in his demo, and the company was kind enough to provide an FPGA for one lucky attendee to take home). On Day Two, Vladimir Panteleev, after being shuffled from the second spot to the first in the lineup, gave a livestream demo of concepts he had discussed in his talk on Practical Metaprogramming. And on the last day of presentations, Bastiaan Veelo presented the livestream audience with an addendum to his talk, Extending Pegged to Parse Another Programming Language.

Day Two closed out with a panel discussion featuring Scott, Walter and Andrei.

Scott Meyers, Walter Bright, and Andrei Alexandrescu in a panel discussion moderated by Dylan Cromwell.

It was during this discussion that Walter made the claim that memory safety will be the bane of C, eliciting skepticism from Scott Meyers. That exchange prompted more discussion on /r/programming almost two weeks later.

The newest segment of the event this year came in the form of an extra day given over entirely to the DConf Hackathon. As originally envisioned, it was never intended to be a hackathon in the traditional sense. The sole purpose was for members of the D community to get together face-to-face and hash out the pain points, issues, and new ideas they feel strongly about. DConf 2016 had featured a “Birds of a Feather” session, with the goal of hashing out a specific category of issues, as part of the regular talk lineup. It didn’t work out as intended. The hackathon, suggested by Sebastian Wilzbach, was conceived as an expansion of and improvement upon that concept.

The initial plan was to present attendees with a list of issues that need resolving in the D ecosystem, allow them to suggest some more, and break off into teams to solve them. Sebastian put a lot of effort into a shared document everyone could add their ideas and their names to. As it turned out, participants flowed naturally through the venue, working, talking, and just getting things done. Some worked in groups, others worked alone. Some, rather than actively coding, hashed out thorny issues through debate, others provided informal tutoring or advice. In the end, it was a highly productive day. Perhaps the most tangible result was, as Walter put it, “a tsunami of pull requests.” It’s already a safe bet that the Hackathon will become a DConf tradition.

In the evenings between it all, there was much food, drink, and discussion to be had. It was in this “downtime” where more ideas were thrown around. Some brought their laptops to hack away in the hotel lobby, working on pet projects or implementing and testing ideas as they were discussed. It was here where old relationships were strengthened and new ones formed. This aspect of DConf can never be overstated.

A small selection of more highlights that came out of the four days of DConf 2017:

  • Walter gave the green light to change the D logo and a strategy was devised for moving forward.
  • Jonathan Davis finally managed to get std.datetime split from a monolithic module into a package of smaller modules.
  • Some contentious issues regarding workflow in the core repositories were settled.
  • Vladimir Panteleev gave DustMite the ability to reduce diffs.
  • Timon Gehr implemented static foreach (yay!).
  • Ali Çehreli finished updating his book Programming in D to 2.074.0.
  • Nemanja Boric fixed the broken exception handling on FreeBSD-CURRENT.
  • Steven Schveighoffer and Atila Neves earned their wizard hats for submitting their first pull requests to DMD.
  • Adrian Matoga and Sönke Ludwig (and probably others) worked on fixing issues with DUB.
  • Progress was made on the D compiler-as-a-library front.

This is far from an exhaustive list. The venue was a hive of activity during that last day, and who knows what else was accomplished in the halls, restaurants, and hotel lobbies. This short list only scratches the surface.

A very big Thank You to everyone at Sociomantic who treated us to another spectacular DConf. We’re already looking forward to DConf 2018!

Thanks Sociomantic!

Serialization in D

Vladimir Panteleev has spent over a decade using and contributing to D. He is the creator and maintainer of DFeed, the software powering the D forums, has made numerous contributions to Phobos, DRuntime, DMD, and the D website, and has created several tools useful for maintaining D software (like Digger and Dustmite).


A few days ago, I saw this blog post by Justin Turpin on the front page of Hacker News:

The Grass is Always Greener – My Struggles with Rust

This was an interesting coincidence in that it occurred during DConf, where I had mentioned serialization in D a few times during my talk. Naturally, I was curious to see how D stands up to this challenge.

The Task

Justin’s blog starts off with the following Python code:

import configparser
config = ConfigParser()
config.read("config.conf")

This is actually very similar to a pattern I use in many of my D programs. For example, DFeed (the software behind forum.dlang.org), has this code for configuring its built-in web server:

struct ListenConfig
{
    string addr;
    ushort port = 80;
}

struct Config
{
    ListenConfig listen;
    string staticDomain = null;
    bool indexable = false;
}
const Config config;

import ae.utils.sini;
shared static this() { config = loadIni!Config("config/web.ini"); }

This is certainly more code than the Python example, but that’s only the case because I declare the configuration as a D type. The loadIni function then accepts the type as a template parameter and returns an instance of it. The strong typing makes it easier to catch typos and other mistakes in the configuration – an unknown field or a non-numeric value where a number is expected will immediately result in an error.

On the last line, the configuration is saved to a global by a static constructor (shared indicates it runs once during program initialization, instead of once per thread). Even though loadIni‘s return type is mutable, D allows the implicit conversion to const because, as it occurs in a static constructor, it is treated as an initialization.

Traits

The Rust code from Justin’s blog is as follows:

#[macro_use]
extern crate serde_derive;
extern crate toml;

#[derive(Deserialize)]
struct MyConfiguration {
  jenkins_host: String,
  jenkins_username: String,
  jenkins_token: String
}

fn gimme_config(some_filename: &str) -> MyConfiguration {
  let mut file = File::open(some_filename).unwrap();
  let mut s = String::new();
  file.read_to_string(&mut s).unwrap();
  let my_config: MyConfiguration = toml::from_str(s).unwrap();
  my_config
}

The first thing that jumps out to me is that the MyConfiguration struct is annotated with #[derive(Deserialize)]. It doesn’t seem optional, either – quoting Justin:

This was something that actually really discouraged me upon learning, but you cannot implement a trait for an object that you did not also create. That’s a significant limitation, and I thought that one of the main reason Rust decided to go with Traits and Structs instead of standard classes and inheritance was for this very reason. This limitation is also relevant when you’re trying to serialize and deserialize objects for external crates, like a MySQL row.

D allows introspecting the fields and methods of any type at compile-time, so serializing third-party types is not an issue. For example (and I’ll borrow a slide from my DConf talk), deserializing one struct field from JSON looks something like this:

string jsonField = parseJsonString(s);
enforce(s.skipOver(":"), ": expected");

bool found;
foreach (i, ref field; v.tupleof)
{
    enum name = __traits(identifier, v.tupleof[i]);
    if (name == jsonField)
    {
        field = jsonParse!(typeof(field))(s);
        found = true;
        break;
    }
}
enforce(found, "Unknown field " ~ jsonField);

Because the foreach aggregate is a tuple (v.tupleof is a tuple of v‘s fields), the loop will be unrolled at compile time. Then, all that’s left to do is compare each struct field with the field name we got from the JSON stream and, if it matches, read it in. This is a minimal example that can be improved e.g. by replacing the if statements with a switch, which allows the compiler to optimize the string comparisons to hash lookups.

That’s not to say D lacks means for adding functionality to existing types. Although D does not have struct inheritance like C++ or struct traits like Rust, it does have:

  • alias this, which makes wrapping types trivial;
  • opDispatch, allowing flexible customization of forwarding;
  • template mixins, which allow easily injecting functionality into your types;
  • finally, there is of course classic OOP inheritance if you use classes.

Ad-lib and Error Handling

It doesn’t always make sense to deserialize to a concrete type, such as when we only know or care about a small part of the schema. D’s standard JSON module, std.json, currently only allows deserializing to a tree of variant-like types (essentially a DOM). For example:

auto config = readText("config.json").parseJSON;
string jenkinsServer = config["jenkins_server"].str;

The code above is the D equivalent of the code erickt posted on Hacker News:

let config: Value = serde::from_reader(file)
    .expect("config has invalid json");

let jenkins_server = config.get("jenkins_server")
    .expect("jenkins_server key not in config")
    .as_str()
    .expect("jenkins_server key is not a string");

As D generally uses exceptions for error handling, the checks that must be done explicitly in the Rust example are taken care of by the JSON library.

Final thoughts

In the discussion thread for Justin’s post, Reddit user SilverWingedSeraph writes:

You’re comparing a systems language to a scripting language. Things are harder in systems programming because you have more control over, in this case, the memory representation of data. This means there is more friction because you have to specify that information.

This struck me as a false dichotomy. There is no reason why a programming language which has the necessary traits to be classifiable as a system programming language can not also provide the convenience of scripting languages to the extent that it makes sense to do so. For example, D provides type inference and variant types for when you don’t care about strong typing, and garbage collection for when you don’t care about object lifetime, but also provides the tools to get down to the bare metal in the parts of the code where performance matters.

For my personal projects, I’ve greatly enjoyed D’s capability of allowing rapidly prototyping a design, then optimizing the performance-critical parts as needed without having to use a different language to do so.

See also

automem: Hands-Free RAII for D

Átila Neves has used both C++ and D professionally. He’s responsible for several D libraries and tools, like unit-threaded, cerealed, and reggae.


Garbage collected languages tend to suffer from a framing problem, and D is no exception. Its inclusion of a mark-and-sweep garbage collector makes safe memory management easy and convenient, but, thanks to a widespread perception that GC in general is a performance killer, alienates a lot of potential users due to its mere existence.

Coming to D as I did from C++, the one thing I didn’t like about the language initially was the GC. I’ve since come to realize that my fears were mostly unfounded, but the fact remains that, for many people, the GC is reason enough to avoid the language. Whether or not that is reasonable given their use cases is debatable (and something over which reasonable people may disagree), but the existence of the perception is not.

A lot of work has been done over the years to make sure that D code can be written which doesn’t depend on the GC. The @nogc annotation is especially important here, and I don’t think it’s been publicized enough. A @nogc main function is a compile-time guarantee that the program will not ever allocate any GC memory. For the types of applications that need those sorts of guarantees, this is invaluable.

But if not allocating from the GC heap, where does one get the memory? Still in the experimental package of the standard library, std.experimental.allocator provides building blocks for composing allocators that should satisfy any and all memory allocation needs where the GC is deemed inappropriate. Better still, via the IAllocator interface, one can even switch between GC and custom allocation strategies as needed at runtime.

I’ve recently used std.experimental.allocator in order to achieve @nogc guarantees and, while it works, there’s one area in which the experience wasn’t as smooth as when using C++ or Rust: disposing of memory. Like C++ and Rust, D has RAII. As is usual in all three, it’s considered bad form to explicitly release resources. And yet, in the current state of affairs, while using the D standard library one has to manually dispose of memory if using std.experimental.allocator. D makes it easier than most languages that support exceptions, due to scope(exit), but in a language with RAII that’s just boilerplate. And as the good lazy programmer that I am, I abhor writing code that doesn’t need to be, and shouldn’t be, written. An itch developed.

The inspiration for the solution I came up with was C++; ever since C++11 I’ve been delighted with using std::unique_ptr and std::shared_ptr and basically never again worrying about manually managing memory. D’s standard library has Unique and RefCounted in std.typecons but they predate std.experimental.allocator and so “bake in” the allocation strategy. Can we have our allocation cake and eat it too?

Enter automem, a library I wrote providing C++-style smart pointers that integrate with std.experimental.allocator. It was clear to me that the design had to be different from the smart pointers it took inspiration from. In C++, it’s assumed that memory is allocated with new and freed with delete (although it’s possible to override both). With custom allocators and no real obvious default choice, I made it so that the smart pointers would allocate memory themselves. This makes it so one can’t allocate with one allocator and deallocate with a different one, which is another benefit.

Another goal was to preserve the possibility of Unique, like std::unique_ptr, to be a zero-cost abstraction. In that sense the allocator type must be specified (it defaults to IAllocator); if it’s a value type with no state, then it takes up no space. In fact, if it’s a singleton (determined at compile-time by probing where Allocator.instance exists), then it doesn’t even need to be passed in to the constructor! As in much modern D code, Design by Instropection pays its dues here. Example code:

struct Point {
    int x;
    int y;
}

{
    // must pass arguments to initialise the contained object
    // but not an allocator instance since Mallocator is
    // a singleton (Mallocator.instance) returns the only
    // instantiation
    
    auto u1 = Unique!(Point, Mallocator)(2, 3);
    assert(*u1 == Point(2, 3));
    assert(u1.y == 3); // forwards to the contained object

    // auto u2 = u1; // won't compile, can only move
    typeof(u1) u2;
    move(u1, u2);
    assert(cast(bool)u1 == false); // u1 is now empty
}
// memory freed for the Point structure created in the block

RefCounted is automem’s equivalent of C++’s std::shared_ptr. Unlike std::shared_ptr however, it doesn’t always do an atomic reference count increment/decrement. The reason is that it leverage’s D’s type system to determine when it has to; if the payload is shared, then the reference count is changed atomically. If not, it can’t be sent to other threads anyway and the performance penalty doesn’t have to be paid. C++ always does an atomic increment/decrement. Rust gets around this with two types, Arc and Rc. In D the type system disambiguates. Another win for Design by Introspection, something that really is only possible in D. Example code:

{
    auto s1 = RefCounted!(Point, Mallocator)(4, 5);
    assert(*s1 == Point(4, 5));
    assert(s1.x == 4);
    {
        auto s2 = s1; // can be copied, non-atomic reference count
    } // ref count goes to 1 here

} // ref count goes to 0 here, memory released

Given that the allocator type is usually specified, it means that when using a @nogc allocator (most of them), the code using automem can itself be made @nogc, with RAII taking care of any memory management duties. That means compile-time guarantees of no GC allocation for the applications that need them.

I hope automem and std.experimental.allocator manage to solve D’s GC framing problem. Now it should be possible to write @nogc code with no manual memory disposal in D, just as it is in C++ and Rust.

The DConf Experience

April 23rd, the deadline for DConf 2017 registrations, is just a few days away. Personally, I wasn’t sure if I’d be able to attend this year or not, but fortunately things worked out. While I’m looking forward to more great presentations this year, that’s not what keeps drawing me back (or makes me regret being unable to attend in 2014 and 2015).

The main draw for me is putting faces to all the GitHub and forum handles, then getting the opportunity to reconnect in person with those I’ve met at past conferences to talk about more than just the D programming language or the projects to which we all contribute.

I’m not the only one who feels that way. Walter Bright had this to say of DConf 2016:

Besides the technical content, I enjoyed the camaraderie of the conference. The pleasant walking and talking going between the Ibis hotel and the conference. The great food and conversations at the event. Chatting at the Ibis after hours.

If you’re staying at the Hotel Ibis Neukölln (which, unfortunately, is booked solid the week of the conference last I heard) and you’re an early riser, you’ll likely find Walter down in the lobby having First Breakfast and up for a good conversation.

Of course, the mingling isn’t necessarily confined to the conference hall or the hotel lobby. Conference goers sometimes share a room in order to save costs. That’s a great opportunity to learn something new about someone you might not otherwise have discovered, as GDC meister Iain Buclaw found out at DConf 2013 when he and Manu Evans were roomies for a few days:

I recall that I had gotten some microwavable food from the nearby groceries ahead of his arrival — “I didn’t know what you would like, or if you had allergies, so I went with the safe option and only got Vegan foods”. It was a pleasant surprise to discover we have a diet in common also.

At that edition, Iain, Manu and Brad Roberts were shuttled to and from the conference by Walter in his rental car. Carpooling has been a common part of the Stateside DConfs. At the venue in Berlin, there’s no need for it. The semi-official hotel is in walking distance and there’s a subway stop nearby for those who are further off. And, as Steven Schveighoffer can attest, rental bicycles are always an option.

I remember last year, Joseph Wakeling, Andrew Edwards and his son, and myself went for a great bike tour of Berlin given by native David Eckardt of Sociomantic. This included the experience of dragging our rented bikes on the subway, and getting turned away by police in riot gear when we had inadvertently tried to go somewhere that political demonstrations were happening. Highly recommended for anyone, especially if David is gracious enough to do the tour again! We randomly met Ethan Watson riding a rented bike along the way too 🙂

Speaking of Ethan, it appears there’s a good reason that he was riding his bike alone and in the wrong direction.

My first BeerConf was in 2016, located in the rather interesting city of Berlin. Every evening, we would gather and drink large amounts of very cheap beer. This would also coincide with some rather enjoyable shenanigans, be they in-depth conversations; pub crawls; more in-depth conversations; convenience store crawls (where the beer is even cheaper); even more in-depth conversations; and a remarkable lack of late night food.

Hours during the day were passed away at a parallel conference that I kinda stumbled in to, DConf. Interestingly, this was stocked with the same characters that I socialised with during BeerConf. Maybe I was drunk then. Maybe I’m drunk now. Did I present a talk there? I think I did, although I can only assume the talk did not have anything to do with beer.

I’ve found out recently that I’m dual-booked for BeerConf 2017 as well as presenting again at DConf. Since attending these events can only be achieved through a submission process my deduction is that I had a thoroughly good time meeting all these people and socialising them, as well as sharing knowledge and technical expertise, and decided I must do it again. Either that, or someone else signed me up. Don’t tell me which option is the truth, I’ll work it out.

Yes, the beer is a big part of it, too. This blog was born at the small bar in the lobby of the Ibis, over a couple of cold ones and the background cacophony of D programmer talk. Who could say what new D initiatives will spawn from the fermented grains this year?

But let’s not forget the presentations. While they’ll be livestreamed and those who don’t attend can watch them on YouTube in perpetuity, the social aspect lends a flavor to them that the videos just can’t. Andrew Edwards is a fan of one speaker in particular.

I’m always fascinated by Don Clugston’s talks. The most memorable is DConf 2013, during which he stated that, “The language made minor steps… We got here like miners following a vein of gold, not by an original master plan,” in his explanation of how D arrived at its powerful metaprogramming capabilities.

Where else can you hear “wonks” recounting their first hand experiences of what makes D “berry, berry good” for them? It is just a great experience talking and exchanging ideas with some of the greatest minds in the D community.

And there’s no substitute for being in the room when something unexpected happens, as it did during Stefan Koch’s lightning talk at DConf 2016, when he wanted to do a bit of live coding on stage.

There was no DMD on the computer 🙂 That surprised me.

Whether your interest is social, technical, alcohological, or otherwise, DConf is just a fun place to be. If you have the time and the means to attend, you’re  warmly invited to make your own DConf memories. Do yourself a favor and register before the window closes. If you can’t make it, be sure to follow the livestream (pay attention to the Announce forum for details) or pull up #D on freenode.net to ask questions of the presenters. But we really hope to see you there!

I’ll leave you to consider Andrei Alexandrescu’s take on DConf, which concisely sums up the impact the conference has on many who attend:

To me, DConf has already become the anchoring event that has me inspired and motivated for the whole year. We are a unique programming language community in that we’re entirely decentralized geographically. Even the leadership is spread all over across the two US coasts, Western Europe, Eastern Europe, and Asia. Therefore, the event that brings us all together is powerful – an annual systole of the live, beating heart of a great community.

We don’t lack a good program with strong talks, but to me the best moments are found in the interstices – the communal meals, the long discussions with Walter and others in the hallway, the late nights in the hotel lobby. All of these are much needed accelerated versions of online exchanges.

I’m very much looking forward to this year’s edition. In addition to the usual suspects, we’ll have a strong showing of work done by our graduate scholarship recipients.

Thanks to Walter, Iain, Steven, Ethan, Andrew, Stefan and Andrei for sharing their anecdotes and thoughts.

The New CTFE Engine

Stefan Koch is the maintainer of sqlite-d, a native D sqlite reader, and has contributed to projects like SDC (the Stupid D Compiler) and vibe.d. He was also responsible for a 10% performance improvement in D’s current CTFE implementation and is currently writing a new CTFE engine, the subject of this post.


For the past nine months, I’ve been working on a project called NewCTFE, a reimplementation of the Compile-Time Function Evaluation (CTFE) feature of the D compiler front-end. CTFE is considered one of the game-changing features of D.

As the name implies, CTFE allows certain functions to be evaluated by the compiler while it is compiling the source code in which the functions are implemented. As long as all arguments to a function are available at compile time and the function is pure (has no side effects), then the function qualifies for CTFE. The compiler will replace the function call with the result.

Since this is an integral part of the language, pure functions can be evaluated anywhere a compile-time constant may go. A simple example can be found in the standard library module, std.uri, where CTFE is used to compute a lookup table. It looks like this:

private immutable ubyte[128] uri_flags = // indexed by character
({

    ubyte[128] uflags;

    // Compile time initialize
    uflags['#'] |= URI_Hash;

    foreach (c; 'A' .. 'Z' + 1)
    {
        uflags[c] |= URI_Alpha;
        uflags[c + 0x20] |= URI_Alpha; // lowercase letters

    }

    foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit;

    foreach (c; ";/?:@&=+$,") uflags[c] |= URI_Reserved;

    foreach (c; "-_.!~*'()") uflags[c] |= URI_Mark;

    return uflags;

})();

Instead of populating the table with magic values, a simple expressive function literal is used. This is much easier to understand and debug than some opaque static array. The ({ starts a function-literal, the }) closes it. The () at the end tells the compiler to immediately evoke that literal such that uri_flags becomes the result of the literal.

Functions are only evaluated at compile time if they need to be. uri_flags in the snippet above is declared in module scope. When a module-scope variable is initialized in this manner, the initializer must be available at compile time. In this case, since the initializer is a function literal, an attempt will be made to perform CTFE. This particular literal has no arguments and is pure, so the attempt succeeds.

For a more in-depth discussion of CTFE, see this article by H. S. Teoh at the D Wiki.

Of course, the same technique can be applied to more complicated problems as well; std.regex, for example, can build a specialized automaton for a regex at compile time using CTFE. However, as soon as std.regex is used with CTFE for non-trivial patterns, compile times can become extremely high (in D everything that takes longer than a second to compile is bloat-ware :)). Eventually, as patterns get more complex, the compiler will run out of memory and probably take the whole system down with it.

The blame for this can be laid at the feet of the current CTFE interpreter’s architecture. It’s an AST interpreter, which means that it interprets the AST while traversing it. To represent the result of interpreted expressions, it uses DMD’s AST node classes. This means that every expression encountered will allocate one or more AST nodes. Within a tight loop, the interpreter can easily generate over 100_000_000 nodes and eat a few gigabytes of RAM. That can exhaust memory quite quickly.

Issue 12844 complains about std.regex taking more than 16GB of RAM. For one pattern. Then there’s issue 6498, which executes a simple 0 to 10_000_000 while loop via CTFE and runs out of memory.

Simply freeing nodes doesn’t fix the problem; we don’t know which nodes to free and enabling the garbage collector makes the whole compiler brutally slow. Luckily there is another approach which doesn’t allocate for every expression encountered. It involves compiling the function to a virtual ISA (Instruction Set Architecture). This virtual ISA, also known as bytecode, is then given to a dedicated interpreter for that ISA (in the case in which a virtual ISA happens to be the same as the ISA of the host, we call it a JIT (Just in Time) interpreter).

The NewCTFE project concerns itself with implementing such a bytecode interpreter. Writing the actual interpreter (a CPU emulator for a virtual CPU/ISA) is reasonably simple. However, compiling code to a virtual ISA is exactly as much work as compiling it to a real ISA (though, a virtual ISA has the added benefit that it can be extended for customized needs, but that makes it harder to do JIT later). That’s why it took a month just to get the first simple examples running on the new CTFE engine, and why slightly more complicated ones still aren’t running even after 9 months of development. At the end of the post, you’ll find an approximate timeline of the work done so far.

I’ll be giving a presentation at DConf 2017, where I’ll discuss my experience implementing the engine and explain some of the technical details, particularly regarding the trade-offs and design choices I’ve made. The current estimation is that the 1.0 goals will not be met by then, but I’ll keep coding away until it’s done.

Those interested in keeping up with my progress can follow my status updates in the D forums. At some point in the future, I will write another article on some of the technical details of the implementation. In the meantime, I hope the following listing does shed some light on how much work it is to implement NewCTFE 🙂

  • May 9th 2016
    Announcement of the plan to improve CTFE.
  • May 27th 2016
    Announcement that work on the new engine has begun.
  • May 28th 2016
    Simple memory management change failed.
  • June 3rd 2016
    Decision to implement a bytecode interpreter.
  • June 30th 2016
    First code (taken from issue 6498) consisting of simple integer arithmetic runs.
  • July 14th 2016
    ASCII string indexing works.
  • July 15th 2016
    Initial struct support
  • Sometime between July and August
    First switches work.
  • August 17th 2016
    Support for the special cases if(__ctfe) and if(!__ctfe)
  • Sometime between August and September
    Ternary expressions are supported
  • September 08th 2016
    First Phobos unit tests pass.
  • September 25th 2016
    Support for returning strings and ternary expressions.
  • October 16th 2016
    First (almost working) version of the LLVM backend.
  • October 30th 2016
    First failed attempts to support function calls.
  • November 01st
    DRuntime unit tests pass for the first time.
  • November 10th 2016
    Failed attempt to implement string concatenation.
  • November 14th 2016
    Array expansion, e.g. assignment to the length property, is supported.
  • November 14th 2016
    Assignment of array indexes is supported.
  • November 18th 2016
    Support for arrays as function parameters.
  • November 19th 2016
    Performance fixes.
  • November 20th 2016
    Fixing the broken while(true) / for (;;) loops; they can now be broken out of 🙂
  • November 25th 2016
    Fixes to goto and switch handling.
  • November 29th 2016
    Fixes to continue and break handling.
  • November 30th 2016
    Initial support for assert
  • December 02nd 2016
    Bailout on void-initialized values (since they can lead to undefined behavior).
  • December 03rd 2016
    Initial support for returning struct literals.
  • December 05th 2016
    Performance fix to the bytecode generator.
  • December 07th 2016
    Fixes to continue and break in for statements (continue must not skip the increment step)
  • December 08th 2016
    Array literals with variables inside are now supported: [1, n, 3]
  • December 08th 2016
    Fixed a bug in switch statements.
  • December 10th 2016
    Fixed a nasty evaluation order bug.
  • December 13th 2016
    Some progress on function calls.
  • December 14th 2016
    Initial support for strings in switches.
  • December 15th 2016
    Assignment of static arrays is now supported.
  • December 17th 2016
    Fixing goto statements (we were ignoring the last goto to any label :)).
  • December 17th 2016
    De-macrofied string-equals.
  • December 20th 2016
    Implement check to guard against dereferencing null pointers (yes… that one was oh so fun).
  • December 22ed 2016
    Initial support for pointers.
  • December 25th 2016
    static immutable variables can now be accessed (yes the result is recomputed … who cares).
  • January 02nd 2017
    First Function calls are supported !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  • January 17th 2017
    Recursive function calls work now 🙂
  • January 23rd 2017
    The interpret3.d unit-test passes.
  • January 24th 2017
    We are green on 64bit!
  • January 25th 2017
    Green on all platforms !!!!! (with blacklisting though)
  • January 26th 2017
    Fixed special case cast(void*) size_t.max (this one cannot go through the normal pointer support, which assumes that you have something valid to dereference).
  • January 26th 2017
    Member function calls are supported!
  • January 31st 2017
    Fixed a bug in switch handling.
  • February 03rd 2017
    Initial function pointer support.
  • Rest of Feburary 2017
    Wild goose chase for issue #17220
  • March 11th 2017
    Initial support for slices.
  • March 15th 2017
    String slicing works.
  • March 18th 2017
    $ in slice expressions is now supported.
  • March 19th 2017
    The concatenation operator (c = a ~ b) works.
  • March 22ed 2017
    Fixed a switch fallthrough bug.