DIP1000: Memory Safety in a Modern Systems Programming Language Part 2

DIP1000: Memory Safety in a Modern System Programming Language Pt. 2

The previous entry in this series shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. But D can refer to the stack in other ways, too, and that’s the topic of this article.

Object-oriented instances are the easiest case

In Part 1, I said that if you understand how DIP1000 works with pointers, then you understand how it works with classes. An example is worth more than mere words:

@safe Object ifNull(return scope Object a, return scope Object b)
{
    return a? a: b;
}

The return scope in the above example works exactly as it does in the following:

@safe int* ifNull(return scope int* a, return scope int* b)
{
    return a? a: b;
}

The principle is: if the scope or return scope storage class is applied to an object in a parameter list, the address of the object instance is protected just as if the parameter were a pointer to the instance. From the perspective of machine code, it is a pointer to the instance.

From the point of view of regular functions, that’s all there is to it. What about member functions of a class or an interface? This is how it’s done:

interface Talkative
{
    @safe const(char)[] saySomething() scope;
}

class Duck : Talkative
{
    char[8] favoriteWord;
    @safe const(char)[] saySomething() scope
    {
        import std.random : dice;

        // This wouldn't work
        // return favoriteWord[];

        // This does
        return favoriteWord[].dup;

        // Also returning something totally
        // different works. This
        // returns the first entry 40% of the time,
        // The second entry 40% of the time, and
        // the third entry the rest of the time.
        return
        [
            "quack!",
            "Quack!!",
            "QUAAACK!!!"
        ][dice(2,2,1)];
    }
}

scope positioned either before or after the member function name marks the this reference as scope, preventing it from leaking out of the function. Because the address of the instance is protected, nothing that refers directly to the address of the fields is allowed to escape either. That’s why return favoriteWord[] is disallowed; it’s a static array stored inside the class instance, so the returned slice would refer directly to it. favoriteWord[].dup on the other hand returns a copy of the data that isn’t located in the class instance, which is why it’s okay.

Alternatively one could replace the scope attributes of both Talkative.saySomething and Duck.saySomething with return scope, allowing the return of favoriteWord without duplication.

DIP1000 and Liskov Substitution Principle

The Liskov substitution principle states, in simplified terms, that an inherited function can give the caller more guarantees than its parent function, but never fewer. DIP1000-related attributes fall in that category. The rule works like this:

  • if a parameter (including the implicit this reference) in the parent functions has no DIP1000 attributes, the child function may designate it scope or return scope
  • if a parameter is designated scope in the parent, it must be designated scope in the child
  • if a parameter is return scope in the parent, it must be either scope or return scope in the child

If there is no attribute, the caller can not assume anything; the function might store the address of the argument somewhere. If return scope is present, the caller can assume the address of the argument is not stored other than in the return value. With scope, the guarantee is that the address is not stored anywhere, which is an even stronger guarantee. Example:

class C1
{   double*[] incomeLog;
    @safe double* imposeTax(double* pIncome)
    {
        incomeLog ~= pIncome;
        return new double(*pIncome * .15);
    }
}

class C2 : C1
{
    // Okay from language perspective (but maybe not fair
    // for the taxpayer)
    override @safe double* imposeTax
        (return scope double* pIncome)
    {
        return pIncome;
    }
}

class C3 : C2
{
    // Also okay.
    override @safe double* imposeTax
        (scope double* pIncome)
    {
        return new double(*pIncome * .18);
    }
}

class C4: C3
{
    // Not okay. The pIncome parameter of C3.imposeTax
    // is scope, and this tries to relax the restriction.
    override @safe double* imposeTax
        (double* pIncome)
    {
        incomeLog ~= pIncome;
        return new double(*pIncome * .16);
    }
}

The special pointer, ref

We still have not uncovered how to use structs and unions with DIP1000. Well, obviously we’ve uncovered pointers and arrays. When referring to a struct or a union, they work the same as they do when referring to any other type. But pointers and arrays are not the canonical way to use structs in D. They are most often passed around by value, or by reference when bound to ref parameters. Now is a good time to explain how ref works with DIP1000.

They don’t work like just any pointer. Once you understand ref, you can use DIP1000 in many ways you otherwise could not.

A simple ref int parameter

The simplest possible way to use ref is probably this:

@safe void fun(ref int arg) {
    arg = 5;
}

What does this mean? ref is internally a pointer—think int* pArg—but is used like a value in the source code. arg = 5 works internally like *pArg = 5. Also, the client calls the function as if the argument were passed by value:

auto anArray = [1,2];
fun(anArray[1]); // or, via UFCS: anArray[1].fun;
// anArray is now [1, 5]

instead of fun(&anArray[1]). Unlike C++ references, D references can be null, but the application will instantly terminate with a segmentation fault if a null ref is used for something other than reading the address with the & operator. So this:

int* ptr = null;
fun(*ptr);

…compiles, but crashes at runtime because the assignment inside fun lands at the null address.

The address of a ref variable is always guarded against escape. In this sense @safe void fun(ref int arg){arg = 5;} is like @safe void fun(scope int* pArg){*pArg = 5;}. For example, @safe int* fun(ref int arg){return &arg;} will not compile, just like @safe int* fun(scope int* pArg){return pArg;} will not.

There is a return ref storage class, however, that allows returning the address of the parameter but no other form of escape, just like return scope. This means that @safe int* fun(return ref int arg){return &arg;} works.

reference to a reference

reference to an int or similar type already allows much nicer syntax than one can get with pointers. But the real power of ref shows when it refers to a type that is a reference itself—a pointer or a class, for instance. scope or return scope can be applied to a reference that is referenced to by ref. For example:

@safe float[] mergeSort(ref return scope float[] arr)
{
    import std.algorithm: merge;
    import std.array : Appender;

    if(arr.length < 2) return arr;

    auto firstHalf = arr[0 .. $/2];
    auto secondHalf = arr[$/2 .. $];

    Appender!(float[]) output;
    output.reserve(arr.length);

    foreach
    (
        el;
        firstHalf.mergeSort
        .merge!floatLess(secondHalf.mergeSort)
    )   output ~= el;

    arr = output[];
    return arr;
}

@safe bool floatLess(float a, float b)
{
    import std.math: isNaN;

    return a.isNaN? false:
          b.isNaN? true:
          a<b;
}

mergeSort here guarantees it won’t leak the address of the floats in arr except in the return value. This is the same guarantee that would be had from a return scope float[] arr parameter. But at the same time, because arr is a ref parameter, mergeSort can mutate the array passed to it. Then the client can write:

float[] values = [5, 1.5, 0, 19, 1.5, 1];
values.mergeSort;

With a non-ref argument, the client would have to write values = values.sort instead (not using ref would be a perfectly reasonable API in this case, because we do not always want to mutate the original array). This is something that cannot be accomplished with pointers, because return scope float[]* arr would protect the address of the array’s metadata (the length and ptr fields of the array), not the address of it’s contents.

It is also possible to have a returnable ref argument to a scope reference. Since this example has a unit test, remember to use the -unittest compile flag to include it in the compiled binary.

@safe ref Exception nullify(return ref scope Exception obj)
{
    obj = null;
    return obj;
}

@safe unittest
{
    scope obj = new Exception("Error!");
    assert(obj.msg == "Error!");
    obj.nullify;
    assert(obj is null);
    // Since nullify returns by ref, we can assign
    // to it's return value.
    obj.nullify = new Exception("Fail!");
    assert(obj.msg == "Fail!");
}

Here we return the address of the argument passed to nullify, but still guard both the address of the object pointer and the address of the class instance against being leaked by other channels.

return is a free keyword that does not mandate ref or scope to follow it. What does void* fun(ref scope return int*) mean then? The spec states that return without a trailing scope is always treated as ref return. This example thus is equivalent to void* fun(return ref scope int*). However, this only applies if there is reference to bind to. Writing void* fun(scope return int*) means void* fun(return scope int*). It’s even possible to write void* fun(return int*) with the latter meaning, but I leave it up to you to decide whether this qualifies as conciseness or obfuscation.

Member functions and ref

ref and return ref often require careful consideration to keep track of which address is protected and what can be returned. It takes some experience to get confortable with them. But once you do, understanding how structs and unions work with DIP1000 is pretty straightforward.

The major difference to classes is that where the this reference is just a regular class reference in class member functions, this in a struct or union member function is ref StructOrUnionName.

union Uni
{
    int asInt;
    char[4] asCharArr;

    // Return value contains a reference to
    // this union, won't escape references
    // to it via any other channel
    @safe char[] latterHalf() return
    {
        return asCharArr[2 .. $];
    }

    // This argument is implicitly ref, so the
    // following means the return value does
    // not refer to this union, and also that
    // we don't leak it in any other way.
    @safe char[] latterHalfCopy()
    {
        return latterHalf.dup;
    }
}

Note that return ref should not be used with the this argument. char[] latterHalf() return ref fails to parse. The language already has to understand what ref char[] latterHalf() return means: the return value is a reference. The “ref” in return ref would be redundant anyway.

Note that we did not use the scope keyword here. scope would be meaningless with this union, because it does not contain references to anything. Just like it is meaningless to have a scope ref int, or a scope int function argument. scope makes sense only for types that refer to memory elsewhere.

scope in a struct or union means the same thing as it means in a static array. It means that the memory its members refer to cannot be escaped. Example:

struct CString
{
    // We need to put the pointer in an anonymous
    // union with a dummy member, otherwise @safe user
    // code could assign ptr to point to a character
    // not in a C string.
    union
    {
        // Empty string literals get optimised to null pointers by D
        // compiler, we have to do this for the .init value to really point to
        // a '\0'.
        immutable(char)* ptr = &nullChar;
        size_t dummy;
    }

    // In constructors, the "return value" is the
    // constructed data object. Thus, the return scope
    // here makes sure this struct won't live longer
    // than the memory in arr.
    @trusted this(return scope string arr)
    {
        // Note: Normal assert would not do! They may be
        // removed from release builds, but this assert
        // is necessary for memory safety so we need
        // to use assert(0) instead which never gets
        // removed.
        if(arr[$-1] != '\0') assert(0, "not a C string!");
        ptr = arr.ptr;
    }

    // The return value refers to the same memory as the
    // members in this struct, but we don't leak references
    // to it via any other way, so return scope.
    @trusted ref immutable(char) front() return scope
    {
        return *ptr;
    }

    // No references to the pointed-to array passed
    // anywhere.
    @trusted void popFront() scope
    {
        // Otherwise the user could pop past the
        // end of the string and then read it!
        if(empty) assert(0, "out of bounds!");
        ptr++;
    }

    // Same.
    @safe bool empty() scope
    {
        return front == '\0';
    }
}

immutable nullChar = '\0';

@safe unittest
{
    import std.array : staticArray;

    auto localStr = "hello world!\0".staticArray;
    auto localCStr = localStr.CString;
    assert(localCStr.front == 'h');

    static immutable(char)* staticPtr;

    // Error, escaping reference to local.
    // staticPtr = &localCStr.front();

    // Fine.
    staticPtr = &CString("global\0").front();

    localCStr.popFront;
    assert(localCStr.front == 'e');
    assert(!localCStr.empty);
}

Part One said that @trusted is a terrible footgun with DIP1000. This example demonstrates why. Imagine how easy it’d be to use a regular assert or forget about them totally, or overlook the need to use the anonymous union. I think this struct is safe to use, but it’s entirely possible I overlooked something.

Finally

We almost know all there is to know about using structs, unions, and classes with DIP1000. We have two final things to learn today.

But before that, a short digression regarding the scope keyword. It is not used for just annotating parameters and local variables as illustrated. It is also used for scope classes and scope guard statements. This guide won’t be discussing those, because the former feature is deprecated, and the latter is not related to DIP1000 or control of variable lifetimes. The point of mentioning them is to dispel a potential misconception that scope always means limiting the lifetime of something. Learning about scope guard statements is still a good idea, as it’s a useful feature.

Back to the topic. The first thing is not really specific to structs or classes. We discussed what return, return ref, and return scope usually mean, but there’s an alternative meaning to them. Consider:

@safe void getFirstSpace
(
    ref scope string result,
    return scope string where
)
{
    //...
}

The usual meaning of the return attribute makes no sense here, as the function has a void return type. A special rule applies in this case: if the return type is void, and the first argument is ref or out, any subsequent return [ref/scope] is assumed to be escaped by assigning to the first argument. With struct member functions, they are assumed to be assigned to the struct itself.

@safe unittest
{
    static string output;
    immutable(char)[8] input = "on stack";
    //Trying to assign stack contents to a static
    //variable. Won't compile.
    getFirstSpace(output, input);
}

Since out came up, it should be said it would be a better choice for result here than ref. out works like ref, with the one difference that the referenced data is automatically default-initialized at the beginning of the function, meaning any data to which the out parameter refers is guaranteed to not affect the function.

The second thing to learn is that scope is used by the compiler to optimize class allocations inside function bodies. If a new class is used to initialize a scope variable, the compiler can put it on the stack. Example:

class C{int a, b, c;}
@safe @nogc unittest
{
    // Since this unittest is @nogc, this wouldn't
    // compile without the scope optimization.
    scope C c = new C();
}

This feature requires using the scope keyword explicitly. Inference of scope does not work, because initializing a class this way does not normally (meaning, without the @nogc attribute) mandate limiting the lifetime of c. The feature currently works only with classes, but there is no reason it couldn’t work with newed struct pointers and array literals too.

Until next time

This is pretty much all that there is to manual DIP1000 usage. But this blog series shall not be over yet! DIP1000 is not intended to always be used explicitly—it works with attribute inference. That’s what the next post will cover.

It will also cover some considerations when daring to use @trusted and @system code. The need for dangerous systems programming exists and is part of the D language domain. But even systems programming is a responsible affair when people do what they can to minimize risks. We will see that even there it’s possible to do a lot.

Thanks to Walter Bright and Dennis Korpel for reviewing this article

DIP1000: Memory Safety in a Modern Systems Programming Language Part 1

Memory safety needs no checks

D is both a garbage-collected programming language and an efficient raw memory access language. Modern high-level languages like D are memory safe, preventing users from accidently reading or writing to unused memory or breaking the type system of the language.

As a systems programming language, not all of D can give such guarantees, but it does have a memory-safe subset that uses the garbage collector to take care of memory management much like Java, C#, or Go. A D codebase, even in a systems programming project, should aim to remain within that memory-safe subset where practical. D provides the @safe function attribute to verify that a function uses only memory-safe features of the language. For instance, try this.

@safe string getBeginning(immutable(char)* cString)
{
    return cString[0..3];
}

The compiler will refuse to compile this code. There’s no way to know what will result from the three-character slice of cString, which could be referring to an empty string (i.e., cString[0] is \0), a string with a length of 1, or even one or two characters without the terminating NUL. The result in those cases would be a memory violation.

@safe does not mean slow

Note that I said above that even a low-level systems programming project should use @safe where practical. How is that possible, given that such projects sometimes cannot use the garbage collector, a major tool used in D to guarantee memory safety?

Indeed, such projects must resort to memory-unsafe constructs every now and then. Even higher-level projects often have reasons to do so, as they want to create interfaces to C or C++ libraries, or avoid the garbage collector when indicated by runtime performance. But still, surprisingly large parts of code can be made @safe without using the garbage collector at all.

D can do this because the memory safe subset does not prevent raw memory access per se.

@safe void add(int* a, int* b, int* sum)
{
    *sum = *a + *b;
}

This compiles and is fully memory safe, despite dereferencing those pointers in the same completely unchecked way they are dereferenced in C. This is memory safe because @safe D does not allow creating an int* that points to unallocated memory areas, or to a float**, for instance. int* can point to the null address, but this is generally not a memory safety problem because the null address is protected by the operating system. Any attempt to dereference it would crash the program before any memory corruption can happen. The garbage collector isn’t involved, because D’s GC can only run if more memory is requestend from it, or if the collection is explicitly called.

D slices are similar. When indexed at runtime, they will check at runtime that the index is less than their length and that’s it. They will do no checking whatsoever on whether they are referring to a legal memory area. Memory safety is achieved by preventing creation of slices that could refer to illegal memory in the first place, as demonstrated in the first example of this article. And again, there’s no GC involved.

This enables many patterns that are memory-safe, efficient, and independent of the garbage collector.

struct Struct
{
    int[] slice;
    int* pointer;
    int[10] staticArray;
}

@safe @nogc Struct examples(Struct arg)
{
    arg.slice[5] = *arg.pointer;
    arg.staticArray[0..5] = arg.slice[5..10];
    arg.pointer = &arg.slice[8];
    return arg;
}

As demonstrated, D liberally lets one do unchecked memory handling in @safe code. The memory referred to by arg.slice and arg.pointer may be on the garbage collected heap, or it may be in the static program memory. There is no reason the language needs to care. The program will probably need to either call the garbage collector or do some unsafe memory management to allocate memory for the pointer and the slice, but handling already allocated memory does not need to do either. If this function needed the garbage collector, it would fail to compile because of the @nogc attribute.

However…

There’s a historical design flaw here in that the memory may also be on the stack. Consider what happens if we change our function a bit.

@safe @nogc Struct examples(Struct arg)
{
    arg.pointer = &arg.staticArray[8];
    arg.slice = arg.staticArray[0..8];
    return arg;
}

Struct arg is a value type. Its contents are copied to the stack when examples is called and can be ovewritten after the function returns. staticArray is also a value type. It’s copied along with the rest of the struct just as if there were ten integers in the struct instead. When we return arg, the contents of staticArray are copied to the return value, but ptr and slice continue to point to arg, not the returned copy!

But we have a fix. It allows one to write code just as performant in @safe functions as before, including references to the stack. It even enables a few formerly @system (the opposite of @safe) tricks to be written in a safe way. That fix is DIP1000. It’s the reason why this example already causes a deprecation warning by default if it’s compiled with the latest nightly dmd.

Born first, dead last

DIP1000 is a set of enhancements to the language rules regarding pointers, slices, and other references. The name stands for D Improvement Proposal number 1000, as that document is what the new rules were initially based on. One can enable the new rules with the preview compiler switch, -preview=dip1000. Existing code may need some changes to work with the new rules, which is why the switch is not enabled by default. It’s going to be the default in the future, so it’s best to enable it where possible and work to make code compatible with it where not.

The basic idea is to let people limit the lifetime of a reference (an array or pointer, for example). A pointer to the stack is not dangerous if it does not exist longer than the stack variable it is pointing to. Regular references continue to exist, but they can refer only to data with an unlimited lifetime—that is, garbage collected memory, or static or global variables.

Let’s get started

The simplest way to construct limited lifetime references is to assign to it something with a limited lifetime.

@safe int* test(int arg1, int arg2)
{
    int* notScope = new int(5);
    int* thisIsScope = &arg1;
    int* alsoScope; // Not initially scope...
    alsoScope = thisIsScope; // ...but this makes it so.

    // Error! The variable declared earlier is
    // considered to have a longer lifetime,
    // so disallowed.
    thisIsScope = alsoScope;

    return notScope; // ok
    return thisIsScope; // error
    return alsoScope; // error
}

When testing these examples, remember to use the compiler switch -preview=dip1000 and to mark the function @safe. The checks are not done for non-@safe functions.

Alternatively, the scope keyword can be explicitly used to limit the lifetime of a reference.

@safe int[] test()
{
    int[] normalRef;
    scope int[] limitedRef;

    if(true)
    {
        int[5] stackData = [-1, -2, -3, -4, -5];

        // Lifetime of stackData ends
        // before limitedRef, so this is
        // disallowed.
        limitedRef = stackData[];

        //This is how you do it
        scope int[] evenMoreLimited
            = stackData[];
    }

    return normalRef; // Okay.
    return limitedRef; // Forbidden.
}

If we can’t return limited lifetime references, how they are used at all? Easy. Remember, only the address of the data is protected, not the data itself. It means that we have many ways to pass scoped data out of the function.

@safe int[] fun()
{
    scope int[] dontReturnMe = [1,2,3];

    int[] result = new int[](dontReturnMe.length);
    // This copies the data, instead of having
    // result refer to protected memory.
    result[] = dontReturnMe[];
    return result;

    // Shorthand way of doing the same as above
    return dontReturnMe.dup;

    // Also you are not always interested
    // in the contents as a whole; you
    // might want to calculate something else
    // from them
    return
    [
        dontReturnMe[0] * dontReturnMe[1],
        cast(int) dontReturnMe.length
    ];
}

Getting interprocedural

With the tricks discussed so far, DIP1000 would be restricted to language primitives when handling limited lifetime references, but the scope storage class can be applied to function parameters, too. Because this guarantees the memory won’t be used after the function exits, local data references can be used as arguments to scope parameters.

@safe double average(scope int[] data)
{
    double result = 0;
    foreach(el; data) result += el;
    return result / data.length;
}

@safe double use()
{
    int[10] data = [1,2,3,4,5,6,7,8,9,10];
    return data[].average; // works!
}

Initially, it’s probably best to keep attribute auto inference off. Auto inference in general is a good tool, but it silently adds scope attributes to all parameters it can, meaning it’s easy to lose track of what’s happening. That makes the learning process a lot harder. Avoid this by always explicitly specifying the return type (or lack thereof with void or noreturn): @safe const(char[]) fun(int* val) as opposed to @safe auto fun(int* val) or @safe const fun(int* val). The function also must not be a template or inside a template. We’ll dig deeper on scope auto inference in a future post.

scope allows handling pointers and arrays that point to the stack, but forbids returning them. What if that’s the goal? Enter the return scope attribute:

//Being character arrays, strings also work with DIP1000.
@safe string latterHalf(return scope string arg)
{
    return arg[$/2 .. $];
}

@safe string test()
{
    // allocated in static program memory
    auto hello1 = "Hello world!";
    // allocated on the stack, copied from hello1
    immutable(char)[12] hello2 = hello1;

    auto result1 = hello1.latterHalf; // ok
    return result1; // ok

    auto result2 = hello2[].latterHalf; // ok
    // Nice try! result2 is scope and can't
    // be returned.
    return result2;
}

return scope parameters work by checking if any of the arguments passed to them are scope. If so, the return value is treated as a scope value that may not outlive any of the return scope arguments. If none are scope, the return value is treated as a global reference that can be copied freely. Like scope, return scope is conservative. Even if one does not actually return the address protected by return scope, the compiler will still perform the call site lifetime checks just as if one did.

scope is shallow

@safe void test()
{
    scope a = "first";
    scope b = "second";
    string[] arr = [a, b];
}

In test, initializing arr does not compile. This may be surprising given that the language automatically adds scope to a variable on initialization if needed.

However, consider what the scope on scope string[] arr would protect. There are two things it could potentially protect: the addresses of the strings in the array, or the addresses of the characters in the strings. For this assignment to be safe, scope would have to protect the characters in the strings, but it only protects the top-level reference, i.e., the strings in the array. Thus, the example does not work. Now change arr so that it’s a static array:

@safe void test()
{
    scope a = "first";
    scope b = "second";
    string[2] arr = [a, b];
}

This works because static arrays are not references. Memory for all of their elements is allocated in place on the stack (i.e., they contain their elements), as opposed to dynamic arrays which contain a reference to elements stored elsewhere. When a static array is scope, its elements are treated as scope. And since the example would not compile were arr not scope, it follows that scope is inferred.

Some practical tips

Let’s face it, the DIP1000 rules take time to understand, and many would rather spend that time coding something useful. The first and most important tip is: avoid non-@safe code like the plague if doable. Of course, this advice is not new, but it appears even more important with DIP1000. In a nutshell, the language does not check the validity of scope and return scope in a non-@safe function, but when calling those functions the compiler assumes that the attributes are respected.

This makes scope and return scope terrible footguns in unsafe code. But by resisting the temptation to mark code @trusted to avoid thinking, a D coder can hardly do damage. Misusing DIP1000 in @safe code can cause needless compilation errors, but it won’t corrupt memory and is unlikely to cause other bugs either.

A second important point worth mentioning is that there is no need for scope and return scope for function attributes if they receive only static or GC-allocated data. Many langauges do not let coders refer to the stack at all; just because D programmers can do so does not mean they must. This way, they don’t have to spend any more time solving compiler errors than they did before DIP1000. And if a desire to work with the stack arises after all, the authors can then return to annotate the functions. Most likely they will accomplish this without breaking the interface.

What’s next?

This concludes today’s blog post. This is enough to know how to use arrays and pointers with DIP1000. In principle, it also enables readers to use DIP1000 with classes and interfaces. The only thing to learn is that a class reference, including the this pointer in member functions, works with DIP1000 just like a pointer would. Still, it’s hard to grasp what that means from one sentence, so later posts shall illustrate the subject.

In any case, there is more to know. DIP1000 has some features for ref function parameters, structs, and unions that we didn’t cover here. We’ll also dig deeper on how DIP1000 plays with non-@safe functions and attribute auto inference. Currently, the plan is to do two more posts for this series.

Do let us know in the comment section or the D forums if you have any useful DIP1000 tips that were not covered!

Thanks to Walter Bright for reviewing this article.

D News May ’22: D 2.100.0; GDC & LDC Releases; DConf ’22 Schedule Published & Early-Bird Registration Ends

May was a busy month in D land. Early on, a major milestone release of GDC, the GCC-based D compiler, hit the virtual shelves. It was followed in middle of the month by the release of D 2.100.0 along with a DMD release, the reference D compiler, of the same version. That was immediately follwed by a beta release of the LLVM-based D compiler, LDC, version 1.30.0. Finally, the latter half of the month saw the publication of the DConf ’22 schedule, we found a sponsor for the DConf tradition of BeerConf, and May 31st marks the final day of DConf ’22 early-bird registration.

A video version of this blog post is available on the D Language Foundation YouTube channel.

D 2.100.0

This latest release of DMD comes to us courtesy of 41 contributors who brought us 22 major changes and 179 fixed Bugzilla issues. Although the community attached a bit of significance to the 2.100.0 version number, there isn’t anything overly exciting in the changelog. This is largely a house-cleaning release—a number of deprecation periods that should have already ended have been terminated— but there are a couple of interesting additions to the language.

D1-style operator overloading

One of these is the deprecation of D1-style operator overloads. Originally, these were designed to make their purpose clear. Want to overload the addition operator? Then implement opAdd. What to overload the multiplication operator? Then implement opMul. Walter took this approach with operator overloading because of one of the major complaints about the feature in C++: people often overload an operator to do something different from what it is expected to do. An example: overloading the + operator to append rather than perform addition. Walter’s reasoning was that if the intent of the operator is included in the name of the function, then anyone overloading it to do something different is essentially violating its contract. Perhaps it would encourage people to stick to the intent.

No one can say for sure if Walter’s approach worked like he hoped, but a more generic design was implemented in D2, and this is the approach all D code must use today. The D1 operators were kept around largely to ease porting D1 code to D2, with the intention that they would one day be deprecated. It finally happened in D 2.088.0, which was released in the fall of 2019. Following the deprecation process, the deprecation period should have ended with 2.098.0 (the first release after 10 non-patch releases including the deprecation).

delete

The delete keyword was another D1 feature that was ultimately axed in D2. It was deprecated in D 2.079.0, which was released in the spring of 2018. This was something that had long been planned (see the deprecation page for the rationale), and its use had been discouraged for some time.

Ndelete would both destroy an object instance (call its destructors) and release the memory allocated for it by the GC. Now, we use the destroy function from the object module which is imported by default in all D programs. This will call the destructor on an instance and optionally reset the instance to its default .init state. The GC will then free the memory allocated for the instance when necessary, or the programmer can do it manually via GC.free static member function in core.memory.

@mustuse

Paul Backus took DIP 1038 through the review process from beginning to end. Initially, it introduced an @nodiscard attribute for functions and types. During the Formal Assessment after the review rounds were completed, Walter and Átila were willing to approve it with changes. The final version renamed the attribute to @mustUse and restricted its application to structs and unions.

The feature was implemented in D 2.100.0 as @mustuse, and is now available to use in your D code. When a type marked with the attribute is the result of an expression, the result cannot be ignored.

.tupleof for static arrays

Many D programmers are familiar with the .tupleof property of structs, which is particularly useful when interfacing with C libraries:

struct Circle {
    float x, y;
    float radius;
    ubyte r, g, b, a;
}

@nogc nothrow
extern(C) void draw_circle (
    float cx, float cy, float radius,
    ubyte r, ubyte g, ubyte b, ubyte a
);

void foo() {
    Circle c = makeCircle();
    draw_circle(c.tupleof);
}

Now we can do the same thing with static arrays:

void foo(int, int, int) { /* ... */ }

int[3] ia = [1, 2, 3];
foo(ia.tupleof); // same as `foo(1, 2, 3);

float[3] fa;
//fa = ia; // error
fa.tupleof = ia.tupleof;
assert(fa == [1F, 2F, 3F]);

DConf ’22

DConf ’22 is happening in London, August 1 4. If you haven’t registered yet and you’re reading this on or before May 31st, then register now to take advantage of the 15% early-bird discount. The schedule is online and BeerConf is a go!

DConf ’22 schedule

We love to see and hear first-time speakers at DConf, whether it’s their first conference talk ever or their first DConf talk. This year, we have 11 first-time DConf speakers, 12 if you include our invited keynote speaker Roberto Ierusalimschy (the head designer of the Lua programming language). This is awesome!

The DConf ’22 schedule is set up as follows:

  • three keynotes: two from the language maintainers, one from our guest speaker
  • two panels: the traditional DConf Ask Us Anything involving the language maintainers, and a panel on Programming Language Design
  • a Lightning Talks session
  • 15 presentations (11 of which are from first-time DConf speakers)

We’re limiting the talks to 45 minutes this year so that we’ll have more time to mingle between sessions. One of the talks on Day 3 is slated for 25–30 minutes, so we’ve slotted it such that we have a longer lunch that day.

The schedule (excluding the keynotes, as the details of those haven’t yet been provided) has a loose theme. It’s not perfect, but it’ll do:

  • Day One is mostly status reports and tutorials
  • Day Two is largely intermediate to advanced and heavy on the tech
  • Day Three is about the D ecosystem

All of the talks will be livestreamed and recorded, so they’ll be available on our YouTube channel at some point after the conference has ended. Still, DConf is about more than just the talks, as Razvan Nitu and Dennis Korpel noted in an interview. It’s about getting to know in person the people we encounter online in our regular D community interactions. As Razvan said and I can attest, your perspective will surely change after you can match the internet handles with living, breathing, human beings with whom you’ve interacted in person.

So register!

Early-bird registration ends

May 31st is the last day of early-bird registration. With the 15% discount and 20% VAT, the total is $423.30 USD. We also show the GBP equivalent on the site, based on the HMRC exchange rate for the current month, and accept payments in GBP through PayPal. On June 1st, the general registration rate of $498.00 USD (including 20% VAT) kicks in.

If you are a student, there’s a flat rate of $120.00 USD (including 20% VAT). Email social@dlang.org to take advantge of it.

We also offer a flat rate of $240.00 USD (including 20% VAT) for major open source contributors. The keyword here is major. It’s not something for which we can set specific criteria, and we don’t really want to provide examples that may discourage inquiries. If you would like to see if you qualify for this discount, please email social@dlang.org, and we’ll let you know.

Finally, we also offer a hardship rate. If you would like to attend DConf but can’t afford the registration, just email social@dlang.org and we’ll see about helping you out. We can’t help you with transportation, just the registration.

BeerConf

BeerConf is a DConf tradition going back to the very beginning, though we didn’t call it that back then. Every year, we would designate an “official” hotel somewhere in the vicinity of the venue. This would be our gathering spot in the evenings, usually in the hotel lobby or bar. Typically, would people break off into groups for dinner, then several of them would wander over to the gathering spot to hang out and chat, usually over beers. At DConf 2017, Ethan Watson branded this gathering BeerConf and the name has stuck.

At DConf 2019 in London, we couldn’t find a suitable hotel to select as the site of BeerConf. Instead, we hired out the upper floor of a pub close to the venue, thanks to the sponsorship of Mercedes Benz Research and Development North America. For DConf ’22, we’re back in the same general area, and so we again have to hire out a pub.

The 2019 pub was a bit crowded for us, and is a bit too far of a walk from our ’22 venue, so we’ve got our eyes on another pub within walking distance of the venue and near some of the budget hotels listed at dconf.org. What we’ve been missing is funding.

That has changed, thanks to Funkwerk! With their sponsorship, we’re able to cover the minimum spend the pub asks for the each of the evenings of August 1 3. This means that DConf attendees dropping by this pub on those nights can order food and drinks (alcoholic and or otherwise) for free until the DConf tab runs out. We’ll have a separate tab for each night so that we don’t blow it all in one go.

Unfortunately, I can’t announce the specifics about the pub just yet. Our DConf host, Symmetry Investments, is handling the arrangements for us since they’re in London and we aren’t. Once I receive confirmation that the deal is set, I’ll announce all of the details in the forums, here on the blog, and at dconf.org. So keep your ears open!

Thanks again to Funkwerk for helping us out.

Next time

The next big news roundup will come in late August or early September, but I’ll keep the blog updated with announcements before DConf as they come. If you are planning to attend DConf, then I’m looking forward to seeing you in London. And if you aren’t, then change your plans!

D News Jan-Mar 2022: SAOC 2021, D 2.099.0, DConf ’22

Digital Mars D logo

The first three months of 2022 brought some major milestones:

  • Symmetry Autumn of Code 2021 came to an end on January 15, but the judges didn’t render a decision until the middle of February. And what a surprise it was!
  • The D Language Foundation announced in January that we were hiring for a vacant position sponsored by Symmetry Investments, and in February we found the person to fill it.
  • Also in February, we made a long-awaited announcement regarding DConf.
  • In early March, D 2.099.0 was released.

That’s a pretty solid start to 2022, and most of it was made possible thanks to the generous contributions of Symmetry Investments. If you’re looking for a job, Symmetry is always hiring, including D programmers!

And now on with the news.

Symmetry Autumn of Code 2021

We started SAOC 2021 with five participants, each working on projects that would be of value to the D community. Three of them were unable to make it to the end. So it came down to two: Teodor Dutu and Luís Ferreira. Teodor was working on converting DRuntime hooks to templates, and Luís on getting support for D into LLDB, the LLVM debugger.

SAOC is sponsored by Symmetry Investments. Each year, participants promise to work on their projects at least 20 hours per week across four month-long milestones. At the end of each of the first three milestones, a panel of judges evaluates their progress to decide if they pass or fail. A passing participant is awarded a $1000 payment and allowed to continue in the next milestone. A failing participant might be given a reduced payment or none at all, and removed from the event or given a warning, depending on the circumstances leading to the failure. At the end of the fourth milestone, the judges evaluate the overall progress of each participant across the entire event and select one to receive a final $1000 payment and a free trip to DConf.

For the first time in four editions of the event, the SOAC 2021 judges were unable to agree on who should receive the final rewards. It was a three-judge panel, each of whom is a veteran of every edition of SAOC: Jon Colvin, Átila Neves, and Robert Schadek. Two of them split, and the third felt there wasn’t enough to make either of the two participants stand out above the other. Teodor and Luís both did their work, wrote detailed milestone reports, and kept up with their forum updates to the same degree. So the conflicted judge took a proposal to Laeeth Isharc of Symmetry: why not award both candidates the final payment and the DConf trip?

Congratulations to Teodor and Luís on being the first dual recipients of the final SAOC reward. They have continued working on their projects, and we look forward to seeing the work they do in the future. Thanks to all of the SAOC participants, mentors, and judges, and to Symmetry Investments for sponsoring the event every year.

The New Pull Request and Issue Manager

For over a year, Razvan Nitu has been working hard at closing Bugzilla issues and merging pull requests in his role as our Pull Request and Issue Manager. His position is sponsored by Symmetry Investments, which provided funding for two such positions. Unfortunately, real-world circumstances conspired to prevent the person selected for the second position from filling it, so it remained vacant through most of 2021.

At the beginning of this year, Symmetry committed to continuing funding for both positions (as well as a different position, that of my assistant, filled by Max Haughton). In January, we put out a call for applications. In February, we announced that Dennis Korpel was selected for the job. His proven track record as a volunteer contributor to the core D repositories made him the top contender.

Dennis officially started his new job on March 1, and he hit the ground running. We’re happy to have him on board.

Tell them about it–#dbugfix

Razvan and Dennis are here to make sure the bugs are fixed and pull requests are merged. If you have an issue that’s bugging you because it’s been open for ages, or if you feel like a pull request should be getting more attention, let them know! That’s what they’re here for.

One way you can do that is by tweeting the issue number along with #dbugfix. We initiated this hashtag a while back so that D users could bring attention to specific issues, but then the hard part was finding someone with the time and inclination to fix it. Now, with both Razvan and Dennis paid to make sure issues get fixed, the hard part is a lot easier. You can also post about issues in the forums or email social@dlang.org, and I will make sure that they see it.

Razvan and Dennis have their criteria for deciding their priorities in the absence of input, but if you bring an issue or PR to their attention, they will work to resolve it as quickly as they can.

D 2.099.0

Version 2.099.0 of DMD, the reference D compiler, was released on March 6. This is a massive release, containing 20 major changes and 221 closed Bugzilla issues from 100 contributors. Some highlights from this release: D modules can be imported into C code via ImportC; D now has throw expressions; and PE/COFF output is now the default in DMD on Windows. See the changelog for the complete list.

Import modules in C source code with ImportC

ImportC is proving to be a valuable addition to D. Once all the kinks are ironed out and a solution for handling C preprocessor directives is implemented, the need for bindings to C libraries will largely disappear—you’ll be able to bring C headers, and compile C source files, directly into your D programs without any external tools.

As of D 2.099.0, you can also bring D modules directly into C files via the __import keyword.

// dsayhello.d
import core.stdc.stdio : puts;

extern(C) void helloImport() {
    puts("Hello __import!");
}
// dhelloimport.c
__import dsayhello;
__import core.stdc.stdio : puts;

int main(int argc, char** argv) {
    helloImport();
    puts("Cool, eh?");
    return 0;
}

Compile with:

dmd dhelloimport.c dsayhello.d

You can also use it to import C modules that have been compiled via ImportC:

// csayhello.c
__import core.stdc.stdio : puts;

void helloImport() {
    puts("Hello _import!");
}
// chelloimport.c
__import csayhello;
__import core.stdc.stdio : puts;

int main(int argc, char** argv) {
    helloImport();
    puts("Cool, eh?");
    return 0;
}

Compile with:

dmd chelloimport.c csayhello.c

The throw expression has been implemented

For all of D’s lifetime, throw has been a statement and only a statement. It couldn’t be used in expressions because expressions must have a type, and since throw doesn’t return a value, there was no suitable type. This prevented it from being used with the following syntax:

(string err) => throw new Exception(err);

And required this form instead:

(string err) { throw new Exception(err); }

DIP 1034, which introduced a bottom type to the language, provided the means to enable throw expressions: when “a throw statement is seen as an expression returning a bottom type”. As of D 2.099.0, the following code snippet compiles:

void foo(int function() f) {}

void main() {
    foo(() => throw new Exception());
}

PE/COFF is the default DMD output on Windows

For many years, DMD outputs object files in the OMF format on Windows. There’s a story behind this, a large part of it related to the culture of software development on Windows, but it can be summarized in two bullet points:

  • Walter Bright already had a C compiler backend that generated OMF output, a license to distribute OMF link libraries for the Win32 API, and a linker that understands OMF (OPTLINK).
  • There was no de facto system linker on Windows when he started working on D in 1999, so he could not rely on a specific linker being installed.

Reusing the compiler backend and the linker allowed Walter to distribute DMD as a compiler that worked out of the box, without the need to install any further development tools. He felt this was important for D’s early adoption. The downside was that it also restricted DMD on Windows to 32-bit. Eventually, he had to support PE/COFF and require the Microsoft linker in order to support 64-bit output, and he implemented PE/COFF 32-bit at the same time, but he was adamant that DMD continue to work out of the box for those who didn’t want to install the Microsoft Build Tools (for the linker) and Windows SDK (for the Win32 link libraries).

Eventually, OPTLINK started showing its age. Linker errors became more common as D codebases grew. There were calls to enable PE/COFF by default. Finally, someone raised the idea of shipping the LLVM linker, LLD, along with link libraries generated from the MinGW project. This would allow DMD to eventually default to PE/COFF while maintaining the out-of-the-box experience.

DMD has been shipping with LLD for several releases, and it seems enough of the kinks have been worked out that it has been ready to become the default for a while now. Nicholas Wilson finally took the step to make that happen, Walter eventually gave it his blessing, and now PE/COFF is the default DMD output on Windows.

Practically, this means that the -m32mscoff switch has been deprecated, -m32 now specifies PE/COFF, and the new switch -m32omf can be used to produce OMF output if needed (but its OMF support will eventually be dropped). The -m64 switch has always produced PE/COFF output, so has not changed.

LDC

The beta release of LDC 1.29.0 was announced on March 10. This version of the LLVM-based D compiler is based on D 2.099.0+. It includes support for LLVM 13, no longer defaults to the ld.gold linker on Linux (LLD is recommended), and includes a breaking change for the extern(D) ABI. See the full release log for details.

DConf ’22 in London

After an unexpected and unwanted hiatus, DConf is returning to the real world! Hosted once again by Symmetry Investments, we’ll be in London, Aug 1–4, 2022. We’re currently accepting submissions and early-bird registration is open.

Guest keynote speaker

Our guest speaker this year is Roberto Ierusalimschy, Associate Professor at the PUC-Rio Department of Informatics and head designer of the Lua programming language. We’re excited that he’s able to join us. Several D community members have used or are using Lua in their D projects, including the gas dynamics toolkit at the University of Queensland that its maintainers wrote about on this blog. (You can also count me in that group. I’ve used Lua in different capacities over the years, and I maintain a set of D bindings for Lua’s C API).

Roberto was the mentor who shepherded the Origins of the D Programming Language paper through the HOPL IV conference, so he already has a connection to the D community.

I don’t know yet if his talk will be related to Lua, but I’m looking forward to hearing what he has to say.

Registration

Early-bird registration is open until May 31. The base early-bird rate is $352.75 ($423.30 after applying 20% VAT), which is 15% off the general registration of $415 ($498 with 20% VAT). We offer a student discount, a discount for major open source contributors, and a hardship rate. You can register now or learn about the discounted rates at dconf.org.

Talks

At past editions of DConf, we’ve allotted talks in 50-minute blocks with 10-minute breaks in between. This year, we’re cutting that down: we’d like to keep the talks no longer than 40–45 minutes. Part of the magic of DConf is the time spent interacting face-to-face with other D enthusiasts, so it only makes sense to make as much room for that as we can while still allowing for educational and informative presentations.

If you have something related to the D programming language that you’d like to share with the world, please send in a submission. Don’t know what to talk about? Then heed Ali Çehreli, from one of his DConf Online 2020 Q & A sessions:

Coming up with an idea for a talk is as simple as the way you use D. Just look at your code, and it makes a presentation…

If you have used the D programming language, then you have material for a talk: describe your project; talk about specific problems you solved or interesting ways in which you’ve employed language features; expound on the ups and downs of your experience learning D so that others can benefit; and so on. Take a look at the DConf and DConf Online talks available on our YouTube channel for inspiration. Even if you’ve never presented at a conference, we encourage you to send us a submission! Several D community members have given their first presentation at DConf, and we are always happy to see more.

The worst that can happen when you submit a talk is that it isn’t accepted. But if it is accepted, then you’ll be entitled to reimbursement for your transportation to and from London, and your lodging for the five nights of the conference. You get to hang out with people who share your interest in D and most of your expenses are covered, with nothing to lose if your talk isn’t accepted.

Don’t let doubt or hesitation hold you back. You can find submission details at dconf.org.

Venue

DConf ’22 is taking place a nifty venue between Moorgate and Liverpool Street Stations called CodeNode. All of our talks will be in their CTRL room on the first floor, and we’ll have the basement ESC room to ourselves for mingling between talks and during lunch. They have table tennis and foosball tables, and plenty of space in which to chill.

CodeNode isn’t far from our DConf 2019 venue, so the same budget hotels we stayed at then are also within walking distance this year. You can find a list of those and several other budget hotels in the area at dconf.org.

BeerConf!

For every edition of DConf before 2019, we designated one area hotel as the official gathering spot. Many attendees would take rooms there, and a number of us would gather in the evenings in the hotel lobby or bar to chat over drinks and snacks. In one of our Berlin editions, Ethan Watson coined the term “BeerConf” to refer to these evening meetups. In 2019, we couldn’t find a suitable hotel in which to gather, so we hired space in a pub near the venue. When DConf was canceled in 2020, a couple of community members hosted an online BeerConf to make up for the loss of the real-world version, and they’ve been hosting it every month since.

This year, since we’re back in the same part of London, we’re again looking for a space we can rent for BeerConf. We’ve got our eyes on a couple of spaces, and we’re working to secure funding. I hope to have an update on that before the end of April.

In the meantime, keep an eye on the D Announce forum for news of our monthly online version of BeerConf, and consider picking up a BeerConf shirt from our DLang Swag Emporium!

Looking ahead

We’re looking forward to the rest of 2022. One of our big goals for this year is to lay the groundwork for bringing more structure and organization to the D ecosystem. The PR/Issue managers have made a big difference and brought order to a chaotic contribution process, but we still have a long way to get to where we’d like to be.

Soon, I’ll start publishing tutorials on the foundation’s YouTube channel. These tutorials are going to cover more than just the language syntax and semantics. They’ll also dive into the tools we use as D programmers: compilers, linkers, loaders, object files, etc. These days, it’s not unsual for a programmer new to D to have gone years without ever touching a programming language that uses the same compile-link model. Questions about static linking errors, or confusion about compiler vs. linker errors, are not uncommon. These tutorials will be short and focused on specific topics, and will hopefully serve as a means for new D programmers to up their game with the tools they use.

Once I’ve uploaded the tutorials, I’ll apply for our channel to join the YouTube Partner Program so that we can start raising money from the channel. We’re eligible now, but I don’t want to apply until I’ve established a more frequent pattern of updates.

On that note, I’d like to remind you that the D Language Foundation is available to select as a charity for the Amazon Smile program. When you shop via smile.amazon.com, selecting the D Language Foundation as your preferred charity allows us to receive a small percentage of your payment. If you shop at Amazon, it’s an easy way to support the D Language Foundation. You can find browser extensions that will redirect you to smile.amazon.com every time you visit amazon.com, such as Amazon Smile Redirect, which is available for Chrome/Edge and for Firefox. (Amazon Smile charities are domain-specific, so the D Language Foundation is only available through Amazon’s .com domain).

You can also support us by shopping at the DLang Swag Emporium or donating directly via one of the options listed at dlang.org.

We can’t wait to see you in London!

Reducing Template Compile Times

Templates have been enormously profitable for the D programming language. They allow the programmer to generate efficient and correct code at compile time. Long gone are the days of preprocessor macros or handwritten, per-type data structures. D templates, though designed in the shadow of C++ templates, were not made in their image. D makes templates cleaner and more expressive, and also enables patterns like “Design by Introspection”.

Here is a simple example of a template that would require the use of preprocessor macros in C or C++:

template sizeOfTypeByName(string name)
{
  enum sizeOfTypeByName = mixin(name, ".sizeof");
}
unittest
{
  assert(sizeOfTypeByName!"int" == 4);
}

D’s templates are powerful tools but should not be used unthinkingly. Carelessness could result in long compile times or excessive code generation.

In this blog post, I introduce some simple concepts that can help in writing templates that minimize resource usage. Deeper intuition can also lead to the discovery of new abstractions or increased confidence in existing ones.

Read the memo

The D compiler memoizes template instantiations: if I instantiate MyTemplate!int once, the compiler produces an AST for that instantiation; if I instantiate that exact template again, the previous computation is reused.

As a demonstration, let’s write a generic addition function and use pragma(msg, ...) to print the number of instantiations at compile time. I’m going to use it twice with integers and twice with floating point numbers.

auto genericAdd(T)(const T x, const T y)
{
  pragma(msg, "genericAdd instantiated with ", T);
  return x + y;
}

// Instantiate with int
writeln(genericAdd!int(4, 5));
writeln(genericAdd!int(6, 1));

// Now for the float type
writeln(genericAdd!float(24.0, 32.0));
writeln(genericAdd!float(0.0, float.nan));

Now let’s compile the code and look at what our pragma(msg, ) says about template instantiations in the compiler.

dmd -c generic_add.d

This yields the following output during compilation:

genericAdd instantiated with int
genericAdd instantiated with float

We can see int and float as we expected, but notice that each is only mentioned once. Newcomers to languages with templates or generics can sometimes mistakenly think that using a template requires a potentially expensive instantiation on every use in the source code. For the benefit of those new users of D, the above is categoric proof that this is not the case; you cannot pay twice for templates you have already asked the compiler to instantiate. (You can, however, convince yourself that you are asking the compiler to do something it’s already done when you in fact are not. We’ll go over contrived and real-world examples of this later in this article.)

The benefit of this feature should be obvious, but what may not be obvious is how it can be employed in writing templates. Within the bounds of our desire for ergonomics, we should design the interfaces of our templates to maximize the number of identical instantiations.

What’s in a name?

The following example, adapted from a real-world change to a large D project, yielded a reduction in compile time of a few percent for unit-test builds.

Let’s say we have an expensive template whose behavior we want to test over a simple type. Our type might be:

struct Vector3
{
  float x;
  float y;
  float z;
}

To demonstrate the phenomenon, we don’t have to do anything fancy, so we’ll just declare a stub called send.

// Let's say this sends a value of type T to a database.
void send(T)(T x);

A note on syntax: Given a variable val of type int, this template could be explicitly instantiated as send!(int)(val). However, the compiler can infer the type T, so we can instantiate it as if it were a normal function call as send(val). Using D’s Uniform Function Call Syntax, we could alternatively call it like a property or member, as val.send() (the approach used in the following example), or even val.send, since parentheses are optional in function calls when there are no arguments.

Our test might then be something like:

struct Vector3
{
  float x;
  float y;
  float z;
}

unittest
{
  Vector3 value;
  value.send();
}

This is reasonable so far. However, an issue arises when we start to write more than one test. Should we want to test different behaviors of a fancy template, but instantiate it with the same type, then we end up spending more time in compilation than we would have expected. A lot more time. And we see large growth in the number of symbols emitted in the resulting binary, resulting in a larger file size than one would expect. Why is that?

Despite our intuition that the compiler should consider multiple declarations of a type like Vector3 in multiple unittest blocks as identical, it actually does not. We can demonstrate this effect with an extremely simple example. We’ll provide an implementation of send that prints at compile time the type of each instantiation. Then we’ll use static foreach to generate five distinct implementations of a single unit test.

void send(T)(T x)
{
  pragma(msg, T); 
}

// Generate 5 unittest blocks
static foreach(_; 0..5)
{
  unittest
  {
    struct JustInt
    {
      int x;
    }
    JustInt value;
    value.send;
  }
}

This results in the following output from the compiler:

JustInt
JustInt
JustInt
JustInt
JustInt

Huh? Doesn’t this violate our “you can’t pay twice” rule? If you were to take this output from the compiler as gospel, then yes, but there’s a more subtle truth here.

Fully qualified names

The name of a type as you would write it in your editor is not the complete name of a type. Let’s amend the implementation of send to print the return value of a template called fullyQualifiedName rather than printing T directly. The rest of the example remains the same.

void send(T)(T x)
{
  import std.traits : fullyQualifiedName;
  pragma(msg, fullyQualifiedName!T); 
}

Assuming the module is named example, this yields something like:

example.__unittest_L13_C3_1.JustInt
example.__unittest_L13_C3_2.JustInt
example.__unittest_L13_C3_3.JustInt
example.__unittest_L13_C3_4.JustInt
example.__unittest_L13_C3_5.JustInt

This explains our previous conundrum. By declaring the type locally in each test, we have actually declared a new type per test, each of which results in a new instantiation.

A type’s fully qualified name includes the name of its enclosing scope ({package-name.}module-name.{scope-name(s).}TypeName). The compiler rewrites each unittest as a unique function with a generated name. We have five unique functions, each with its own local, distinct declaration of a JustInt type. And so we end up with five distinct types.

We want to ensure that one instantiation is reused across unittest blocks. We do that by moving the declaration of JustInt to module scope, outside of the unit tests.

struct JustInt
{
  int x;
}

static foreach(_; 0..5)
{
  unittest
  {
    JustInt value;
    value.send;
  }
}

The send template now prints:

example.JustInt

Much better.

Some hard data

To collect some anecdata about the usefulness of these changes, we’ll look at compilation times and the size of compiled binaries. Since this template is very trivial, let’s generate a hundred copies of the same unittest rather than five so we can see a trend.

On my system, timing the compilation of our programs shows the locally declared types took 243ms to compile, but the version with a single global type declaration took 159ms to compile. A difference of 84ms is not all that much, sure, but in a large codebase, there may be a lot of these speedups waiting to be found. Any reduction in compile times is to be embraced, especially when it’s cumulative.

As for binary size, I saw a savings of 69K on disk. The quantity of machine code generated by the compiler is worth keeping a close eye on. Larger binaries mean more work for the linker, which in turn means more time waiting for builds to complete. The easiest job is the one you don’t have to do.

A more complex example

The following example demonstrates a very simple but fundamental change to a template that yields an enormous improvement in compile times and other metrics.

Let’s say we have a fairly simple interpreter, and we want to expose functions in our D code to the scripts executed by our interpreter. We can do that with some sort of registration function, which we’ll call register.

The signature of the register function

To prove the point I’m discussing, we don’t need to implement this function—its interface is what can cause a big slow down.

Let’s say our register function looks like this:

// Context is something our hypothetical interpreter works with
void register(alias func, string registeredName)(Context x); 

It’s pretty reasonable, right? It takes a template alias parameter that specifies the function to call (a common idiom in D) and a template value parameter of type string that represents the name of the function as it is exposed to scripts. The implementation of register will presumably map the value of registeredName to the func alias, and then scripts can call the function using that value. Functions can be registered with, e.g., the following:

context = createAContext();
context.register!(writeln!string, "writeln")();

The scripts can call the Phobos writeln function template using the name writeln.

The compile-time performance of the register template

The interface for register looks harmless, but it turns out that it has a significant impact on compile time. We can test this by registering some random functions. The actual contents of the functions don’t matter—this article is about template compile times, so we just want a baseline figure for roughly how much time the infrastructure templates take to compile rather than the code they are hooking together.

Although we will pull the functions out of a hat, the thing that will drive our intuition is to realize that a small number of interfaces will likely be reused many times. We could start with a basket of interface stubs like this:

int stub1(string) { return 1; }

int stub2(string) { return 2; }

int stub3(string) { return 3; }

// etc.

More broadly, with a bunch of functions that have identical signatures and a bunch of functions with random parameter lists and return types, we can get a rough baseline. With the set of stubs I used, compile times ended up at roughly 5 seconds.

So what happens if we move the compile-time parameters to run time? Since registeredName is a template value parameter, we can just move it into the function parameter list with no change. We have to handle the func parameter differently. Almost any symbol can bind at compile time to a template alias parameter, but symbols can’t bind at run time to function parameters. We have to use a function pointer instead. In that case, we can use the type of the referenced function as a template parameter.

void register(FuncType)(Context x, FuncType ptr, string registeredName);

With this signature, the compile time drops to roughly 1 second.

What’s going on?

D is a fairly fast language to compile. Good decisions have been made over the lifetime of the language to make that possible. It is also the case that one can happily write slow-to-compile D code. Although we are choosing to ignore the compilation speeds of the non-infrastructure code to simplify the point being made, this can actually (in a certain sense) be the case in real projects, too. As such it is worthwhile to pay attention not to the quantity of metaprogramming being done semantically but rather the quantity of metaprogramming being performed by the compiler.

In this case, with the first interface used for register, the compiler had no opportunity to reuse any instantiations. Because it accepted the registered functions as symbols, each instantiation was unique. By shifting instead to take the type of the registered function as a template parameter and a pointer to the function as a function parameter, the compiler could reuse instantiations. stub1, stub2, and stub3 are distinct symbols, but they each have the same type of (int function(string)).

To be clear, this is not an indictment of template alias parameters. There are good use cases for them (the Phobos algorithms API is an example). The point of this example is to show how the compile-time costs of unique template instantiations can be hidden. A decision about the trade-offs between compile-time and run-time performance can only be made if the programmer is aware there is a decision to make. So when implementing a template, consider how it will be used. If it’s going to end up creating many unique instantiations, then you can weigh the benefits of keeping that interface versus redesigning it to maximize reuse.

A false friend

In linguistics, a false friend is a pair of words from two different languages that look the same but have different meanings. I’m going to abuse this term by using it to refer to a pair of programming patterns that actually result in the same program behavior, but via different routes through the language implementation, i.e., one of these patterns has, say, worse performance or compile times than the other.

A simple example:

Let’s say we have a library that exposes a template as part of its API, like this one that prints a string when a module is loaded during run-time initialization:

template FunTemplate(string op)
{
  shared static this()
  {
    import std.stdio;
    writeln(op);
  }
}

// Use like this
mixin FunTemplate!"Hello from DLang";

Now let’s say we want to refactor the library in some way that it’s desirable to distinguish the name FunTemplate and its implementation. How would you go about doing that?

One way would be to tack Impl onto the implementation name, then declare an eponymous template that aliases the shortened name to the implementation name and forwards the template argument like this:

template FunTemplate(string op)
{
  alias FunTemplate = FunTemplateImpl!op;
}

This does the job, but it also creates an additional instantiation for each different value of op, one instance of FunTemplate, and one of FunTemplateImpl. So if we instantiate it with, e.g., five different values, we end up with ten unique instantiations. Now imagine doing that with a template that’s heavily used throughout a program.

Since we only want to provide an alternate name for the implementation and aren’t doing anything to the parameter list, we can achieve the same result without adding another template into the mix: just use alias by itself.

alias FunTemplate = FunTemplateImpl; 

Since FunTemplate is no longer a template, FunTemplate!"Foo" only creates the one instance of FunTemplateImpl.

Normalization of template arguments

Once we know what we want a template to look like, and we’re satisfied with the interface we want it to have, there are sometimes subtle ways to separate the interface and implementation of a template such that we can minimize the total amount of work the compiler has to do.

The definition of “work” in this context can be important to consider, as we can find ways to balance a tradeoff between compile times and the amount of object code generated for each instantiation. One technique to reduce these costs is by normalizing a given list of template arguments into something called a canonical form.

Canonical Forms

A canonical form, resulting from a process called canonicalization, is a mathematical structure that is intended to reduce multiple different-looking but identical objects into one form that we can then manipulate as we see fit. Using an automatic code formatter is an example of transforming input (in this case, source code) into a canonical form.

Application to templates

Consider a template like this one:

template Expensive(Args...)
{
  /* Some kind of expensive metaprogramming or code generation */
}

If we can think of a useful canonical form that isn’t too hard to compute, we can then write a second template Reduce to implement it, then inject it like in the following example.

template Expensive(Args...)
{
  // Reduce to some kind of canonical form
  alias reduced = Reduce!(Args);

  // Where ExpensiveImpl is the same as above but renamed
  alias Expensive = ExpensiveImpl!(reduced);
}

To be worth doing, ExpensiveImpl must be significantly more expensive than the reduction operation (pay attention to differences in sys time when measuring this), where “significant” is meant statistically rather than informally, i.e., any win is good as long as you can rigorously prove it’s real.

An example: sorting template arguments

Take a templated aggregate like this:

struct ExposeMethods(Types...)
{
  /* Some kind of internal state dependant on Types but not their order */

  static foreach(Type; Types) {
    bool test(Type x) { 
      /* Something slow to compile depending on Type */
    }
  }
}

If it’s instantiated with, e.g., five different types across a large codebase, we could spend a lot of time redoing semantically identical compilation. If all possible types are used as input many times, we could end up with a few permutations, and if not we will probably get a few identical subsets.

A canonical form that might come to mind (i.e., a potential definition of Reduce) is simply sorting the arguments by their names. This can be achieved via the use of staticSort.

Conclusion

D has powerful metaprogramming and code generation features. But like anything in programming, their use isn’t free. If you want to avoid the situation where you find yourself making coffee while your project builds, then it’s imperative to be aware of the cost vs. benefits of the metaprogramming features you use. Then you can make intelligent decisions about your compile-time interfaces and implementations.

Appendix – Tracing the D compiler to count template instantiations

Here’s a simple lesson in Linux userspace tracing: you can use a tool like bpftrace or DTrace to spy on the D compiler compiling other things, so we can get basic figures about the compilation of other D programs without either hacking the compiler or changing their build process.

You’ll need a bpftrace file like the following (saved as e.g., main.bt):

BEGIN
{
  printf("Tracing a D file\n");
}
uprobe:/home/mhh/dlang/dmd-2.097.0/linux/bin64/dmd:_Dmain
{
  printf("This is the main\n");
}
uprobe:/home/mhh/dlang/dmd-2.097.0/linux/bin64/dmd:_D3dmd10dsymbolsem24templateInstanceSemanticFCQBs9dtemplate16TemplateInstancePSQCz6dscope5ScopePSQDr4root5array__T5ArrayTCQEq10expression10ExpressionZQBkZv
{
  //We do nothing with the knowledge here but if you write some code you can get info about the templates relatively easily
  printf("Instantiating a template\n");
}

What am I hooking here? That big mangled name in the middle of the script is templateInstanceSemantic in the dsymbolsem module in the DMD source. By hooking it, we can get a rough idea of when a template is being worked on.

Running it with sudo bpftrace main.bt (eBPF tracing currently requires root) when building DMD, for example, I see there are about 50,800 template hits.

You can use a more complicated script in a system like bcc to reconstruct the compiler’s internal data structures. With that, we can get output a bit more like 09:05:17 69310 b'/home/mhh/d_dev/dmd/src/dmd/errors.d:85' and actually reconstruct the source/line info (alongside a timestamp and PID).

How I Taught the D Programming Language at a Russian University

This article was originally published in Russian by Grigorii Smorkalov. It was translated to English for the D Blog by Georgy Markov and lightly revised from the original by Michael Parker.

This is the fourth year I’m teaching my D Programming Language course at a very real university in Russia. It’s a full-term course with lectures, practical lessons, and exams, although it’s all remote now. This is the story about how I got there, the challenges I encountered, and how students sometimes surpass their teachers.

What’s in D for university students

The job market for D is very small. As I always say during the first lesson, it’s unlikely that students are going to write D code for a salary, but that doesn’t mean that learning D is useless. Firstly, it’s much easier to learn how to program with D than with C or C++. This is important because many students don’t know how to program even after a full C/C++ course. Secondly, a broader outlook makes for better code. My familiarity with D improved my C++ skills and made it much easier to learn Python, especially its iterators. Most importantly, D is the future—of C++ and beyond. Many C++11/17/20 novelties were first battle-tested in D, and even today D is a much more modern and feature-rich language than C++.

Who am I and what’s in D for me

For simplicity, I would say that I am a C++ programmer. This is my main line of work. Before this course, I’d never been a teacher of any sort. Even on my job, I’m not involved much in mentoring. After earning my master’s degree in Computational Mathematics and Cybernetics at UNN (Lobachevsky State University of Nizhny Novgorod) in 2014, I had no relation to academics at all.

In my first months of university, I entertained the thought of being a school teacher. Now I realize that this was just a call for justice of sorts. There is a stark difference between a high school and a university, and for me, the latter was a much better experience. It felt like in just one month at university I learned more than in one term in high school, and that was so cool. Why couldn’t they explain it this way back in high school, was my common thought. If only educators applied the same educational methods in a regular school, it would make the experience much better. My desire to become a teacher vanished the second I received my first paycheck as a programmer (in Russia, ‘programmer’ is a very well-paid job and ‘school teacher’ is the total opposite), but some memory of that desire remained.

This was also the time when I became interested in D. Compared to C++ it looked like a perfect programming language. You can write code that would be as fast, but without all those C atavisms. I used D for my master’s thesis, and I loved it. My program was twice as small and simple as the older C++ version while performing better. Implementing complex and more efficient algorithms in D was much easier; doing the same in C++ would be too much work, and, like any student, I always struggled with my deadlines.

Since then I’ve followed D and used it for my little pet projects. I’ve always wanted to help the D community in some way, but I couldn’t write any useful library, nor could I find the motivation for contributing to open source.

Beginnings

In 2018, an unusual offer surfaced on the D mailing list: does anyone by any chance want to teach students in Moscow, Russia?

The initiator was Dmitry Olshansky, a well-known member of the D community, the creator of std.regex and more. I contacted him and said I was interested. I didn’t see myself as a full-fledged lecturer and expected to be just an assistant. At the beginning that was the plan, with someone else acting as a lecturer. He was going to give lectures remotely via Skype, and I would assist him on site.

To my surprise, the university in question was RSUH, Russian State University for the Humanities. As it turned out, they do have technical faculty there, and the students do actually code. I checked their program: D was introduced for third-term students, and during the first two terms they learned C, C++, Prolog, and even some Lisp, I think (a bit too much, but why not). Their math course was solid, too (yes, I am among those who think that math is important for programmers).

Preparations

I was introduced to the department staff. They explained everyone’s responsibilities and even offered the opportunity to join in scientific work as a programmer. We started working on the course program, although I barely included myself in that “we”. That was a mistake. With one month left until the classes started, the lecturer was suddenly leaving us. The news took me by surprise, but… there was still plenty of time, right?

This was happening when it was time for me to complete all the formalities and start work. Though I knew they were hiring only me, for some reason I was still under the impression that I wouldn’t be alone. I can’t say why I thought so. Everything was saying that there would be no help and that I had to do the whole course by myself, but my impression was hard to shake. The grave realization only came one week before Day One. Only then did I start to prepare for real.

Bureaucracy

This is supposed to be the part about the trials and tribulations of the endless bureaucracy awaiting a poor programmer’s soul. The amount of paperwork required to sign the contract was indeed an entertaining story to tell to my peers. And everyone was, as they say, “rolling on the floor laughing” when I described applying for a Mir payroll card (Mir is the Russian national payment system mandated in the state-funded sector). But that’s it, actually.

The next bureaucratic task was composing the formal program, an official paper including the course program and the materials. There were indeed a lot of formalities there, but I got some help: they showed me the paper for a very similar course. At the end of the day, it was easier than I expected. For this, I should thank the university staff. It was a one-year contract. Renewing it every year only takes one piece of paper and a couple of pen strokes.

The first class

The schedule was set up such that my first class happened to be a seminar (a.k.a. a practical lesson) instead of a lecture. This is only on paper, though with only one group consisting of just 14 people it didn’t make any difference. The schedule was adjusted later. I got two classes in a row and could decide which was a lecture and which was a seminar. I came up with the following arrangement: in the first class, I would lecture and answer general questions, and in the second, the students would program and ask practical questions.

For the first lesson, I prepared a brief description of the language, a syntax overview, a compiler, and several problems to solve: calculating the area of a triangle, solving a quadratic equation, and similar problems that yield a simple output for simple input. The idea was to immerse students in the language and immediately give them something to do with it. The plan was a success. By the end of the class, most of the students had gotten the hang of using a command-line compiler, written some code in a text editor, and solved some problems.

Notepad and the command line

I bet many people would take issue with the command line and text editor part. Seriously? No IDE? IDEs for D exist, but I left it up to the students if they wanted to use them. The reason was my own experience in learning C++ and programming in general. Knowing how a compiler works and how to link several files into a project is integral to understanding the language as a whole.

This is especially true for C and C++. Things like the difference between declaration and definition lose their meaning without understanding the build process. In D, such nuances are fewer. For example, header files are not required. Still, I meant to teach how to program in D, not how to use an IDE. Things like the principles of import are easier to grok when using a command-line compiler rather than some “intuitive interface”. And you learn the syntax faster if you write the code by hand without autocomplete doing it for you.

Further development

Lifted by the success of the first seminar, I was slammed to the ground by the first lecture. It contained the full theoretical explanation of type systems and their various types, compared D with other languages, brought out the problems of C and C++, and demonstrated what makes D different. It was a total failure.

I expected the lecture to take the whole 80-minute class, including questions, but it finished in less than an hour without a single question during or after the speech. I even asked if the students couldn’t hear or understand me, or if there was a problem with my diction. But the problem was with the lecture itself. First, it was too much for one class. Second, the lecture was based on the talks I did for my job that were intended for seasoned programmers. I realized that everything that I’d prepared for my future lectures must be tossed aside and rewritten from scratch.

The new program

The first candidate for simplification, and by that I mean expunging, was metaprogramming. Getting rid of templates was impossible since even the most basic language features and algorithms are tied to them, but code generation of all sorts was the first to be removed. Following that was anything that required external libraries. What was left were things that D code can’t be written without.

Since I had better success with live communication during actual programming, I decided to focus on that. Rewriting the course on the fly was tricky, but I came up with a solid plan: throughout the semester we would write a complex calculating program, during lectures we would study language features required for a given task, and then during seminars the ideas would be transformed into code.

The semester assignment

If you’re a nerd like me, you’ve probably heard of the 10,958 problem. The gist of it is that you need to put signs and parentheses between numbers composed of the digits 1 to 9 so that the result would be exactly 10,958. The solution exists for any number up to 11,000, except 10,958. There’s no proof that it doesn’t exist either, hence the 10,958 problem.

I gave the students an assignment to write a program that would find the solution, brute-forcing any possible combination of signs and parenthesis. All calculations were done with double instead of some sort of bignum, so it’s not a real solution, but it’s simpler this way.

I had several reasons to think that this was a well-fitting problem. First, I could write a solution I expected to see from the students in five hours or two evenings. Compensating for the students’ level, it looked like a good project for a semester-long assignment. Second, the solution isn’t too straightforward. Simple brute force would take too much time so you need to cut off the equal variants in the beginning. And third, I was fascinated by this problem myself, so I thought the students would feel the same. I couldn’t be more wrong.

Not only were a majority of students not into such mathematics, most of them couldn’t even grasp what the problem was about and what this 10,958 was for. I failed to get them interested.

When I realized the problem, it was too late to change anything. So the first semester wasn’t very engaging. Only a few students could finish the assignment to its fullest. For the rest, it was impossible.

The second semester

Since it was a full-term course, I had time to make up for it. For the second semester, I tried to come up with something practical and interactive. I gave the students an assignment to write a game. They were learning networking, so it was to be a multiplayer game. I recalled playing “tic-tac-toe on an infinite field” with my peers during boring lectures. The smarter name for this game is Gomoku. Two players can play online, and the game logic is simple. I was hoping to spark students’ interest. This assignment turned out much better than the previous one, but I still can’t call it a full success.

I really wanted to show them how coroutines (fibers, as they are called in D) make async programming much more manageable, how nice the vibe.d framework is, and how easy it is to use external libraries with the dub package manager. Lesson One: don’t try to sell coroutines to those who don’t know what callback hell is. Lesson Two: always keep hardware limitations in mind.

Problems out of nowhere

I would never have thought that single-threaded compilation could be thwarted by the memory limit. It happened because the computers at the university were equipped with only 2 GB of memory, and some students’ netbooks worked on a mere 1 GB. Simple programs build just fine even on weak machines—D is much better than C++ in terms of compilation time—but a large framework like vibe.d requires a lot of memory for its compilation. Now imagine, I was just telling them how everything is so easy-peasy and then half the students couldn’t even compile a networked version of Hello World.

In my defense, I checked everything in advance. I set up a virtual machine with 2 GB of memory and made sure that it worked with the special compiler flags. Theoretically, I was prepared. But dealing with a new library and a new build system and new compiler flags all at once was just too much for the students. Their brains were getting DoS’d and shut down. So even though I demonstrated how to compile a program on a low-memory machine, I still had to explain to them individually why the usual method of compilation wasn’t working. For those who had only 1 GB of memory, I didn’t even have a ready solution.

But still, the results of the second semester were much better: absolutely everyone could write the client side of the game. Some had problems with coroutines on the server side, but in general, they got this part just fine. For me, that wasn’t enough. So I gave them a supertask: write a simple AI for the game. This would help them understand the advantages of the client-server architecture. They had to realize that an AI is just another client, so the server code should be left as it is, and on the client side, the only required modification was move polling. This was a good problem on architecture design, suitable for students who already had gotten a grip on programming.

We also had a little contest. I invited the students to play some code golf, solving a simple problem with the smallest program possible. For encouragement, I promised to free the best-achieving students from writing a semester report. And if anyone could beat my solution, they would pass the semester examination automatically (in Russian universities, teachers are allowed to set arbitrary conditions, like participating in side projects, for passing the semester without the usual examination).

As I expected, nobody could beat me, though a couple of students came up with some interesting tricks, like writing 10 instead of "\0" to save two symbols. Some would say this is an abuse of the type system, but clever hacks like this speak to a student’s knowledge.

The results of the first year

I was asking too much for a single semester; only one student could write an AI that kind of worked (she said that the first time it made a sensible move, it made her really happy). Another student who complained the whole year that programming was not her thing and that she couldn’t understand anything managed to write her own client and server. At the end of the day, all was not for nothing.

The second try

Everything described until this point happened during the 2018/2019 academic year. The higher-ups had no issues with the course, and I was able to continue on the following year. New students, new possibilities, a new program. This time I was much better prepared. I had materials for most of the lectures, no more need to redo everything on the fly, and during the summer I had time to fix the problems with the course.

The 10,958 problem was expelled on the charge of being boring. Instead, the Gomoku game was expanded, now lasting almost the whole term. “Almost” because the first assignment was a problem regarding OOP: students had to implement various geometrical shapes as classes and draw them on screen with some customization. I made this the first step after Hello World so everyone could learn how to use the tools, and so I could judge their level.

It’s not surprising that students are very different. Some are already working as programmers and attend the course out of curiosity, while some still struggle with variables and loops. From the beginning, I decided that my course would be not just for everyone, but for those who are interested. However, disregarding the others would be wrong, so we needed assignments for different levels.

To jump-start the network study, I set up a server with the reference implementation, and everyone could connect to it and play. Even telnet would do. Allotting more time for an assignment and providing better explanations served students well. Many could finish the supertask. We even had a little contest between AIs. A human could still beat them with ease, but it was still a big improvement over the previous year.

A pleasant surprise

I had the same code golf contest again. And again, I promised to give a free pass to whoever could beat me. I’m sure you can see where this is going, but first I must explain the problem in detail. The task was to implement a function, challenge, defined as follows:

import std;

/**
* Params:
* s = Multiline string, each line containing positive integers separated by whitespace.
* Returns: An array containing the sums of each line and the grand total as the last element.
*/
uint[] challenge(string s);

unittest {
    assert(challenge("0") == [0, 0]);
    assert(challenge("1\n1") == [1, 1, 2]);
    assert(challenge("2\n2\n3") == [2, 2, 3, 7]);
    assert(challenge("2\n2 0 3\n3 1 1 4") == [2, 5, 9, 16]);
}

Only the symbols inside the function body count. Using any Phobos functionality is allowed, but no renamed imports.

My solution from the previous year was pretty straightforward:

auto r=s.split('\n').map!(a=>a.split.map!(i=>i.to!uint).sum); return r.array~r.sum; // 84 characters

I didn’t show them my solution, I just told them its character count. I was absolutely sure that, just like the previous year, nobody could beat it. Imagine my shock when I was outdone in a mere week:

auto r=split(src,"\n").map!"sum(map!(to!uint)(a.split))".array;return r~[sum(r)]; // 82 characters

Even with the unneeded brackets, this solution was better, thanks to the shortened map syntax: passing a lambda as a string in the first case and passing to directly in the second. The latter was just my oversight, but the former was a typical teacher’s mistake. You see, back when I first wrote my solution, it wasn’t possible to pass a string to map like that. I don’t remember well what the problem was, something about scopes. I missed that it was fixed at some point. That’s how in just one year you become an old geezer teacher who can’t keep up with the times.

I stayed true to my word and granted a free pass to the resourceful student at the end of the semester. I was concerned that securing the pass in the middle of the semester would demotivate him to attend classes, but fortunately, I was wrong.

COVID-19 strikes

Of course, I can’t omit how the pandemic impacted the educational system. In the spring of 2020, all classes were moved online. There are some pros and cons to this.

Giving lectures remotely turned out to be very convenient. You can mutually agree on a suitable time, and you don’t need to find a free lecture room. Another very good thing is that students can either ask questions vocally—as when they raise their hands offline—or write them in the chat so that the teacher can answer them when it’s most appropriate. I really think that this system works very well.

Practical lessons are problematic though, especially with those students who aren’t involved enough. When they sit in a classroom they at least do something. Why show up at all if you’re not going to do anything? When it’s online, they just skip class. The chemistry of a group coding session with a teacher ready to help won’t kick in. I encouraged them to ask questions not only during class but at any time, hoping that this would allow me to assist them whenever they are coding. But this only worked for a couple of people.

Lessons of the second year

The second year was better than the first but still less than ideal. I saw no need to modify the course, but its presentation had to be addressed.

We need to tear down this wall in communication. Mutual trust between the teacher and the student makes for a better educational process. It’s difficult to work with those who are wary of the teacher even when doing alright. I don’t yet know a robust solution; each case is tackled individually.

Even those who are doing well need some control. I used to think that attendance scrutiny is for those students who don’t really want to learn, to intimidate them to show up. When I was a student, the best teachers never bothered about absentees. So I too was liberal with these things, even when half the group was missing. But everything has its limits. Bad students will skip classes anyway, but lazy B-graders could benefit from a little scolding.

Fast forward to the present

I don’t have much to say about the last year-and-a-half. Teaching D is now part of my life. Due to the pandemic, we’ve had to move online completely. I couldn’t even meet my students face-to-face. Aside from what I said earlier about how this affects the educational process, this also disrupted one of the ideas I had.

I need some way to track students’ progress to be sure that they actually follow the course and don’t just show up for classes. The way I handled this during the previous two years worked for some students: if they had a problem, they asked questions during a lecture or a practical lesson. But some students just keep quiet when they have problems, so I need some means to identify them. I couldn’t do written tests for programming; that would be nonsense. So during the summer break, I came up with the idea of doing some small quizzes if COVID restrictions were to be lifted. These quizzes would affect the final grade, but not much, as the intention behind them was to help students who are having problems, not to be the final straw that breaks the camel’s back. Unfortunately, remote lessons made this nearly impossible.

My students keep surprising me, coming up with newer and better solutions to the code golf puzzle. Unfortunately, I don’t have the exact code of the student who won in the third year (it appears that she deleted her Github account), but it was something like:

auto r=s.split('\n').map!(a=>a.split.to!(uint[]).sum); return r.array~r.sum; // 74 characters 

Again, it was a feature I didn’t know about: to converts between arrays, too.

I thought that this problem was done and that there was no room for further improvement. Oh, how wrong I was. This is what they’ve brought me this year:

return(s.split('\n')~s).map!(a=>a.split.to!(uint[]).sum).array; // 63 characters

This completely destroys my solution with all the improvements! Note that the trick is not in the syntax, but in the logic. This works by concatenating the split lines with the original string, making it unnecessary to declare an intermediate array, which allows implementing this function as a single statement. I would never have thought to do that! Algorithms win over micro-optimizations, even in code golf.

The biggest change this year is a new, formalized evaluation system. In the semester, students earn points for doing assignments and writing reports on them. The first and simplest assignment is worth 5 points, and the last and hardest is worth 30. The maximum number of points a student can score without participating in code golf is 100. Code golf is scored by the formula 110 − length. The code golf winner this year got 47 points for his solution which earned him an exemption from writing any reports. We have a table listing every student’s points so everybody knows how many points they need to score. Everything is very transparent, so I don’t need to worry about not being objective when evaluating students.

A Gas Dynamics Toolkit in D

The Eilmer flow simulation code is the main simulation program in our collection of gas dynamics simulation tools. An example of its application is shown here with the simulation of the hypersonic flow over the BoLT-II research vehicle that is to be flown in 2022.

BoLT-II simulation with steady-state variant of the Eilmer code. Flow is from bottom-left to top-right of the picture. Only one quarter of the vehicle surface, coloured grey, is shown. Several slices through the flow domain are coloured with the local Mach number, with blue for low values and red for high values. Several streamlines, drawn in black, start at the blunt leading edge of the vehicle and follow the gas flow along the vehicle surface. Image produced by Kyle Damm.

Some history

This simulation program, originally called cns4u, started as a relatively small C program that ran on the Cray-Y/MP supercomputer at NASA Langley Research Center in 1991. A PDF of an early report with the title ‘Single-Block Navier-Stokes Integrator’ can be found here. It describes the simple finite-volume formulation of the code that allows simulation of a nonreacting gas on a single, structured grid. Thirty years on, many capabilities have been added through the efforts of a number of academic staff and students. These capabilities include high-temperature thermochemical effects with reacting gases and distributed-memory parallel simulations on cluster computers. The language in which the program was written changed from C to C++, with connections to Tcl, Python and Lua.

The motivation for using C++ in combination with the scripting languages was to allow many code variations and user programmability so that we could tackle any number of initially unimagined gas-dynamic processes as new PhD students arrived to do their studies. By 2010, the Eilmer3 code (as it was called by then) was sitting at about 100k lines of code and was growing. We were, and still are, mechanical engineers and students of gas-dynamics first and programmers second. C++ was a lot of trouble for us. Over the next 4 years, C++ became even more trouble for us as the Eilmer3 code grew to about 250k lines of code and many PhD students used it to do all manner of simulations for their thesis studies.

Also in 2010, a couple of us (PAJ and RJG) living in different parts of the world (Queensland, Australia and Virginia, USA) came across the D programming language and took note of Andrei Alexandrescu’s promise of stability into the future. Here was the promise of a C++ replacement that we could use to rebuild our code and remain somewhat sane. We each bought a copy of Andrei’s book and experimented with the D language to see if it really was the C++-done-right that we wished for. One of us still has the copy of the initial printing of Andrei’s book without his name on the front cover.

Rebuilding in D

In 2014 we got serious about using D for the next iteration of Eilmer and started porting the core gas dynamics code from C++ to D. Over the next four years, in between university teaching activities, we reimplemented much of the Eilmer3 C++ code in D and extended it. We think that this was done to good effect. This conference paper, from late 2015, documents our effort at the initial port of the structured grid solver. (A preprint is hosted on our site.) The Eilmer4 program is as fast as the earlier C++ program but is far more versatile while being implemented in fewer lines of code. It now works with unstructured as well as structured grids and has a new flexible boundary condition model, a high-temperature thermochemistry module, and in the past two years we have added the Newton-Krylov-accelerated steady-state solver that was used to do the simulation shown above. And importantly for us, with the code now being in D, we now have have many fewer WTF moments.

If you want more details on our development of the Eilmer4 code in D, we have the slides from a number of presentations given to the Centre for Hypersonics over the past six years.

Features of D that have been of benefit to us include:

  • Template programming that other Mechanical Engineers can understand (thanks Walter!). Many of our numerical routines are defined to work with numbers that we define as an alias to either double or Complex!double values. This has been important to us because we can use the same basic update code and get the sensitivity coefficients via finite differences in the complex direction. We think this saved us a large number of lines of code.

  • String mixins have replaced our use of the M4 preprocessor to generate C++ code in Eilmer3. We still have to do a bit of head-scratching while building the code with mixins, but we have retained most of our hair—something that we did not expect to do if we continued to work with C++.

  • Good error messages from the compiler. We often used to be overwhelmed by the C++ template error messages that could run to hundreds of lines. The D compilers have been much nicer to us and we have found the “did you mean” suggestions to be quite useful.

  • A comprehensive standard library in combination with language features such as delegates and closures that allow us to write less code and instead concentrate on our gas dynamics calculations. I think that having to use C++ Functors was about the tipping point in our 25-year adventure with C++.

  • Ranges and the foreach loops make our D code so much tidier than our equivalent C++ code.

  • Low-barrier shared-memory parallelism. We do many of the flow update calculations in parallel over blocks of cells and we like to take advantage of the many cores that are available on a typical workstation.

  • Simple and direct linkage to C libraries. We make extensive use of Lua for our configuration and do large simulations by using many processors in parallel via the OpenMPI library.

  • The garbage collector is wonderful, even if other people complain about it. It makes life simpler for us. For the input and output, we take the comfortable path of letting the compiler manage the memory and then tell the garbage collector to tidy up after us. Of course, we don’t want to overuse it. @nogc is used in the core of the code to force us not to generate garbage. We allocate much of our data storage at the start of a simulation and then pass references to parts of it into the core functions.

  • Fast compilation and good optimizing compilers. Nearly an hour’s build time was fairly common for our old C++ code, and now we would expect a DMD or LDC debug build in about a quarter of a minute. This builds a basic version of the main simulation code on a Lenovo ThinkPad with Core i7 processor. An optimized build can take a little over a minute but the benefit of the faster simulation is paid back by orders of magnitude when a simulation job is run for several hours over hundreds of processors.

  • version(xxxx) { ... } has been a good way to have variants of the code. Some use complex numbers and others are just double numbers. Also, some variants of the code can have multiple chemical species and/or just work with a single-species nonreacting gas. This reduction in physical modelling allows us to reduce the memory required by the simulation. For big simulations of 3D flows, the required memory can be on the order of hundreds of gigabytes.

  • debug { ... } gets used to hide IO code in @nogc functions. If a simulation fails, our first action is often to run the debug-flavor of the code to get more information and then, if needed, run the debug flavor of the code under the control of gdb to dig into the details.

We have a very specialized application and so don’t make use of much of the software ecosystem that has built up around the D language. For a build tool, we use make and for an IDE, we use emacs. The D major mode is very convenient.

There are lots of other features that just work together to make our programming lives a bit better. We are six years in on our adventure with the D programming language and we are still liking it.

The Binary Language of Moisture Vaporators

Digital Mars D logo

I know why you’re reading this. Like other Alpha programmers, you’re not content with just compiling Vaporator code and testing to see if it works. You need to know the binary code that’s generated. But getting at it is clumsy. I want to make it easy for myself, and why not share it?

One of my earliest memories is being curious about how light bulbs worked and sticking my finger in a hot light bulb socket. I was three or four years old at the time. Later, I was always taking things apart to see how they worked. It was years before I could successfully put them back together again. I remember being baffled at the grey dust inside a resistor I cracked open and when I unwrapped the paper in a capacitor. I took my first car to pieces to see how it worked.

When I first learned Fortran, it was a great mystery how the text of the language turned into machine code. Machine code was the language of the gods. This evolved into wanting to make my own compiler. But to build a compiler, you need to be able to see the output. A disassembler had to be built along with the compiler. That became obj2asm.exe. I’ve spent a great deal of time running the disassembler and poring over what the compiler generated. I look at what other compilers generate, too, using obj2asm.

But running obj2asm is a separate process, and the output is filled with all the boilerplate needed to create a proper object file. The boilerplate is rarely of interest, and I’m only interested in the generated code for a function. Why not just give the compiler a switch, call it -vasm (short for Show Me The Vaporator Assembly), and have it emit the binary and assembler code to the screen, function by function? So I ripped the disassembler logic out of obj2asm and put it into the dmd D compiler.

One would think that the way to do this would be to have the compiler generate the assembler source code, which would then be run through an assembler like MASM or gas to create the object file. I figured this would be slow and too much work. Instead, the disassembler logic actually intercepts the binary data being written to the object file and disassembles it to a string, then prints the string to the console.

For example, for the file vaporator.d:

int demo(int x)
{
     return x * x;
}

Compiling with:

dmd vaporator.d -c -vasm

prints:

_D9vaporator4demoFiZi:
0000:   89 F8                   mov     EAX,EDI
0002:   0F AF C0                imul    EAX,EAX
0005:   C3                      ret

and we see the mangled name of the function, the code offsets, the code binary, and the mnemonic representation for those learning binary.

I am not aware of any other compiler that does this in the same way. This is probably because most programmers are not particularly interested in how the sausages are made. But I find it fascinating and fun. I’ve opined before that programmers who don’t know the assembler their code is transformed into are not likely to be Alpha programmers. With the -vasm switch, it’s so easy to look at the output, why not do it? It works as a great way to learn assembler code, too!

I’ve been using it myself, and the convenience is a game changer. What are you waiting for?

P.S. I made the disassembler as a Boost Licensed standalone module that anyone can use who needs a tool to understand the binary language of moisture vaporators.

Using the GCC Static Analyzer on the D Programming Language

Largely thanks to the tireless work of Iain Buclaw, the D programming language is part of GCC. As well as having access to an extremely potent set of compiler optimizations and a large group of target platforms, D also benefits from upstream features added to GCC as a whole or even for specific languages. For some projects, this can be very important, as some of these features require large quantities of careful work, for example, mitigations for transient execution vulnerabilities.

A few years ago, thanks to David Malcolm at Red Hat, GCC gained a static analyzer. This uses a set of algorithms at compile time to find patterns in a program that would lead to memory safety bugs when the program is executed.

How do I turn it on?

Run GDC like you normally would and add the -fanalyzer flag. If you’re already bored of reading and want to have a go, please use Matt Godbolt’s excellent compiler explorer. Start with this simple example.

Which patterns does it look for?

Some memory bugs

From the GCC documentation, we can get a list of every warning the analyzer can emit:

-Wanalyzer-double-fclose 
-Wanalyzer-double-free 
-Wanalyzer-exposure-through-output-file 
-Wanalyzer-file-leak 
-Wanalyzer-free-of-non-heap 
-Wanalyzer-malloc-leak 
-Wanalyzer-mismatching-deallocation 
-Wanalyzer-possible-null-argument 
-Wanalyzer-possible-null-dereference 
-Wanalyzer-null-argument 
-Wanalyzer-null-dereference 
-Wanalyzer-shift-count-negative 
-Wanalyzer-shift-count-overflow 
-Wanalyzer-stale-setjmp-buffer 
-Wanalyzer-tainted-array-index 
-Wanalyzer-unsafe-call-within-signal-handler 
-Wanalyzer-use-after-free 
-Wanalyzer-use-of-pointer-in-stale-stack-frame 
-Wanalyzer-write-to-const 
-Wanalyzer-write-to-string-literal 

These names are fairly descriptive. However, let’s take a look at some examples before going into detail.

Let’s say we have some code that allocates a buffer for itself via malloc, like the following.

int usesTheHeap(size_t x)
{
    import core.stdc.stdlib : malloc, free;
    int[] slice = (cast(int*) malloc(int.sizeof * x))[0..x];
    slice[] = 0;
    // Algorithm goes here
    return 0;
}

For this code, the static analyzer gives us two warnings, the first of which is the following:

warning: leak of 'slice.ptr' [CWE-401]
   11 | }
      | ^
  'usesTheHeap': events 1-3
    |
    |    8 |     int[] slice = (cast(int*) malloc(int.sizeof * x))[0..x];
    |      |                                     ^
    |      |                                     |
    |      |                                     (1) allocated here
    |    9 |     slice[] = 0;
    |      |     ~                                
    |      |     |
    |      |     (2) assuming 'slice.ptr' is non-NULL
    |   10 |     // Algorithm goes here
    |   11 | }
    |      | ~                                    
    |      | |
    |      | (3) 'slice.ptr' leaks here; was allocated at (1)

As you might expect, since we didn’t free the memory we allocated, the analyzer warns us that the memory leaks at the end of the scope.

The second warning complains that we used the memory from malloc without checking if it was null. Program failure due to dereferencing a null-pointer is sometimes desirable in D, so you can turn this off with -Wno-analyzer-possible-null-dereference if you need to.

Thanks to assert being built into the core language and being lowered to a construct that GCC understands, we can use it to make the analyzer assume a pointer is non-null:

int usesTheHeap(size_t x)
{
    import core.stdc.stdlib : malloc, free;
    void* allocatedBuffer = malloc(int.sizeof * x);
    assert(allocatedBuffer != null);
    // The program may not proceed if the pointer is null
    int[] slice = (cast(int*) allocatedBuffer)[0..x];
    slice[] = 0; //So the analyzer knows this is safe.
    // Algorithm goes here
    return 0;
}

More than malloc and free

Let’s think about something that (obviously) uses memory, but isn’t always considered part of memory safety: although it’s not encouraged, you can use setjmp and longjmp from C in D code. As with many C features, these really can blow up in your face.

Look at the following:

import core.sys.posix.setjmp;

void main()
{
    jmp_buf local;
    void set()
    {
        setjmp(local);
    }
    set();
    longjmp(local, 0);
} 

We set the buffer inside set, but the buffer is now primed, ready, and pointing to nothing (technically it is something but that something is chaotic). Thankfully, the analyzer can warn us about this as in the following:

<source>: In function 'D main':
<source>:11:12: warning: 'longjmp' called after enclosing function of 'setjmp' has returned [-Wanalyzer-stale-setjmp-buffer]
   11 |     longjmp(local, 0);
      |            ^
  'D main': events 1-2
    |
    |    3 | void main()
    |      |      ^
    |      |      |
    |      |      (1) entry to 'D main'
    |......
    |   10 |     set();
    |      |        ~
    |      |        |
    |      |        (2) calling 'set' from 'D main'
    |
    +--> 'set': events 3-5
           |
           |    6 |     void set()
           |      |          ^
           |      |          |
           |      |          (3) entry to 'set'
           |    7 |     {
           |    8 |         setjmp(local);
           |      |               ~
           |      |               |
           |      |               (4) 'setjmp' called here
           |    9 |     }
           |      |     ~     
           |      |     |
           |      |     (5) stack frame is popped here, invalidating saved environment
           |
    <------+
    |
  'D main': events 6-7
    |
    |   10 |     set();
    |      |        ^
    |      |        |
    |      |        (6) returning to 'D main' from 'set'
    |   11 |     longjmp(local, 0);
    |      |            ~
    |      |            |
    |      |            (7) 'longjmp' called after enclosing function of 'setjmp' returned at (5)
    |

Beyond skin-deep

While important, stack corruption and (simple) memory leaks are old hat; catching them is usually relatively (touch wood) easy with modern programming practices, programming language design (i.e., sound memory safety analysis), sanitizers, and toolings like Valgrind or your favorite debugger. For less trivial issues, finding the issues when they happen in a controlled environment is still relatively easy with the above tools if the program fails, but finding why they happened could require manually instrumenting the program. Finding issues early is important and appreciated.

The analyzer is interprocedural, i.e., it can see across function boundaries (when the information is available). In some older codebases you can sometimes see code like this:

struct Handle
{
    void* x;
    void reset()
    {
        free(x);
    }
    ~this()
    {
        free(x);
    }
}
void accept(Handle x)
{
    x.reset();
    // Destructor called 
}

This yields a double-free. The analyzer is able to see “inside” the destructor and thus correctly warns about the double-free and what causes it.

The following seems to be sensitive to the optimization settings used but is very important when it works: iterator invalidation. That is to say, we hand out a pointer to somewhere, end up (say) realloc-ing, and suddenly that pristine pointer is now a pointer to absolutely nowhere.

struct Vector
{
    int* handle;
    void expand(size_t sz)
    {
        int* newPtr = cast(int*) realloc(handle, sz);
        assert(newPtr);
        handle = newPtr;
    }
    ~this()
    {
        free(handle);
    }
}
void iter(Vector x)
{
    int* copy = x.handle;
    x.expand(1000);
    *copy = 3;
}

The analyzer sees this and spits out the following:

<source>: In function 'iter':
<source>:23:11: warning: use after 'free' of 'copy_5' [CWE-416] [-Wanalyzer-use-after-free]
   23 |     *copy = 3;
      |           ^
  'iter': events 1-2
    |
    |   19 | void iter(Vector x)
    |      |      ^
    |      |      |
    |      |      (1) entry to 'iter'
    |......
    |   22 |     x.expand(1000);
    |      |             ~
    |      |             |
    |      |             (2) calling 'expand' from 'iter'
    |
    +--> 'expand': events 3-7
           |
           |    8 |     void expand(size_t sz)
           |      |          ^
           |      |          |
           |      |          (3) entry to 'expand'
           |    9 |     {
           |   10 |         int* newPtr = cast(int*) realloc(handle, sz);
           |      |                                         ~
           |      |                                         |
           |      |                                         (4) freed here
           |      |                                         (5) when '__builtin_realloc' succeeds, moving buffer
           |   11 |         assert(newPtr);
           |      |         ~ 
           |      |         |
           |      |         (6) following 'false' branch...
           |   12 |         handle = newPtr;
           |      |                ~
           |      |                |
           |      |                (7) ...to here
           |
    <------+
    |
  'iter': events 8-9
    |
    |   22 |     x.expand(1000);
    |      |             ^
    |      |             |
    |      |             (8) returning to 'iter' from 'expand'
    |   23 |     *copy = 3;
    |      |           ~  
    |      |           |
    |      |           (9) use after 'free' of 'copy_5'; freed at (4)
    |

Inline assembly

The analyzer was partly intended to help eliminate bugs in the Linux kernel. As such, it is useful to be able to analyze inline assembly (which is commonplace in the kernel). An example will not be given here, but GCC has gained the ability to analyze basic X86 inline assembly.

Some idiosyncrasies

The static analyzer is implemented as just another pass inside GCC (there are hundreds). This means that some warnings may magically disappear under certain optimization settings as the compiler eliminates dead code and propagates information.

Similarly, the quality of output does vary with the flags used. We won’t discuss it here, but options exist to increase the usefulness of diagnostics by performing more sophisticated analysis, for example, by propagating constraints through analyzed branches and thus eliminating some paths which are superficially “possible” but can, in fact, be eliminated by considering the semantics of the code.

Finding bugs when combining C and D

The static analyzer was designed for use with C (and C++, but mostly the former) and operates on GCC’s IR. If we use link-time optimization, we can combine the IR from compilation units in different languages (D and C), then use the analyzer to look for bugs across language boundaries.

Let’s say we have an unfortunate C library with two functions, doWork and terminate. They both accept void*, but they expect the memory to be allocated by the user of the library rather than by a matching init function.

#include <stdlib.h>
void doWork(void* ptr)
{
    // Do something, doesn't matter what here
}
void terminate(void* ptr)
{
    // Clean up things attached to ptr
    free(ptr);
}

Assuming we have no access to the C source and assuming the library documentation fails to mention that terminate calls free, we would likely write the following code:

extern(C) void doWork(void*);
extern(C) void terminate(void*);

void main()
{
    import core.stdc.stdlib : malloc, free;
    void* buf = malloc(100);
    scope(exit) free(buf);
    buf.doWork();
    buf.terminate();
}

If we’re lucky, we’ll see an error message like

free(): double free detected in tcache 2
Aborted (core dumped)

which is better than nothing but nonetheless not ideal if we were unfamiliar with the code.

If instead, we compile with gdc d.d c.c -fanalyzer -flto (the last flag is essential), we get this warning:

In function ‘D main’:
d.d:11:14: warning: double-‘free’ of ‘buf_6’ [CWE-415] [-Wanalyzer-double-free]
   11 |  scope(exit) free(buf);
      |              ^
  ‘D main’: event 1
    |
    |/usr/lib/gcc/x86_64-linux-gnu/10/include/d/__entrypoint.di:33:5:
    |   33 | int _Dmain(char[][] args);
    |      |     ^
    |      |     |
    |      |     (1) entry to ‘D main’
    |
  ‘D main’: events 2-3
    |
    |d.d:10:8:
    |   10 |  void* buf = malloc(100);
    |      |        ^
    |      |        |
    |      |        (2) allocated here
    |......
    |   13 |  buf.terminate();
    |      |  ~
    |      |  |
    |      |  (3) calling ‘terminate’ from ‘D main’
    |
    +--> ‘terminate’: events 4-5
           |
           |c.c:6:6:
           |    6 | void terminate(void* ptr)
           |      |      ^
           |      |      |
           |      |      (4) entry to ‘terminate’
           |    7 | {
           |    8 |     free(ptr);
           |      |     ~
           |      |     |
           |      |     (5) first ‘free’ here
           |
    <------+
    |
  ‘D main’: events 6-7
    |
    |d.d:13:2:
    |   11 |  scope(exit) free(buf);
    |      |              ~
    |      |              |
    |      |              (7) second ‘free’ here; first ‘free’ was at (5)
    |   12 |  buf.doWork();
    |   13 |  buf.terminate();
    |      |  ^
    |      |  |
    |      |  (6) returning to ‘D main’ from ‘terminate’
    |

This found our bug straight away. Thank you very much, static analysis.

Conclusion

The way this analyzer is implemented can serve as a lesson on the usefulness of IRs as a tool for analysis rather than merely optimization. A similar analysis is currently performed on the AST in the D frontend, but that’s slow and fairly ugly to write (let alone read).

I don’t think using a static analyzer is a replacement for a carefully designed language-level memory safety story, but I am very glad it exists. The fact that it is usable and useful from D is a testament to the benefits of D’s presence in GCC and diversity of implementation.

New Year DLang News: Hello 2022

Digital Mars D logo

For many people around the world, 2021 is a year they’d like to forget. The ongoing pandemic has touched all of our lives indirectly, but for too many, including some in the D community, it has had a more direct impact. We wish a full recovery for those of you who have been physically or emotionally affected by the virus. Please don’t forget: the D community is a network of people located around the globe. We are linked by our interest in the D programming language, but we are people before we are D programmers. If you find yourself in circumstances that disrupt any commitments you have in the community, it’s nothing to fret over. Get it sorted and we’ll be here when you get back. And if you need help to get it sorted, there are many among us willing to help if they can. Don’t be afraid to reach out.

Collectively, 2021 was a pretty good year for D. Some highlights:

A small amount of the work done in 2021 was paid for. The rest was carried out by volunteers, without whom the D programming language would not be where it is today. On behalf of the D Language Foundation, thanks again to all of our contributors, large and small, for all that you do.

Now for some updates to lead us into 2022.

We’re hiring

Symmetry Investments has informed us that they will continue sponsoring the three positions they started sponsoring last year. Razvan Nitu will continue in his role as a Pull Request Manager, and Max Haughton will go on as a general purpose assistant. The second Pull Request Manager role is currently vacant. We are looking for someone to fill it.

The position pays $25,000 USD per year. The ideal candidate is someone who:

  • is familiar with git, GitHub, and Bugzilla;
  • is familiar enough with D to be able to review simple pull requests;
  • is able to recognize when more specialized reviews are required and
  • is able to proofread English text (for reviewing documentation and web site pull requests).

The person who fills the position will work closely with Razvan Nitu. Examples of the role’s responsibilities include:

  • ensuring all pull requests follow procedure;
  • reviewing simple pull requests;
  • finding appropriate reviewers for more complex pull requests;
  • ensuring that pull requests are reviewed in a timely manner;
  • reviving stale pull requests;
  • coordinating between pull request submitters and reviewers to prevent pull requests from going stale;
  • closing pull requests that are no longer valid;
  • identifying Bugzilla issues that are duplicates or invalid;
  • identifying Bugzilla issues that are candidates for bounties;
  • publicizing Bugzilla issues in need of a champion and
  • other related tasks.

We are hoping to hire from within the D community, though we will accept queries from anyone. If you are interested in taking on the role, please send your resume to social@dlang.org.

Symmetry Investments is hiring

Symmetry Investments is looking for people to fill a number of roles. Their monthly job announcement at HackerNews lists those roles along with qualifications, details on how to apply, and more. If you think you don’t qualify because you lack a degree or haven’t built up a history of experience, please pay special attention to the following lines from the job announcement:

We look for virtues and capabilities over only experience and credentials although those things aren’t a disadvantage. Do not let a lack of credentials or qualifications prevent you from applying.

They are hiring for full-time, fixed-term contracts with flexible hours, with the possibility for both remote work and sponsorship for a visa in London, Hong Kong, Singapore, or Jersey.

Symmetry Autumn of Code 2021

Milestone 4 of SAOC 2021 kicked off on December 15th. As this point, only two participants remain eligible for the final Milestone 4 reward, but four of the original five projects are on the road to completion.

  • Replace DRuntime hooks with templates – Teodor Dutu has been steadily making progress on his project and has faced some tough challenges along the way. He successfully completed Milestones 1 – 3 and is continuing the project through Milestone 4.
  • Implement support for D in LLVM Debugger (LLDB) – Luís Ferreira has also faced some hard problems in passing Milestones 1 – 3 and continues his work as well. One major step in his progress: he has been granted commit access to LLVM and is now part of the team that reviews, accepts, and merges D-related code into the LLVM tree.
  • Rethinking the default class hierarchyRobert Aron submitted a DIP for the ProtoObject at the end of Milestone 1. Unfortunately, he was unable to complete SAOC Milestone 3, but we will launch the first round of Community Review for the DIP in mid-January.
  • Light Weight DRuntime (LWDR) – Dylan Graham had to withdraw from the SAOC event after Milestone 2. However, his LWDR is a passion project that existed prior to SAOC and will still be there after the event ends. He intends to pick up the project again when he is able. We wish him the best and look forward to his future work.
  • Improve DUB: solve dependency hell – Ahmet Sait Koçak picked this project from the community-maintained DLang Project Idea repository. The SAOC judges had concerns about the proposed solution, so before accepting it for SAOC 2021, we discussed the project at the D Language Foundation’s monthly meeting in August. The final decision was to accept the project, but that Ahmet should explore a specific alternative and only attempt his proposed solution if that was not viable. The alternative proved a dead end, so he moved forward on his initial proposal. He was able to make progress until he encountered issues which will likely require work beyond the scope of the project to resolve. As such, he will be unable to complete the event. Future work on solving the DUB dependency hell problem may well need to take a different approach.

DConf Online 2021 Q & A videos

To date as I write, I have published six of the eight Q & A videos that I cut and trimmed down from the Day One and Day Two livestreams. I’ll have the remaining two published, along with the ‘Ask Us Anything!’ session with Walter, Atila, and Razvan, before the middle of January. All of the Q & A videos are available on the DConf Online 2021 Q & A playlist and links are available in the description of each talk at dconf.org. The AUA will be listed on the DConf Online 2021 playlist and linked from its description in the DConf Online 2021 schedule.

On a related note, we’re all itching to get the real-world DConf going again. We’re currently evaluating the possibility of doing so later this year and what it will look like if it happens. Stay tuned.

Onward and upward!

We’ve got a number of things going on for 2022. Some examples: I’ll be publishing a tutorial series on our YouTube channel; we’ll finally publish a new vision document; we’ll be taking the first steps toward bringing the services in our ecosystem under one roof with multiple admins; we’ll either give Bugzilla an overhaul or port our issues to GitHub; we’ll finally have an implementation of the named arguments DIP; and more.

We are always in need of contributors. There are several ways to contribute:

  • If you’re working on your own D project, please contact me to write about it on this blog. Or write about it on your own blog. Or tweet about it. Let the world know what you’re doing! D exists and people are using it, so we need to be shouting out loud so that more people know about it.
  • If you find an issue, please report it. If there’s an issue you can solve, please submit a PR. If you’re interested in solving multiple issues, please contact Razvan Nitu about joining one of his strike teams.
  • If you don’t have time to solve issues, please consider supporting us financially by posting a bounty on any issues you care about, or donating to one of our funds. Or maybe support us by buying swag at the DLang Swag Emporium using the link in the sidebar so that we get a referral bonus on top of royalties. Or perhaps select the D Language Foundation as your preferred charity at smile.amazon.com so that we get a small percentage of your purchase amount when you shop there. (The D Language Foundation is only available as an option through Amazon’s .com domain.)
  • One of the most impactful ways you can contribute is to help newcomers to the D programming language. Hang out on the D Community Discord server or in the D Forums and employ the knowledge you’ve gained about D in helping others solve their problems. Help us in continuing to grow one of the most helpful communities on the internet.

Together, we can make 2022 a great year for our favorite programming language.

Happy New Year!