Project Highlight: Derelict

Posted on

Previous project highlights on this blog were written up both in my own words and in quotes from the project maintainers. This time around is different — it would be a little odd to quote myself while writing about my own project.

Derelict is a collection of D bindings to C libraries. In its present incarnation, it resides in a GitHub organization called DerelictOrg. There you’ll find bindings to a number of libraries such as SDL, GLFW, SFML, OpenGL, OpenAL, FreeType, FreeImage and more. There are currently 26 repositories in the organization: one for the documentation, one for a utility package, 21 active packages and three that are currently unsupported, maintained primarily by me and Guillaume Piolat.

The Derelict packages primarily provide dynamic bindings, though some can be configured as static bindings at compile time. Dynamic bindings require the C libraries to which they bind to be dynamically loaded at run time. The mechanism for this is provided by the DerelictUtil package. Loading a library is as simple as a function call, e.g. DerelictSDL2.load();.

Static bindings have a link-time dependency, requiring either statically or dynamically linking with the bound C library when building an executable. In that case, the dependency on DerelictUtil goes away (some packages may still use a few DerelictUtil declarations, requiring one of its modules to be imported, but it need not be linked into the program). On Windows, this introduces potential issues with object file formats, but these days they are quite easy to solve.

It’s hard to believe now, but Derelict has been around continuously, in one form or another, since March of 2004. I was first drawn to D because it fit almost perfectly between the two languages I had the most experience with back in 2003, C and Java. It addressed some of the frustrations I had with each while combining things I loved from both. But early on I ran into my first frustration with D.

At the time, interfacing with C libraries on Windows was a bit annoying. D has excellent support for and compatibility with C libraries in the language, but the object file format issues I alluded to above were frustrating. DMD on Windows could only output object files in the OMF format and could only use the OPTLINK linker, an ancient 32-bit linker that only understands OMF. This is because DMD was implemented on top of backend and tool chain used by the DMC, the Digital Mars C and C++ compiler. Meanwhile, the rest of the Microsoft ecosystem was (and is) using the COFF format. In practice, a static binding to a C library required either using tools to convert the object files or libraries from OMF to COFF, or compiling the C library with DMC. You don’t tend to find support for DMC in most build scripts.

When I found that the bindings for the libraries I wanted to use (SDL and OpenGL) were static, I resolved to create my own. I’m a big fan of dynamic loading, so it was a no-brainer to create dynamic bindings. When you load a DLL via the LoadLibrary/GetProcAddress API, the object file format is irrelevant. From that point on, I never had to worry about the COFF/OMF problem again. Even though the problem didn’t exist outside of Windows, I made them cross platform anyway to keep a consistent interface.

In those days, DMD 1.0.0 wasn’t yet a thing, but a new web site, dsource.org, had just been launched to host open source D projects (much of the site is still online in archive mode thanks to Vladimir Panteleev). All of the projects there used Subversion, so I decided to learn my way around it and take a stab at maintaining an open source project by making my new bindings available. Part of my motivation for picking “Derelict” as the title is because I fully expected no one else would be interested and, whether they were or not, I would eventually abandon it anyway.

As it turned out, people really were interested. Contributions started coming in almost immediately, along with questions and suggestions on the DSource forums. I added new bindings, set up criteria on new binding submissions (they must be gamedev-related and cross-platform), and added documentation on how to create “Derelictified” bindings. Tomasz Stachowiak (now a Frostbite game engine developer) came along and started making contributions, including a templated loader in DerelictUtil that replaced the one I had hacked together and became the basis for Derelict2.

Eventually, with the dawning of the D2 era, D development moved away from DSource to GitHub. So did Derelict, in the guise of Derelict 3. I had made use of every D build tool that came along before then, but finally settled on a custom build script (written in D) for this iteration, since the build tools all died. When DUB came along and looked like it was here to stay, I fully committed to it. I took the opportunity to finally split up the monolithic repository, gave every package its own repo, and created DerelictOrg as their new home. Aside from the appalling lack of documentation (a state which is rapidly changing, but is still a WIP), things have been fairly stable.

If I were still counting from Derelict 3, I think we might be at Derelict 6 by now, but that’s not how it rolls anymore. Since the move to DerelictOrg, I’ve twice made iterative improvements to DerelictUtil that broke binary compatibility with previous releases. The first time around, I didn’t properly manage the roll out. Anyone building the Derelict libraries manually, i.e. when they weren’t using DUB to manage their application project, could run into issues where one package used the new version and another used the older. It seems like it ought to have been a mess, but I heard very little about it. Still, in the most recent overhaul, I took steps to keep the new stuff distinctly separate from the old stuff and rolled it all out at once (which is why most of the latest releases as I write have a -beta suffix).

As D has evolved over the years, so has Derelict. Now that DMD supports COFF on Windows, some of the packages can now be configured at compile time as a static binding. Long ago, DerelictUtil gained the ability to selectively ignore missing symbols (curiously missing from the latest docs, but described in the old DerelictUtil Wiki) and more recently gained a feature that enables a package implementation to support multiple versions of a C library via a configurable call to the loader (SharedLibVersion), which only a handful of packages support and few more likely will (due to API compatibility issues with type sizes or enum values, not every package can). The latest version of DerelictGL3 is massively configurable at compile time. For a little while, I even relaxed my criteria for allowing new packages under the DerelictOrg umbrella, but now after I ended up being wholly responsible for them that’s something I’m not inclined to continue. With the DUB registry, it’s not really necessary.

There are a number of C library bindings using DerelictUtil out in the wild. You can find them, as well as all of the DerelictOrg packages, at the DUB package registry. Some of the third-party bindings have a name like “derelict-extras-foo”, but others use “derelict-foo” like those in DerelictOrg. You can also find a collection of static bindings in the Deimos organization, and third-party static bindings in the DUB registry that have adopted the Deimos approach of providing C headers along side the D modules.

Anyone who needs help with any of the Derelict packages can often find it in the #D irc channel at freenode.net. I drop by infrequently, but I monitor the D forums every day. Asking for Derelict help in the Learn forum is not off-topic.

The Derelict bindings have gone well beyond targeting game developers. Though they’re often used in games (like Voxelman and the first D game on Steam, Mayhem Intergalactic), they are also used elsewhere (like DLangUI). If you’re using any of the Derelict bindings in your own projects, I’d love to hear about it. Particularly since I’m always looking for projects I’ve never heard about to highlight here on the blog.

Life in the Fast Lane

Posted on

The first post I wrote in the GC series introduced the D garbage collector and the language features that use it. Two key points that I tried to get across in the article were:

  1. The GC can only run when memory allocations are requested. Contrary to popular misconception, the D GC isn’t generally going to decide to pause your Minecraft clone in the middle of the hot path. It will only run when memory from the GC heap is requested, and then only if it needs to.
  2. Simple C and C++ allocation strategies can mitigate GC pressure. Don’t allocate memory in inner loops – preallocate as much as possible, or fetch it from the stack instead. Minimize the total number of heap allocations. These strategies work because of point #1 above. The programmer can dictate when it is possible for a collection to occur simply by being smart about when GC heap allocations are made.

The strategies in point #2 are fine for code that a programmer writes herself, but they aren’t going to help at all with third-party libraries. For those situations, D provides built-in mechanisms to guarantee that no GC allocations can occur, both in the language and the runtime. There are also command-line options that can help make sure the GC stays out of the way.

Let’s imagine a hypothetical programmer named J.P. who, for reasons he considers valid, has decided he would like to avoid garbage collection completely in his D program. He has two immediate options.

The GC chill pill

One option is to make a call to GC.disable when the program is starting up. This doesn’t stop allocations, but puts a hold on collections. That means all collections, including any that may result from allocations in other threads.

void main() {
    import core.memory;
    import std.stdio;
    GC.disable;
    writeln("Goodbye, GC!");
}

Output:

Goodbye, GC!

This has the benefit that all language features making use of the GC heap will still work as expected. But, considering that allocations are still going without any cleanup, when you do the math you’ll realize this might be problematic. If allocations start to get out of hand, something’s gotta give. From the documentation:

Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

Depending on J.P.’s perspective, this might not be a good thing. But if this constraint is acceptable, there are some additional steps that can help keep things under control. J.P. can make calls to GC.enable or GC.collect as necessary. This provides greater control over collection cycles than the simple C and C++ allocation strategies.

The GC wall

When the GC is simply intolerable, J.P. can turn to the @nogc attribute. Slap it at the front of the main function and thou shalt suffer no collections.

@nogc
void main() { ... }

This is the ultimate GC mitigation strategy. @nogc applied to main will guarantee that the garbage collector will never run anywhere further along the callstack. No more caveats about collecting “where the implementation deems necessary”.

At first blush, this may appear to be a much better option than GC.disable. Let’s try it out.

@nogc
void main() {
    import std.stdio;
    writeln("GC be gone!");
}

This time, we aren’t going to get past compilation:

Error: @nogc function 'D main' cannot call non-@nogc function 'std.stdio.writeln!string.writeln'

What makes @nogc tick is the compiler’s ability to enforce it. It’s a very blunt approach. If a function is annotated with @nogc, then any function called from inside it must also be annotated with @nogc. As may be obvious, writeln is not.

That’s not all:

@nogc 
void main() {
    auto ints = new int[](100);
}

The compiler isn’t going to let you get away with that one either.

Error: cannot use 'new' in @nogc function 'D main'

Any language feature that allocates from the GC heap is out of reach inside a function marked @nogc (refer to the first post in this series for an overview of those features). It’s turtles all the way down. The big benefit here is that it guarantees that third-party code can’t use those features either, so can’t be allocating GC memory behind your back. Another downside is that any third-party library that is not @nogc aware is not going to be available in your program.

Using this approach requires a number of workarounds to make up for non-@nogc language features and library functions, including several in the standard library. Some are trivial, some are not, and others can’t be worked around at all (we’ll dive into the details in a future post). One example that might not be obvious is throwing an exception. The idiomatic way is:

throw new Exception("Blah");

Because of the new in that line, this isn’t possible in @nogc functions. Getting around this requires preallocating any exceptions that will be thrown, which in turn runs into the issue that any exception memory allocated from the regular heap still needs to be deallocated, which leads to ideas of reference counting or stack allocation… In other words, it’s a big can of worms. There’s currently a D Improvement Proposal from Walter Bright intended to stuff all the worms back into the can by making throw new Exception work without the GC when it needs to.

It’s not an insurmountable task to get around the limitations of @nogc main, it just requires a good bit of motivation and dedication.

One more thing to note about @nogc main is that it doesn’t banish the GC from the program completely. D has support for static constructors and destructors. The former are executed by the runtime before entering main and the latter upon exiting. If any of these exist in the program and are not annotated with @nogc, then GC allocations and collections can technically be present in the program. Still, @nogc applied to main means there won’t be any collections running once main is entered, so it’s effectively the same as having no GC at all.

Working it out

Here’s where I’m going to offer an opinion. There’s a wide range of programs that can be written in D without disabling or cutting the GC off completely. The simple strategies of minimizing GC allocations and keeping them out of the hot path will get a lot of mileage and should be preferred. It can’t be repeated enough given how often it’s misunderstood: D’s GC will only have a chance to run when the programmer allocates GC memory and it will only run if it needs to. Use that knowledge to your advantage by keeping the allocations small, infrequent, and isolated outside your inner loops.

For those programs where more control is actually needed, it probably isn’t going to be necessary to avoid the GC entirely. Judicious use of @nogc and/or the core.memory.GC API can often serve to avoid any performance issues that may arise. Don’t put @nogc on main, put it on the functions where you really want to disallow GC allocations. Don’t call GC.disable at the beginning of the program. Call it instead before entering a critical path, then call GC.enable when leaving that path. Force collections at strategic points, such as between game levels, with GC.collect.

As with any performance tuning strategy in software development, it pays to understand as fully as possible what’s actually happening under the hood. Adding calls to the core.memory.GC API in places where you think they make sense could potentially make the GC do needless work, or have no impact at all. Better understanding can be achieved with a little help from the toolchain.

The DRuntime GC option --DRT-gcopt=profile:1 can be passed to a compiled program (not to the compiler!) for some tune-up assistance. This will report some useful GC profiling data, such as the total number of collections and the total collection time.

To demonstrate, gcstat.d appends twenty values to a dynamic array of integers.

void main() {
    import std.stdio;
    int[] ints;
    foreach(i; 0 .. 20) {
        ints ~= i;
    }
    writeln(ints);
}

Compiling and running with the GC profile switch:

dmd gcstat.d
gcstat --DRT-gcopt=profile:1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
        Number of collections:  1
        Total GC prep time:  0 milliseconds
        Total mark time:  0 milliseconds
        Total sweep time:  0 milliseconds
        Total page recovery time:  0 milliseconds
        Max Pause Time:  0 milliseconds
        Grand total GC time:  0 milliseconds
GC summary:    1 MB,    1 GC    0 ms, Pauses    0 ms <    0 ms

This reports one collection, which almost certainly happened as the program was shutting down. The runtime terminates the GC as it exits which, in the current implementation, will generally trigger a collection. This is done primarily to run destructors on collected objects, even though D does not require destructors of GC-allocated objects to ever be run (a topic for a future post).

DMD supports a command-line option, -vgc, that will display every GC allocation in a program, including those that are hidden behind language features like the array append operator.

To demonstrate, take a look at inner.d:

void printInts(int[] delegate() dg)
{
    import std.stdio;
    foreach(i; dg()) writeln(i);
} 

void main() {
    int[] ints;
    auto makeInts() {
        foreach(i; 0 .. 20) {
            ints ~= i;
        }
        return ints;
    }

    printInts(&makeInts);
}

Here, makeInts is an inner function. A pointer to a non-static inner function is not a function pointer, but a delegate (a context pointer/function pointer pair; if an inner function is static, a pointer of type function is produced instead). In this particular case, the delegate makes use of a variable in its parent scope. Here’s the output of compiling with -vgc:

dmd -vgc inner.d
inner.d(11): vgc: operator ~= may cause GC allocation
inner.d(7): vgc: using closure causes GC allocation

What we’re seeing here is that memory needs to be allocated so that the delegate can carry the state of ints, making it a closure (which is not itself a type – the type is still delegate). Move the declaration of ints inside the scope of makeInts and recompile. You’ll find that the closure allocation goes away. A better option is to change the declaration of printInts to look like this:

void printInts(scope int[] delegate() dg)

Adding scope to any function parameter ensures that any references in the parameter cannot be escaped. In other words, it now becomes impossible to do something like assign dg to a global variable, or return it from the function. The effect is that there is no longer a need to create a closure, so there will be no allocation. See the documentation for more on function pointers, delegates and closures, and function parameter storage classes.

The gist

Given that the D GC is very different from those in languages like Java and C#, it’s certain to have different performance characteristics. Moreover, D programs tend to produce far less garbage than those written in a language like Java, where almost everything is a reference type. It helps to understand this when embarking on a D project for the first time. The strategies an experienced Java programmer uses to mitigate the impact of collections aren’t likley to apply here.

While there is certainly a class of software in which no GC pauses are ever acceptable, that is an arguably small set. Most D projects can, and should, start out with the simple mitigation strategies from point #2 at the top of this article, then adapt the code to use @nogc or core.memory.GC as and when performance dictates. The command-line options demonstrated here can help ferret out the areas where that may be necessary.

As time goes by, it’s going to become easier to micromanage garbage collection in D programs. There’s a concerted effort underway to make Phobos, D’s standard library, as @nogc-friendly as possible. Language improvements such as Walter’s proposal to modify how exceptions are allocated should speed that work considerably.

Future posts in this series will look at how to allocate memory outside of the GC heap and use it alongside GC allocations in the same program, how to compensate for disabled language features in @nogc code, strategies for handling the interaction of the GC with object destructors, and more.

Thanks to Vladimir Panteleev, Guillaume Piolat, and Steven Schveighoffer for their valuable feedback on drafts of this article.

The article has been amended to remove a misleading line about Java and C#, and to add some information about multiple threads.

Compile-Time Sort in D

Posted on

Björn Fahller recently wrote a blog post showing how to implement a compile-time quicksort in C++17. It’s a skillful demonstration that employs the evolving C++ feature set to write code that, while not quite concise, is more streamlined than previous iterations. He concludes with, “…the usefulness of this is very limited, but it is kind of cool, isn’t it?”

There’s quite a bit of usefulness to be found in evaluating code during compilation. The coolness (of which there is much) arises from the possibilities that come along with it. Starting from Björn’s example, this post sets out to teach a few interesting aspects of compile-time evaluation in the D programming language.

The article came to my attention from Russel Winder’s provocative query in the D forums, “Surely D can do better”, which was quickly resolved with a “No Story”-style answer by Andrei Alexandrescu. “There is nothing to do really. Just use standard library sort,” he quipped, and followed with code:

Example 1

void main() {
    import std.algorithm, std.stdio;
    enum a = [ 3, 1, 2, 4, 0 ];
    static b = sort(a);
    writeln(b); // [0, 1, 2, 3, 4]
}

Though it probably won’t be obvious to those unfamiliar with D, the call to sort really is happening at compile time. Let’s see why.

Compile-time code is runtime code

It’s true. There are no hurdles to jump over to get things running at compile time in D. Any compile-time function is also a runtime function and can be executed in either context. However, not all runtime functions qualify for CTFE (Compile-Time Function Evaluation).

The fundamental requirements for CTFE eligibility are that a function must be portable, free of side effects, contain no inline assembly, and the source code must be available. Beyond that, the only thing deciding whether a function is evaluated during compilation vs. at run time is the context in which it’s called.

The CTFE Documentation includes the following statement:

In order to be executed at compile time, the function must appear in a context where it must be so executed…

It then lists a few examples of where that is true. What it boils down to is this: if a function can be executed in a compile-time context where it must be, then it will be. When it can’t be excecuted (it doesn’t meet the CTFE requirements, for example), the compiler will emit an error.

Breaking down the compile-time sort

Take a look once more at Example 1.

void main() {
    import std.algorithm, std.stdio;
    enum a = [ 3, 1, 2, 4, 0 ];
    static b = sort(a);
    writeln(b);
}

The points of interest that enable the CTFE magic here are lines 3 and 4.

The enum in line 3 is a manifest constant. It differs from other constants in D (those marked immutable or const) in that it exists only at compile time. Any attempt to take its address is an error. If it’s never used, then its value will never appear in the code.

When an enum is used, the compiler essentially pastes its value in place of the symbol name.

enum xinit = 10;
int x = xinit;

immutable yinit = 11;
int y = yinit;

Here, x is initialized to the literal 10. It’s identical to writing int x = 10. The constant yinit is initialized with an int literal, but y is initialized with the value of yinit, which, though known at compile time, is not a literal itself. yinit will exist at run time, but xinit will not.

In Example 1, the static variable b is initialized with the manifest constant a. In the CTFE documentation, this is listed as an example scenario in which a function must be evaluated during compilation. A static variable declared in a function can only be initialized with a compile-time value. Trying to compile this:

Example 2

void main() {
    int x = 10;
    static y = x;
}

Will result in this:

Error: variable x cannot be read at compile time

Using a function call to initialize a static variable means the function must be executed at compile time and, therefore, it will be if it qualifies.

Those two pieces of the puzzle, the manifest constant and the static initializer, explain why the call to sort in Example 1 happens at compile time without any metaprogramming contortions. In fact, the example could be made one line shorter:

Example 3

void main() {
    import std.algorithm, std.stdio;
    static b = sort([ 3, 1, 2, 4, 0 ]);
    writeln(b);
}

And if there’s no need for b to stick around at run time, it could be made an enum instead of a static variable:

Example 4

void main() {
    import std.algorithm, std.stdio;
    enum b = sort([ 3, 1, 2, 4, 0 ]);
    writeln(b);
}

In both cases, the call to sort will happen at compile time, but they handle the result differently. Consider that, due to the nature of enums, the change will produce an equivalent of this: writeln([ 0, 1, 2, 3, 4 ]). Because the call to writeln happens at run time, the array literal might trigger a GC allocation (though it could be, and sometimes will be, optimized away). With the static initializer, there is no runtime allocation, as the result of the function call is used at compile time to initialize the variable.

It’s worth noting that sort isn’t directly returning a value of type int[]. Take a peek at the documentation and you’ll discover that what it’s giving back is a SortedRange. Specifically in our usage, it’s a SortedRange!(int[], "a < b"). This type, like arrays in D, exposes all of the primitives of a random-access range, but additionally provides functions that only work on sorted ranges and can take advantage of their ordering (e.g. trisect). The array is still in there, but wrapped in an enhanced API.

To CTFE or not to CTFE

I mentioned above that all compile-time functions are also runtime functions. Sometimes, it's useful to distinguish between the two inside the function itself. D allows you to do that with the __ctfe variable. Here's an example from my book, 'Learning D'.

Example 5

string genDebugMsg(string msg) {
    if(__ctfe)
        return "CTFE_" ~ msg;
    else
        return "DBG_" ~ msg;
}

pragma(msg, genDebugMsg("Running at compile-time."));
void main() {
    writeln(genDebugMsg("Running at runtime."));
}

The msg pragma prints a message to stderr at compile time. When genDebugMsg is called as its second argument here, then inside that function the variable __ctfe will be true. When the function is then called as an argument to writeln, which happens in a runtime context, __ctfe is false.

It's important to note that __ctfe is not a compile-time value. No function knows if it's being executed at compile-time or at run time. In the former case, it's being evaluated by an interpreter that runs inside the compiler. Even then, we can make a distinction between compile-time and runtime values inside the function itself. The result of the function, however, will be a compile-time value when it's executed at compile time.

Complex compile-time validation

Now let's look at something that doesn't use an out-of-the-box function from the standard library.

A few years back, Andrei published 'The D Programming Language'. In the section describing CTFE, he implemented three functions that could be used to validate the parameters passed to a hypothetical linear congruential generator. The idea is that the parameters must meet a set of criteria, which he lays out in the book (buy it for the commentary -- it's well worth it), for generating the largest period possible. Here they are, minus the unit tests:

Example 6

// Implementation of Euclid’s algorithm
ulong gcd(ulong a, ulong b) { 
    while (b) {
        auto t = b; b = a % b; a = t;
    }
    return a; 
}

ulong primeFactorsOnly(ulong n) {
    ulong accum = 1;
    ulong iter = 2;
    for (; n >= iter * iter; iter += 2 - (iter == 2)) {
        if (n % iter) continue;
        accum *= iter;
        do n /= iter; while (n % iter == 0);
    }
    return accum * n;
}

bool properLinearCongruentialParameters(ulong m, ulong a, ulong c) { 
    // Bounds checking
    if (m == 0 || a == 0 || a >= m || c == 0 || c >= m) return false; 
    // c and m are relatively prime
    if (gcd(c, m) != 1) return false;
    // a - 1 is divisible by all prime factors of m
    if ((a - 1) % primeFactorsOnly(m)) return false;
    // If a - 1 is multiple of 4, then m is a multiple of 4, too. 
    if ((a - 1) % 4 == 0 && m % 4) return false;
    // Passed all tests
    return true;
}

The key point this code was intended to make is the same one I made earlier in this post: properLinearCongruentialParameters is a function that can be used in both a compile-time context and a runtime context. There's no special syntax required to make it work, no need to create two distinct versions.

Want to implement a linear congruential generator as a templated struct with the RNG parameters passed as template arguments? Use properLinearCongruentialParameters to validate the parameters. Want to implement a version that accepts the arguments at run time? properLinearCongruentialParameters has got you covered. Want to implement an RNG that can be used at both compile time and run time? You get the picture.

For completeness, here's an example of validating parameters in both contexts.

Example 7

void main() {
    enum ulong m = 1UL << 32, a = 1664525, c = 1013904223;
    static ctVal = properLinearCongruentialParameters(m, a, c);
    writeln(properLinearCongruentialParameters(m, a, c));
}

If you've been paying attention, you'll know that ctVal must be initialized at compile time, so it forces CTFE on the call to the function. And the call to the same function as an argument to writeln happens at run time. You can have your cake and eat it, too.

Conclusion

Compile-Time Function Evaluation in D is both convenient and painless. It can be combined with other features such as templates (it's particularly useful with template parameters and constraints), string mixins and import expressions to simplify what might otherwise be extremely complex code, some of which wouldn't even be possible in many languages without a preprocessor. As a bonus, Stefan Koch is currently working on a more performant CTFE engine for the D frontend to make it even more convenient. Stay tuned here for more news on that front.

Thanks to the multiple members of the D community who reviewed this article.