Category Archives: Code

DMD 2.076.0 Released

The core D team is proud to announce that version 2.076.0 of DMD, the reference compiler for the D programming language, is ready for download. The two biggest highlights in this release are the new static foreach feature for improved generative and generic programming, and significantly enhanced C language integration making incremental conversion of C projects to D easy and profitable.

static foreach

As part of its support for generic and generative programming, D allows for conditional compilation by way of constructs such as version and static if statements. These are used to choose different code paths during compilation, or to generate blocks of code in conjunction with string and template mixins. Although these features enable possibilities that continue to be discovered, the lack of a compile-time loop construct has been a steady source of inconvenience.

Consider this example, where a series of constants named val0 to valN needs to be generated based on a number N+1 specified in a configuration file. A real configuration file would require a function to parse it, but for this example, assume the file val.cfg is defined to contain a single numerical value, such as 10, and nothing else. Further assuming that val.cfg is in the same directory as the valgen.d source file, use the command line dmd -J. valgen.d to compile.

module valgen;
import std.conv : to;

enum valMax = to!uint(import("val.cfg"));

string genVals() 
{
    string ret;
    foreach(i; 0 .. valMax) 
    {
        ret ~= "enum val" ~ to!string(i) ~ "=" ~ to!string(i) ~ ";";
    }
    return ret;
}

string genWrites() 
{
    string ret;
    foreach(i; 0 .. valMax) 
    {
        ret ~= "writeln(val" ~ to!string(i) ~ ");";
    }
    return ret;
}

mixin(genVals);

void main() 
{
    import std.stdio : writeln;
    mixin(genWrites);
}

The manifest constant valMax is initialized by the import expression, which reads in a file during compilation and treats it as a string literal. Since we’re dealing only with a single number in the file, we can pass the string directly to the std.conv.to function template to convert it to a uint. Because valMax is an enum, the call to to must happen during compilation. Finally, because to meets the criteria for compile-time function evaluation (CTFE), the compiler hands it off to the interpreter to do so.

The genVals function exists solely to generate the declarations of the constants val0 to valN, where N is determined by the value of valMax. The string mixin on line 26 forces the call to genVals to happen during compilation, which means this function is also evaluated by the compile-time interpreter. The loop inside the function builds up a single string containing the declaration of each constant, then returns it so that it can be mixed in as several constant declarations.

Similarly, the genWrites function has the single-minded purpose of generating one writeln call for each constant produced by genVals. Again, each line of code is built up as a single string, and the string mixin inside the main function forces genWrites to be executed at compile-time so that its return value can be mixed in and compiled.

Even with such a trivial example, the fact that the generation of the declarations and function calls is tucked away inside two functions is a detriment to readability. Code generation can get quite complex, and any functions created only to be executed during compilation add to that complexity. The need for iteration is not uncommon for anyone working with D’s compile-time constructs, and in turn neither is the implementation of functions that exist just to provide a compile-time loop. The desire to avoid such boilerplate has put the idea of a static foreach as a companion to static if high on many wish lists.

At DConf 2017, Timon Gehr rolled up his sleeves during the hackathon and implemented a pull request to add support for static foreach to the compiler. He followed that up with a D Improvement Proposal, DIP 1010, so that he could make it official, and the DIP met with enthusiastic approval from the language authors. With DMD 2.076, it’s finally ready for prime time.

With this new feature, the above example can be rewritten as follows:

module valgen2;
import std.conv : to;

enum valMax = to!uint(import("val.cfg"));

static foreach(i; 0 .. valMax) 
{
    mixin("enum val" ~ to!string(i) ~ "=" ~ to!string(i) ~ ";");
}

void main() 
{
    import std.stdio : writeln;
    static foreach(i; 0 .. valMax) 
    {
        mixin("writeln(val" ~ to!string(i) ~ ");");
    }
}

Even such a trivial example brings a noticeable improvement in readability. Don’t be surprised to see compile-time heavy D libraries (and aren’t most of them?) get some major updates in the wake of this compiler release.

Better C integration and interoperation

DMD’s -betterC command line switch has been around for quite a while, though it didn’t really do much and it has languished from inattention while more pressing concerns were addressed. With DMD 2.076, its time has come.

The idea behind the feature is to make it even easier to combine both D and C in the same program, with an emphasis on incrementally replacing C code with D code in a working project. D has been compatible with the C ABI from the beginning and, with some work to translate C headers to D modules, can directly make C API calls without going through any sort of middleman. Going the other way and incorporating D into C programs has also been possible, but not as smooth of a process.

Perhaps the biggest issue has been DRuntime. There are certain D language features that depend on its presence, so any D code intended to be used in C needs to bring the runtime along and ensure that it’s initialized. That, or all references to the runtime need to be excised from the D binaries before linking with the C side, something that requires more than a little effort both while writing code and while compiling it.

-betterC aims to dramatically reduce the effort required to bring D libraries into the C world and modernize C projects by partially or entirely converting them to D. DMD 2.076 makes significant progress toward that end. When -betterC is specified on the command line, all asserts in D modules will now use the C assert handler rather than the D assert handler. And, importantly, neither DRuntime nor Phobos, the D standard library, will be automatically linked in as they normally are. This means it’s no longer necessary to manually configure the build process or fix up the binaries when using -betterC. Now, object files and libraries generated from D modules can be directly linked into a C program without any special effort. This is especially easy when using VisualD, the D plugin for Visual Studio. Not too long ago, it gained support for mixing C and D modules in the same project. The updated -betterC switch makes it an even more convenient feature.

While the feature is now more usable, it’s not yet complete. More work remains to be done in future releases to allow the use of more D features currently prohibited in betterC. Read more about the feature in Walter Bright’s article here on the D Blog, D as a Better C.

A new release schedule

This isn’t a compiler or language feature, but it’s a process feature worth noting. This is the first release conforming to a new release schedule. From here on out, beta releases will be announced on the 15th of every even month, such as 2017–10–15, 2017–12–15, 2018–2–15, etc. All final releases will be scheduled for the 1st of every odd month: 2017–11–01, 2018–01–01, 2018–03–01, etc. This will bring some reliability and predictability to the release schedule, and make it easier to plan milestones for enhancements, changes, and new features.

Get it now!

As always, the changes, fixes, and enhancements for this release can be found in the changelog. This specific release will always be available for download at http://downloads.dlang.org/releases/2.x/2.076.0, and the latest release plus betas and nightlies can be found at the download page on the DLang website.

Open Methods: From C++ to D

Prelude

Earlier this year I attended C++Now, a major conference dedicated to C++. I listened to talks given by very bright people, who used all sorts of avant-garde C++ techniques to accomplish all sorts of feats at compile time. It was a constexpr party! However, at the end of the week I had severe doubts about the future of C++.

I’ll say this for the organizers, though: they were quite open minded. They reserved the largest auditorium for a two-hour presentation of competing languages, one every day. We had Haskell and Rust, and Ali Çehreli talked about D.

I knew next to nothing about D. You see, I learned to program in Forth. Later I did some Lisp programming just for fun. To me, the idea of CTFE was natural right off the bat. So when Ali talked about static if and mixins, he definitely got my attention.

In order to learn (and evaluate) D, I decided to reproduce parts of my C++ library yomm11. It implements open multi-methods and contains code that exercises the “interesting” parts of the language, both at compile time and run time. Initially, I thought I would just see how I could reimplement bits of yomm11, how nice (or ugly) the syntax for declaring methods would turn out to be. The result was satisfying. I would even say intoxicating. I ended up bringing the port to completion and I feel that the result–openmethods.d–is the best implementation of open methods I’ve crafted so far. And it’s all done in a library, relying only on existing language features.

But wait, what are open methods?

From Member to Free

Open methods are just like virtual functions, except that they are declared outside of a class hierarchy. They are often conflated with multi-methods, because they are frequently implemented together (as is the case with this library), but really these are two different concepts. The ‘open’ part is, I believe, the more important, so I will focus more on that in this article.

Here is an example of a virtual function:

interface Animal
{
  string kick();
}

class Dog : Animal
{
  string kick() { return "bark"; }
}

class Pitbull : Dog
{
  override string kick() { return super.kick() ~ " and bite"; }
}

void main()
{
  import std.stdio : writeln;
  Animal snoopy = new Dog, hector = new Pitbull;
  writeln("snoopy.kick(): ", snoopy.kick()); // bark
  writeln("hector.kick(): ", hector.kick()); // bark and bite
}

The direct equivalent, translated to open methods, reads like this:

import openmethods;
mixin(registerMethods);

interface Animal
{
}

class Dog : Animal
{
}

class Pitbull : Dog
{
}

string kick(virtual!Animal);

@method
string _kick(Dog dog) { return "bark"; }

@method
string _kick(Pitbull dog) { return next!kick(dog) ~ " and bite"; }

void main()
{
  updateMethods();
  import std.stdio : writeln;
  Animal snoopy = new Dog, hector = new Pitbull;
  writeln("snoopy.kick(): ", snoopy.kick()); // bark
  writeln("hector.kick(): ", hector.kick()); // bark an dbite
}

Let’s break it down.

The string kick() in interface Animal becomes the free function declaration string kick(virtual!Animal). The implicit this parameter becomes an explicit parameter, and its type is prefixed with virtual!, thus indicating that the parameter is used to resolve calls at run time.
The string kick() override in class Dog becomes the free function definition @method string _kick(Dog dog) { return "bark"; }. Three things here:
- the override is preceded by the @method attribute
- the function name is prefixed with an underscore
- the implicit this argument is explicitly named: Dog dog
The same thing happens to the override in class Pitbull, with an extra twist: super.kick() becomes next!kick(dog)
The calls to kick in main become free function calls – although, incidentally, they could have remained unchanged, thanks to Uniform Function Call Syntax.
After importing the openmethods module, a mixin is called: mixin(registerMethods). It should be called in each module that imports openmethods. It matches method declarations and overrides. It also creates a kick(Animal) function (note: sans the virtual!), which is the entry point in the dynamic dispatch mechanism.
Finally, main calls updateMethods. This should be done before calling any method (typically first thing in main) and each time a library containing methods is dynamically loaded or unloaded.

Open Is Good

What does it gain us? Well, a lot. Now we can add polymorphic behavior to any class hierarchy without modifying it. In fact, this implementation even allows you to add methods to Object, in a matter of speaking. Because, of course, class Object is never modified.

Let’s take a more serious example. Suppose that you have written a nifty matrix math library. Matrices come in all sorts of flavors: diagonal, shallow, tri-diagonal, and of course dense (i.e. “normal” matrices). Depending on the exact nature of a matrix, you can optimize some operations. Transposing a diagonal or a symmetric matrix amounts to returning it, unchanged. Adding sparse matrices does not require adding thousands of zeroes; and so on. And you have exploited all these properties in your matrix library, varying the behavior by means of virtual functions.

Neat.

Now let me ask you a question: should you provide a print function? A persist function?

Almost certainly not. For starters, there are many ways to display a matrix. If it is sparse, you may want to show only the non-zero elements… or all of them. You may want to display the null matrix as [0]… or in full. It is the privilige of the application to decide what matrices should look like on screen or paper. The matrix library should do the maths, and the application should do the presentation. If it needs to display matrices at all, that is. In game programming, there may be no need to display matrices. However, if you provide a print function, given the way they are implemented, the print or the persist code will always be pulled in from the library. Not good.

Now the application programmer will have to write his print and persist functions, but immediately he will be facing a problem: certainly he wants to vary the behavior according to the exact matrix type; he wants polymorphism! So he will probably end up coding a set of type switches.

Open methods solve this problem more neatly:

void print(virtual!Matrix m);

@method
void _print(Matrix m)
{
  const int nr = m.rows;
  const int nc = m.cols;
  for (int i = 0; i < nr; ++i) {
    for (int j = 0; j < nc; ++j) {
      writef("%3g", m.at(i, j));
    }
    writeln();
  }
}

@method
void _print(DiagonalMatrix m)
{
  import std.algorithm;
  import std.format;
  import std.array;
  writeln("diag(", m.elems.map!(x => format("%g", x)).join(", "), ")");
}

Accept No Visitors (c) Yuriy Solodkyy

A popular existing solution to this problem comes in the form of the Visitor pattern. Your matrix library could provide one, thus allowing the application writer to process different matrices according to their type.

In truth, Visitor is more an anti-pattern than a pattern, because the base class is aware of all its derived classes – something that flies in the face of all OOP design rules.

Here it is anyway:

import std.stdio;

interface Matrix
{
  interface Visitor
  {
    void visit(DenseMatrix m);
    void visit(DiagonalMatrix m);
  }

  void accept(Visitor v);
}

class DenseMatrix : Matrix
{
  void accept(Visitor v) { v.visit(this); }
}

class DiagonalMatrix : Matrix
{
  void accept(Visitor v) { v.visit(this); }
}

class PrintVisitor : Matrix.Visitor
{
  this(File of) { this.of = of; }

  void visit(DenseMatrix m) { of.writeln("print a DenseMatrix"); }
  void visit(DiagonalMatrix m) { of.writeln("print a DiagonalMatrix"); }

  File of;
}

void main()
{
  Matrix dense = new DenseMatrix, diagonal = new DiagonalMatrix;
  auto printer = new PrintVisitor(stdout);
  dense.accept(printer);
  diagonal.accept(printer);
}

This approach is more verbose than using an open method, and it has a more fatal flaw: it is not extensible. Suppose that the user of your matrix library wants to add matrices of his own design. For example, a SparseMatrix. The Visitor will be of no help here. With open methods, on the other hand, the solution is available, simple, and elegant:

// from library

void print(virtual!Matrix m, File of);

@method
void _print(DenseMatrix m, File of)
{
  of.writeln("print a DenseMatrix");
}

@method
void _print(DiagonalMatrix m, File of)
{
  of.writeln("print a DiagonalMatrix");
}

// extend library

class SparseMatrix : Matrix
{
  // ...
}

@method
void _print(SparseMatrix m, File of)
{
  of.writeln("print a SparseMatrix");
}

Multiple Dispatch

Occasionally, there is a need to take into account the type of two or more arguments to select the appropriate behavior. This is called multiple dispatch. Most languages only support single dispatch in the form of virtual member functions. Once again, the “solution” involves type switches or visitors. A few languages address this situation directly by means of multi-methods. The most notorious example is the Common Lisp Object System. Recently, a string of new languages have native support for multi-methods: Clojure (unsurprising for a lispoid), Julia, Nice, Cecil, TADS (a language for developing text-based adventure games).

This library implements multi-methods as well. There is no limit to the number of arguments that can be adorned with the virtual! qualifier. They will all be considered during dynamic dispatch.

Continuing the matrix library example, you probably want to provide binary operations on matrices: addition, subtraction and multiplication. If both operands are matrices, you really want to pick the right algorithm depending on the respective types of both operands. There is no point wasting time on adding all the elements if both operands are diagonal matrices; adding the diagonals suffices. Crucially, adding two DiagonalMatrix objects should return a DiagonalMatrix, not a plain DenseMatrix. Adding a DiagonalMatrix and a TriDiagonalMatrix should return a TriDiagonalMatrix, etc.

With open multi-methods, there is no problem at all:

module matrix;

Matrix plus(virtual!Matrix, virtual!Matrix);

module densematrix;

@method
Matrix _plus(Matrix a, Matrix b)
{
  // fallback: add all elements, fetched via interface
  // return a DenseMatrix
}

@method
Matrix _plus(DenseMatrix a, DenseMatrix b)
{
  // add all elements, access representation directly
  // return a DenseMatrix
}

module diagonalmatrix;

@method
Matrix _plus(DiagonalMatrix a, DiagonalMatrix b)
{
  // just add the elements on diagonals
  // return a DiagonalMatrix
}

Once again, open methods make the library extensible. It is trivial to plug new types in:

module mymatrices;

@method
Matrix _plus(SparseMatrix a, SparseMatrix b)
{
  // just add the non-zero elements
  // return a SparseMatrix
}

@method
Matrix _plus(SparseMatrix a, DiagonalMatrix b)
{
  // still don't add all the zeroes
  // return a SparseMatrix
}

@method
Matrix _plus(DiagonalMatrix a, SparseMatrix b)
{
  return plus(b, a); // matrix addition is commutative
}

Implementation Notes and Performance

This implementation uses tables of pointers to select the appropriate function to call. The process is very similar to what happens when a regular, virtual member function is called.

Each class involved in method dispatch–either because it is used as a virtual argument in a method declaration, or because it inherits from a class or an interface used as a virtual argument–has an associated method table (mtbl). The pointer to the method table (mptr) associated to a given class is stored, by default, in the deallocator pointer of the class’s ClassInfo. The first entry in a class’s vtable contains a pointer to its ClassInfo. The deallocator pointer was used to implement the deprecated delete method, so it is reasonable to recycle it. The deallocator pointer may be removed some day, or one may want to use methods in conjunction with classes that implement delete, so an alternative is supported. Tagging a method with @mptr("hash") makes it fetch the method table pointer from an array indexed by a perfect integer hash calculated during updateMethods. In this case, finding the mptr amounts to multiplying the vptr’s value by an integer and applying a bit mask.

The method table contains one entry for each virtual parameter for each method. If the method has a single virtual argument, the entry contains the specialization’s address, just like an ordinary virtual function; otherwise, the entry contains a pointer to a row in a multi-dimensional dispatch table for the first argument, and integer indexes for the subsequent virtual arguments.

Since the set of methods applicable to a given class is known only at run time and may change in the presence of dynamic loading, the position of a method’s entries in the method table is not fixed; it is stored in a table associated with each method. Finally, in the presence of multiple dispatch, a per-method array of strides is used to convert the multi-dimensional index to a linear offset.

However, finding the specialization amounts to a few memory reads, additions and perhaps multiplications. As a result, open methods are almost as fast as virtual functions backed by the compiler. How much slower they are depends on several factors, including the compiler, or whether the call is issued from an interface or a class. The following table sums up some of my benchmarks. Rows come in groups of three: the “usual”, compiler-supported virtual member functions; the functional equivalent using open methods; and the cost, expressed as (method - virtual) / virtual:

mptr in deallocator	dmd	ldc2	gdc
vfunc (interface)	1.84	1.80	1.80
vs 1-method (interface)	10.73	3.53	6.05
delta%	484%	96%	236%

vfunc (class)	1.83	1.80	1.80
vs 1-method (class)	5.12	2.13	1.80
delta%	180%	18%	0%

double dispatch	4.11	2.40	2.13
2-method	7.75	3.14	3.40
delta%	88.45%	30.71	59.85

Times in nanoseconds, measured on my Asus ROG G751JT.

A few results stand out. The first is expected, the others are quite remarkable.

gdc and ldc2 do a better job at optimizing method dispatch
Method calls that take an object perform much better than those taking an interface; there may be some further improvements to be done here.
Method calls from an object are almost as fast as plain virtual function calls when ldc2 is used; they are just as fast with gdc. The latter is surprising and calls for further investigation.
Disappointingly, double dispatch beats binary methods. This is not the case in C++. My intuition is that extracting the method table pointer requires traversing too many indirections, to the point that it is more costly than a plain virtual function call. In contrast, yomm11 sticks the mptr right inside the object (but at the cost of requiring changes to the classes). This deserves further investigation, but I am convinced that a bit of help from the compiler (like reserving the second element of the vtbl for the mptr) would reverse this result.

Memory footprint is also a common concern when implementing table-based multiple dispatch: imagine a method with three virtual arguments, which can each be any of a dozen classes. This gives us a 12x12x12 table, containing 1728 function pointers. Fortunately, it is rare that a specialization is defined for each combination of arguments. Typically, there is a lot of duplication along each axis. This implementation takes advantages of this: it builds tables free of redundancies. The table is not “compressed” per se, as it never exists as a cartesian product of all the class sets; rather, it is built in terms of class partitions, not classes, where all the classes in the same group in the same dimension have the same set of candidate specializations. See
this article for an example.

Extending the Language – in D and in C++

Yomm11, the initial implementation of open methods in C++, takes 1845 lines of code (excluding comments) to implement; the D version weighs 1120 lines. Much of the difference is due to D’s ClassInfo. It contains information on the base class and inherited interfaces. It is used to build a bi-directional inheritance graph of the types that have methods attached to them.

C++’s type_info contains no such informaton, thus yomm11 comes with its own runtime class information system, and a macro that the user must call for each class participating in method dispatch. The usual difficulties with static constructors arise, and necessitates extra code to handle them.

Yomm11 can be used in two modes: intrusive and orthogonal. In the intrusive mode, the user augments the classes using macro calls. One of them allocates a method table pointer in the object; the other–called in each constructor–initializes the method pointer. In the orthogonal mode, no modification of the classes is required: the method pointer is stored in a hash map keyed by the type_info obtained via the typeid operator.

openmethods.d has two modes, too, but they are both orthogonal. The default mode stores the method pointer in the deallocator field of the ClassInfo. The ClassInfo of an object is available as the first pointer of the virtual function table; all this is documented. However, hijacking deallocator is a bit like cheating, and nothing guarantees that that field will be there forever.

For that reason, the library supports another mode, which is only slightly slower than the first: store the method pointer in an array indexed by a perfect integer hash of the virtual table pointer.

Unfortunately, it is not possible to use this approach in C++. It is possible to retrieve an object’s vptr, albeit by resorting to undocumented implementation details. However, the library needs to build the method tables without having instances of objects at hand; in D, on the other hand, the value of the vptr is available in the ClassInfo. Another idea would be to use a pointer to the type_info structure; alas, while a type_info can be obtained from a type as well as from an object, the standard explicitly states that the type_info object for a given type may not be unique.

Thus D provides at bit more information than C++, and that makes all the difference.

As for the meta-programming involved in processing the method declarations and specializations, it is easier, and yields a better syntax, in D than in C++, for several reasons.

Obviously, constructs like static if and foreach on type tuples make meta-programming easier. But the real advantage of D comes from the interplay
of template mixins, string mixins, compile-time reflection and alias. The mixin(registerMethods) incantation scans the entire translation unit and:

locates all the method declarations by detecting the functions that have virtual! in their signature
creates (via an alias created by a string mixin) a function with the same signature, minus the virtual qualifiers, which is what the user calls
finds all the method specializations (by locating the functions that have a @method attribute) and generates code that, at runtime, will register the specializations with the appropriate method

Conclusion

Object-oriented programming became popular in the nineties, but has been subjected to a lot of criticism in the last decade. This is in part because OOP promised modularity and extensibility, but failed to deliver. Instead we got “God” classes and Visitors. It is not the fault of the OOP paradigm per se, but rather of the unnatural and unnecessay fusion of class membership and polymorphism that most OO languages enforce. Open methods correct this mistake. As a bonus, this implementation also supports multiple dispatch. This is OOP done right: not objects “talking” to each other, but applying the appropriate algorithm depending on the arguments’ runtime types.

Open methods can be implemented as a library in C++ and in D, but D has a clear edge when it comes to meta-programming. As a result, the D version of the library delivers a lighter, cleaner syntax.

openmethods.d is available on dub

Jean-Louis Leroy is not French, but Belgian. He got his first taste of programming from a HP-25 calculator. His first real programming language was Forth, where CTFE is pervasive. Later he programmed (a little) in Lisp and Smalltalk, and (a lot) in C, C++, and Perl. He now works for Bloomberg LP in New York. His interests include object-relational mapping, open multi-methods, DSLs, and language extensions in general.

D as a Better C

Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century. This post is the first in a series about D’s BetterC mode

D was designed from the ground up to interface directly and easily to C, and to a lesser extent C++. This provides access to endless C libraries, the Standard C runtime library, and of course the operating system APIs, which are usually C APIs.

But there’s much more to C than that. There are large and immensely useful programs written in C, such as the Linux operating system and a very large chunk of the programs written for it. While D programs can interface with C libraries, the reverse isn’t true. C programs cannot interface with D ones. It’s not possible (at least not without considerable effort) to compile a couple of D files and link them in to a C program. The trouble is that compiled D files refer to things that only exist in the D runtime library, and linking that in (it’s a bit large) tends to be impractical.

D code also can’t exist in a program unless D controls the main() function, which is how the startup code in the D runtime library is managed. Hence D libraries remain inaccessible to C programs, and chimera programs (a mix of C and D) are not practical. One cannot pragmatically “try out” D by add D modules to an existing C program.

That is, until Better C came along.

It’s been done before, it’s an old idea. Bjarne Stroustrup wrote a paper in 1988 entitled “A Better C“. His early C++ compiler was able to compile C code pretty much unchanged, and then one could start using C++ features here and there as they made sense, all without disturbing the existing investment in C. This was a brilliant strategy, and drove the early success of C++.

A more modern example is Kotlin, which uses a different method. Kotlin syntax is not compatible with Java, but it is fully interoperable with Java, relies on the existing Java libraries, and allows a gradual migration of Java code to Kotlin. Kotlin is indeed a “Better Java”, and this shows in its success.

D as Better C

D takes a radically different approach to making a better C. It is not an extension of C, it is not a superset of C, and does not bring along C’s longstanding issues (such as the preprocessor, array overflows, etc.). D’s solution is to subset the D language, removing or altering features that require the D startup code and runtime library. This is, simply, the charter of the -betterC compiler switch.

Doesn’t removing things from D make it no longer D? That’s a hard question to answer, and it’s really a matter of individual preference. The vast bulk of the core language remains. Certainly the D characteristics that are analogous to C remain. The result is a language somewhere in between C and D, but that is fully upward compatible with D.

Removed Things

Most obviously, the garbage collector is removed, along with the features that depend on the garbage collector. Memory can still be allocated the same way as in C – using malloc() or some custom allocator.

Although C++ classes and COM classes will still work, D polymorphic classes will not, as they rely on the garbage collector.

Exceptions, typeid, static construction/destruction, RAII, and unittests are removed. But it is possible we can find ways to add them back in.

Asserts are altered to call the C runtime library assert fail functions rather than the D runtime library ones.

(This isn’t a complete list, for that see http://dlang.org/dmd-windows.html#switch-betterC.)

Retained Things

More importantly, what remains?

What may be initially most important to C programmers is memory safety in the form of array overflow checking, no more stray pointers into expired stack frames, and guaranteed initialization of locals. This is followed by what is expected in a modern language — modules, function overloading, constructors, member functions, Unicode, nested functions, dynamic closures, Compile Time Function Execution, automated documentation generation, highly advanced metaprogramming, and Design by Introspection.

Footprint

Consider a C program:

#include <stdio.h>

int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

It compiles to:

_main:
push EAX
mov [ESP],offset FLAT:_DATA
call near ptr _printf
xor EAX,EAX
pop ECX
ret

The executable size is 23,068 bytes.

Translate it to D:

import core.stdc.stdio;

extern (C) int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

The executable size is the same, 23,068 bytes. This is unsurprising because the C compiler and D compiler generate the same code, as they share the same code generator. (The equivalent full D program would clock in at 194Kb.) In other words, nothing extra is paid for using D rather than C for the same code.

The Hello World program is a little too trivial. Let’s step up in complexity to the infamous sieve benchmark program:

#include <stdio.h>

/* Eratosthenes Sieve prime number calculation. */

#define true    1
#define false   0
#define size    8190
#define sizepl  8191

char flags[sizepl];

int main() {
    int i, prime, k, count, iter;

    printf ("10 iterations\n");
    for (iter = 1; iter <= 10; iter++) {
        count = 0;
        for (i = 0; i <= size; i++)
            flags[i] = true;
        for (i = 0; i <= size; i++) {
            if (flags[i]) {
                prime = i + i + 3;
                k = i + prime;
                while (k <= size) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf ("\n%d primes", count);
    return 0;
}

Rewriting it in Better C:

import core.stdc.stdio;

extern (C):

__gshared bool[8191] flags;

int main() {
    int count;

    printf("10 iterations\n");
    foreach (iter; 1 .. 11) {
        count = 0;
        flags[] = true;
        foreach (i; 0 .. flags.length) {
            if (flags[i]) {
                const prime = i + i + 3;
                auto k = i + prime;
                while (k < flags.length) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf("%d primes\n", count);
    return 0;
}

It looks much the same, but some things are worthy of note:

extern (C): means use the C calling convention.
D normally puts static data into thread local storage. C sticks them in global storage. __gshared accomplishes that.
foreach is a simpler way of doing for loops over known endpoints.
flags[] = true; sets all the elements in flags to true in one go.
Using const tells the reader that prime never changes once it is initialized.
The types of iter, i, prime and k are inferred, preventing inadvertent type coercion errors.
The number of elements in flags is given by flags.length, not some independent variable.

And the last item leads to a very important hidden advantage: accesses to the flags array are bounds checked. No more overflow errors! We didn’t have to do anything
in particular to get that, either.

This is only the beginning of how D as Better C can improve the expressivity, readability, and safety of your existing C programs. For example, D has nested functions, which in my experience work very well at prying goto’s from my cold, dead fingers.

On a more personal note, ever since -betterC started working, I’ve been converting many of my old C programs still in use into D, one function at a time. Doing it one function at a time, and running the test suite after each change, keeps the program in a correctly working state at all times. If the program doesn’t work, I only have one function to look at to see where it went wrong. I don’t particularly care to maintain C programs anymore, and with -betterC there’s no longer any reason to.

The Better C ability of D is available in the 2.076.0 beta: download it and read the changelog.

A DUB Case Study: Compiling DMD as a Library

In his day job, Jacob Carlborg is a Ruby backend developer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and DVM, the topic of another post on this blog. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.

DUB is the official build tool and package manager for the D programming language. Originally written and currently maintained by Sönke Ludwig as part of the vibe.d web framework, its acceptance as an official part of the D toolchain means it is now shipping with the most recent DMD and LDC compilers.

A Quick Introduction to DUB

If you have have the latest DMD or LDC installed, you already have DUB installed as well. If not, or if you want to check for a more recent version, you can get the very latest release, beta or release candidate from the DUB download page.

You can create a new DUB project by executing the dub init command. This will start an interactive setup that guides you through project creation.

First decide the format of the package recipe. Two formats are supported: JSON and SDLang. Here we picked SDLang.
Then specify the name of the project. Press enter to use the default name, which is displayed in brackets and is inferred from the directory
Do the same for the description, author, license, copyright, and dependencies to select the default values

$ dub init foo
Package recipe format (sdl/json) [json]: sdl
Name [foo]:
Description [A minimal D application.]:
Author name [Jacob Carlborg]:
License [proprietary]:
Copyright string [Copyright © 2017, Jacob Carlborg]:
Add dependency (leave empty to skip) []:
Successfully created an empty project in '/Users/jacob/tmp/foo'.
Package successfully created in foo

After the setup has completed, the following files and directories will have been created:

$ tree foo
foo
├── dub.sdl
└── source
    └── app.d

1 directory, 2 files

dub.sdl is the package recipe file, which provides instructions telling DUB how to build the package
source is the default path where DUB looks for D source files
app.d contains the main function and is an example Hello World generated by DUB with the following content:

import std.stdio;

void main()
{
	writeln("Edit source/app.d to start your project.");
}

The content of the dub.sdl file is the following:

name "foo"
description "A minimal D application."
authors "Jacob Carlborg"
copyright "Copyright © 2017, Jacob Carlborg"
license "proprietary"

All of which was taken from what we specified during project creation. By default, DUB looks for D source files in either source or src directories and compiles all files it finds there and in any subdirectories.

To build and run the application, navigate to the project’s root directory, foo in this case, and invoke dub:

$ dub
Performing "debug" build using dmd for x86_64.
foo ~master: building configuration "application"...
Linking...
Running ./foo
Edit source/app.d to start your project.

To build without running, invoke dub build:

$ dub build
Performing "debug" build using dmd for x86_64.
foo ~master: building configuration "application"...
Linking...

Case Study: DMD as a Library

Recently there has been some progress in making the D compiler (DMD) available as a library. Razvan Nitu has been working on it as part of his D Foundation scholarship at the University Politechnica of Bucharest. He gave a presentation at DConf 2017 (a video of the talk is available, as well as examples in the DMD repository). So I had the idea that as part of the DConf 2017 hackathon I could create a simple DUB package for DMD to make only the lexer and the parser available as a library, something his work has made possible.

Currently DMD is built using make. There are three Makefiles, one for Posix, one for 32-bit Windows and one for 64-bit Windows (which is only a wrapper of the 32-bit one). I don’t intend to try to completely replicate the Makefiles as a DUB package (they contain some additional tasks besides building the compiler), but instead will start out fresh and only include what’s necessary to build the lexer and parser.

DMD already has all the source code in the src directory, which is one of the directories DUB searches by default. If we would leave it as is, DUB would include the entirety of DMD, including the backend and other parts we don’t want to include at this point.

The first step is to create the DUB package recipe file. We start simple with only the metadata (here using the SDLang format):

name "dmd"
description "The DMD compiler"
authors "Walter Bright"
copyright "Copyright © 1999-2017, Digital Mars"
license "BSL-1.0"

When we have this we need to figure out which files to include in the package. We can do this by invoking DMD with the -deps flag to generate the imports of a module. A good start is the lexer, which is located in src/ddmd/lexer.d. We run the following command to output the imports that lexer.d is using:

$ dmd -deps=deps.txt -o- -Isrc src/ddmd/lexer.d

This will write a file named deps.txt containing all the imports used by lexer.d. The -o- flag is used to tell the compiler not to generate any code. The -I flag is used to add an import path where the compiler will look for additional modules to import (but not compile). An example of the output looks like this (the long path names have been reduced to save space):

core.attribute (druntime/import/core/attribute.d) : private : object (druntime/import/object.d)
object (druntime/import/object.d) : public : core.attribute (druntime/import/core/attribute.d):selector
ddmd.lexer (ddmd/lexer.d) : private : object (druntime/import/object.d)
core.stdc.ctype (druntime/import/core/stdc/ctype.d) : private : object (druntime/import/object.d)
ddmd.root.array (ddmd/root/array.d) : private : object (druntime/import/object.d)
ddmd.root.array (ddmd/root/array.d) : private : core.stdc.string (druntime/import/core/stdc/string.d)

The most interesting part of this output, in this case, is the first column, which consists of a long list of module names. What we are interested in here is a unique list of modules that are located in the ddmd package. All modules in the core package are part of the D runtime and are already precompiled as a library and automatically linked when compiling a D executable, so these modules don’t need to be compiled. The modules from the ddmd package can be extracted with some search-and-replace in your favorite text editor or using some standard Unix command lines tools:

$ cat deps.txt | cut -d ' ' -f 1 | grep ddmd | sort | uniq
ddmd.console
ddmd.entity
ddmd.errors
ddmd.globals
ddmd.id
ddmd.identifier
ddmd.lexer
ddmd.root.array
ddmd.root.ctfloat
ddmd.root.file
ddmd.root.filename
ddmd.root.hash
ddmd.root.outbuffer
ddmd.root.port
ddmd.root.rmem
ddmd.root.rootobject
ddmd.root.stringtable
ddmd.tokens
ddmd.utf

Here we can see that a set of modules is located in the nested package ddmd.root. This package contains common functionality used throughout the DMD source code. Since it doesn’t have any dependencies on any code outside the package it’s a good fit to place in a DUB subpackage. This can be done using the subPackage directive, as follows:

subPackage {
  name "root"
  targetType "library"
  sourcePaths "src/ddmd/root"
}

We specify the name of the subpackage, root. The targetType directive is used to tell DUB whether it should build an executable or a library (though it’s optional — DUB will build an executable if it finds an app.d in the root of the source directory and a library if it doesn’t). Finally, sourcePaths can be used to specify the paths where DUB should look for the D source files if neither of the default directories is used. Fortunately, we want to include all the files in the src/ddmd/root, so using sourcePaths works perfectly fine.

We can verify that the subpackage works and builds by invoking:

$ dub build :root
Building package dmd:root in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:root ~master: building configuration "library"...

:package-name is shorthand that tells DUB to build the package-name subpackage of the current package, in our case the root subpackage.

After removing all the modules from the root package from the initial list of dependencies, the following modules remain:

ddmd.console
ddmd.entity
ddmd.errors
ddmd.globals
ddmd.id
ddmd.identifier
ddmd.lexer
ddmd.tokens
ddmd.utf

The next step is to create a subpackage for the lexer containing the remaning modules.

subPackage {
  name "lexer"
  targetType "library"
  sourcePaths

Again we start by specifying the name of the subpackage and that the target type is a library. Specifying sourcePaths without any value will set it to an empty list, i.e. no source paths. This is done because there are more files than we want to include in this subpackage in the source directory.

sourceFiles \
    "src/ddmd/console.d" \
    "src/ddmd/entity.d" \
    "src/ddmd/errors.d" \
    "src/ddmd/globals.d" \
    "src/ddmd/id.d" \
    "src/ddmd/identifier.d" \
    "src/ddmd/lexer.d" \
    "src/ddmd/tokens.d" \
    "src/ddmd/utf.d"

The above specifies all source files that should be included in this subpackage. The difference between sourcePaths and sourceFiles is that sourcePaths expects a whole directory of source files that should be included, where sourceFiles lists only the individual files that should be included. A list in SDLang is written by separating the items with a space. The backslash (\) is used for line continuation, making it possible spread the list across multiple lines.

The final step of the lexer subpackage is to add a dependency on the root subpackage. This is done with the dependency directive:

dependency "dmd:root" version="*"
}

The first parameter for the dependency directive is the name of another DUB package. The colon is used to separate the package name from the subpackage name. The version attribute is used to specify which version the package should depend on. The * is used to indicate that any version of the dependency matches, i.e. the latest version should always be used. When implementing subpackages in any given package, this is generally what should be used. External projects that depend on any DUB package should specify a SemVer version number corresponding to a known release version.

If we build the lexer subpackage now it will result in an error:

$ dub build :lexer
Building package dmd:lexer in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:lexer ~master: building configuration "library"...
src/ddmd/globals.d(339,21): Error: need -Jpath switch to import text file VERSION
dmd failed with exit code 1.

Looking at the file and line of the error shows that it contains the following code:

_version = (import("VERSION") ~ '\0').ptr;

This code contains an import expression. Import expressions differ from import statements (e.g. import std.stdio;) in that they take a file from the file system and insert its contents into the current module. It’s just as if you copied and pasted the contents yourself. Using an import expression requires that the path where the file is imported from be passed to the compiler as a security mechanism. This can be done using the -J flag. In this case, we want to use the package root, where we are executing DUB, so we can use a single dot: “.“. Passing arbitrary flags to the compiler can be done with the dflags build setting, as follows:

dflags "-J."

Add that to the lexer subpackage configuration and it will compile correctly:

$ dub build :lexer
Building package dmd:lexer in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:lexer ~master: building configuration "library"...

For the final subpackage, we have the parser. The parser is located in src/ddmd/parse.d. To get its dependencies we can use the same approach we used for the lexer. But we will filter out all files that are part of the other subpackages:

$ dmd -deps=deps.txt -Isrc -J. -o- src/ddmd/parse.d
$ cat deps.txt | cut -d ' ' -f 1 | grep ddmd | grep -E -v '(root|console|entity|errors|globals|id|identifier|lexer|tokens|utf)' | sort | uniq
ddmd.parse

Here, we’re supplying the -v flag to grep to filter the results and the -E flag to enable extended regular expressions. All modules from the root package and all modules from the lexer subpackage are filtered out and the only remaining module is the ddmd.parse module.

The subpackage for the parser will look similar to the other subpackages:

subPackage {
  name "parser"
  targetType "library"
  sourcePaths

  sourceFiles "src/ddmd/parse.d"

  dependency "dmd:lexer" version="*"
}

Again, we can verify that it’s working by building the subpackage:

$ dub build :parser
Building package dmd:parser in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:parser ~master: building configuration "library"...

Currently we have three subpackages in the DUB recipe file, but no way to use the main package as a whole. To fix this we add the parser subpackage as a dependency of the main package. We pick the parser subpackage as a dependency because it will include the other two subpackages through its own dependencies.

license "BSL-1.0"

targetType "none"
dependency ":parser" version="*"

subPackage {
  name "root"

In addition to specifying parser as a dependency, we also specify the target type to be none. This will avoid building an empty library out of the main package, since it doesn’t contain any source files of its own.

As a final step, we’ll verify that the whole library is working by creating a separate project that uses the DMD DUB package as a dependency. We create a new DUB project in the test directory, called dub_package:

$ cd test
$ mkdir dub_package
$ cd dub_package
$ cat > dub.sdl <<EOF
> name "dmd-dub-test"
> description "Test of the DMD Dub package"
> license "BSL 1.0"
>
> dependency "dmd" path="../../"
> EOF
$ mkdir source

We create a new file, source/app.d, with the following content:

void main()
{
}

// lexer
unittest
{
    import ddmd.lexer;
    import ddmd.tokens;

    immutable expected = [
        TOKvoid,
        TOKidentifier,
        TOKlparen,
        TOKrparen,
        TOKlcurly,
        TOKrcurly
    ];

    immutable sourceCode = "void test() {} // foobar";
    scope lexer = new Lexer("test", sourceCode.ptr, 0, sourceCode.length, 0, 0);
    lexer.nextToken;

    TOK[] result;

    do
    {
        result ~= lexer.token.value;
    } while (lexer.nextToken != TOKeof);

    assert(result == expected);
}

// parser
unittest
{
    import ddmd.astbase;
    import ddmd.parse;

    scope parser = new Parser!ASTBase(null, null, false);
    assert(parser !is null);
}

The above file contains two unit tests, one for the lexer and one for the parser. We can run dub test to run the unit tests for this package:

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
../../src/ddmd/globals.d(339,21): Error: file "VERSION" cannot be found or not in a path specified with -J
dmd failed with exit code 1.

Which gives us the error that it cannot find the VERSION file in any string import paths, even though we added the correct directory to the string import paths. If we run the tests with verbose output enabled, using the --verbose flag we get a hint (the output has been reduced to save space):

dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd -J. -lib

Here we see that the compiler is invoked with the -J. flag, which is what we previously specified in the lexer subpackage. The problem is that the current directory is now of the dmd-dub-test DUB package instead of the dmd DUB package. Looking at the documentation of DUB we can see there’s an environment variable, $PACKAGE_DIR, that we can use as the string import path instead of hardcoding it to use a single dot. We update the dflags setting of the lexer subpackage to use the $PACKAGE_DIR environment variable:

dflags "-J$PACKAGE_DIR"
}

Running the tests again shows that the error is fixed, but now we get a new error, a long list of undefined symbols (shortened here):

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd:parser ~issue-17392-dub: building configuration "library"...
dmd-dub-test ~master: building configuration "application"...
Linking...
Undefined symbols for architecture x86_64:
  "_D4ddmd7astbase12__ModuleInfoZ", referenced from:
      _D3app12__ModuleInfoZ in dmd-dub-test.o

The reason for this is that we’re importing the ddmd.astbase module in the test of the parser, but it’s never compiled. We can solve that problem by adding it to the parser subpackage in the dmd DUB package. Running dmd again to show all its dependencies shows that it also depends on the ddmd.astbasevisitor module. We add these two modules as follows:

sourceFiles \
  "src/ddmd/astbase.d" \
  "src/ddmd/astbasevisitor.d" \
  "src/ddmd/parse.d"

Finally, running the tests again shows that everything is working correctly:

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd:parser ~issue-17392-dub: building configuration "library"...
dmd-dub-test ~master: building configuration "application"...
Linking...
Running ./dmd-dub-test

After verifying that both the lexer and parser are working in a separate DUB package, this is the final result of the package recipe for the dmd DUB package:

name "dmd"
description "The DMD compiler"
authors "Walter Bright"
copyright "Copyright © 1999-2017, Digital Mars"
license "BSL-1.0"

targetType "none"
dependency ":parser" version="*"

subPackage {
  name "root"
  targetType "library"
  sourcePaths "src/ddmd/root"
}

subPackage {
  name "lexer"
  targetType "library"
  sourcePaths

  sourceFiles \
    "src/ddmd/console.d" \
    "src/ddmd/entity.d" \
    "src/ddmd/errors.d" \
    "src/ddmd/globals.d" \
    "src/ddmd/id.d" \
    "src/ddmd/identifier.d" \
    "src/ddmd/lexer.d" \
    "src/ddmd/tokens.d" \
    "src/ddmd/utf.d"

  dflags "-J$PACKAGE_DIR"

  dependency "dmd:root" version="*"
}

subPackage {
  name "parser"
  targetType "library"
  sourcePaths

  sourceFiles \
    "src/ddmd/astbase.d" \
    "src/ddmd/astbasevisitor.d" \
    "src/ddmd/parse.d"

  dependency "dmd:lexer" version="*"
}

All this has now been merged into master and the DUB package is available here: http://code.dlang.org/packages/dmd. Happy hacking!

New D Compiler Release: DMD 2.075.0

DMD 2.075.0 was released a few days back. As with every release, the changelog is available so you can browse the list of fixed bugs and new features. 2.075.0 can be fetched from the dlang.org download page, which always makes available the latest DMD release alongside a nightly build.

Notable Changes

Every DMD release brings with it a number of bug fixes, changes, and enhancements. Here are some of the more noteworthy changes in this release.

Two array properties removed

Anyone who does a lot of work with D’s ranges will likely have encountered this little annoyance that arises from the built-in .sort property of arrays.

void main()
{
    import std.algorithm : remove, sort;
    import std.array : array;
    int[] nums = [5, 3, 1, 2, 4];
    nums = nums.sort.remove(2).array;
}

The .sort property has been deprecated for ages, so the above would result in the following error:

sorted.d(6): Deprecation: use std.algorithm.sort instead of .sort property

The workaround would be to add an empty set of parentheses to the sort call. With DMD 2.075.0, this is no longer necessary and the above will compile. Both the .sort and .reverse array properties have finally been removed from the language.

For the uninitiated, D has two features that have proven convenient in the functional pipeline programming style typically used with ranges. One is that parentheses on a function call are optional when there are no parameters. The other is Universal Function Call Syntax (UFCS), which allows a function call to be made using the dot notation on the first argument, so that a function int add(int a, int b) can be called as: 10.add(5).

Each of D’s built-in types comes with a set of built-in properties. Given that the built-in properties are not functions, no parentheses are used to access them. The .sort array property has been around since the early days of D1. At the time, it was rather useful and convenient for anyone who was happy with the default implementation. When D2 came along with the range paradigm, the standard library was given a set of functions that can treat arrays as ranges, opening them up to use with the many range based functions in the std.algorithm package and elsewhere.

With optional parentheses, UFCS, and a range-based function in std.algorithm called sort, conflict was inevitable. Now range-based programmers can put that behind them and take one more pair of parentheses out of their pipelines.

The breaking up of std.datetime

The std.datetime module has had a reputation as the largest module in D’s standard library. Some developers have been known to use it a stress test for their tooling. It was added to the library long before D got the special package module feature, which allows multiple modules in a package to be imported as a single module.

Once package modules were added, Jonathan M. Davis, the original std.datetime developer, found it challenging to split the monolith into multiple modules. Then, at DConf 2017, he could be seen toiling away on his laptop in the conference hall and the hotel lobby. On the final day of the conference, the day of the DConf Hackathon, he announced that std.datetime was now a package. DMD 2.075.0 is the first release where the new module structure is available.

Any existing code using the old module should still compile. However, any static libraries or object files lying around with the old symbols stuffed inside may need to be recompiled.

Colorized compiler messages

This one is missing from the changelog. DMD now has the ability to output colorized messages. The implementation required going through the existing error messages and properly annotating them where appropriate, so there may well be some messages for which the colors are missing. Also, given that this is a brand new feature and people can be picky about their terminal colors, more work will likely be done on this in the future. Perhaps that might include support for customization.

Compiler Ddoc documentation online

DMD, though originally written in C++, was converted to D some time ago. Now that more D programmers are able to contribute to the compiler, work has gone into documenting its source using D’s built-in Ddoc syntax. The result is now online, accessible from the sidebar of the existing library reference. A good starting point is the ddmd.mars module.

And more…

The above is a small part of the bigger picture. The bugfix list shows 89 bugs, regressions, and enhancements across the compiler, runtime, standard library, and web site. See the full changelog for the details.

Thanks to everyone who contributed to this release, whether it was by reporting issues, submitting or reviewing pull requests, testing out the beta, or carrying out any of the numerous small tasks that help a new release see the light of day.

DCompute: GPGPU with Native D for OpenCL and CUDA

Nicholas Wilson is a student at Murdoch University, studying for his BEng (Hons)/BSc in Industrial Computer Systems (Hons) and Instrumentation & Control/ Molecular Biology & Genetics and Biomedical Science. He just finished his thesis on low-cost defect detection of solar cells by electroluminescence imaging, which gives him time to work on DCompute and write about it for the D Blog. He plays the piano, ice skates, and has spent 7 years putting D to use on number bashing, automation, and anything else that he could make a computer do for him.

DCompute is a framework and compiler extension to support writing native kernels for OpenCL and CUDA in D to utilise GPUs and other accelerators for computationally intensive code. In development are drivers to automate the interactions between user code and the tedious and error prone compute APIs with the goal of enabling the rapid development of high performance D libraries and applications.

Introduction

After watching John Colvin’s DConf 2016 presentation in May of last year on using D’s metaprogramming to make the OpenCL API marginally less horrible to use, I thought, “This would be so much easier to do if we were able to write kernels in D, rather than doing string manipulations in OpenCL C”. At the time, I was coming up to the end of a rather busy semester and thought that would make a good winter^[1] project. After all, LDC, the LLVM D Compiler, has access to LLVM’s SPIR-V and PTX backends, and I thought, “It can’t be too hard, its only glue code”. I slightly underestimated the time it would take, finishing the first stage of DCompute (because naming things is hard), mainlining the changes I made to LDC at the end of February, eight months later — just in time for the close of submissions to DConf, where I gave a talk on the progress I had made.

Apart from familiarising myself with the LDC and DMD front-end codebases, I also had to understand the LLVM SPIR-V and PTX backends that I was trying to target, because they require the use of special metadata (for e.g. denoting a function is a kernel) and address spaces, used to represent __global & friends in OpenCL C and __global__ & friends in CUDA, and introduce these concepts into LDC.

But once I was familiar with the code and had sorted the above discrepancies, it was mostly smooth sailing translating the OpenCL and CUDA modifiers into compiler-recognised attributes and wrapping the intrinsics into an easy to use and consistent interface.

When it was all working and almost ready to merge into mainline LDC, I hit a bit of a snag with regards to CI: the SPIR-V backend that was being developed by Khronos was based on the quite old LLVM 3.6.1 and, despite my suggestions, did not have any releases. So I forward ported the backend and the conversion utility to the master branch of LLVM and made a release myself. Still in progress on this front are converting magic intrinsics to proper LLVM intrinsics and transitioning to a TableGen-driven approach for the backend in preparation for merging the backend into LLVM Trunk. This should hopefully be done soon™.

Current state of DCompute

With the current state of DCompute we are able to write kernels natively in D and have access to most of its language-defining features like templates & static introspection, UFCS, scope guards, ranges & algorithms and CTFE. Notably missing, for hardware and performance reasons, are those features commonly excluded in kernel languages, like function pointers, virtual functions, dynamic recursion, RTTI, exceptions and the use of the garbage collector. Note that unlike OpenCL C++ we allow kernel functions to be templated and have overloads and default values. Still in development is support for images and pipes.

Example code

To write kernels in D, we need to pass -mdcompute-targets=<targets> to LDC, where <targets> is a comma-separated list of the desired targets to build for, e.g. ocl-120,cuda-350 for OpenCL 1.2 and CUDA compute capability 3.5, respectively (yes, we can do them all at once!). We get one file for each target, e.g. kernels_ocl120_64.spv, when built in 64-bit mode, which contains all of the code for that device.

The vector add kernel in D is:

@compute(CompileFor.deviceOnly) module example;
import ldc.dcompute;
import dcompute.std.index;

alias gf = GlobalPointer!float;

@kernel void vadd(gf a, gf b, gf c) 
{
	auto x = GlobalIndex.x;
	a[x] = b[x]+c[x];
}

Modules marked with the @compute attribute are compiled for each of the command line targets, @kernel makes a function a kernel, and GlobalPointer is the equivalent of the __global qualifier in OpenCL.

Kernels are not restricted to just functions — lambdas & tamplates also work:

@kernel void map(alias F)(KernelArgs!F args)
{
    F(args);
}
//In host code
AutoBuffer!float x,y,z; // y & z initialised with data
q.enqueue!(map!((a,b,c) => a=b+c))(x.length)(x, y, z);

Where KernelArgs translates host types to device types (e.g. buffers to pointers or, as in this example, AutoBuffers to AutoIndexed Pointers) so that we encapsulate the differences in the host and device types.

The last line is the expected syntax for launching kernels, q.enqueue!kernel(dimensions)(args), akin to CUDA’s kernel<<<dimensions,queue>>>(args). The libraries for launching kernels are in development.

Unlike CUDA, where all the magic for transforming the above expression into code on the host lies in the compiler, q.enqueue!func(sizes)(args) will be processed by static introspection of the driver library of DCompute.
The sole reason we can do this in D is that we are able to query the mangled name the compiler will give to a symbol via the symbol’s .mangleof property. This, in combination with D’s easy to use and powerful templates, means we can significantly reduce the mental overhead associated with using the compute APIs. Also, implementing this in the library will be much simpler, and therefore faster to implement, than putting the same behaviour in the compiler. While this may not seem much for CUDA users, this will be a breath of fresh air to OpenCL users (just look at the OpenCL vector add host code example steps 7-11).

While you cant do that just yet in DCompute, development should start to progress quickly and hopefully become a reality soon.

I would like to thank John Colvin for the initial inspiration, Mike Parker for editing, and the LDC folks, David Nadlinger, Kai Nacke, Martin Kinke, with a special thanks to Johan Engelen, for their help with understanding the LDC codebase and reviewing my work.

If you would like to help develop DCompute (or be kept in the loop), feel free to drop a line at the libmir Gitter. Similarly, any efforts preparing the SPIR-V backend for inclusion into LLVM are also greatly appreciated.

[1] Southern hemisphere.

Go Your Own Way (Part One: The Stack)

This is my third post in the GC series. In the first post, I introduced D’s garbage collector and the language features that require it, and touched on simple strategies to use it effectively. In the second post, I showed off the tools provided by the language and library to disable or prohibit the GC in specific parts of a code base, how to use the compiler to assist in that endeavor, and recommended that D programs be written initially to embrace the GC, taking advantage of simple strategies to mitigate its impact, and later tuned to avoid it or further optimize its usage only when profiling shows it’s warranted.

When garbage collection is turned off via GC.disable or prevented by the @nogc function annotation, memory will still need to be allocated from somewhere. And even when the GC is fully embraced, it’s still desirable to minimize the size and quantity of GC heap allocations. That means allocating either via the stack, or via the non-GC heap. The focus of this post is the former. Non-GC heap allocations will be covered in my next post in this series.

Stack allocation

The simplest allocation strategy in D is the same as it is in C: avoid the heap and use the stack whenever possible. When a local array is needed and the size can be known at compile time, use a static rather than a dynamic array. Structs, which are value types and stack-allocated by default, should be preferred where possible over classes, which are reference types and are usually allocated from one heap or another. D’s compile-time features can present opportunities here that might not otherwise be available.

Static arrays

Static array declarations in D require the length to be known at compile-time.

// OK
int[10] nums;

// Error: variable x cannot be read at compile time
int x = 10;
int[x] err;

Unlike dynamic arrays, static arrays can be initialized with array literals with no allocation taking place on the GC heap. The lengths must match, otherwise the compiler will emit an error.

@nogc void main() {
    int[3] nums = [1, 2, 3];
}

Static arrays are automatically sliced when passed to any function taking a slice as a parameter, making them interchangeable with dynamic arrays.

void printNums(int[] nums) {
    import std.stdio : writeln;
    writeln(nums);
}

void main() {
    int[]  dnums = [0, 1, 2];
    int[3] snums = [0, 1, 2];
    printNums(dnums);
    printNums(snums);
}

When compiling with -vgc to find the potential GC allocations in a program and eliminate them where possible, this is an easy win. Just be wary of situations like the following:

int[] foo() {
    auto nums = [0, 1, 2];

    // Do work with nums...

    return nums;
}

Converting nums in this example to a static array would be a mistake. The return statement in that case would be returning a slice to stack-allocated memory, which is a programming error. Luckily, doing so will generate a compiler error.

On the other hand, if the return is conditional, it may be desirable to heap-allocate the array only when absolutely necessary rather than every time the function is called. In that scenario, a static array can be declared locally and a dynamic copy made on return. Enter the .dup property:

int[] foo() {
    int[3] nums = [0, 1, 2];
    
    // Let x = the result of some work with nums
    bool condtion = x;

    if(condition) return nums.dup;
    else return [];
}

This function still uses the GC via .dup, but only allocates if it needs to and avoids allocation when it doesn’t. Note that [] is equivalent to null in this case, a slice (or dynamic array) with a .length of 0 and a .ptr of null.

Structs vs. classes

Struct instances in D are allocated on the stack by default, but can be allocated on the heap when desired. Stack-allocated structs have deterministic destruction, with their destructors called as soon as the enclosing scope exits.

struct Foo {
    int x;
    ~this() {
        import std.stdio;
        writefln("#%s says bye!", x);
    }
}
void main() {
    Foo f1 = Foo(1);
    Foo f2 = Foo(2);
    Foo f3 = Foo(3);
}

As expected, this prints:

#3 says bye!
#2 says bye!
#1 says bye!

Classes, being reference types, are almost always allocated on the heap. Usually, that’s the GC heap via new, though it could also be the non-GC heap through a custom allocator. But there’s no rule saying they can’t be allocated on the stack. The standard library template std.typecons.scoped allows us to easily do so.

class Foo {
    int x;

    this(int x) { 
        this.x = x; 
    }
    
    ~this() {
        import std.stdio;
        writefln("#%s says bye!", x);
    }
}
void main() {
    import std.typecons : scoped;
    auto f1 = scoped!Foo(1);
    auto f2 = scoped!Foo(2);
    auto f3 = scoped!Foo(3);
}

Functionally, this is identical to the struct example above; it prints the same results. Deterministic destruction is achieved via the core.object.destroy function, which allows destructors to be called outside of GC collections.

Note that neither scoped nor destroy are currently usable in @nogc functions. This isn’t necessarily a problem, as a function doesn’t have to be annotated such to avoid the GC, but it can be a headache if you are trying to fit everything into a @nogc call tree. In future posts, we’ll look at some of the design issues that crop up when using @nogc and how to avoid them.

Generally, when implementing custom types in D, the choice between struct and class should be dependent on the type’s intended usage. POD types are obvious candidates for struct, whereas for types in something like a GUI system, where inheritance heirarchies and runtime interfaces are extremely useful, class is a more appropriate choice. Beyond those obvious cases, there are a number of other considerations which could be the focus of a separate blog post on the topic. For our purposes, just keep in mind that whether or not a type is implemented as a struct or a class need not always dictate whether or not instances can be allocated on the stack.

alloca

Given that D makes alloca available, it is also an option for stack allocation. This is a candidate especially for arrays when you want to avoid or eliminate a local GC allocation, but the array size is only known at run time. The following example allocates a dynamic array with a runtime size on the stack.

import core.stdc.stdlib : alloca;

void main() {
    size_t size = 10;
    void* mem = alloca(size);

    // Slice the memory block
    int[] arr = cast(int[])mem[0 .. size];
}

The same caution about using alloca in C applies here: be careful not to blow up the stack. And as with local static arrays, don’t return a slice of arr. Return arr.dup instead.

A simple example

Consider an implementation of a Queue data type. An idiomatic implementation in D is going to be a struct that’s templated on the type it’s intended to contain. In Java, collection usage is interface heavy and it’s recommended to declare an instance using an interface type rather than the implementation type. Structs in D can’t implement interfaces, but in many cases they can still be used to program to interfaces thanks to Design by Introspection (DbI). This allows programming to a common interface that is verified via compile-time introspection without the need for an interface type, so it can work with structs, classes and, thanks to Universal Function Call Syntax (UFCS), even free functions (when the functions are in scope).

D’s arrays are an obvious choice as the backing store for a Queue implementation. Moreover, there’s an opportunity to make the backing store a static array when a queue is intended to be bounded with a fixed size. Since it’s already a templated type, an additional parameter, a template value parameter with a default value can easily be added to decide at compile time if the array should be static or not and, if so, how much space it should require.

// A default Size of 0 means to use a dynamic array for the
// backing store; non-zero indicates a static array.
struct Queue(T, size_t Size = 0) 
{
    // This constant will be inferred as a boolean. By making it
    // public, a DbI template outside of this module can determine
    // whether or not the Queue might grow. 
    enum isFixedSize = Size > 0;

    void enqueue(T item) 
    {
        static if(isFixedSize) {
            assert(_itemCount < _items.length);
        }
        else {
            ensureCapacity();
        }
        push(item);
    }

    T dequeue() {
        assert(_itemCount != 0);
        static if(isFixedSize) {
            return pop();
        }
        else {
            auto ret = pop();
            ensurePacked();
            return ret;
        }
    }

    // Only available on a growable array
    static if(!isFixedSize) {
        void reserve(size_t capacity) { /* Allocate space for new items */ }
    }

private:   
    static if(isFixedSize) {
        T[Size] _items;     
    }
    else T[] _items;
    size_t _head, _tail;
    size_t _itemCount;

    void push(T item) { 
        /* Add item, update _head and _tail */
        static if(isFixedSize) { ... }
        else { ... }
    }

    T pop() { 
        /* Remove item, update _head and _tail */ 
        static if(isFixedSize) { ... }
        else { ... }
    }

    // These are only available on a growable array
    static if(!isFixedSize) {
        void ensureCapacity() { /* Alloc memory if needed */ }
        void ensurePacked() { /* Shrink the array if needed */}
    }
}

With this, the client can declare instances like so:

Queue!Foo qUnbounded;
Queue!(Foo, 128) qBounded;

qBounded requires no heap allocations. What happens with qUnbounded depends on the implementation. Moreover, compile-time introspection can be used to test if an instance is a fixed size or not. The isFixedSize constant is a convenience for that. Clients could alternatively use the built-in __traits(hasMember, T, "reserve") or the standard library function std.traits.hasMember!T("reserve") in one compile-time construct or another (__traits and std.traits are great for DbI, and the latter should be preferred when it provides similar functionality), but including the constant in the type is more convenient.

void doSomethingWithQueueInterface(T)(T queue)
{
    static if(T.isFixedSize) { ... }
    else { ... }
}

Conclusion

This has been a brief overview of a few options for stack allocation in D to avoid allocations from the GC heap. Making use of them when possible is an easy way to minimize the size and quantity of GC allocations, a proactive strategy for mitigating potential negative performance impacts from garbage collection.

The next post in this series will cover some of the options available for non-GC heap allocations.

Life in the Fast Lane

The first post I wrote in the GC series introduced the D garbage collector and the language features that use it. Two key points that I tried to get across in the article were:

The GC can only run when memory allocations are requested. Contrary to popular misconception, the D GC isn’t generally going to decide to pause your Minecraft clone in the middle of the hot path. It will only run when memory from the GC heap is requested, and then only if it needs to.
Simple C and C++ allocation strategies can mitigate GC pressure. Don’t allocate memory in inner loops – preallocate as much as possible, or fetch it from the stack instead. Minimize the total number of heap allocations. These strategies work because of point #1 above. The programmer can dictate when it is possible for a collection to occur simply by being smart about when GC heap allocations are made.

The strategies in point #2 are fine for code that a programmer writes herself, but they aren’t going to help at all with third-party libraries. For those situations, D provides built-in mechanisms to guarantee that no GC allocations can occur, both in the language and the runtime. There are also command-line options that can help make sure the GC stays out of the way.

Let’s imagine a hypothetical programmer named J.P. who, for reasons he considers valid, has decided he would like to avoid garbage collection completely in his D program. He has two immediate options.

The GC chill pill

One option is to make a call to GC.disable when the program is starting up. This doesn’t stop allocations, but puts a hold on collections. That means all collections, including any that may result from allocations in other threads.

void main() {
    import core.memory;
    import std.stdio;
    GC.disable;
    writeln("Goodbye, GC!");
}

Output:

Goodbye, GC!

This has the benefit that all language features making use of the GC heap will still work as expected. But, considering that allocations are still going without any cleanup, when you do the math you’ll realize this might be problematic. If allocations start to get out of hand, something’s gotta give. From the documentation:

Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

Depending on J.P.’s perspective, this might not be a good thing. But if this constraint is acceptable, there are some additional steps that can help keep things under control. J.P. can make calls to GC.enable or GC.collect as necessary. This provides greater control over collection cycles than the simple C and C++ allocation strategies.

The GC wall

When the GC is simply intolerable, J.P. can turn to the @nogc attribute. Slap it at the front of the main function and thou shalt suffer no collections.

@nogc
void main() { ... }

This is the ultimate GC mitigation strategy. @nogc applied to main will guarantee that the garbage collector will never run anywhere further along the callstack. No more caveats about collecting “where the implementation deems necessary”.

At first blush, this may appear to be a much better option than GC.disable. Let’s try it out.

@nogc
void main() {
    import std.stdio;
    writeln("GC be gone!");
}

This time, we aren’t going to get past compilation:

Error: @nogc function 'D main' cannot call non-@nogc function 'std.stdio.writeln!string.writeln'

What makes @nogc tick is the compiler’s ability to enforce it. It’s a very blunt approach. If a function is annotated with @nogc, then any function called from inside it must also be annotated with @nogc. As may be obvious, writeln is not.

That’s not all:

@nogc 
void main() {
    auto ints = new int[](100);
}

The compiler isn’t going to let you get away with that one either.

Error: cannot use 'new' in @nogc function 'D main'

Any language feature that allocates from the GC heap is out of reach inside a function marked @nogc (refer to the first post in this series for an overview of those features). It’s turtles all the way down. The big benefit here is that it guarantees that third-party code can’t use those features either, so can’t be allocating GC memory behind your back. Another downside is that any third-party library that is not @nogc aware is not going to be available in your program.

Using this approach requires a number of workarounds to make up for non-@nogc language features and library functions, including several in the standard library. Some are trivial, some are not, and others can’t be worked around at all (we’ll dive into the details in a future post). One example that might not be obvious is throwing an exception. The idiomatic way is:

throw new Exception("Blah");

Because of the new in that line, this isn’t possible in @nogc functions. Getting around this requires preallocating any exceptions that will be thrown, which in turn runs into the issue that any exception memory allocated from the regular heap still needs to be deallocated, which leads to ideas of reference counting or stack allocation… In other words, it’s a big can of worms. There’s currently a D Improvement Proposal from Walter Bright intended to stuff all the worms back into the can by making throw new Exception work without the GC when it needs to.

It’s not an insurmountable task to get around the limitations of @nogc main, it just requires a good bit of motivation and dedication.

One more thing to note about @nogc main is that it doesn’t banish the GC from the program completely. D has support for static constructors and destructors. The former are executed by the runtime before entering main and the latter upon exiting. If any of these exist in the program and are not annotated with @nogc, then GC allocations and collections can technically be present in the program. Still, @nogc applied to main means there won’t be any collections running once main is entered, so it’s effectively the same as having no GC at all.

Working it out

Here’s where I’m going to offer an opinion. There’s a wide range of programs that can be written in D without disabling or cutting the GC off completely. The simple strategies of minimizing GC allocations and keeping them out of the hot path will get a lot of mileage and should be preferred. It can’t be repeated enough given how often it’s misunderstood: D’s GC will only have a chance to run when the programmer allocates GC memory and it will only run if it needs to. Use that knowledge to your advantage by keeping the allocations small, infrequent, and isolated outside your inner loops.

For those programs where more control is actually needed, it probably isn’t going to be necessary to avoid the GC entirely. Judicious use of @nogc and/or the core.memory.GC API can often serve to avoid any performance issues that may arise. Don’t put @nogc on main, put it on the functions where you really want to disallow GC allocations. Don’t call GC.disable at the beginning of the program. Call it instead before entering a critical path, then call GC.enable when leaving that path. Force collections at strategic points, such as between game levels, with GC.collect.

As with any performance tuning strategy in software development, it pays to understand as fully as possible what’s actually happening under the hood. Adding calls to the core.memory.GC API in places where you think they make sense could potentially make the GC do needless work, or have no impact at all. Better understanding can be achieved with a little help from the toolchain.

The DRuntime GC option --DRT-gcopt=profile:1 can be passed to a compiled program (not to the compiler!) for some tune-up assistance. This will report some useful GC profiling data, such as the total number of collections and the total collection time.

To demonstrate, gcstat.d appends twenty values to a dynamic array of integers.

void main() {
    import std.stdio;
    int[] ints;
    foreach(i; 0 .. 20) {
        ints ~= i;
    }
    writeln(ints);
}

Compiling and running with the GC profile switch:

dmd gcstat.d
gcstat --DRT-gcopt=profile:1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
        Number of collections:  1
        Total GC prep time:  0 milliseconds
        Total mark time:  0 milliseconds
        Total sweep time:  0 milliseconds
        Total page recovery time:  0 milliseconds
        Max Pause Time:  0 milliseconds
        Grand total GC time:  0 milliseconds
GC summary:    1 MB,    1 GC    0 ms, Pauses    0 ms <    0 ms

This reports one collection, which almost certainly happened as the program was shutting down. The runtime terminates the GC as it exits which, in the current implementation, will generally trigger a collection. This is done primarily to run destructors on collected objects, even though D does not require destructors of GC-allocated objects to ever be run (a topic for a future post).

DMD supports a command-line option, -vgc, that will display every GC allocation in a program, including those that are hidden behind language features like the array append operator.

To demonstrate, take a look at inner.d:

void printInts(int[] delegate() dg)
{
    import std.stdio;
    foreach(i; dg()) writeln(i);
} 

void main() {
    int[] ints;
    auto makeInts() {
        foreach(i; 0 .. 20) {
            ints ~= i;
        }
        return ints;
    }

    printInts(&makeInts);
}

Here, makeInts is an inner function. A pointer to a non-static inner function is not a function pointer, but a delegate (a context pointer/function pointer pair; if an inner function is static, a pointer of type function is produced instead). In this particular case, the delegate makes use of a variable in its parent scope. Here’s the output of compiling with -vgc:

dmd -vgc inner.d
inner.d(11): vgc: operator ~= may cause GC allocation
inner.d(7): vgc: using closure causes GC allocation

What we’re seeing here is that memory needs to be allocated so that the delegate can carry the state of ints, making it a closure (which is not itself a type – the type is still delegate). Move the declaration of ints inside the scope of makeInts and recompile. You’ll find that the closure allocation goes away. A better option is to change the declaration of printInts to look like this:

void printInts(scope int[] delegate() dg)

Adding scope to any function parameter ensures that any references in the parameter cannot be escaped. In other words, it now becomes impossible to do something like assign dg to a global variable, or return it from the function. The effect is that there is no longer a need to create a closure, so there will be no allocation. See the documentation for more on function pointers, delegates and closures, and function parameter storage classes.

The gist

Given that the D GC is very different from those in languages like Java and C#, it’s certain to have different performance characteristics. Moreover, D programs tend to produce far less garbage than those written in a language like Java, where almost everything is a reference type. It helps to understand this when embarking on a D project for the first time. The strategies an experienced Java programmer uses to mitigate the impact of collections aren’t likley to apply here.

While there is certainly a class of software in which no GC pauses are ever acceptable, that is an arguably small set. Most D projects can, and should, start out with the simple mitigation strategies from point #2 at the top of this article, then adapt the code to use @nogc or core.memory.GC as and when performance dictates. The command-line options demonstrated here can help ferret out the areas where that may be necessary.

As time goes by, it’s going to become easier to micromanage garbage collection in D programs. There’s a concerted effort underway to make Phobos, D’s standard library, as @nogc-friendly as possible. Language improvements such as Walter’s proposal to modify how exceptions are allocated should speed that work considerably.

Future posts in this series will look at how to allocate memory outside of the GC heap and use it alongside GC allocations in the same program, how to compensate for disabled language features in @nogc code, strategies for handling the interaction of the GC with object destructors, and more.

Thanks to Vladimir Panteleev, Guillaume Piolat, and Steven Schveighoffer for their valuable feedback on drafts of this article.

The article has been amended to remove a misleading line about Java and C#, and to add some information about multiple threads.

Compile-Time Sort in D

Björn Fahller recently wrote a blog post showing how to implement a compile-time quicksort in C++17. It’s a skillful demonstration that employs the evolving C++ feature set to write code that, while not quite concise, is more streamlined than previous iterations. He concludes with, “…the usefulness of this is very limited, but it is kind of cool, isn’t it?”

There’s quite a bit of usefulness to be found in evaluating code during compilation. The coolness (of which there is much) arises from the possibilities that come along with it. Starting from Björn’s example, this post sets out to teach a few interesting aspects of compile-time evaluation in the D programming language.

The article came to my attention from Russel Winder’s provocative query in the D forums, “Surely D can do better”, which was quickly resolved with a “No Story”-style answer by Andrei Alexandrescu. “There is nothing to do really. Just use standard library sort,” he quipped, and followed with code:

Example 1

void main() {
    import std.algorithm, std.stdio;
    enum a = [ 3, 1, 2, 4, 0 ];
    static b = sort(a);
    writeln(b); // [0, 1, 2, 3, 4]
}

Though it probably won’t be obvious to those unfamiliar with D, the call to sort really is happening at compile time. Let’s see why.

Compile-time code is runtime code

It’s true. There are no hurdles to jump over to get things running at compile time in D. Any compile-time function is also a runtime function and can be executed in either context. However, not all runtime functions qualify for CTFE (Compile-Time Function Evaluation).

The fundamental requirements for CTFE eligibility are that a function must be portable, free of side effects, contain no inline assembly, and the source code must be available. Beyond that, the only thing deciding whether a function is evaluated during compilation vs. at run time is the context in which it’s called.

The CTFE Documentation includes the following statement:

In order to be executed at compile time, the function must appear in a context where it must be so executed…

It then lists a few examples of where that is true. What it boils down to is this: if a function can be executed in a compile-time context where it must be, then it will be. When it can’t be excecuted (it doesn’t meet the CTFE requirements, for example), the compiler will emit an error.

Breaking down the compile-time sort

Take a look once more at Example 1.

void main() {
    import std.algorithm, std.stdio;
    enum a = [ 3, 1, 2, 4, 0 ];
    static b = sort(a);
    writeln(b);
}

The points of interest that enable the CTFE magic here are lines 3 and 4.

The enum in line 3 is a manifest constant. It differs from other constants in D (those marked immutable or const) in that it exists only at compile time. Any attempt to take its address is an error. If it’s never used, then its value will never appear in the code.

When an enum is used, the compiler essentially pastes its value in place of the symbol name.

enum xinit = 10;
int x = xinit;

immutable yinit = 11;
int y = yinit;

Here, x is initialized to the literal 10. It’s identical to writing int x = 10. The constant yinit is initialized with an int literal, but y is initialized with the value of yinit, which, though known at compile time, is not a literal itself. yinit will exist at run time, but xinit will not.

In Example 1, the static variable b is initialized with the manifest constant a. In the CTFE documentation, this is listed as an example scenario in which a function must be evaluated during compilation. A static variable declared in a function can only be initialized with a compile-time value. Trying to compile this:

Example 2

void main() {
    int x = 10;
    static y = x;
}

Will result in this:

Error: variable x cannot be read at compile time

Using a function call to initialize a static variable means the function must be executed at compile time and, therefore, it will be if it qualifies.

Those two pieces of the puzzle, the manifest constant and the static initializer, explain why the call to sort in Example 1 happens at compile time without any metaprogramming contortions. In fact, the example could be made one line shorter:

Example 3

void main() {
    import std.algorithm, std.stdio;
    static b = sort([ 3, 1, 2, 4, 0 ]);
    writeln(b);
}

And if there’s no need for b to stick around at run time, it could be made an enum instead of a static variable:

Example 4

void main() {
    import std.algorithm, std.stdio;
    enum b = sort([ 3, 1, 2, 4, 0 ]);
    writeln(b);
}

In both cases, the call to sort will happen at compile time, but they handle the result differently. Consider that, due to the nature of enums, the change will produce an equivalent of this: writeln([ 0, 1, 2, 3, 4 ]). Because the call to writeln happens at run time, the array literal might trigger a GC allocation (though it could be, and sometimes will be, optimized away). With the static initializer, there is no runtime allocation, as the result of the function call is used at compile time to initialize the variable.

It’s worth noting that sort isn’t directly returning a value of type int[]. Take a peek at the documentation and you’ll discover that what it’s giving back is a SortedRange. Specifically in our usage, it’s a SortedRange!(int[], "a < b"). This type, like arrays in D, exposes all of the primitives of a random-access range, but additionally provides functions that only work on sorted ranges and can take advantage of their ordering (e.g. trisect). The array is still in there, but wrapped in an enhanced API.

To CTFE or not to CTFE

I mentioned above that all compile-time functions are also runtime functions. Sometimes, it's useful to distinguish between the two inside the function itself. D allows you to do that with the __ctfe variable. Here's an example from my book, 'Learning D'.

Example 5

string genDebugMsg(string msg) {
    if(__ctfe)
        return "CTFE_" ~ msg;
    else
        return "DBG_" ~ msg;
}

pragma(msg, genDebugMsg("Running at compile-time."));
void main() {
    writeln(genDebugMsg("Running at runtime."));
}

The msg pragma prints a message to stderr at compile time. When genDebugMsg is called as its second argument here, then inside that function the variable __ctfe will be true. When the function is then called as an argument to writeln, which happens in a runtime context, __ctfe is false.

It's important to note that __ctfe is not a compile-time value. No function knows if it's being executed at compile-time or at run time. In the former case, it's being evaluated by an interpreter that runs inside the compiler. Even then, we can make a distinction between compile-time and runtime values inside the function itself. The result of the function, however, will be a compile-time value when it's executed at compile time.

Complex compile-time validation

Now let's look at something that doesn't use an out-of-the-box function from the standard library.

A few years back, Andrei published 'The D Programming Language'. In the section describing CTFE, he implemented three functions that could be used to validate the parameters passed to a hypothetical linear congruential generator. The idea is that the parameters must meet a set of criteria, which he lays out in the book (buy it for the commentary -- it's well worth it), for generating the largest period possible. Here they are, minus the unit tests:

Example 6

// Implementation of Euclid’s algorithm
ulong gcd(ulong a, ulong b) { 
    while (b) {
        auto t = b; b = a % b; a = t;
    }
    return a; 
}

ulong primeFactorsOnly(ulong n) {
    ulong accum = 1;
    ulong iter = 2;
    for (; n >= iter * iter; iter += 2 - (iter == 2)) {
        if (n % iter) continue;
        accum *= iter;
        do n /= iter; while (n % iter == 0);
    }
    return accum * n;
}

bool properLinearCongruentialParameters(ulong m, ulong a, ulong c) { 
    // Bounds checking
    if (m == 0 || a == 0 || a >= m || c == 0 || c >= m) return false; 
    // c and m are relatively prime
    if (gcd(c, m) != 1) return false;
    // a - 1 is divisible by all prime factors of m
    if ((a - 1) % primeFactorsOnly(m)) return false;
    // If a - 1 is multiple of 4, then m is a multiple of 4, too. 
    if ((a - 1) % 4 == 0 && m % 4) return false;
    // Passed all tests
    return true;
}

The key point this code was intended to make is the same one I made earlier in this post: properLinearCongruentialParameters is a function that can be used in both a compile-time context and a runtime context. There's no special syntax required to make it work, no need to create two distinct versions.

Want to implement a linear congruential generator as a templated struct with the RNG parameters passed as template arguments? Use properLinearCongruentialParameters to validate the parameters. Want to implement a version that accepts the arguments at run time? properLinearCongruentialParameters has got you covered. Want to implement an RNG that can be used at both compile time and run time? You get the picture.

For completeness, here's an example of validating parameters in both contexts.

Example 7

void main() {
    enum ulong m = 1UL << 32, a = 1664525, c = 1013904223;
    static ctVal = properLinearCongruentialParameters(m, a, c);
    writeln(properLinearCongruentialParameters(m, a, c));
}

If you've been paying attention, you'll know that ctVal must be initialized at compile time, so it forces CTFE on the call to the function. And the call to the same function as an argument to writeln happens at run time. You can have your cake and eat it, too.

Conclusion

Compile-Time Function Evaluation in D is both convenient and painless. It can be combined with other features such as templates (it's particularly useful with template parameters and constraints), string mixins and import expressions to simplify what might otherwise be extremely complex code, some of which wouldn't even be possible in many languages without a preprocessor. As a bonus, Stefan Koch is currently working on a more performant CTFE engine for the D frontend to make it even more convenient. Stay tuned here for more news on that front.

Thanks to the multiple members of the D community who reviewed this article.

Project Highlight: excel-d

Ever had the need to write an Excel plugin? Check this out.

Atila Neves opened his lightning talk at DConf 2017 like this:

I’m going to talk about how you can write Excel add-ins in D. Don’t ask me why. It’s just because people need it.

From there he goes into a quick intro on how to write plugins for Excel and gives a taste of what it looks like to register a single function in a C++ implementation:

Excel12f(
    xlfRegister, NULL, 11,          // 11: Number of args
    &xDLL,                          // name of the DLL
    TempStr12(L"Fibonacci"),        // procedure name
    TempStr12(L"UU"),               // type signature
    TempStr12(L"Compute to..."),    // argument text
    TempStr12(L"1"),                // macro type
    TempStr12(L"Generic Add-In"),   // category
    TempStr12(L""),                 // shortcut text
    TempStr12(L""),                 // help topic
    TempStr12(L"Number to compute to"), // function help
    TempStr12(L"Computes the nth Fibonacci number") // arg help 
);

Two things to note about this. First, Excel12f is a C++ function (wrapping a C API) that must be called in an add-in DLL’s entry point (xlAutoOpen) for each function that needs to be registered when the add-in is loaded by Excel. For a small plugin, the implementations of any registered functions might be in the same source file, but in a larger one they might be a bit of a maintenance headache by being located somewhere else. Also take note of all the comments used to document the function arguments, a common sight in C and C++ code bases.

The example D code Atila showed using excel-d is a world of difference:

@Register(
    ArgumentText("Array"),
    HelpTopic("Length of Array"),
    FunctionHelp("Length of an Array"),
    ArgumentHelp(["array"])
)
double DoublesLength(double[] arg) {
    return arg.length;
}

Here, the boilerplate for the registration is being generated at compile-time via a User Defined Attribute, which is used to annotate the function. Implementation and registration are happening in the same place. Another key difference is that the UDA has fields with descriptive names, eliminating the need to comment each argument. Finally, the UDA only requires four arguments, nine less than the C++ function. This is because it makes use of D’s compile-time introspection features to figure out as much as it possibly can and, at the same time, optional arguments (like the shortcut text) can just be omitted.

Since this is a Project Highlight on the D Blog, we’re going to ignore Atila’s opening request and ask, “Why?” There are actually two parts to that. First, why Excel?

Our customers are traders, and they work with Excel as one of their main tools. They need/want to, amongst other things, receive live stock updates in a cell and have their formulae automatically update. There’s other functionality they’d like to have and that means adding this to Excel somehow.

Of all the possible languages that could be used for this purpose, the business chose D. That brings us to the second part of the question: why D?

This is possible in Visual Basic, Python or C#, and possibly other languages. But none of them match D’s performance. C++ does, but it’s tedious and requires a lot of boilerplate to get going. D combines the speed and power of C++ with the reflection capabilities of those other languages. No boilerplate, just code, runs fast.

There’s more to the story, of course. The company is heavily invested in D.

We use D for nearly everything, even some “scripts”. The bulk of it is calculations for market indicators. Lots of data in -> munge -> new data out that needs to look pretty for traders. So integrating with existing code was an important factor.

Even though excel-d is targeting Windows, much of it was actually developed on Linux.

We use a Linux container as our reference development machine, but people use what they want. I do nearly all of my work on Linux and only boot into Windows when I have to. For the Excel work, that’s a necessity. But, as usual for me, I wrote all the tests to be platform agnostic, so I do the Excel development on Linux and test there. Every now and again that means a particular quirk of Excel wasn’t captured well in my mocking code, but it’s usually a quick fix after that.

He says they use both DMD and LDC for development, and both are running in continuous integration.

Although DMD doesn’t technically require Visual Studio to be installed (out of the box, it generates 32-bit OMF objects, and uses the OPTLINK linker, rather than the VS-compatible COFF), anyone doing serious work on Windows is going to need VS (or the Visual Studio Build Tools and the Windows SDK) for 64-bit and 32-bit COFF support. The latest LDC binary releases currently require the MS tools (support for MinGW was dropped, but according to the D Wiki, could be picked up again if there’s a champion for it).

Atila already had VS on his Windows partition. For this project, he got a bit of help from the VS plugin for D, Visual D.

I had to install VisualD because our reference project for Excel was in a Visual Studio solution, but afterwards I reverse engineered the build and didn’t open Visual Studio ever again.

Currently, excel-d has no support for custom dialogs or menus. Both items are on his TODO list.

If you’re working with D and need to write an Excel add-in, or want to try something cleaner than C++ to do so, excel-d is available in the DUB package registry. If not, the sponsors of the project, Kaleidic Associates and Symmetry Investments, have made several other open source projects available. They are interested in hiring talented hackers with a moral compass who aspire towards excellence and would like to work in D.

excel-d was developed by Stefan Koch and Atila Neves.