Driving with D

Here is what comes to mind when I think of D: fast, expressive, easy, and… driving? That’s right, I drive with D.

Enter my venerable Holden VZ Ute daily driver. From the factory, it came with a rubbish four-speed automatic gearbox. During 18 months of ownership, I destroyed four gearboxes. I could not afford a new vehicle at the time (I’m a 20-year-old Australian computer science student at Monash University), so I had to get creative. I purchased a rock-solid, bulletproof, six-speed automatic gearbox from another car. But that’s where the solutions ended. To make it work, I had to build my own circuit board, computer system, and firmware to control the solenoids, hydraulics, and clutches inside the gearbox, handle user input, perform shifting decisions, and interface to my car by pretending to be the four-speed automatic.

I’m quite proud of my solution. It can perform a shift in 250 milliseconds, which is great for racing. It has a steep first gear, giving it a swift takeoff. It has given some more powerful cars a run for their money. It’s got flappy paddles, diagnostic data on the screen, and the ability to go ahead and change the way it works whenever I want.

Here’s a very old video of the system working. It’s not representative of the current system—that ghastly blue screen is gone, the speedo works, and shifting has improved.

The computer is split into two parts: the user interface board, which drives an OLED display and uses an STM32F042, and the mainboard, which handles everything else, utilizing an STM32F407. The two cooperate over a CAN bus (Controller Area Network). All the firmware to handle this is written in D.

I picked D (as -betterC) because of its ingenious Uniform Function Call Syntax (UFCS), design by contract, metaprogramming, ease of interfacing with C, unit testing, portability, shared, @safe, and codebase maintenance features. Another bonus is the helpful, welcoming community. It has genuinely been a joy discussing D on the forums, and with the founders and community leaders.

The Advantages of D

Uniform Function Call Syntax (UFCS)
This has made my code significantly clearer. My code can accurately follow the flow of data without polluting my stack with single-use variables, nesting many function calls, or other sorts of clutter.
Here is an example of how you could potentially use it in an ECU (Engine Control Unit):

immutable injectorTime = airStoich(100.kpa, 25.degCelsius)
    .airMass
    .fuelMass((14.7f).afr)
    .fuelMol
    .calculateInjectorWidth;

This is equivalent to:

immutable injectorTime =
    calculateInjectorWidth(
        fuelMol(
            fuelMass(
                airMass(
                    airStoich(kpa(100), degCelsius(25))
                ),
                afr(14.7)
            )
        )
    ); // brackets have been expanded for reading clarity

Please note: the values in this example are hardcoded to simplify the code and demonstrate how UFCS can give a unit of measurement to a value.

Both representations are valid D code; you can use either.

With UFCS, there’s no need to read the code backward or count your brackets, no need to use a gazillion single-use variables. Function calls mirror the flow of data. It’s concise.

Design by Contract
D’s contract programming is quite similar to Ada’s. Functions can be marked with preconditions, designated by in, and postconditions, designated by out. Should a contract fail, an assertion is thrown.

// This demonstrates D’s contract programming for a function.
// It uses the short-hand expression based syntax.
int iHaveAContract(void* ptr)
in(ptr !is null) // this is a precondition, if ptr is null, an assertion is raised
in(ptr !is null, "ptr is null :(") // this is a precondition, if ptr is null, an assertion with the error message "ptr is null :(" is raised
out(result; result > 0) // this is a post condition, it captures the return value as result and tests it
out(result; result > 0, "result was too low") // this is a post condition, it captures the return value as result, tests it, and if it fails, raise an assertion with the message "result was too low"
{
    // normal function code here
}

If you want to do something a bit more complex in your contracts, an alternative syntax is available:

int iHaveAContract(void* ptr)
in {
    assert(ptr !is null); // this is equivalent to in(ptr !is null) from above.
    MyStruct* ms = cast(MyStruct*)ptr; // we can introduce variables local to this contract
    assert(ms.blah == 2, "MyStruct.blah must be 2!"); // test ms.blah, if it fails, raise an assert with an error message
}
out(result) { // captures the return value as result
    int squareIt = result * result;
    assert(squareIt == 4);
}
do { // this designates the function body
    // normal function code here
}

void main(string[] args)
{
    auto i = iHaveAContract(null); // this will violate the contract
}

Structs and classes can include contracts, called invariants, that sanity check the state of an instance for its whole lifetime. Invariants are checked after the constructor is run, before the destructor is run, and before and after public function calls.

struct MyStruct
{
    int blah;

    invariant {
        assert(blah == 2, "blah must always be 2 for some reason!");
    }
    // shorthand:
    invariant(blah == 2, "blah must be 2 for some reason!");

    // The invariant is checked at function entry and exit. If value is anything other than 2, the invariant will fail when the function exits
    void setBlah(int value)
    {
        blah = value;
    }
}

void main(string[] args)
{
    MyStruct s; // invariant is run after construction
    s.setBlah(3); // invariant contract will be violated.
}

Metaprogramming
The Don’t Repeat Yourself (DRY) principle is often touted by programmers. D’s metaprogramming is an incredible tool to achieve that goal. I use it in my CAN bus implementation. For example:

struct CANPacket(ushort ID) {
    enum id = ID;
    ubyte[8] data;
}
alias HeartbeatPacket = CANPacket!10;
alias BeepHornPacket = CANPacket!140;

I’ve got specific aliased types as HeartbeatPacket and BeepHornPacket, but I haven’t needed to repeat any code. They all follow the same underlying structure, so if I modify CANPacket, every alias is also updated.

Maybe you want a more descriptive CAN packet? Mixin templates can help with that!

struct GenericCanPacket
{
    ushort id;
    ubyte[8] data; // 8 bytes to store the CAN packet payload
}

struct HeartbeatPacket
{
    ubyte deviceID; // first byte is the device ID
    ubyte statusID; // second byte is the status
}

To translate a GenericCanPacket to the descriptive HeartbeatPacket, we can use mixin templates.

mixin template CanPacketHelperFunctions(ushort ID)
{
    enum id = ID;

    // typeof(this) means that the return type of readFromGeneric will be the struct this template is instantiated in.
    static typeof(this) readFromGeneric(const ref GenericCanPacket p)
    in(p.id == ID, "Generic packet cannot be converted!")
    {
        // do some cast
    }
}

struct HeartbeatPacket
{
    // the stuff declared in CanPacketHelperFunctions is pretty much copy-pasted (not literally) into here
    mixin CanPacketHelperFunctions!10;

    ubyte deviceID; // first byte is the device ID
    ubyte statusID; // second byte is the status
}

void main()
{
    GenericCanPacket generic;
    generic.id = 10;
    generic.data = [ 2 /* deviceID */, 3 /* statusID */ ];

    HeartbeatPacket heartbeat = HeartbeatPacket.readFromGeneric(generic);
    writeln(HeartbeatPacket.id); // the packet ID is 10
    writeln(heartbeat.deviceID); // 2
    writeln(heratbeat.statusID); // 3
}

The mixin template CanPacketHelperFunctions can be used over and over for all sorts of packet representations, and since it is only declared once, the implementation remains consistent across all types that use it.

Interfacing to C
I frequently must communicate with my microcontroller’s HAL and RTOS; D’s C interface made that a breeze. Just add an extern(C) and it’s good to go.

extern(C) c_setPwm(int solenoid, void* userData); // declaration
c_setPwm(4, null); // usage, pretty easy!

Unit Testing
D’s built-in unit testing has saved me from blowing my foot off a few times. I can run all my unit tests on Windows to guarantee logical correctness, and then build a final target for my microcontroller. Here is an example:

struct MyStruct
{
    int x;
    int squareIt() { return x * x; }
}

unittest
{
    MyStruct s;
    s.x = 9;
    assert(s.squareIt == 9 * 9); // If for some reason the implementation breaks, then the unit test fails
}

Deprecation
Codebases can be ever-changing, and sometimes certain functionality may no longer be considered good practice but must be retained for legacy reasons. D provides a way to explicitly mark this in code. Any use of such deprecated code will trigger a deprecation warning from the compiler. Example:

deprecated struct Example
{
    // ...
}

// This deprecation includes a reason
deprecated("This is the reason for deprecation..") struct ExampleWithMessage
{
    // ...
}

void main()
{
    // This will generate the following compiler warning:
    // "Deprecation: struct `Example` is deprecated"
    Example e;

    // This will generate the following compiler warning:
    // "Deprecation: struct `ExampleWithMessage` is deprecated - This is the reason for deprecation.."
    ExampleWithMessage ewm;
}

Of course, deprecated can be applied to all sorts of things, not just structs.

Portability
Following on from above, D supports a surprisingly large number of targets via GDC and LDC. If it weren’t for D’s portability, I would have had to write my project in C++ (ugh). I use LDC, and cross-compiling can be performed by simply adjusting my command line arguments.

Shared
shared is D’s way of guarding against multi-threaded access of code. It’s not perfect, but I use it as-is, and I think it works well. I do have multiple threads in my codebase, and they need to synchronize data. I mark certain variables as shared, which means I must take special care accessing that data. It works with system locks and mutexes. While locked, I can cast shared away and use it like a normal variable. This is handy with structs and classes.

shared int sensorValue;
sensorValue = 4; // using it like a single-thread variable, error
atomicStore(sensorValue, 4); // works with atomics

SafeD
@safe exists to prohibit sketchy memory activities and enforce best behavior. I haven’t had to fight @safe much yet because I don’t do anything wicked with my memory, but it is comfortable knowing that if I am going to make a mistake, the compiler can assist me in stopping it.

Mental Friction
Adam D. Ruppe puts it succinctly: D has low mental friction. The flexibility and expressiveness of the language make it easy to translate one’s thoughts into written code and maintain productivity. I don’t have to fight D much. This is my personal opinion, but I feel like D is the language in which I’m most productive.

Final Thoughts

D is the perfect fit for this sort of project— I think it’s going to have a bright future ahead in the embedded world. I’m going to continue using D for my projects. I’ve got another D-powered automotive project in the works which I hope to show off in the future. Even if D isn’t yet suitable for your project, keep an eye on it. D has been making enormous strides in the past few years, especially in regards to memory safety.

The examples shown in this article are purely meant to demonstrate how D’s features can be used in the real world. Do not take them as gospel as to how you should program.

Interfacing D with C: Strings Part One

Digital Mars D logo

This post is part of an ongoing series on working with both D and C in the same project. The previous two posts looked into interfacing D and C arrays. Here, we focus on a special kind of array: strings. Readers are advised to read Arrays Part One and Arrays Part Two before continuing with this one.

The same but different

D strings and C strings are both implemented as arrays of character types, but they have nothing more in common. Even that one similarity is only superficial. We’ve seen in previous blog posts that D arrays and C arrays are different under the hood: a C array is effectively a pointer to the first element of the array (or, in C parlance, C arrays decay to pointers, except when they don’t); a D dynamic array is a fat pointer, i.e., a length and pointer pair. A D array does not decay to a pointer, i.e., it cannot be implicitly assigned to a pointer or bound to a pointer parameter in an argument list. Example:

extern(C) void metamorphose(int* a, size_t len);

void main() {
    int[] a = [8, 4, 30];
    metamorphose(a, a.length);      // Error - a is not int*
    metamorphose(a.ptr, a.length);  // Okay
}

Beyond that, we’ve got further incompatibilities:

  • each of D’s three string types, string, wstring, and dstring, are encoded as Unicode: UTF-8, UTF-16, and UTF-32 respectively. The C char* can be encoded as UTF-8, but it isn’t required to be. Then there’s the C wchar_t*, which differs in bit size between implementations, never mind encoding.
  • all of D’s string types are dynamic arrays with immutable contents, i.e., string is an alias to immutable(char)[]. C strings are mutable by default.
  • the last character of every C string is required to be the NUL character (the escape character \0, which is encoded as 0 in most character sets); D strings are not required to be NUL-terminated.

It may appear at first blush as if passing D and C strings back and forth can be a major headache. In practice, that isn’t the case at all. In this and subsequent posts, we’ll see how easy it can be. In this post, we start by looking at how we can deal with NUL termination and wrap up by digging deeper into the related topic of how string literals are stored in memory.

NUL termination

Let’s get this out of the way first: when passing a D string to C, the programmer must ensure it is terminated with \0. std.string.toStringz, a simple utility function in the D standard library (Phobos), can be employed for this:

import core.stdc.stdio : puts;
import std.string : toStringz;

void main() {
    string s0 = "Hello C ";
    string s1 = s0 ~ "from D!";
    puts(s1.toStringz());
}

toStringz takes a single argument of type const(char)[] and returns immutable(char)* (there’s more about const vs. immutable in Part Two). The form s1.toStringz, known as UFCS (Uniform Function Call Syntax), is lowered by the compiler into toStringz(s1).

toStringz is the idiomatic approach, but it’s also possible to append "\0" manually. In that case, puts can be supplied with the string’s pointer directly:

import core.stdc.stdio : puts;

void main() {
    string s0 = "Hello C ";
    string s1 = s0 ~ "from D!" ~ "\0";
    puts(s1.ptr);
}

Forgetting to use .ptr will result in a compilation error, but forget to append the "\0" and who knows when someone will catch it (possibly after a crash in production and one of those marathon debugging sessions which can make some programmers wish they had never heard of programming). So prefer toStringz to avoid such headaches.

However, because strings in D are immutable, toStringz does allocate memory from the GC heap. The same is true when manually appending "\0" with the append operator. If there’s a requirement to avoid garbage collection at the point where the C function is called, e.g., in a @nogc function or when -betterC is enabled, it will have to be done in the same manner as in C, e.g., by allocating/reallocating space with malloc/realloc (or some other allocator) and copying the NUL terminator. (Also note that, in some situations, passing pointers to GC-managed memory from D to C can result in unintended consequences. We’ll dig into what that means, and how to avoid it, in Part Two.)

None of this applies when we’re dealing directly with string literals, as they get a bit of special treatment from the compiler that makes puts("Hello D from C!".toStringz) redundant. Let’s see why.

String literals in D are special

D programmers very often find themselves passing string literals to C functions. Walter Bright recognized early on how common this would be and decided that it needed to be just as seamless in D as it is in C. So he implemented string literals in a way that mitigates the two major incompatibilities that arise from NUL terminators and differences in array internals:

  1. D string literals are implicitly NUL-terminated.
  2. D string literals are implicitly convertible to const(char)*.

These two features may seem minor, but they are quite major in terms of convenience. That’s why I didn’t pass a literal to puts in the toStringz example. With a literal, it would look like this:

import core.stdc.stdio : puts;

void main() {
    puts("Hello C from D!");
}

No need for toStringz. No need for manual NUL termination or .ptr. It just works.

I want to emphasize that this only applies to string literals (of type string, wstring, and dstring) and not to string variables; once a string literal is included in an expression, the NUL-termination guarantee goes out the window. Also, no other array literal type is implicitly convertible to a pointer, so the .ptr property must be used to bind them to a pointer function parameter, e.g., `giveMeIntPointer([1, 2, 3].ptr).

But there is a little more to this story.

String literals in memory

Normal array literals will usually trigger a GC allocation (unless the compiler can elide the allocation, such as when assigning the literal to a static array). Let’s do a bit of digging to see what happens with a D string literal:

import std.stdio;

void main() {
    writeln("Where am I?");
}

To make use of a command-line tool particularly convenient for this example, I compiled the above on 64-bit Linux with all three major compilers using the following command lines:

dmd -ofdmd-memloc memloc.d
gdc -o gdc-memloc memloc.d
ldc2 -ofldc-memloc memloc.d

If we were compiling C or C++, we could expect to find string literals in the read-only data segment, .rodata, of the binary. So let’s look there via the readelf command, which allows us to extract specific data from binaries in the elf object file format, to see if the same thing happens with D. The following is abbreviated output for each binary:

readelf -x .rodata ./dmd-memloc | less
Hex dump of section '.rodata':
  0x0008e000 01000200 00000000 00000000 00000000 ................
  0x0008e010 04100000 00000000 6d656d6c 6f630000 ........memloc..
  0x0008e020 57686572 6520616d 20493f00 2f757372 Where am I?./usr
  0x0008e030 2f696e63 6c756465 2f646d64 2f70686f /include/dmd/pho
...

readelf -x .rodata ./gdc-memloc | less
Hex dump of section '.rodata':
  0x00003000 01000200 00000000 57686572 6520616d ........Where am
  0x00003010 20493f00 00000000 2f757372 2f6c6962  I?...../usr/lib
...

readelf -x .rodata ./ldc-memloc | less
Hex dump of section '.rodata':
  0x00001e40 57686572 6520616d 20493f00 00000000 Where am I?.....
  0x00001e50 2f757372 2f6c6962 2f6c6463 2f783836 /usr/lib/ldc/x86

In all three cases, the string is right there in the read-only data segment. The D spec explicitly avoids specifying where a string literal will be stored, but in practice, we can bank on the following: it might be in the binary’s read-only segment, or it might be in the normal data segment, but it won’t trigger a GC allocation, and it won’t be allocated on the stack.

Wherever it is, there’s a positive consequence that we can sometimes take advantage of. Notice in the readelf output that there is a dot (.) immediately following the question mark at the end of each string. That represents the NUL terminator. It is not counted in the string’s .length (so "Where am I?".length is 11 and not 12), but it’s still there. So when we initialize a string variable with a string literal or assign a string literal to a variable, the lack of an allocation also means there’s no copying, which in turn means the variable is pointing to the literal’s location in memory. And that means we can safely do this:

import core.stdc.stdio: puts;

void main() {
    string s = "I'm NUL-terminated.";
    puts(s.ptr);
    s = "And so am I.";
    puts(s.ptr);
}

If you’ve read the GC series on this blog, you are aware that the GC can only have a chance to run a collection if an attempt is made to allocate from the GC heap. More allocations mean a higher chance to trigger a collection and more memory that needs to be scanned when a collection runs. Many applications may never notice, but it’s a good policy to avoid GC allocations when it’s easy to do so. The above is a good example of just that: toStringz allocates, we don’t need it in either call to puts because we can trust that s is NUL-terminated, so we don’t use it.

To be very clear: this is only true for string variables that have been directly initialized with a string literal or assigned one. If the value of the variable was the result of any other operation, then it cannot be considered NUL-terminated. Examples:

string s1 = s ~ "...I'm Unreliable!!";
string s2 = s ~ s1;
string s3 = format("I'm %s!!", "Unreliable");

None of these strings can be considered NUL-terminated. Each case will trigger a GC allocation. The runtime pays no mind to the NUL terminator of any of the literals during the append operations or in the format function, so the programmer can’t trust it will be part of the result. Pass any one of these strings to C without first terminating it and trouble will eventually come knocking.

But hold on…

Given that you’re reading a D blog, you’re probably adventurous or like experimenting. That may lead you to discover another case that looks reliable:

import core.stdc.stdio: puts;

void main() {
    string s = "Am I " ~ "reliable?";
    puts(s.ptr);
}

The above very much looks like appending multiple string literals in an initialization or assignment is just as reliable as using a single string literal. We can strengthen that assumption with the following:

import std.stdio : writeln;

void main() {
    writeln("Am I reliable?".ptr);

    string s = "Am I " ~ "reliable?";
    writeln(s.ptr);
}

writeln is a templated function that recognizes when it’s being given a pointer; rather than treating it as a string and printing what it points to, it prints the pointer’s value. So we can print memory addresses in D without a format string.

Compiling the above, again on 64-bit Linux:

dmd -ofdmd-rely rely.d
gdc -o gdc-rely rely.d
ldc2 -ofldc-rely rely.d

Now let’s execute them all:

./dmd-rely
562363F63010
562363F63030

./gdc-rely
5566145E0008
5566145E0008

./ldc-rely
55C63CFB461C
55C63CFB461C

We see that dmd-rely prints two different addresses, but they’re very close together. Both gdc-rely and ldc-rely print a single address in both cases. And if we make use of readelf as we did with the memloc example above, we’ll find that, in every case, the literals are in the read-only data segment. Case closed!

Well, not so fast.

What’s happening is that all three compilers are performing an optimization known as constant folding. In short, they can recognize when all operands of an append expression are compile-time constants, so they can perform the append at compile-time to produce a single string literal. In this case, the net effect is the same as s = "Am I reliable?". LDC and GDC go further and recognize that the resulting literal is identical to the one used earlier, so they reuse the existing literal’s address (a.k.a. string interning). (Note that DMD also performs string interning, but currently it only kicks in when a string literal appears more than twice.)

To be clear: this only works because all of the operands are string literals. No matter how many string literals are involved in an operation, if only one operand is a variable, then the operation triggers a GC allocation.

Although we see that the result of an append operation involving string literals can be passed directly to C just fine, and we’ve proven that it’s stored in read-only memory alongside its NUL terminator, this is not something we should consider reliable. It’s an optimization that no compiler is required to perform. Though it’s unlikely that any of the three major D compilers will suddenly stop constant folding string literals, a future D compiler could possibly be released without this particular optimization and instead trigger a GC allocation.

In short: rely on this at your own risk.

Addendum: Compile rely.d on Windows with dmd and the binary will yield some very different output:

dmd -m64 -ofwin-rely.exe rely.d
./win-rely
7FF76701D440
7FF76702BB30

There is a much bigger difference in the memory addresses here than in the dmd binary on Linux. We’re dealing with the PE/COFF format in this case, and I’m not familiar with anything similar to readelf for that format on Windows. But I do know a little something about Abner Fog’s objconv utility. Not only does it convert between object file formats, but it can also disassemble them:

objconv -fasm win-rely.obj

This produces a file, win-rely.asm. Open it in a text editor and search for a portion of the string, e.g., "I rel". You’ll find the two entries aren’t too far apart, but one is located in a block of text under this heading:

rdata SEGMENT PARA ‘CONST’ ; section number 4

And the other under this heading:

.data$B SEGMENT PARA ‘DATA’ ; section number 6

In other words, one of them is in the read-only data segment (rdata SEGMENT PARA 'CONST'), and the other is in the regular data segment. This goes back to what I mentioned earlier about the D spec being explicitly silent on where string literals are stored. Regardless, the behavior of the program on Windows is the same as it is on Linux; the second call to puts doesn’t blow anything up because the NUL terminator is still there, one slot past the last character. But it doesn’t change the fact that constant folding of appended string literals is an optimization and only to be relied upon at your own risk.

Conclusion

This post provides all that’s needed for many of the use cases encountered with strings when interacting with C from D, but it’s not the complete picture. In Part Two, we’ll look at how mutability, immutability, and constness come into the picture, how to avoid a potential problem spot that can arise when passing GC-allocated D strings to C, and how to get D strings from C strings. We’ll save encoding for Part Three.

Thanks to Walter Bright, Ali Çehreli, and Iain Buclaw for their valuable feedback on this article.

A Pull Request Manager’s Perspective

Since January of this year, I have been working as a part-time PR (Pull Request) manager. During this time, I have mostly been reviewing PRs and going through issues on the D Bugzilla. I have also been trying to come up with ways of creating organizational structures and procedures that will ultimately aid the D leadership in motivating and focusing community effort. This blog post presents a few insights I’ve had regarding the PR queues of the dmd, druntime, and phobos repositories, and a couple of proposals that, in my opinion, could benefit the D contribution process.

PR rounds

As a PR manager, I spend most of my time reviewing PRs. Since I started on the job, I have been involved in the merger of more than 400 PRs across our repositories. From this experience I have extracted a few insights:

  • If a new PR is not reviewed within the first 3 days after it was opened, chances are that it will get abandoned.
  • If a PR is not merged during the first 2 weeks after it was opened, chances are that it will be abandoned.
  • Contributions, in terms of PRs per month, are as follows: phobos (130), dmd (85), druntime (30).
  • Although phobos benefits from more contributions, dmd has a larger contributor base.
  • Druntime needs morelove.
  • Veteran contributors are more likely to abandon PRs than new/first-time contributors.

Given the first 2 points, I try to make contact as fast as possible with PR authors. It often happens that I do not have the necessary expertise to technically review a PR. In that case, I try to find people who are willing to take a look. However, since we do not have a concrete community hierarchy, it is sometimes difficult to find the needed reviewers. A solution to this problem is proposed later in the blog post.

Regarding the ratio of contributions per repository, it is noteworthy that phobos and dmd get a lot of attention, whereas druntime is by far the least attractive repository. Another interesting aspect is the diversity of the contributor base: in the last month, there have been ten contributors who opened more than one PR for dmd, five for phobos, and four for druntime. Ths emphasizes the fact that druntime needs more love.

Lastly, I noticed that veteran contributors tend to abandon their PRs more often than newcomers. This can be explained by the fact that veteran contributors usually tackle multiple PRs at the same time, whereas newcomers usually focus on a single PR. I want to take this opportunity to urge all contributors not to abandon their PRs. It is disappointing for reviewers such as myself to put in the time to properly investigate the patch and offer advice to then see it go to waste. I know that it is much more appealing to start working on new things, but it is highly important not to let any work go to waste.

Upcoming projects

From my perspective, D has come a long way from its early days: language features are maturing, adoption is steadily growing and the community is expanding around a nucleus of veteran contributors. But given that growth, it is surprising that from an organizational standpoint we are basically in the same spot: if a critical issue appears (a critical bug report, a CI failure, an expired certificate, etc.), the solution is to make a forum post or a comment on Slack and hope that someone who can fix it, or can get it fixed, notices it soon; non-critical issues depend on someone taking an interest: an issue might eventually be fixed, or we might be stuck with it indefinitely. The problem is not manpower or skill; our community has a lot of talent. Unfortunately, we fail to utilize it to its full potential.

If we want certain things to be done, it is the leadership’s responsibility to:

  1. specifically state what work needs to be done,
  2. organize the community, and
  3. incentivize contributors to do the needed work.

Although there is room for improvment, (1) has usually taken place in the form of forum discussions, DIPs, and blog posts. (2) is difficult to implement, given that people contribute in their own free time. As for (3), the mantra has been “fix it if you need it”, which works well for interesting topics, but not that well for important, hard-to-fix bugs, or high-impact, boring tasks.

Implementing points (2) and (3) in an open-source community and with limited financial resources is difficult. However, there are alternative approaches that have not been explored in the DLang ecosystem. I will outline them below.

Creating strike teams

One way of organizing the community is to create dedicated groups of people, or strike teams, that can be called upon for specific tasks. One will be assigned to each repository (dmd, druntime, phobos, dlang.org). The idea is to add people to these groups who either have expertise but lack time to contribute, or don’t lack expertise but are willing to actively contribute. This way, if you do not have time to contribute code, you can still help the community by offering implementation advice, whereas if you do have time to offer, you can contribute and develop expertise. The strike teams will be populated by a limited number of people who are trusted members of the community. These teams will be approached directly by the leadership (Walter, Atila, Mike, PR Managers) to fix issues or implement work defined in point (1). The components of the strike teams will receive recognition by having their name listed on the dlang.org site (thus satisfying point (3)).

Of course, this will work as long as there are folks out there willing to dedicate their time. If you want to contribute in some form to any of the strike teams, please contact me directly on Slack or via email.

Bugzilla Gamification

The D compiler has around 3000 reported bugs, druntime around 300, and phobos 900. These numbers have grown over time. Although some issues are fixed, we have had no means to incentivize people to work on the critical ones. To that end, we propose a simple gamification scheme: each issue has a severity associated; once a PR that closes an issue is merged, the github author of the PR is awarded points according to its severity level; a leaderboard, which is updated in real-time, is presented on dlang.org, and anyone can see who the top contributors are. At the end of each “season”, contributors will be awarded prizes and recognition based on different criteria, such as overall point total, number of total contributions, and so on (we have yet to finalize the kinds of prizes that will be awarded).

By implementing this scoring scheme, we offer some incentive for more experienced contributors to prioritize blocker/critical/major/regression issues over the more trivial or simpler ones, and encourage new contributors to try their hand at a level with which they’re comfortable. We are already working on implementing this and will announce the rules and prize categories once everything is up and running.

Conclusion

We are at a point in the evolution of the D programming language and its ecosystem where motivating community effort towards a common goal is crucial. This is a long-term, complicated task, but we need to start somewhere. I hope that with this initiative we can pave the way to a more sophisticated and better-organized contribution process that is a more satisfying and rewarding experience for our contributors.

D 2.096.0 Released and Other News

Digital Mars D logo

The latest version of DMD, the D reference compiler, is now available for download. The changelog notes 17 major changes and 81 resolved Bugzilla issues from 54 contributors. After we get into some notable items from the changelog, we’ll turn our attention to other items of note from the D community: a new release of LDC is right around the corner and one of GDC just beyond the horizon; the Symmetry Autumn of Code wrapped up with a surprise ending; and there are two sometimes forgotten places where anyone can go to find ways to contribute and sharpen their D coding skills.

DMD 2.096.0

This release of DMD is one of those where so many of the improvements are highlight-worthy that it’s difficult to decide which ones to focus on: there are more improvements to the experimental C++ header generation like those I highlighted in the previous release; support for DWARF debug info is surely important to a significant segment of the D community; and changing plain synchronized statements to use runtime-allocated mutexes fixes a critical issue.

The full changelog is there with all the details for anyone who wants them. The features below are a couple that I believe warrant a bit of exposition.

New C-compatible complex types

The complex types cfloat, cdouble, and creal (and their imaginary i-prefixed counterparts) have been a part of D for a very long time, but for much of that time they have been on the block to face future deprecation.

What became clear over time is that they are accompanied by a high maintenance cost, requiring special cases in the frontend to maintain compatibility with other language features. With that and their specialized nature, they have the wrong cost-benefit ratio. The std.complex module introduced the Complex type to shift that cost-benefit ratio in the other direction.

For several reasons, the built-in types have yet to be deprecated, but new D code should be written to use std.complex. One potential problem there is that the library type is not compatible with the C _Complex type, so anyone needing that compatibility has continued to reach for the built-ins.

This release introduces three new aliased types that are automatically configured at compile-time to be ABI compatible with C’s _Complex and should be used in place of the built-ins when interacting with C or C++: c_complex_float, c_complex_double, and c_complex_real. They are declared in the stdc.config module, which must be imported to use them (and which includes other aliased types that have ABI variations between C compilers, such as c_long and c_long_double).

Postblit and copy constructor priority

The postblit constructor has long been the second step in D’s approach to copy constructing struct instances:

  1. “blit” (copy) the fields from the source to the destination
  2. invoke the destination’s postblit constructor

The first step would always take place, the second only if the struct declaration included an implementation of this(this). The idea behind this is that the first step does a shallow copy, and the postblit constructor can take care of any extra work that is required, such as making “deep copies” (copying referenced data) or incrementing a reference count. The postblit constructor does not have access to the source object.

Unfortunately, time and usage uncovered issues with postblit, particularly in how it behaves in the presence of qualifiers like const, immutable, and shared. Changing the behavior of such a long-lived and deeply ingrained feature is problematic due to the impact on existing code. Instead, the approval of DIP 1018 introduced copy constructors to the language.

Now, newly written code should make use of copy constructors rather than postblits, but postblits are not deprecated and remain in the language. Consequently, the two features need to learn to play nice together.

Until now, when both a postblit and a copy constructor were present, priority was given to the postblit. Unfortunately, there was a corner case that slipped by.

The example in the changelog looks like this:

// library code using postblit
struct A
{
    this(this) {}
}

// new code using copy constructor
struct B
{
    A a;
    this(const scope ref B) {}
}

Because A implements a postblit, then given an instance b of B, the postblit of a should be invoked anytime a copy of b is made. Since the user defined no postblit for B, the compiler will generate one that, when invoked, will in turn invoke the postblit of a.

Now that postblit constructors have priority over copy constructors, the programmer who implemented B is expecting its copy constructor to run, but it never will.

With 2.096.0, the above code will result in a deprecation message informing the programmer that a generated postblit will be invoked instead of the copy constructor. The programmer then has three options:

  • disable the postblit with @disable this(this);
  • implement a postblit, which will then have priority over any copy constructors
  • remove all copy constructors from B in preference to the generated postblit

LDC

Up-to-date beta versions of the LLVM-based D compiler, LDC, tend to come not too long after DMD releases. As I write, the LDC maintainer Martin Kinkelin is working toward the next beta release, and that will support D 2.096.0. Keep an eye on the D Announce forum and the LDC release page for news of the release.

GDC

Using the current release of GDC, the D compiler which is distributed as part of the GNU Compiler Collection (GCC), the __VERSION__ constant reports 2.076. LDC and DMD moved on from that version with a D frontend that was ported from C++ to D, but to facilitate bootstrapping, GDC had to stick with the C++ version of the runtime for inclusion into GCC. Maintainer Iain Buclaw has been backporting some fixes and improvements from upstream, so the 2.076 of GDC is no longer at full bug/feature parity with DMD 2.076. Just one example of many, a performance boost to static foreach that was merged into DMD last year is also included in the GDC that shipped with GCC 10.2.1.

The upside of this is that GDC is rock-solid stable. The downside is that code that compiles on the latest DMD and LDC may fail to compile on GDC. But in between backporting bugfixes, his day job, and his normal life, Iain has been expending his free time on replacing the old C++ frontend with the newer D implementation. Given time constraints and the amount of work left to do, version 11 of GDC will remain on D 2.076. Once the GCC feature freeze is lifted in May, he will start laying the foundations for making the switch to the newer frontend in a future GCC release.

Symmetry Autumn of Code 2020

The latest edition of SAOC kicked off in September of 2020. With the sponsorship of Symmetry Investments, four students were to work through four milestones to complete projects that would benefit the D ecosystem. Upon successful completion of each of the first three milestones, each student would receive $1,000. At the end of the fourth milestone, the SAOC judges would award one of them with a final $1,000 payment and a free trip to the next real-world DConf.

All four students completed the first two milestones, but two of them were unable to continue past the third. Milestone 4 ended on January 15th. At the end of the month, the two remaining students submitted their final milestone reports and a summary of their experience working on their projects: what they learned, how reality compared to their expectations, and what their plans are going forward. With those documents and the mentors’ student evaluations in hand, the SAOC judges had to decide which of the two should be awarded the final payment.

Their decision was neither obvious nor easy. Both of the students did outstanding work throughout the event. Their milestone reports sufficiently documented their progress. They kept up with their forum updates as required. Their mentors gave them glowing reviews. There was very little to separate them. In the end, the judges looked at the state of both projects and determined that one would have a broader and more immediate benefit to the D community than the other, and so made a decision.

But they didn’t leave it there. The SAOC judges felt that both students had done so well that they both should be rewarded, but our sponsor had only allocated funding for one reward. The judges devised alternative scenarios and asked our sponsor which of the options they would support. They received an answer, and I informed the students in the last week of February (after they had patiently waited several days beyond our initial deadline).

Robert Aron was selected to receive the final payment and a free trip to the next real-world DConf. His project: implementing D clients for Google APIs. By the end, he had two fully functional libraries for talking to Google Drive and Google Calendar, and an in-progress library for interfacing with Gmail, all of which you can find on GitHub. He also has a work-in-progress template-based Google API generator. He intends to finish both it and the GMail client. He also plans to eventually generate libraries for Google APIs that use RPC.

The “runner-up” is Adela Vais. With Symmetry’s permission, she was offered a choice between a final $1,000 payment and a free trip to the next real-world DConf (I’ll leave it to her to tell folks which she chose). The initial goal of her SAOC project was to implement a D GLR parser for GNU Bison. After seeing the state of the existing D LALR(1) pull parser support in Bison, she shifted her goal to first implement its missing features and add support for a push parser before moving on to the GLR parser. By the end of the event, the push parser was 95% complete, and she intended to take it all the way and then turn to the GLR parser support.

Congratulations to both Robert and Adela! And a special thanks to Laeeth Isharc and Symmetry Investments for providing this opportunity to young programmers every year. Robert and Adela told us they learned more from this experience than they had expected to, and they were obviously proud of the work they had done.

D ecosystem projects and tasks

The DLang Project Idea Repository was created during DConf 2019 in London after a discussion among some of the attendees. Not only is it a solid source of project ideas for potential participants in Google Summer of Code and SAOC applicants, but it’s also useful for anyone looking to make a meaningful contribution to the D community.

The D Ecosystem Task List came about after discussions with representatives of companies using D in production. Any company that is willing to allow one of their employees at least one day per quarter to work on smaller tasks that benefit the D ecosystem can use the repository to get ideas and indicate which task they’re working on. Funkwerk, who have been using D in production since 2008, have been the only ones to take us up on this so far. Their employees have contributed improvements to D-Scanner, dfmt, and dub. We are grateful to them for all they’ve done.

The task list isn’t limited to company employees. It’s a list of broad categories, not specific tasks, for anyone who has an hour or two they’d like to spend on writing D code. So if you are looking for D experience or would simply like to contribute, and you don’t have time for a full-on project, the Ecosystem Task List is a great place to go for ideas. And if you have ideas for other broad task categories that can be added to the list, please submit a pull request!

Symphony of Destruction: Structs, Classes and the GC (Part One)

Digital Mars D logo

This post is part of a broader series on garbage collection in D. The motivation is to explore how destructors and the GC interact. To do that, we first need a bit of background. We do not go into a broader discussion on the ins and outs of object destruction, only what is most relevant to the interaction of destructors and the GC.

I’ve split the discussion into two blog posts. Here in Part One, we look at how deterministic and non-deterministic destruction differ, consider the consequences of having a single destructor for both scenarios, and finally establish two simple guidelines that will help us avoid those consequences. In Part Two, we’ll go further and explore how we can still write solid destructors when circumstances dictate that the guidelines don’t apply.

Deterministic destruction

Destruction is deterministic when it is predictable, meaning the programmer can, simply by following the flow of the code, point out where and when an object’s destructor is invoked. This is possible with struct instances allocated on the stack, as the compiler will insert calls to their destructors at well-defined points for automatic and deterministic destruction.

There are two basic rules for automatic destruction:

  1. The destructors of all stack-allocated structs in a given scope are invoked when the scope exits.
  2. Destructors are invoked in reverse lexical order (i.e., the opposite of the order in which the declarations appear in the source).

With these two rules in mind, we can examine the following example and accurately predict its output.

import std.stdio;
struct Predictable
{
    int number;
    this(int n)
    {
        writeln("Constructor #", n);
        number = n;
    }
    ~this()
    {
        writeln("Destructor #", number);
    }
}

void main()
{
    Predictable s0 = Predictable(0);
    {
        Predictable s1 = Predictable(1);
    }
    Predictable s2 = Predictable(2);
}

We see that both s0 and s2 are directly within the scope of the main function, so their destructors will run when main exits. Given that the declaration of s2 comes after that of s0, the destructor of s2 will run before that of s0.

We also see that s1 is declared in an anonymous inner scope between the declarations of s0 and s2. This scope exits before s2 is constructed, so the destructor of s1 will execute before the constructor of s2.

With that, we can expect the following output:

Constructor #0      // declaration of s0
Constructor #1      // declaration of s1
Destructor #1       // anonymous scope exits, s1 destroyed
Constructor #2      // declaration of s2
Destructor #2       // main exits, s2 then s0 destroyed
Destructor #0

Compiling and executing the example proves us accurate seers.

The programmer can implement deterministic destruction manually, as is necessary when destroying instances allocated on the non-GC heap, e.g., with malloc or std.experimental.allocator. In an earlier post, Go Your Own Way (Part Two: The Heap), I covered how to use std.conv.emplace to allocate instances on the non-GC heap and briefly mentioned that destructors can be invoked manually via destroy. That’s a function template declared in the automatically imported object module so that it’s always available. We won’t retread the allocation discussion, but an example of manual destruction isn’t out of bounds for this post.

In the following example, we’ll reuse the definition of the Predictable struct and a destroyPredictable function to manually invoke the destructors. For completeness, I’ve included functions for allocating and deallocating Predictable instances from the non-GC heap: allocatePredictable and deallocatePredictable. If it isn’t clear to you what these two functions are doing, please read the blog post I mentioned above.

void main()
{
    Predictable* s0 = allocatePredictable(0);
    scope(exit) { destroyPredictable(s0); }
    {
        Predictable* s1 = allocatePredictable(1);
        scope(exit) { destroyPredictable(s1); }
    }
    Predictable* s2 = allocatePredictable(2);
    scope(exit) { destroyPredictable(s2); }
}

void destroyPredictable(Predictable* p)
{
    if(p) {
        destroy(*p);
        deallocatePredictable(p);
    }
}

Predictable* allocatePredictable(int n)
{
    import core.stdc.stdlib : malloc;
    import std.conv : emplace;
    auto p = cast(Predictable*)malloc(Predictable.sizeof);
    return emplace!Predictable(p, n);
}

void deallocatePredictable(Predictable* p)
{
    import core.stdc.stdlib : free;
    free(p);
}

Running this program will result in precisely the same output as the previous example. In the destroyPredictable function, we dereference the struct pointer when calling destroy because there is no overload that takes a pointer. There are specializations for classes, interfaces, and structs passed by reference and a general catch-all that takes all other types by reference. Destructors are invoked on types that have them. Before exiting, the function sets the argument to its default .init value through the reference.

Note that if we were to give destroy a pointer without first dereferencing it, the code would still compile. The pointer would be accepted by reference and simply set to null, the default .init value for pointers, but the struct’s destructor would not be invoked (i.e., the pointer is “destroyed”, not the struct instance).

Inserting writeln(*p) immediately after destroy(*p) should print

Predictable(0)

for each destroyed instance. (The default .init state for a struct in D is the aggregate of the .init property of each of its members; in this case, the sole member, being of type int, has an .init property of 0, so the struct’s default .init state is Predictable(0). This can be changed in the struct definition, e.g., struct Predictable { int id = 1; }.)

destroy is not restricted to instances allocated on the non-GC heap. Any aggregate type instance (struct, class, or interface) is a valid argument no matter where it was allocated.

Non-deterministic destruction

In languages with support for objects and a garbage collector, the responsibility for destroying object instances allocated on the GC heap falls to the GC. This is known as finalization. Before reclaiming an object’s memory, the GC finalizes the object by invoking its finalizer.

Finalization, though convenient, comes with a price. In Java’s particular circumstances, the price was determined to be so high that its maintainers deprecated the Object.finalize method and left a scary warning about its use in the documentation. It’s worth quoting here:

The finalization mechanism is inherently problematic. Finalization can lead to performance issues, deadlocks, and hangs. Errors in finalizers can lead to resource leaks; there is no way to cancel finalization if it is no longer necessary; and no ordering is specified among calls to finalize methods of different objects. Furthermore, there are no guarantees regarding the timing of finalization. The finalize method might be called on a finalizable object only after an indefinite delay, if at all.

Finalization in D isn’t quite the bugbear it is in Java, but we do see a less dramatic warning about it in the D documentation:

The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreferenced objects is not specified.

Although there’s no mention of “finalization” or “finalizers” here, that’s precisely what the text is referring to. The core message is the same in both warnings: finalization is non-deterministic and cannot be relied on.

Unlike structs, classes in D are reference types by default. Some consequences: the programmer never has direct access to the underlying class instance; instances declared uninitialized are null by default; the normal use case is to allocate instances via new. When a class is instantiated in D, it is usually going to be managed by the GC and its destructor will serve as a finalizer.

As an experiment, let’s change the definition of struct Predictable in our first example to class Unpredictable and use new to allocate the instances like so:

import std.stdio;
class Unpredictable
{
    int number;
    this(int n)
    {
        writeln("Constructor #", n);
        number = n;
    }
    ~this()
    {
        writeln("Destructor #", number);
    }
}

void main()
{
    Unpredictable s0 = new Unpredictable(0);
    {
        Unpredictable s1 = new Unpredictable(1);
    }
    Unpredictable s2 = new Unpredictable(2);
}

We’ll see that the output is drastically different:

Constructor #0
Constructor #1
Constructor #2
Destructor #0
Destructor #1
Destructor #2

Anyone familiar with the characteristics of the default DRuntime constructor can predict for this very simple program that all the destructors will be run when the GC’s cleanup function is executed as the D runtime shuts down, and that they will be executed in the order in which they were declared (an implementation detail; and note that destruction at shut down can be disabled via a command line argument). But in a more complex program, this ability to predict breaks down. Destructors can be invoked by the GC at almost any time and in any order.

To be clear, the GC will only perform its cleanup duties if and when it finds more memory is needed to fulfill a specific allocation request. In other words, it isn’t constantly running in the background, marking objects unreachable and calling destructors willy nilly. To that extent, we can predict when the GC has the possibility to perform its duties. Beyond that, all bets are off. We cannot predict with accuracy if any destructors will be invoked during any given allocation request or the order in which they will be invoked. This uncertainty has ramifications for how one implements destructors for any GC-managed type.

For starters, destructors of GC-managed objects should never perform any operation that can potentially result in a GC allocation request. Attempting to do so can result in an InvalidMemoryOperationError at run time. I use the word “potentially” because some operations can indirectly cause the error in certain circumstances, but not in others. Some examples: attempting to index an associative array can trigger an attempt to allocate a RangeError if the key is not present; a failed assert will result in allocation of an AssertError; calling any function not annotated with @nogc means GC operations are always possible in the call stack. These and any such operations should be avoided in the destructors of GC-managed objects. (The first seven items on the list of operations disallowed in @nogc functions are collectively a good guide.)

A larger issue is that one cannot rely on any resource still being valid when a destructor is called by the GC. Consider a class that attempts to close a socket handle in its destructor; it’s quite possible that the destructor won’t be called until after the program has already shutdown the network interface. There is no scenario in which the runtime can catch this. In the best case, such circumstances will result in a silent failure, but they could also result in crashes during program shutdown or even sooner.

What it comes down to is that GC-allocated objects should never be used to manage any resource that depends upon deterministic destruction for cleanup.

Designing for destruction

For the D neophyte, it can appear as if destructors in D are useless. Given that both struct and class instances can be allocated from memory that may or may not be managed by the GC, that destructors of GC-managed objects are not guaranteed to run, and that destructors are forbidden to perform GC operations during finalization, how can we ever rely on them?

In practice, it’s not as bad as it may seem. Issues do arise for the unwary, but armed with a basic awareness of the nature of D destructors, it turns out that it’s pretty easy to avoid having problems. This is especially true if the programmer adopts two fairly simple rules.

1. Pretend class destructors don’t exist

Class instances will nearly always be allocated with new. That means their destructors will nearly always be non-deterministic. Much of what one would want to do in a destructor is somehow dependent on program state: either the destructor itself expects a certain state (like writing to a log file that is expected to be open), or the program expects the destructor to have modified the program state in a specific manner (like releasing a resource handle).

Non-deterministic destruction means that all expectations about program state are thrown out the window. That log file might have already been closed, so the message will never be written (I hope it wasn’t important). That system resource handle may never be released until the program ends (I hope that particular resource isn’t scarce). Even if it seems through testing that a class destructor is working the way it’s intended, it’s quite likely down to the fact that the testing has not uncovered the case where it breaks. In a long-running program, that case will inevitably pop up at some point. Have fun debugging when your production game server starts randomly crashing.

So when using classes in D, pretend they have no destructors. Pretend that they are Java classes with a deprecated finalize method.

2. Don’t allocate structs on the GC heap when they have destructors

Since we’re pretending classes have no destructors, then we’re going to turn to structs for all of our destructive needs. Allocating structs as value objects on the stack will cover many use cases, but sometimes we may need to allocate them on the heap. When that situation arises, do not allocate any destructor-bearing struct with new. If we allocate a struct that has a destructor on the GC heap, it completely defeats our purpose of avoiding class destructors in the first place. That destructor we intended to be deterministic is now non-deterministic, so we may as well have just used a class.

As we have seen, struct instances can be allocated on the non-GC heap (e.g., with malloc) and their destructors manually invoked with the destroy function. If we need deterministic destruction and we absolutely must have a heap-allocated struct, then we cannot allocate that struct on the GC heap.

Guidelines schmuidelines

I’m sure someone is reading the above guidelines and thinking, “If I have to pretend that classes have no destructors, then why do classes have destructors?” Well, you don’t have to.

There is no One True Path to follow when deciding if an object should be implemented as a class or struct. Personally, I will always prefer structs over classes, and I will only reach for a class if I need something structs can’t give me easily (like a hierarchy) or efficiently. Other people will consider if the object they need to represent has an identity, e.g., an Actor in a simulation versus the Vertex that defines its 3D coordinates. POD (Plain Old Data) types should always be structs, but beyond that it’s largely a matter of preference.

My two guidelines are based on my experience and that of others with whom I’ve spoken. They are intended to help you keep the full implications of D’s distinction between classes and structs at the forefront of your thoughts when architecting your program. They are not commandments that every D programmer must follow.

Realistically, most D programmers will encounter circumstances at one time or another in which the guidelines do not apply. For example, when mixing GC-managed memory and manually-managed memory in the same program, it’s quite possible for a struct intended for stack use to wind up on the GC heap if the programmer is unaware of the circumstances. And some D programmers will always prefer classes over structs because that’s just the way they want it, and so will simply choose to ignore the guidelines. That’s no problem as long as they fully understand the consequences.

So what does that mean? How do you get over the non-deterministic nature of class destructors if your Actor class absolutely must have a destructor, or if you prefer to always follow The Way of the Class? How do you prevent structs intended for the stack from being GC-allocated? These are things we’ll be looking at in Part Two. See you there.

Thanks to Ali Çehreli and Max Haughton for their feedback. And to Adam D. Ruppe for his conversation on the topic in Discord and the title suggestion (it fits in more nicely with the series than the ‘Appetite For Destruction’ I had intended to go with)

A New Year, A New Release of D

Here in DLang Land we’re beginning the new year with a new release of the D reference compiler (DMD) and a beta release of the popular LLVM-based D compiler (LDC). D 2.095.0 is crammed full of 27 major changes and 78 fixes from 61 contributors. Following are some highlights that I expect some D programmers might find interesting, but please see the changelog for the full rundown. Those more interested in Bugzilla issue numbers can jump straight to the bugfix list

D 2.095.0

Digital Mars D logo

D’s support for other programming languages is important for interacting with existing codebases. C ABI compatibility has been strong from the beginning. Support for Objective-C and C++ came later. Though C++-compatibility is a bear to get right, it keeps improving with every compiler release. This release continues that trend and also enhances Objective-C support. We also see a number of QOL (quality-of-life) improvements throughout the compiler, libraries, and tools. DUB, the D build tool and package manager that ships with the compiler (and is also available separately), especially gets a good bit of love in this release.

C++ header generation

For a little while now, DMD has included experimental support for the generation of C++ header files from D source code, via the -CH command-line option, in order to facilitate calling D libraries from C++. For example, given the following D source file:

cpp-ex.d

extern(C++):
struct A {
    int x;
}

void printA(ref A a) {
    import std.stdio : writeln;
    writeln(a);
}

And the following command line:

dmd -HC cpp-ex.d

The compiler outputs the following to stdout (-HCf to specify a file name, and -HCd a directory):

// Automatically generated by Digital Mars D Compiler

#pragma once

#include <assert.h>
#include <stddef.h>
#include <stdint.h>
#include <math.h>

#ifdef CUSTOM_D_ARRAY_TYPE
#define _d_dynamicArray CUSTOM_D_ARRAY_TYPE
#else
/// Represents a D [] array
template<typename T>
struct _d_dynamicArray
{
    size_t length;
    T *ptr;

    _d_dynamicArray() : length(0), ptr(NULL) { }

    _d_dynamicArray(size_t length_in, T *ptr_in)
        : length(length_in), ptr(ptr_in) { }

    T& operator[](const size_t idx) {
        assert(idx < length);
        return ptr[idx];
    }

    const T& operator[](const size_t idx) const {
        assert(idx < length);
        return ptr[idx];
    }
};
#endif

struct A;

struct A
{
    int32_t x;
    A() :
        x()
    {
    }
};

extern void printA(A& a);

This release brings a number of fixes and improvements to this feature, as can be seen in the changelog. Note that generation of C headers is also supported via -H, -Hf, and -Hd.

Default C++ standard change

Prior to this release, extern(C++) code was guaranteed to link with C++98 binaries out of the box. This is no longer true, and you will need to pass -extern-std=c++98 on the command line to maintain that behavior. The C++11 standard is now the default.

Additionally, the compiler will now accept -extern-std=c++20. In practice, the only effect this has at the moment is to change the compile-time value, __traits(getTargetInfo, "cppStd"), but new types may be added in the future.

Improved Objective-C support

Objective-C compatibility is enhanced in this release with support for Objective-C protocols. This is achieved by repurposing interface in an extern(Objective-C) context. Additionally, the attributes @optional and @selector help get the job done. Read the details and see an example in the changelog.

Improved compile-time feedback

Here’s a QOL issue that really became an annoyance after a deprecation in Phobos, the standard library: when instantiating templates, deprecation messages reported the source location deep inside the library where the deprecated feature was used (e.g., template constraints) and not the user-code instantiation that triggered it. No longer. You’ll now get a template instantiation trace just as you do on errors.

Another QOL feedback issue involved the absence of errors. The compiler would silently allow multiple definitions of identical functions in the same module. The compiler will now raise an error when it encounters this situation. However, multiple declarations are allowed as long as there is at most one definition. For mangling schemes where overloading is not supported (extern(C), extern(Windows), and extern(System)), the compiler will emit a deprecation message.

The mainSourceFile in DUB recipes

The mainSourceFile entry in DUB package recipes was a way to specify a source file containing a main function that should be excluded from unit tests when invoking dub test. However, when setting up other configurations where the file should also not be compiled, or where a different main source file was required, it was necessary to add the file to an excludedSourceFiles entry. This is no longer the case. If a mainSourceFile is specified in any configuration, it will automatically be excluded from other configurations.

Propagating compiler flags to dependencies

Not every existing compiler flag has a corresponding build setting for DUB recipes. The dflags entry allows for such flags to be configured for any project. For example, -fPic, or -preview=in. The catch is, it does not propagate to dependencies. Now, you can explicitly specify compiler flags for dependencies by adding a dflags parameter to any dependency entry in a dub.json recipe. For example:

{
    "name": "example",
    "dependencies": {
        "vibe-d": { "version" : "~>0.9.2", "dflags" : ["-preview=in"] }
    }
}

Unfortunately, it appears the implementation does not work for recipes in SDLang format (dub.sdl), so those of us who prefer that format over JSON will have to wait a bit.

LDC 1.025-beta1

LDC logo

This release of LDC brings the compiler up to date with the D 2.095.0 frontend, with the prebuilt packages based on LLVM v11.0.1. The biggest news in this release looks to be the new -linkonce-templates flag. This experimental feature causes the compiler to emit template symbols into each compilation unit that references them, “with optimizer-discardable linkonce-odr linkage”. The implementation has big wins both in terms of compile times when compiling with optimizations turned on and in cutting down on a class of template-related bugs. See the beta1 release notes for the details.

Happy New Year

On behalf of the D Language Foundation, I wish you all the very best for 2021. As a community, we weren’t affected much by the global pandemic. Sure, we were forced to cancel DConf 2020, but the silver lining is that it also motivated us to finally launch DConf Online in November. We fully intend to make this an annual event alongside of, not in place of, the real-world conference (when physically possible). Other than that, it was business as usual in D Land.

At a personal level, the lives of some in our community were disrupted last year in ways large and small. Please remember that, though the primary object that brings us together is our enthusiasm for the D programming language, we are all still human beings behind our keyboards. The majority of work that gets done in our community is carried out on a volunteer basis. All of us, as the beneficiaries, must never forget that the health and well-being of everyone in our community take top priority over any work we may want or expect to see completed. We encourage everyone to keep an ear open for those who may need to borrow it, and never be afraid to communicate that need when it feels necessary. Sometimes, an open ear can make a very big difference.

Thanks to all of you for your participation in the D community, whether as a user, a contributor, or both. Stay safe, and have a very happy 2021.

DConf Online 2020: How to Participate

DConf Online 2020
As I write, we are a little over 24 hours away from the start of DConf Online 2020, our first online version of DConf. All of the talks for Day One are uploaded, the livestreams are scheduled, and #BeerConf is almost ready to launch.

The details

All of the prerecorded talks will be accessible on our YouTube channel via the DConf Online 2020 playlist (look under the live chat box for the full playlist; you may have to scroll down). Use the live chat to ask questions during the talk. The speaker will be available to provide short answers in the chat box. Longer, more complex answers, and/or additional context, will be provided in the Q & A livestream. The speaker will let you know if he is providing more detail in the livestream. If you don’t want to tab over to the livestream and miss part of the talk, the livestream will be saved to our channel once it ends and you will be able to go back and watch any part of it you may be interested in.

Each day, the Q & A livestream will begin at 13:50 UTC. Each speaker will be in the livestream 5 minutes before his talk begins and will be available to answer questions for the duration of the talk and for up to 15 minutes after. As I said above, you may ask questions in the live chat of the talk, but you may also ask them in the live chat of the livestream (and will likely have to if you have questions after the talk ends). Depending on the amount of time available, the number of questions, and the speaker’s schedule, each speaker may stay longer than 15 minutes after the talk, but is not required to.

Please note that speakers are not expected to answer off-topic questions. It’s entirely up to them if they do so.

I’ll be hosting the livestream throughout each day. I’ll be chatting with the speakers about their talks and D in general to fill in the dead time when no one is asking questions. After the conference is over, I intend to chop up the livestream and upload the Q & A session for each talk as separate videos.

On Day One, we have an Ask us Anything session scheduled with Walter and Átila. This will take place in the Q & A livestream for that day. We also have a livecoding session by Adam Ruppe scheduled. That will take place in a separate livestream when the Day One Q & A livestream ends (links below). Adam will be monitoring the chat as he codes, so he will answer any questions you have.

The livestream links:

BeerConf

From 18:00 UTC November 20, we’ll be running a Jitsi Meet instance for our online version of BeerConf. Everyone is welcome to join, no alcohol required. If you aren’t familiar with BeerConf, you can read a brief description of it on the DConf Online 2020 website. You can also read about it here on the blog.

BeerConf will run all weekend long. You can come and go as you please, during talks, in between talks, day time, night time, anytime!

See this D forum thread for details on how to join.

The prizes

Throughout the event, I’ll be announcing different ways for viewers to win various prizes. We’ll be handing out t-shirts, coffee mugs, and other items from the DLang Swag Emporium (and maybe a DMan shirt or two). I’ll announce the details in the Q & A livestream and, if a talk is ongoing, in the talk’s live chat. Sometimes, winning the prize may involve tweeting, in which case I’ll announce the details on Twitter, so be sure to follow us if you aren’t already.

Additionally, everyone who asks a question to which a speaker provides an answer will be entered into at least two random drawings. There will be one random drawing at the end of each day which includes those eligible on that day. The winners of these drawings will receive a $50 Amazon eGift card. The winner of the two-day drawing will receive a $100 Amazon eGift card. If you win on Day One, you will not be eligible to win on Day Two, but both winners will be eligible to win the two-day drawing.

Funding for all prizes comes from the D Language Foundation General Fund. You can contribute by buying DConf Online 2020 swag or other items from the DLang Swag Emporium, by selecting the D Language Foundation as your preferred AmazonSmile charity and shopping through smile.amazon.com, or by donating directly to the General Fund.

Swag prize winners will be announced in a talk’s live chat and/or the Q & A livestream, depending on the nature of the prize task. For prize tasks that take place on Twitter, winners will not be announced, but will be notified through private message. Amazon eGift card winners will be announced in the livestream. Since YouTube apparently no longer allows private messages, winners on YouTube will be instructed on how to claim their prize when they are announced in the livestream.

Enjoy!

We want to thank all of our speakers for volunteering their time to put together these presentations and making themselves available for Q & A. Without them, this event would not be possible. We hope you enjoy DConf Online 2020!

D 2.094.0, DConf Online Schedule, and SAOC 2020

Digital Mars D logo

The end of September saw a new release of the reference D compiler, DMD 2.094.0, sporting the latest language features. That was followed not long after by a beta release of LDC, the LLVM-based D compiler, based on the same frontend version. The DMD 2.094.1 patch release entered into beta a few days before this post was published. Meanwhile, the first Milestone of the Symmetry Autumn of Code has come to an end, and the DConf Online 2020 schedule has been published.

DMD 2.094.0

This release of DMD incorporates 21 major changes and 119 fixed Bugzilla issues, thanks to the efforts of 49 contributors. Here are some highlights.

This ain’t your grandpa’s in parameter

Back in the days of yore, when DMD was still a pre-1.000 alpha, the D language supported in, out, and inout parameter storage classes. They had the following meanings:

  • in (input), the default, was the bog standard function parameter which is a mutable copy of its argument, i.e., the normal passed-by-value parameter.
  • out (output) parameters were passed by reference and, upon function entry, initialized to the default initializer value for the parameter type (e.g., 0 for int, float.nan for float, etc).
  • inout (input/output) parameters were passed unmodified by reference.

When D2 came along, there were some changes. inout was replaced by the ref keyword and out kept the same meaning, but now there was an explicit restriction that these parameters could only take lvalue references; rvalue references, commonly used in C++, were forbidden as arguments. With in, things became a little muddy. And that brings us to scope parameters, a D2 feature that has evolved over time.

For quite some time, it was not fully implemented and only affected parameters that were delegates: the compiler would only allocate a closure for a scope delegate if it absolutely needed to. The D2 version of in was intended to be equivalent to const scope, but it was never fully implemented and was effectively equivalent to const. Today, scope is intended to be applied to ref or out parameters to prevent them from escaping the function scope, and with DMD 2.092.0, in finally became equivalent to const scope. In DMD 2.094.0, in has been reimagined and extended to solve the rvalue reference issue.

The first thing to know about the new in is that it’s still equivalent to const scope, but now the compiler is free to choose whether to pass an in parameter’s argument by reference or by value. The second thing to know is that in parameters can now take rvalue references. All of this is implemented behind the -preview=in command line switch first introduced in 2.092.0.

Like any preview feature, the new in may or may not make it into the language proper, and if it does it might not be without changes. But for now, it’s there and waiting to be put through its paces. The more people using it, pushing it, and looking for holes, the sooner we can know if this is the in we’re looking for.

Ddoc Markdown support

Quite a while ago, Ddoc, D’s built-in documentation syntax, was enhanced to support some Markdown features. It was hidden beind a -preview switch. Now, that switch is no longer necessary—Ddoc supports Markdown out of the box.

Note that this is not full-on Markdown. For example, although asterisks are supported for italic and bold text, underscores are not. But Markdown-style links, code blocks, inline code, and images are supported. For the details, see the Documentation Generator documentation.

More speed please

Since the release of DMD 2.091.0, the DMD binaries in the Windows release packages are being compiled with LDC. This is a good thing because LDC has a better optimizer than DMD, which makes DMD’s fast compile times even faster. Now, LDC is used to compile binary releases on Linux, OSX, and FreeBSD. As a side effect, there are now no more 32-bit releases for FreeBSD, and additional binary tools are no longer included. If you need them, you can still pick them up from https://digitalmars.com/ or from older DMD releases.

Download

The latest release of DMD is always available for download at https://dlang.org/download.html. The latest Beta or Release Candidate can always be found there as well. You can also find links to download LDC and GDC, the GCC-based D compiler (which is now an official component of GCC). While you’re there, if you enjoy the D programming language, consider leaving a tip to the D Language Foundation.

DConf Online 2020 Schedule

DConf Online 2020 is coming together nicely. Over the two days of November 21 and 22, we have nine prerecorded talks, a livestream Q & A with the language maintainers, and a livecoding session. We’ll also be bringing our annual real-world BeerConf to the virtual world.

The talks

The prerecorded talks will be scheduled to premiere on our YouTube channel at the UTC times listed on the schedule. For the duration of each talk and for 15 minutes after, each speaker will be avalailable in a separate livestream for questions and answers related to the talk. We want to record the questions and answers verbally for posterity. The idea is that viewers of the prerecorded talk can ask questions in the video’s chat, or ask in the livestream chat during or up to 15 minutes after the talk. The speaker will read the questions out loud. Short answers will be provided both verbally and in the chat. Longer answers will be provided verbally only. Commenters asking questions during the talk will be notified in the chat if their questions were selected so that they don’t have to tab out to the Q & A and miss a portion of the talk. They can go back and watch the Q & A video later on our YouTube channel.

The livestream Q & A with the language maintainers will run on our YouTube channel. We’ll be streaming a video conference call and questions will be taken from the livestream chat. During the livestream, some viewers will be invited to join in on the conference call and ask their question directly in order to provide more opportunity for follow up and feedback. Details on how to participate will be released on the day of the livestream.

Throughout the weekend, we’ll be handing out prizes to random viewers. Eligibility details will be provided during the course of the event, so pay attention!

BeerConf

BeerConf is a real-world DConf tradition dating back to the first edition of the conference, though the name didn’t come around until Ethan Watson coined it a few years later. Every year, we designate a gathering spot where DConf attendees can mingle every evening to unwind. The DConf days are where we all wear our D programmer hats and spend our time talking about our favorite programming language, but BeerConf is our chance to be human. We still talk about D, but we also have the opportunity to go beyond the code and get to know each other on a more personal level.

So for DConf Online, we’re taking BeerConf online. On the evening (UTC) of Friday, November 20, we’ll open the BeerConf video conference to any and all, and we’ll leave it open all weekend. Despite the name, no alcohol is required to participate. All you need is an internet connection and a web browser, and you can come and go as you please. We’ve been running monthly BeerConf events since June of this year, so we know that, though it’s not quite the same as being in the same place, it’s still a lot of fun.

We hope to see you November 20–22 in BeerConf and DConf!

Symmetry Autumn of Code

We are currently running our third annual Symmetry Autumn of Code (SAOC). Sponsored by Symmetry Investments, the event provides an opportunity for D programmers to make a little money working on projects aimed at improving the D ecosystem. Particpants each get paid $1000 for the successful completion of each of three milestones. At the end of a fourth milestone, the progress of each participant will be evaluated by the SAOC committee, then one participant will be awarded a final $1000 payment, and receive free registration and reimbursement for transportation and lodging for the next real-world DConf.

We currently have four programmers coding away toward their goals. Milestone 1 has just come to an end and Milestone 2 is set to begin. The participants will soon be sending in their milestone reports, their mentors will send in progress evaluations, and the SAOC Committee will review it all to determine if everyone has put forth the effort required to continue through the event (we expect no issues on that front!). You can follow the progress of each participant, and perhaps provide them with some timely advice, through their weekly updates in the D General Forum. Search for “SAOC2020”.

Function Generation in D: The Good, the Bad, the Ugly, and the Bolt

Introduction

Digital Mars D logo

A while ago, Andrei Alexandrescu started a thread in the D Programming Language forums, titled “Perfect forwarding”, about a challenge which came up during the July 2020 beerconf:

Write an idiomatic template forward that takes an alias fun and defines (generates) one overload for each overload of fun.

Several people proposed solutions. In the course of the discussion, it arose that there is sometimes a need to alter a function’s properties, like adding, removing, or hanging user-defined attributes or parameter storage classes. This was exactly the problem I struggled with while trying to support all the bells and whistles of D functions in my openmethods library.

Eventually, I created a module that helps with this problem and contributed it to the bolts meta-programming library. But before we get to that, let’s first take a closer look at function generation in D.

The Good

I speculate that many a programmer who is a moderately advanced beginner in D would quickly come up with a mostly correct solution to the “Perfect forwarding” challenge, especially those with a C++ background who have an interest in performing all sorts of magic tricks by means of template meta-programming. The solution will probably look like this:

template forward(alias fun)
{
  import std.traits: Parameters;
  static foreach (
    ovl; __traits(getOverloads, __traits(parent, fun), __traits(identifier, fun))) {
    auto forward(Parameters!ovl args)
    {
      return ovl(args);
    }
  }
}

...

int plus(int a, int b) { return a + b; }
string plus(string a, string b) { return a ~ b; }

assert(forward!plus(1, 2) == 3);        // pass
assert(forward!plus("a", "b") == "ab"); // pass

This solution is not perfect, as we shall see, but it is not far off either. It covers many cases, including some that a beginner may not even be aware of. For example, forward handles the following function without dropping function attributes or parameter storage classes:

class Matrix { ... }

Matrix times(scope const Matrix a, scope const Matrix b) pure @safe
{
  return ...;
}

pragma(msg, typeof(times));
// pure @safe Matrix(scope const(Matrix) a, scope const(Matrix) b)

pragma(msg, typeof(forward!times));
// pure @safe Matrix(scope const(Matrix) _param_0, scope const(Matrix) _param_1)

It even handles user-defined attributes (UDAs) on parameters:

struct testParameter;

void testPlus(@testParameter int a, @testParameter int b);

pragma(msg, typeof(testPlus));
// void(@(testParameter) int a, @(testParameter) int b)

pragma(msg, typeof(forward!testPlus));
// void(@(testParameter) int a, @(testParameter) int b)

Speaking of UDAs, that’s one of the issues with the solution above: it doesn’t carry function UDAs. It also doesn’t work with functions that return a reference. Both issues are easy to fix:

template forward(alias fun)
{
  import std.traits: Parameters;
  static foreach (ovl; __traits(getOverloads, __traits(parent, fun), __traits(identifier, fun)))
  {
    @(__traits(getAttributes, fun)) // copy function UDAs
    auto ref forward(Parameters!ovl args)
    {
      return ovl(args);
    }
  }
}

This solution is still not 100% correct though. If the forwardee is @trusted, the forwarder will be @safe:

@trusted void useSysCall() { ... }

pragma(msg, typeof(&useSysCall));         // void function() @trusted
pragma(msg, typeof(&forward!useSysCall)); // void function() @safe

This happens because the body of the forwarder consists of a single statement calling the useSysCall function. Since calling a trusted function is safe, the forwarder is automatically deemed safe by the compiler.

The Bad

However, Andrei’s challenge was not exactly what we discussed in the previous section. It came with a bit of pseudocode that suggested the template should not be eponymous. In other words, I believe that the exact task was to write a template that would be used like this: forward!fun.fun(...). Here is the pseudocode:

// the instantiation of forward!myfun would be (stylized):

template forward!myfun
{
    void myfun(int a, ref double b, out string c)
    {
        return myfun(a, b, c);
    }
    int myfun(in string a, inout double b)
    {
        return myfun(a, b);
    }
}

Though this looks like a small difference, if we want to implement exactly this, a complication arises. In the eponymous forward, we did not need to create a new identifier; we simply used the template name as the function name. Thus, the function name was fixed. Now we need to create a function with a name that depends on the forwardee’s name. And the only way to do this is with a string mixin.

The first time I had to do this, I tried the following:

template forward(alias fun)
{
  import std.format : format;
  import std.traits: Parameters;
  enum name = __traits(identifier, fun);
  static foreach (ovl; __traits(getOverloads, __traits(parent, fun), name)) {
    @(__traits(getAttributes, fun))
    auto ref mixin(name)(Parameters!ovl args)
    {
      return ovl(args);
    }
  }
}

This doesn’t work because a string mixin can only be used to create expressions or statements. Therefore, the solution is to simply expand the mixin to encompass the entire function definition. The token-quote operator q{} is very handy for this:

template forward(alias fun)
{
  import std.format : format;
  import std.traits: Parameters;
  enum name = __traits(identifier, fun);
  static foreach (ovl; __traits(getOverloads, __traits(parent, fun), name)) {
    mixin(q{
        @(__traits(getAttributes, fun))
          auto ref %s(Parameters!ovl args)
        {
          return ovl(args);
        }
      }.format(name));
  }
}

Though string mixins are powerful, they are essentially C macros. For many D programmers, resorting to a string mixin can feel like a defeat.

Let us now move on to a similar, yet significantly more difficult, challenge:

Write a class template that mocks an interface.

For example:

interface JsonSerializable
{
  string asJson() const;
}

void main()
{
  auto mock = new Mock!JsonSerializable();
}

Extrapolating the techniques acquired during the previous challenge, a beginner would probably try this first:

class Mock(alias Interface) : Interface
{
  import std.format : format;
  import std.traits: Parameters;
  static foreach (member; __traits(allMembers, Interface)) {
    static foreach (fun; __traits(getOverloads, Interface, member)) {
      mixin(q{
          @(__traits(getAttributes, fun))
          auto ref %s(Parameters!fun args)
          {
            // record call
            static if (!is(ReturnType!fun == void)) {
              return ReturnType!fun.init;
            }
          }
        }.format(member));
    }
  }
}

Alas, this fails to compile, throwing errors like:

Error: function `challenge.Mock!(JsonSerializable).Mock.asJson` return type
inference is not supported if may override base class function

In other words, auto cannot be used here. We have to fall back to explicitly specifying the return type:

class Mock(alias Interface) : Interface
{
  import std.format : format;
  import std.traits: Parameters, ReturnType;
  static foreach (member; __traits(allMembers, Interface)) {
    static foreach (fun; __traits(getOverloads, Interface, member)) {
      mixin(q{
          @(__traits(getAttributes, fun))
          ReturnType!fun %s(Parameters!fun args)
          {
            // record call
            static if (!is(ReturnType!fun == void)) {
              return ReturnType!fun.init;
            }
          }
        }.format(member));
    }
  }
}

This will not handle ref functions though. What about adding a ref in front of the return type, like we did in the first challenge?

// as before
          ref ReturnType!fun %s(Parameters!fun args) ...

This will fail with all the functions in the interface that do not return a reference.

The reason why everything worked almost magically in the first challenge is that we called the wrapped function inside the template. It enabled the compiler to deduce almost all of the characteristics of the original function and copy them to the forwarder function. But we have no model to copy from here. The compiler will copy some of the aspects of the function (pure, @safe, etc.) to match those of the overriden function, but not some others (ref, const, and the other modifiers).

Then, there is the issue of the function modifiers: const, immutable, shared, and static. These are yet another category of function “aspects”.

At this point, there is no other option than to analyze some of the function attributes by means of traits, and convert them to a string to be injected in the string mixin:

      mixin(q{
          @(__traits(getAttributes, fun))
          %sReturnType!fun %s(Parameters!fun args)
          {
            // record call
            static if (!is(ReturnType!fun == void)) {
              return ReturnType!fun.init;
            }
          }
        }.format(
            (functionAttributes!fun & FunctionAttribute.const_ ? "const " : "")
          ~ (functionAttributes!fun & FunctionAttribute.ref_ ? "ref " : "")
          ~ ...,
          member));
    }

If you look at the implementation of std.typecons.wrap, you will see that part of the code deals with synthesizing bits of a string mixin for the storage classes and modifiers.

The Ugly

So far, we have looked at the function storage classes, modifiers, and UDAs, but we have merely passed the parameter list as a single, monolithic block. However, sometimes we need to perform adjustments to the parameter list of the generated function. This may seem far-fetched, but it does happen. I encountered this problem in my openmethods library. During the “Perfect forwarding” discussion, it appeared that I was not the only one who wanted to do this.

I won’t delve into the details of openmethods here (see an older blog post for an overview of the module); for the purpose of this article, it suffices to say that, given a function declaration like this one:

Matrix times(virtual!Matrix a, double b);

openmethods generates this function:

Matrix dispatcher(Matrix a, double b)
{
  return resolve(a)(a, b);
}

The virtual template is a marker; it indicates which parameters should be taken into account (i.e., passed to resolve) when picking the appropriate specialization of times. Note that only a is passed to the resolve function—that is because the first parameter uses the virtual! marker and the second does not.

Bear in mind, though, that dispatcher is not allowed to use the type of the parameters directly. Inside the openmethods module, there is no Matrix type. Thus, when openmethods is handed a function declaration, it needs to synthesize a dispatcher function that refers to the declaration’s parameter types exclusively via the declaration. In other words, it needs to use the ReturnType and Parameters templates from std.traits to extract the types involved in the declaration – just like we did in the examples above.

Let’s put aside function attributes and UDAs – we already discussed those in the previous section. The obvious solution then seems to be:

ReturnType!times dispatcher(
  RemoveVirtual!(Parameters!times[0]) a, Parameters!times[1] b)
{
  return resolve(a)(a, b);
}

pragma(msg, typeof(&dispatcher)); // Matrix function(Matrix, double)

where RemoveVirtual is a simple template that peels off the virtual! marker from the type.

Does this preserve parameter storage classes and UDAs? Unfortunately, it does not:

@nogc void scale(ref virtual!Matrix m, lazy double by);

@nogc ReturnType!scale dispatcher(RemoveVirtual!(Parameters!scale[0]) a, Parameters!scale[1] b)
{
  return resolve(a)(a, b);
}

pragma(msg, typeof(&dispatcher)); // void function(Matrix a, double b)

We lost the ref on the first parameter and the lazy on the second. What happened to them?

The culprit is Parameters. This template is a wrapper around an obscure feature of the is operator used in conjunction with the __parameters type specialization. And it is quite a strange beast. We used it above to copy the parameter list of a function, as a whole, to another function, and it worked perfectly. The problem is what happens when you try to process the parameters separately. Let’s look at a few examples:

pragma(msg, Parameters!scale.stringof); // (ref virtual!(Matrix), lazy double)
pragma(msg, Parameters!scale[0].stringof); // virtual!(Matrix)
pragma(msg, Parameters!scale[1].stringof); // double

We see that accessing a parameter individually returns the type… and discards everything else!

There is actually a way to extract everything about a single parameter: use a slice instead of an element of the paramneter pack (yes, this is getting strange):

pragma(msg, Parameters!scale[0..1].stringof); // (ref virtual!(Matrix))
pragma(msg, Parameters!scale[1..2].stringof); // (lazy double)

So this gives us a solution for handling the second parameter of scale:

ReturnType!scale dispatcher(???, Parameters!scale[1..2]) { ... }

But what can we put in place of ???. RemoveVirtual!(Parameters!scale[0..1]) would not work. RemoveVirtual expects a type, and Parameters!scale[1..2] is not a type—it is a sort of conglomerate that contains a type, and perhaps storage classes, type constructors, and UDAs.

At this point, we have no other choice but to construct a string mixin once again. Something like this:

mixin(q{
    %s ReturnType!(scale) dispatcher(
      %s RemoveVirtual!(Parameters!(scale)[1]) a,
      Parameters!(scale)[1..2] b)
    {
        resolve(a)(a, b);
    }
  }.format(
    functionAttributes!scale & FunctionAttribute.nogc ? "@nogc " : ""
    /* also handle other function attributes */,
    __traits(getParameterStorageClasses, scale, 0)));

pragma(msg, typeof(dispatcher)); // @nogc void(ref double a, lazy double)

This is not quite sufficient though, because it still doesn’t take care of parameter UDAs.

To Boltly Refract…

openmethods once contained kilometers of mixin code like the above. Such heavy use of string mixins was too ugly and messy, so much so that the project began to feel less like fun and more like work. So I decided to sweep all the ugliness under a neat interface, once and for all. The result was a “refraction” module, which I later carved out of openmethods and donated to Ali Akhtarzada’s excellent bolts library. bolts attempts to fill in the gaps and bring some regularity to D’s motley set of meta-programming features.

refraction’s entry point is the refract function template. It takes a function and an “anchor” string, and returns an immutable Function object that captures all the aspects of a function. Function objects can be used at compile-time. It is, actually, their raison d’être.

Function has a mixture property that returns a declaration for the original function. For example:

Matrix times(virtual!Matrix a, double b);
pragma(msg, refract!(times, "times").mixture);
// @system ReturnType!(times) times(Parameters!(times) _0);

Why does refract need the anchor string? Can’t the string "times" be inferred from the function by means of __traits(identifier...)? Yes, it can, but in real applications we don’t want to use this. The whole point of the library is to be used in templates, where the function is typically passed to refract via an alias. In general, the function’s name has no meaning in the template’s scope—or if, by chance, the name exists, it does not name the function. All the meta-expressions used to dissect the function must work in terms of the local symbol that identifies the alias.

Consider:

module matrix;

Matrix times(virtual!Matrix a, double b);

Method!times timesMethod; // openmethods creates a `Method` object for each
                          // declared method

module openmethods;

struct Method(alias fun)
{
    enum returnTypeMixture = refract!(fun, "fun").returnType;
    pragma(msg, returnTypeMixture);              // ReturnType!(fun)
    mixin("alias R = ", returnTypeMixture, ";"); // ok
    pragma(msg, R.stringof);                     // Matrix
}

There is no times and no Matrix in module openmethods. Even if they existed, they could not be the times function and the Matrix class from module matrix, as this would require a circular dependency between the two modules, something that D forbids by default. However, there is a fun symbol, and it aliases to the function; thus, the return type can be expressed as ReturnType!(fun).

All aspects of the function are available piecemeal. For example:

@nogc void scale(ref virtual!Matrix m, lazy double by);
pragma(msg, refract!(scale, "scale").parameters[0].storageClasses); // ["ref"]

Function also has methods that return a new Function object, with an alteration to one of the aspects. They can be used to create a variation of a function. For example:

pragma(msg,
  refract!(scale, "scale")
  .withName("dispatcher")
  .withBody(q{{ resolve(_0[0])(_0); }})
  .mixture
);

@nogc @system ReturnType!(scale) dispatcher(ref Parameters!(scale)[0] _0, lazy Parameters!(scale)[1] _1)
{
  resolve(_0[0])(_0);
}

This is the reason behind the name “refraction”: the module creates a blueprint of a function, performs some alterations on it, and returns a string—called a mixture—which, when passed to mixin, will create a new function.

openmethods needs to change the type of the first parameter while preserving storage classes. With bolts.experimental.refraction, this becomes easy:

original = refract!(scale, "scale");

pragma(msg,
  original
  .withName("dispatcher")
  .withParameters(
    [original.parameters[0].withType(
        "RemoveVirtual!(%s)".format(original.parameters[0].type)),
     original.parameters[1],
    ])
  .withBody(q{{
      return resolve(_0)(%s);
   }}.format(original.argumentMixture))
);

This time, the generated code splits the parameter pack into individual components:

@nogc @system ReturnType!(scale) dispatcher(
  ref RemoveVirtual!(Parameters!(scale)[0]) _0, Parameters!(scale)[1..2] _1)
{
  return resolve(_0)(_0);
}

Note how the first and second parameters are handled differently. The first parameter is cracked open because we need to replace the type. That forces us to access the first Parameters value via indexing, and that loses the storage classes, UDAs, etc. So they need to be re-applied explicitly.

On the other hand, the second parameter does not have this problem. It is not edited; thus, the Parameters slice trick can be used. The lazy is indeed there, but it is inside the parameter conglomerate.

Conclusion

Initially, D looked almost as good as Lisp for generating functions. As we tried to gain finer control of the generated function, our code started to look a lot more like C macros; in fact, in some respects, it was even worse: we had to put an entire function definition in a string mixin just to set its name.

This is due to the fact that D is not as “regular” a language as Lisp. Some of the people helming the evolution of D are working on changing this, and it is my hope that an improved D will emerge in the not-too-distant future.

In the meantime, the experimental refraction module from the bolts meta-programming library offers a saner, easier way of generating functions without compromising on the idiosyncrasies that come with them. It allows you to pretend that functions can be disassembled and reassembled at will, while hiding all the gory details of the string mixins that are necessarily involved in that task.


Jean-Louis Leroy is not French, but Belgian. He got his first taste of programming from a HP-25 calculator. His first real programming language was Forth, where CTFE is pervasive. Later he programmed (a little) in Lisp and Smalltalk, and (a lot) in C, C++, and Perl. He now works for Bloomberg LP in New York. His interests include object-relational mapping, open multi-methods, DSLs, and language extensions in general.

Symmetry Investments and the D Language Foundation are Hiring

Digital Mars D logo

The D Language Foundation is hiring! Thanks to generous funding from Symmetry Investments, we are looking to fill two (mostly) non-programming positions geared toward improving the D ecosystem. Symmetry is also offering a bounty for a specific improvement to DUB, the D build tool and package manager. And on top of all of that, they are hiring D programmers.

D Pull Request/Issue Manager

A lot of good work goes into the D Programming Language GitHub repositories. Unfortunately, some of that good work sometimes gets left behind. A similar story can be told for our Bugzilla database, where some issues are fixed almost as soon as they’re reported and others fall victim to a lack of attention. Efforts have been made in the past to tidy things up, but without someone in a position to permanently keep at it, it’s a task that is never complete.

The D Language Foundation is looking for one or two motivated individuals to take on that permanent position, get the work done, and keep things running smoothly. Symmetry Investements is generously funding this role with $50,000 per year for one person, or $25,000 per year for each of two.

The ideal candidate is someone who:

  • is familiar with git, GitHub, and Bugzilla;
  • is familiar enough with D to be able to review simple pull requests;
  • is able to recognize when more specialized reviews are required and
  • is able to proofread English text (for reviewing documentation and web site pull requests).

Examples of the role’s responsibilities include:

  • ensuring all pull requests follow procedure;
  • reviewing simple pull requests;
  • finding appropriate reviewers for more complex pull requests;
  • ensuring that pull requests are reviewed in a timely manner;
  • reviving stale pull requests;
  • coordinating between pull request submitters and reviewers to prevent pull requests from going stale;
  • closing pull requests that are no longer valid;
  • identifying Bugzilla issues that are duplicates or invalid;
  • identifying Bugzilla issues that are candidates for bounties;
  • publicizing Bugzilla issues in need of a champion and
  • other related tasks.

We are hoping to hire from within the D community, though we will accept queries from anyone. If you are interested in taking on the role, please send your resume to social@dlang.org. You should also indicate if you are willing to do the job full time (just you) or part time (share the responsibilities with someone else).

Community Relations Assistant

I’ve been working with the D Language Foundation for the past three years. Much of what I do falls loosely in the category of Community Relations. These days, I’m in need of an assistant. Symmetry Investments is providing $600 per month for the role.

The job will involve a number of different activities as the need arises, such as:

  • seeking out guest authors and projects to highlight for the D Blog;
  • monitoring our social media accounts;
  • sending out messages from the D Language Foundation (such as thank you notes to new donors);
  • assisting with maintenance of pages at dlang.org and dconf.org;
  • assisting with the organization of events like DConf and SAOC and
  • any odd jobs that pop up now and again.

If you have good communication skills, an optimistic disposition, and enthusiasm for the D Programming Language, I’d like to talk to you. I don’t need a resume. Instead, please send an email to social@dlang.org explaining why you’re the right person for the job.

DUB Bounty

Symmetry Investments logoDUB has become a critical component in the D ecosystem. A significant number of projects depend on it and we need it to be able to meet a wide range of project needs. To that end, there are certainly improvements to be made. One such is in how DUB determines which of a project’s source files are in need of recompilation. Currently, DUB follows in the tradition of the venerable make and uses timestamp comparisons to make that determination.

A new generation of version control and build tools (git, buck, bazel, scons, waf, plz, and more) rely on file checksums to assess the need for action. This is a much more robust approach because it detects actual changes in file content. Timestamps can change in any number of irrelevant ways. Robustness is important if one is to depend on a build working properly even when files are moved, copied, and shared across people, machines, and teams. As hashes are fast to compute on modern hardware, the impact on speed is very low.

Symmetry Investments is offering a $2,000 bounty to the programmer who either converts DUB’s use of timestamp-dependent builds to use SHA-1 hashing throughout, or implements it as a global option to preserve the current behavior.

For inspiration, see this clip from Linus Torvald’s Google talk, and the article Build-Systems Should Use Hashes Over Timestamps. Note that shasum $(git ls-files) in Phobos takes 0.05 seconds on a warm SSD drive in a desktop machine.

Anyone interested in taking on this bounty should contact social@dlang.org beforehand. Anyone interested in contributing to the bounty amount can do so via the bounty card Support for Hash-Based Recompilation in DUB at our Task Bounties page.