Category Archives: Compilers & Tools

How an Engineering Company Chose to Migrate to D

Bastiaan Veelo is the lead developer of a specialised program for the
computer aided geometric design of ship hulls called Fairway, for the
company SARC in the Netherlands.


Imagine there is this little-known programming language in which you enjoy programming in your free time. You know it is ready for prime time and you dream about using it at work everyday. This is the story about how I made a dream like that come true.

My early acquaintance with D

Back when “google” was not yet a common verb, I was doing a web search for “parsing C++”. The reason was that writing a report for an assignment had derailed into writing a syntax highlighter for noweb using bison and flex, and I found out firsthand that C++ is not easy to parse. That web search brought up this page (present version) with an overview of the D Programming Language, and the following statement has had me hooked ever since:

D’s lexical analyzer and parser are totally independent of each other and of the semantic analyzer. This means it is easy to write simple tools to manipulate D source perfectly without having to build a full compiler. It also means that source code can be transmitted in tokenized form for specialized applications.

“Genius,” I thought, “here we have someone who knows what he’s doing.” This is representative of the pragmatic professionalism that still radiates from the D community, and it combines with an unpretentious flair that makes it pleasant to be around. This funny quote decorated its homepage for many years:

“Great, just what I need.. another D in programming.” – Segfault

Nevertheless, I didn’t have many opportunities to use the language and I largely remained sitting on the fence, observing its development.

Programming professionally

With mostly academic programming experience, I started programming professionally in 2006 for SARC, a Dutch engineering company serving the maritime industry. Since the early ’80s they have been developing software for ship design and onboard loading calculations, which today amounts to roughly half a million lines of code. I think their success can partly be attributed to their choice of programming language: Extended Pascal (the ISO 10206 standard, not one of the many proprietary extensions of Pascal).

Extended Pascal was a great improvement over ISO 7185 Pascal. Its compiler, by Prospero Software from England, was fast and well documented. The language is small enough and its syntax appropriately verbose to make engineering professionals quickly productive in programming. Personally though, I spent most of my time programming in C++, modernizing their system for computer aided design of ship hulls using Qt and Coin3D.

When your company outlives a programming language

Although selecting an ISO standard in favor of a proprietary Pascal dialect seemed wise at the time, it is apparent now that the company has outlived the language. Prospero Development Software Ltd was officially dissolved 15 years ago. Still, its former director, Tony Hetherington, continued giving support many years after, but he’d be close to 86 years old now and can no longer be reached. Its website is gone, last archived in 2013. There’s GNU Pascal, which also supports ISO 10206, but that project has stopped moving and long ago lost synchrony with gcc. Although there is no immediate crisis, it is clear that something needs to happen sometime if the company wants to continue its activities in the coming decades.

Changing the odds

A couple of years ago, I secretly started playing with the fantasy of replacing Extended Pascal with D. Even though D’s syntax is somewhat different from Pascal, it shares at least four important similarities: support for nested functions, boundary checking, modules, and compilation speed. In addition, it has many traits that make the language attractive to engineers: good focus on performance and numerics, garbage collection, dynamic arrays, easy parallelization, understandable templates, contract programming, memory safety, unit tests, and even wysiwyg strings and formatted numerals. D’s language features encourage experimentation, which resonates well with engineers.

So I wondered what I could do to highlight D’s significance to my employer and show it’s an attractive language to switch to. I thought I could make a compelling case if I could write a parser in D that would take Extended Pascal source and transpile it to D source. At least I would have fun trying!

So I went over to code.dlang.org to see if there were any D alternatives to flex and bison. There, I found Pegged, and instantly the fun began. Pegged combines the functionality of flex and bison in one incredibly easy to use package, for which its creator Philippe Sigaud obviously enjoyed writing excellent documentation. Nowadays, Pegged is part of the D language tour and you can try it out on-line without having to install a thing. The beauty is that the grammar from the Extended Pascal language specification maps almost linearly to the PEG from which Pegged generates the parser. For this it makes heavy use of D’s generic programming capabilities and compile-time function evaluation — it can generate a parser at compile time if you want it to!

However, it wasn’t smooth sailing all along. As I was testing D, I suddenly found myself being tested as well. I learned the hard way that there is a phenomenon called left-recursion, from which a PEG parser typically cannot break out of. And the Extended Pascal grammar is left-recursive in several ways. Consequently, I spent many evenings and weekends researching parsing theory, until eventually I managed to extend Pegged with support for all kinds of left-recursion! From one thing came another, and I added longest match alternation, case insensitive literals, the toHTML() method for dynamically browsing the syntax tree, and a tracer for logging the parsing process.

Obviously, I was having fun. But more importantly, I was demonstrating that the D programming language is accessible enough that a naval architect can understand other people’s code and expand it in non-trivial ways. The icing on the cake came when I was asked to present my experiences at DConf 2017 in Berlin, which you can watch here (and here’s the extra bit I presented at lunch time for the livestream audience).

At this time, I was able to automatically translate the following trivial example:

program hello(output);

begin
    writeln('Hello D''s "World"!');
end.

into D:

import std.stdio;

// Program name: hello
void main(string[] args)
{
    writeln("Hello D's \"World\"!");
}

Language competition

Having come this far, the founder of SARC agreed that it was time to investigate the merits of various alternative programming languages. We would do a thorough and objective comparison based on trial translations of a comprehensive set of language features. Due to the amount of manual labor that this requires, we had to drastically prune the space of programming languages in an initial review round. Note that what I am about to present does not declare which programming language is the best in our industry. What we are looking for is a language that allows an efficient transition from Extended Pascal without interrupting our business, and which enables us to take advantage of modern insights and tools.

In the initial review round we looked at general language characteristics. Here I’ll just highlight what fell through the sieve and why.

Performance is important to us, which is why we did not consider interpreted languages. C++ is in use for one component of our software, but that was written from the ground up. We feel that the options for translation are not favorable, that its long compile times are a serious hindrance to productivity, and that there are too many ways in which one can shoot one’s self in the foot. We cannot require our expert naval architects to also become experts in C++.

Nowadays, whenever D is publicly evaluated, the younger languages Go and Rust are often brought up as alternatives. Here, we need not go into an in-depth comparison of these languages because both Rust and Go lack one feature that we rely on heavily: nested functions with access to variables in their enclosing scope. Solutions for eliminating nested functions, like bringing them into global scope and passing extra variables, or breaking files up into smaller modules, we find unattractive because it would complicate automated translation, and we’d like to preserve the structure and style of our current code. GNU C does offer nested functions, but it is a non-standard extension and it has been predicted that many will move away from C due to its unsafe features. After this initial pruning, three languages remained on our shortlist: Free Pascal, Ada and D.

As a basis for our detailed comparison, we wrote fifteen small programs that each used a specific feature of Extended Pascal that is important in our current code base. We then translated those programs into each language on our shortlist. We kept a simple score board on how well these features were represented in each language: +1 if the feature is supported or can be implemented, 0 if the lack of the feature can be worked around, and -1 if it can’t. This is what came out of that evaluation:

Test Free Pascal Ada D
Arrays beginning at arbitrary indices +1 +1 +1
Sets 0 0 +1
Schema types 0 0 +1
Types with custom initial values -1 0 +1
Classes +1 +1 +1
Casts +1 +1 +1
Protection against use of dangling pointers -1 +1 +1
Thread safe memory [de]allocation +1 +1 +1
Calling into Windows API +1 +1 +1
Forwarding Windows callbacks to nested functions +1 +1 +1
Speed of calculations +1 +1 +1
Calling procedures written in assembly +1 0 +1
Calling procedures in a DLL +1 +1 +1
Binary compatibility of strings 0 +1 +1
Binary compatible file i/o -1 0 0
Score 6 10 14

So, Free Pascal is the only candidate with negative scores, Ada positions itself in the middle, and D achieves an almost perfect score. Not effortlessly, though; we’ll talk about some of the technical challenges later. Because Free Pascal is, like D, fully Open Source and written in itself, extending the language and filling in the gaps is theoretically possible. Although some of its deficiencies could certainly be resolved that way, others would be quite complicated and/or unlikely to be accepted upstream.

We also estimated the productivity of the languages. Free Pascal scored high because it is closest to what we are used to. Despite its dissimilar syntax, D scored high because of its expressiveness and flexibility. Ada scored lowest because of its rigidity and because of the extra work the programmer has to put in (most importantly casts and conversions). Ada is more verbose than Pascal which was disliked by some of us because it can somewhat obscure the essence of what a piece of code tries to express, and frequently the code became not only verbose but cryptic, which was unanimously disliked.

Third, we estimated the future prospects and the advantages each language could bring to the table. Although Free Pascal has a more active community than we expected it to have, we do not see great potential for growth. Ada is renowned for its support for writing reliable code (although it has no monopoly in that field) but it does come at a cost and requires real effort. D has a dynamic and open community, supports both script-like productivity and high performance, includes various features for writing reliable software (approaching Ada but at a much lower cost), and offers some unique advanced features with which wonders can be accomplished.

Finally, we estimated the effort of translation. Although Free Pascal is very similar to Extended Pascal, missing features pose a real problem and would require a high degree of manual translation and rewriting. Although p2ada exists, it only works partially in our case and does not fully support Extended Pascal. Because Ada frequently requires additional code (casting to the correct type, pulling in a package, instantiating a generic type, adding a pragma, splitting up Put_Lines etc.), writing or extending a reliable transpiler into Ada would be more difficult than doing the same into D.

Selecting a winner

I gave away the winner in the title, but we landed at that conclusion as follows. Ada was the first language to be dropped. We really felt that the extra work that the programmer has to put in is a brake on productivity and creativity. Although it barely played a role in our evaluation, illustrative is the difference between the Ada and D equivalents to the Expressive C++17 Challenge. The D solution is both concise and expressive, the Ada solution is hardly expressive and consists of more lines than I want to write or read. Also of secondary importance, but difficult to ignore, is the difference between the communities surrounding the languages, which in Ada’s case is AdaCore Support, who has no problems demanding annual five-figure subscription fees.

Although akin to our current language, Free Pascal was mainly dropped due to its porting challenges and our estimation that its potential is lower and its future outlook is less optimistic than that of D. If we were to choose Free Pascal, we would basically invest a lot of effort only to arrive at a technological solution that we felt would be of lower quality than we currently have.

And that’s were I saw a dream come true: A clap on the table by the company founder and it was decided to commit to the effort of bringing twenty-five years worth of Extended Pascal code to D!

What makes a difference

In short, my experience is that if a feature is not present in the language, D is powerful enough that the feature can be implemented in a library. Translating each sample program by hand has really helped to focus on replicating functionality, leaving the translation process for later concern. This has led to writing a compatibility library with types and functions that are vital for the conversion. Now that equivalents are known and the parser is done, I just have to implement code generation.

Below follows another example that currently translates automatically and executes identically. It iterates over a fixed length array running from 2 to 20 inclusive, fills it with values, prints the memory footprint and writes it to binary file:

program arraybase(input,output);

type t = array[2..20] of integer;
var a : t;
    n : integer;
    f : bindable file of t;

begin
  for n := 2 to 20 do
    a[n] := n;
  writeln('Size of t in bytes is ',sizeof(a):1); { 76 }
  if openwrite(f,'array.dat') then
    begin
      write(f,a);
      close(f);
    end;
end.

Transpiled to D (or should I say Dascal?) and post-processed by dfmt to fix up formatting:

import epcompat;
import std.stdio;

// Program name: arraybase
alias t = StaticArray!(int, 2, 20);

t a;
int n;
Bindable!t f;

void main(string[] args)
{
    for (n = 2; n <= 20; n++)
        a[n] = n;
    writeln("Size of t in bytes is ", a.sizeof); // 76
    if (openwrite(f, "array.dat"))
    {
        epcompat.write(f, a);
        close(f);
    }
}

Of course this is by no means idiomatic D, but the fact that it is recognizable and readable is nice, especially for my colleagues who will have to go through an unusual transition. By the way, did you notice that code comments are preserved?

One very-nice-to-have feature is binary file compatibility; In fact it may have been the killer feature, without which D might not have been so victorious. The case is that whenever a persistent data structure is extended in our software, we make sure that we can still read and convert that structure from its prior format. That way, if a client pulls out an old design from its archives and runs it through our current software, it will still work without the user even being aware that conversion occurs, possibly in multiple steps. Not having to give up that ability is very attractive.

But it wasn’t easy to get there. The main difficulty is the difference in how strings are represented in D and the Prospero implementation of Extended Pascal, in memory and on file. This presented the challenge of how to preserve binary compatibility in file I/O with data structures that contain string members.

Strings

In Prospero Extended Pascal, strings are implemented as a schema type, which is a parameterized type that can be used in the following ways:

type string80 = string(80);
var str1 : string80;
    str2 : string(60);
procedure foo(s : string);

This defines string80 to be an alias for a string type discriminated to have a capacity of 80 characters. Discriminated string variables, like str1 and str2, can be passed to functions and procedures that take undiscriminated strings as arguments, like foo, which thereby work on strings of any capacity. In memory, str1 is laid out as a sequence of 80 chars, followed by a ushort that encodes the length of the string. I say encodes because a shorter string is padded with \0s up to the capacity and the ushort actually contains the length of that padding. This way, when a pointer to the string is passed to a C function and the contents of the string occupy its full capacity, the 0 in the padding length doubles as the terminating \0 of the C string.

My first thought was to mimic this data representation with a D template. But that would require procedures like foo to be turned into templates as well, which would escalate horribly into template bloat, a problem with multiple string arguments and argument ordering, and would complicate translation. Besides, schema types can also be discriminated at run time, which does not translate to a template.

Could some sort of inheritance scheme be the solution? Not really, because instances of D classes live on the heap, so a string embedded in a struct would just be a pointer instead of the char array and ushort.

But binary layout is actually only relevant in files, and in a stroke of insight I realized that this must be why user-defined attributes, or UDAs, exist. If I annotate the string with the correct capacity for file I/O, then I can just use native D strings everywhere, which genuinely must be the best possible translation and solves the function argument issue. Annotation can be done with an instance of a struct like

struct EPString
{
    ushort capacity;
}

The above Pascal snippet then translates to D like so:

@EPString(80) struct string80 { string _; alias _ this; }
string80 str1;
@EPString(60) string str2;
void foo(string s);

Notice how the string80 alias is translated into the slightly convoluted struct instead of a normal D alias, which would have looked like

@EPString(80) alias string80 = string;
</code>

Although that compiles, there is no way to retrieve the UDA in that case because plain alias does not introduce a symbol. Then hasUDA!(typeof(str1), EPString) would have been equivalent to hasUDA!(string, EPString) which evaluates to false. By using the struct, string80 is a symbol so typeof(str1) gives string80, and hasUDA!(string80, EPString) evaluates to true in this example.

There is one side effect that we will have to learn to accept, and that is that taking a slice of a string does not produce the same result in D as it does in Extended Pascal. That is because string indices start at 1 in Extended Pascal and at 0 in D. My strategy is to eliminate slices from the source and replace them with a call to the standard substr function, which I can implement with index correction. Finding all string slices can be accomplished with a switch in the transpiler that makes it insert a static if to test if the slice is being taken on a string, and abort compilation if it is. (Arrays are transpiled into a custom array type that handles slices and indices compatibly with Extended Pascal.)

Binary compatible file I/O

Now, to write structs to file and handle any embedded @EPString()-annotated strings specially, we can use compile-time introspection in an overload to toFile that acts on structs as shown below. I have left out handling of aliased strings for clarity, as well as shortstring, which is a legacy string type with yet a different binary format.

void toFile(S)(S s, File f) if (is(S == struct))
{
    import std.traits;
    static if (!hasIndirections!S)
        f.lockingBinaryWriter.put(s);
    else
        // TODO unions
        foreach(field; FieldNameTuple!S)
        {
            // If the member has itself a toFile method, call it.
            static if (hasMember!(typeof(__traits(getMember, s, field)), "toFile") &&
                       __traits(compiles, __traits(getMember, s, field).toFile(f)))
                __traits(getMember, s, field).toFile(f);
            // If the member is a struct, recurse.
            else static if (is(typeof(__traits(getMember, s, field)) == struct))
                toFile(__traits(getMember, s, field), f);
            // Treat strings specially.
            else static if (is(typeof(__traits(getMember, s, field)) == string))
            {
                // Look for a UDA on the member string.
                static if (hasUDA!(__traits(getMember, s, field), EPString))
                {
                    enum capacity = getUDAs!(__traits(getMember, s, field), EPString)[0].capacity;
                    static assert(capacity > 0);
                    writeAsEPString(__traits(getMember, s, field), capacity, f);
                }
                else static assert(false, `Need an @EPString(n) in front of ` ~ fullyQualifiedName!S ~ `.` ~ field );
            }
            // Just write other data members.
            else static if(!isFunction!(__traits(getMember, s, field)))
                f.lockingBinaryWriter.put(__traits(getMember, s, field));
        }
}

At the time of writing, I still have work to do for unions, which are used in the translation of variant records (including considering the use of one of the seven existing library solutions 1, 2, 3, 4, 5, 6, 7).

Currently, detecting unions is a bit involved . Also, there is a complication in the determination of the size of a union when the largest variant contains strings: the D version of that variant may not be the largest because D strings are just slices. I’ll probably work around this by adding a dummy variant that is a fixed size array of bytes to force the size of the union to be compatible with Extended Pascal. This is the reason why D scored a mere 0 in file format compatibility. It is amazing what D allows you to do though, so I may be able to do all of that automatically and award D a perfect score retroactively. On the other hand, it is probably easiest to just add the dummy variant in the Pascal source at the few places where it matters and be done with it.

The way forward

Obviously, this is long term planning. It has taken years to grow into D; it will possibly take a year, and probably longer, to migrate to D. Unless others turn up who are in the same boat as us (please contribute!) it’ll be me who has to row this ship to D-land and I still have my regular duties to attend to. My colleagues will continue to develop in Extended Pascal as usual, and once my transpiler is able to translate all or almost all of it, we will make the switch to D overnight. From then on, we’ll be in it for the long run. We trust to be with D and D to be with us for decades to come!

Driving Continuous Improvement in D

Jack Stouffer is a member of the Phobos team and a contributor to dlang.org. You can check out more of his writing on his blog.


In my previous article, I went over the techniques we use in the D Standard Library (a.k.a Phobos) to develop a wide variety of testing mechanisms. I also briefly mentioned our style checker, dscanner. In this article, I’ll detail how we use dscanner to prevent style, documentation, and best practices regressions in the code, none of which can be covered by standard unit tests.

Keeping a level of quality in software projects is a neverending battle. Bad practices and shortsighted design decisions make their way into code over time, whether from poor oversight, rushing things through, or simple code rot. There are three things we do in Phobos, other than tests, to fight this entropy.

First, Pull Requests are required to be about one thing and one thing only. What counts as “one thing” is subjective, but, for example, a PR to fix a bug in a specific piece of code mustn’t also edit the documentation of that function. This allows Phobos reviewers to merge the documentation change quickly if there’s an issue in the bug fix preventing it from being merged (or vice versa). At the same time, it keeps the reviewers focused on one set of changes.

Second, PRs need to be small; ideally less than 100 lines of code changed. If that’s not possible, they need to be broken into multiple commits of smaller changes. This really helps reviewers keep D best practices in mind, while also fully understanding and internalizing the new code.

Third, we continuously improve the existing code with dscanner, which, among other things, is D’s official linting tool.

What dscanner Can Do

As an example of the checks that dscanner can provide, let’s take a look at some code that contains a very hard-to-spot bug. The following code creates a type that mimics an int, but allows a null state:

struct NullableInt
{
    private int value;
    bool isNull;

    int get()
    {
        assert(!isNull, "can't get a null");
        return value;
    }

    void nullify() { isNull = true; }

    bool opEquals(T rhs) // D's equality overloading function
    {
        if (isNull && rhs.isNull)
            return true;
        if (isNull != rhs.isNull)
            return false;
        return value == rhs.value;
    }
}

The bug is not in the code that’s there, it’s in the code that’s not there. All structs in D have default versions of the standard operator overloading functions if they aren’t defined by the user. One of those functions provides a hash to represent the value to use in D’s built-in associative arrays. The default version uses all the type fields to make the hash, which is a problem for NullableInt, as we’ve decided that all instances of the type that are null are equal. Here’s an illustration of the bug:

void main()
{
    auto a = NullableInt(10, false);
    auto b = NullableInt(10, false);
    auto c = NullableInt(10, true);
    auto d = NullableInt(0, true);

    assert(a == b);
    assert(b != c);
    assert(c == d);

    import core.internal.hash : hashOf; // D's internal hashing function
    assert(c.hashOf != d.hashOf);
}

dscanner will emit a message every time it finds a type that defines a custom opEquals, but doesn’t define a custom toHash as well, alerting us to this bug.

Continuous Improvement and “Ratcheting” Quality

dscanner ties into continuous improvement, the philosophy that improvements to processes or designs should be made in a periodic, timely manner, rather than as one-time “breakthroughs”. Such large breakthrough changes tend to be pushed back infinitely; in a phrase, it’s letting the perfect be the enemy of the good. This easily fits with the continuous delivery or rapid release philosophies, and can vastly reduce bugs in software given enough time.

By using the warnings from dscanner, we can ratchet Phobos’s quality over time. If you’ve never used a socket wrench, a ratchet is a mechanism that allows the user to freely move the wrench back and forth, but allows the bolt to spin only when the wrench moves clockwise. Similarly, we can move the quality of Phobos forward, while not letting it ever slip backwards, with dscanner. It works as follows:

  • Run dscanner with every check but one turned off on all files.
  • Populate a black list for that check in dscanner’s configuration file containing each of the files that issued a warning.
  • Repeat until the current code passes with all checks turned on with the black-list enabled.

Once we have our list-of-blacklists, new PRs will trigger warnings for issues detected by dscanner for files that weren’t included in the blacklists, thereby keeping the status quo of quality.

Next, we ratchet quality by making a PR that fixes one issue in one file, and then removing that blacklist entry from the configuration. This way, that file will be checked in the future for every new PR. Over time, quality issues will be removed from Phobos, and they won’t crop up again. For example, from the release of 2.076 to now, Phobos has gone from 624 public symbols without examples, to 328.

Currently, we’re using this ratchet technique with dscanner to

  • remove unused variables.
  • add const or immutable to variables that aren’t modified after their construction.
  • make sure every public symbol is documented.
  • make sure every symbol in Phobos that has documentation, has a code example in that documentation.
  • force every assert to have an error message to print in case it fails.
  • make sure only Exceptions, and not Errors, are caught in try/catch blocks.

among other things.

This process also has the added benefit of dogfooding dscanner, which helps us find bugs and know which features would be helpful to add from a user perspective. If you have a project that isn’t using a linting tool as part of your test suite, it’s only a matter of time before code rot creeps in.

The #dbugfix Campaign: Round 1 Report

The D Foundation released version 2.080.0 of DMD on May 2nd. That normally would have been accompanied by a blog post highlighting some of the items from the changelog. This time, it should also have included the first update on the #dbugfix campaign. I was on vacation with my wife in the days before and after DConf and never got around to writing the post. It’s a bit late to announce the DMD release now, so this post is instead all about the first round of #dbugfix.

I admit, my initial expectations of participation were low. I suppose I didn’t want to be disappointed. Happily, in the end I had no reason to be pessimistic. There were several issues “nominated” on Twitter and in the forums. I was also pleased that some of the forum posts led to substantial discussion. In fact, at least one of the forum threads led to one of the nominated bugs being fixed well before the period ended, and the fix was included in the 2.080 release.

The results

The following issues were nominated, but were fixed before the nomination period ended. #5227 is the only one I’m certain was fixed as a result of discussion from the #dbugfix campaign. The others might have been, or they may have been fixed in the normal course of activity.

The following issues were open at the time the bug fixers selected which two issues to focus on, along with the “vote” count (+1’s, retweets, or anything which indicated support for fixing the issue) as best as I could determine:

The mandate for the bug fixers is to select at least two of the top five. Of course there has to be some leeway here, given that some issues are extremely difficult to resolve, so they can look beyond the top five if they need to. In this case, given that all of the issues share only four distinct vote counts, the entirety of the list is in play anyway.

I got the bug fixers together before I headed off to my (spectacular) German vacation and asked them to come to a consensus by May 1st on two bugs to fix (I must note that they met their deadline, even though I did not). The two bugs selected were:

There are no time constraints on when these will be fixed. It is hoped that they will be fixed and merged in time for 2.081.0, but they could come in a later release. The point is, each of these is now a subject of attention.

Some details regarding a few of the issues not selected:

  • re: 15692 – a while back, Sebastian Wilzbach revived an old DIP for in-place struct initialization and is part way through an implementation.
  • re: 1983 – there’s an old PR waiting for a champion to revive it.
  • re: 18493 – Mike Franklin has a PR that’s waiting on clarification about whether or not nothrow should be implicit with -betterC.

Keep it going

Now that it’s done, I’m happy to declare the inaugural round of the #dbugfix campaign a success. I had hoped that asking for nominations in the forums would as an alternative to Twitter would spark discussions and am pleased to see that happened. So let’s build on this and keep it going.

The slate was wiped clean and the votes reset from the day I declared the first round finished on April 24th. I haven’t noticed any nominations since then (though I haven’t searched for any yet), but please keep them coming. If there’s an issue on the above list that’s not yet fixed but that you feel strongly about, nominate it again. Or nominate something not on the list that’s stuck in your craw. If you see a nomination on Twitter that you support, retweet it to add your vote. In the forums, give a +1 reply. Keep it going!

And while you’re nominating your issues, keep in mind that fixing bugs is only part of the story. If you really care about seeing issues fixed, please try your hand at reviewing pull requests. Contributors whose fixes languish in the PR queue are demotivated from contributing more, but without more eyes reviewing the PRs, the bandwidth to speed up the reviews isn’t there. More PR reviews beget more merges beget more closed issues beget more motivated contributors and satisfied users.

Barring a change in release schedule, DMD 2.081.0 should be released July 1. This round of nominations will close at the beginning of the last week of June. So tweet out or open a new forum thread for your #dbugfix nominee today. Please include both the #dbugfix hashtag and the bugzilla issue number in the tweet or post title, along with a link to the issue report. That will make my job considerably easier.

And review some PRs!

Thanks to Sebasitan Wilzbach and Mike Franklin for their help with this, particularly in keeping me updated with info about PR and DIPs!

LDC 1.8.0 Released

LDC, the D compiler using the LLVM backend, has been actively developed for going on a dozen years, as laid out by co-maintainer David Nadlinger in his DConf 2013 talk. It is considered one of two production compilers for D, along with GDC, which uses the gcc backend and has been accepted for inclusion into the gcc tree.

The LDC developers are proud to announce the release of version 1.8.0, following a short year and a half from the 1.0 release. This version integrates version 2.078.3 of the D front-end (see the DMD 2.078.0 changelog for the important front-end changes), which is itself written in D, making LDC one of the most prominent mixed D/C++ codebases. You can download LDC 1.8.0 and read about the major changes and bug fixes for this release at GitHub.

More platforms

Kai Nacke, the other LDC co-maintainer, talked at DConf 2016 about taking D everywhere, to every CPU architecture that LLVM supports and as many OS platforms as we can. LDC is a cross-compiler: the same program can compile code for different platforms, in contrast to DMD and GDC, where a different DMD/GDC binary is needed for each platform. Towards that end, this release continues the existing Android cross-compilation support, which was introduced with LDC 1.4. A native LDC compiler to both build and run D apps on your Android phone is also available for the Termux Android app. See the wiki page for instructions on how to use one of the desktop compilers or the native compiler to compile D for Android.

The LDC team has also been putting out LDC builds for ARM boards, such as the Raspberry Pi 3. Download the armhf build if you want to try it out. Finally, some developers have expressed interest in using D for microservices, which usually means running in an Alpine container. This release also comes with an Alpine build of LDC, using the just-merged Musl port by @yshui. This port is brand new. Please try it out and let us know how well it works.

Linking options – shared default libraries

LDC now makes it easier to link with the shared version of the default libraries (DRuntime and the standard library, called Phobos) through the -link-defaultlib-shared compiler flag. The change was paired with a rework of linking-related options. See the new help output:

Linking options:

-L= - Pass to the linker
-Xcc= - Pass to GCC/Clang for linking
-defaultlib=<lib1,lib2,…> - Default libraries to link with (overrides previous)
-disable-linker-strip-dead - Do not try to remove unused symbols during linking
-link-defaultlib-debug - Link with debug versions of default libraries
-link-defaultlib-shared - Link with shared versions of default libraries
-linker=<lld-link|lld|gold|bfd|…> - Linker to use
-mscrtlib=<libcmt[d]|msvcrt[d]> - MS C runtime library to link with
-static

Other new options

  • -plugin=... for compiling with LLVM-IR pass plugins, such as the AFLfuzz LLVM-mode plugin
  • -fprofile-{generate,use} for Profile-Guided Optimization (PGO) based on the LLVM IR code (instead of PGO based on the D abstract syntax tree)
  • -fxray-{instrument,instruction-threshold} for generating code for XRay instrumentation
  • -profile (LDMD2) and -fdmd-trace-functions (LDC2) to support DMD-style profiling of programs

Vanilla compiler-rt libraries

LDC uses LLVM’s compiler-rt runtime library for Profile-Guided Optimization (PGO), Address Sanitizer, and fuzzing. When PGO was first added to LDC 1.1.0, the required portion of compiler-rt was copied to LDC’s source repository. This made it easy to ship the correct version of the library with LDC, and make changes for LDC specifically. However, a copy was needed for each LLVM version that LDC supports (compiler-rt is compatible with only one LLVM version): the source of LDC 1.7.0 has 6 (!) copies of compiler-rt’s profile library.

For the introduction of ASan and libFuzzer in the official LDC binary packages, a different mechanism was used: when building LDC, we check whether the compiler-rt libraries are available in LLVM’s installation and, if so, copy them into LDC’s lib/ directory. To use the same mechanism for the PGO runtime library, we had to remove our additions to that library. Although the added functionality is rarely used, we didn’t want to just drop it. Instead, the functionality was turned into template-only code, such that it does not need to be compiled into the library (if the templated functionality is called by the user, the template’s code will be generated in the caller’s object file).

With this change, LDC no longer needs its own copy of compiler-rt’s profile library and all copies were removed from LDC’s source repository. LDC 1.8.0 ships with vanilla compiler-rt libraries. LDC’s users shouldn’t notice any difference, but for the LDC team it means less maintenance burden.

Onward

A compiler developer’s work is never done. With this release out the door, we march onward toward 1.9. Until then, start optimizing your D programs by downloading the pre-compiled LDC 1.8.0 binaries for Linux, Mac, Windows, Alpine, ARM, and Android, or build the compiler from source from our GitHub repository.

Thanks to LDC contributors Johan Engelen and Joakim for coauthoring this post.

DMD 2.079.0 Released

The D Language Foundation is happy to announce version 2.079.0 of DMD, the reference compiler for the D programming language. This latest version is available for download in multiple packages. The changelog details the changes and bugfixes that were the product of 78 contributors for this release.

It’s not always easy to choose which enhancements or changes from a release to highlight on the blog. What’s important to some will elicit a shrug from others. This time, there’s so much to choose from that my head is spinning. But two in particular stand out as having the potential to result in a significant impact on the D programming experience, especially for those who are new to the language.

No Visual Studio required

Although it has only a small entry in the changelog, this is a very big deal for programming in D on Windows: the Microsoft toolchain is no longer required to link 64-bit executables. The previous release made things easier by eliminating the need to configure the compiler; it now searches for a Visual Studio or Microsoft Build Tools installation when either -m32mscoff or -m64 are passed on the command line. This release goes much further.

DMD on Windows now ships with a set of platform libraries built from the MinGW definitions and a wrapper library for the VC 2010 C runtime (the changelog only mentions the installer, but this is all bundled in the zip package as well). When given the -m32mscoff or -m64 flags, if the compiler fails to find a Windows SDK installation (which comes installed with newer versions of Visual Studio – with older versions it must be installed separately), it will fallback on these libraries. Moreover, the compiler now ships with lld, the LLVM linker. If it fails to find the MS linker, this will be used instead (note, however, that the use of this linker is currently considered experimental).

So the 64-bit and 32-bit COFF output is now an out-of-the-box experience on Windows, as it has always been with the OMF output (-m32, which is the default). This should make things a whole lot easier for those coming to D without a C or C++ background on Windows, for some of whom the need to install and configure Visual Studio has been a source of pain.

Automatically compiled imports

Another trigger for some new D users, particularly those coming from a mostly Java background, has been the way imports are handled. Consider the venerable ‘Hello World’ example:

import std.stdio;

void main() {
    writeln("Hello, World!");
}

Someone coming to D for the first time from a language that automatically compiles imported modules could be forgiven for assuming that’s what’s happening here. Of course, that’s not the case. The std.stdio module is part of Phobos, the D standard library, which ships with the compiler as a precompiled library. When compiling an executable or shared library, the compiler passes it on to the linker along any generated object files.

The surprise comes when that same newcomer attempts to compile multiple files, such as:

// hellolib.d
module hellolib;
import std.stdio;

void sayHello() {
    writeln("Hello!");
}

// hello.d
import hellolib;

void main() {
    sayHello();
}

The common mistake is to do this, which results in a linker error about the missing sayHello symbol:

dmd hello.d

D compilers have never considered imported modules for compilation. Only source files passed on the command line are actually compiled. So the proper way to compile the above is like so:

dmd hello.d hellolib.d

The import statement informs the compiler which symbols are visible and accessible in the current compilation unit, not which source files should be compiled. In other words, during compilation, the compiler doesn’t care whether imported modules have already been compiled or are intended to be compiled. The user must explicitly pass either all source modules intended for compilation on the command line, or their precompiled object or library files for linking.

It’s not that adding support for compiling imported modules is impossible. It’s that doing so comes with some configuration issues that are unavoidable thanks to the link step. For example, you don’t want to compile imported modules from libFoo when you’re already linking with the libFoo static library. This is getting into the realm of build tools, and so the philosophy has been to leave it up to build tools to handle.

DMD 2.079.0 changes the game. Now, the above example can be compiled and linked like so:

dmd -i hello.d

The -i switch tells the compiler to treat imported modules as if they were passed on the command line. It can be limited to specific modules or packages by passing a module or package name, and the same can be excluded by preceding the name with a dash, e.g.:

dmd -i=foo -i=-foo.bar main.d

Here, any imported module whose fully-qualified name starts foo will be compiled, unless the name starts with foo.bar. By default, -i means to compile all imported modules except for those from Phobos and DRuntime, i.e.:

-i=-core -i=-std -i=-etc -i=-object

While this is no substitute for a full on build tool, it makes quick tests and programs with no complex configuration requirements much easier to compile.

The #dbugfix Campaign

On a related note, last month I announced the #dbugfix Campaign. The short of it is, if there’s a D Bugzilla issue you’d really like to see fixed, tweet the issue number along with #dbugfix, or, if you don’t have a Twitter account or you’d like to have a discussion about the issue, make a post in the General forum with the issue number and #dbugfix in the title. The core team will commit to fixing at least two of those issues for a subsequent compiler release.

Normally, I’ll collect the data for the two months between major compiler releases. For the initial batch, we’re going three months to give people time to get used to it. I anticipated it would be slow to catch on, and it seems I was right. There were a few issues tweeted and posted in the days after the announcement, but then it went quiet. So far, this is what we have:

DMD 2.080.0 is scheduled for release just as DConf 2018 kicks off. The cutoff date for consideration during this run will be the day the 2.080.0 beta is announced. That will give our bugfixers time to consider which bugs to work on. I’ll include the tally and the issues they select in the DMD release announcement, then they will work to get the fixes implemented and the PRs merged in a subsequent release (hopefully 2.081.0). When 2.080.0 is released, I’ll start collecting #dbugfix issues for the next cycle.

So if there’s an issue you want fixed that isn’t on that list above, put it out there with #dbugfix! Also, don’t be shy about retweeting #dbugfix issues or +1’ing them in the forums. This will add weight to the consideration of which ones to fix. And remember, include an issue number, otherwise it isn’t going to count!

Project Highlight: The D Community Hub

As has been stressed on this blog before, D is a community-driven language. Because of that, the ecosystem depends on the work of volunteers who are willing to contribute their time and open their projects to the community at large. From IDE and editor plugins to libraries in the DUB registry, it’s all about the efforts of people who are (usually) getting no monetary reward for their efforts.

There are some inherent downsides to that reality. Sometimes projects are abandoned. Sometimes they aren’t updated as frequently as users would like. This can become an issue for those who depend upon these projects, but it’s alleviated by the fact that most D projects are open source and their repositories are publicly available. To keep a project alive and up-to-date only requires more volunteers willing to pitch in.

That’s the motivation behind the D Community Hub (dlang-community) at GitHub. According to Sebastian Wilzbach, it started with Brian Schott’s popular tools used by several IDE and editor plugins:

There were maintenance issues with Brian’s (aka Hackerpilot) awesome projects. He has a full-time job and often could only respond to simple issues every few weeks. This meant that simple bug fix PRs sat in the queue for quite a while. For example, there was one case where the same PR to fix the Windows build script was submitted by three different people (there was no Windows CI at the time).

Brian’s projects weren’t the only ones that motivated the idea. Sebastian and Petar Kirov maintain the DLang Tour, and some of the projects they depend upon were either inactive or slow to update. However, Brian’s tools are widely used, so they started with him. Eventually, they convinced him to move some of his projects to the new organization and others followed.

Sebastian lays out the following benefits that have come from moving multiple projects from disparate developers under an umbrella group:

  • Common best policies (e.g. all repositories have GitHub branch protection)
  • No need to fork an inactive repository – work can be shared easily.
  • No dependence on a single person who might be busy or on vacation (this is especially important for swiftly pulling and releasing bug fixes )
  • One common location whenever updates are required (e.g. package bumps or deprecation fixes)
  • Many of the projects are enabled on the Project Tester (their test suite is run on every PR for the DMD, DRuntime, Phobos, Dub, and tools repositories to prevent regressions) – this is possible because many people have merge rights in case an improvement in the compiler finds critical bugs or deprecations are moved forward
  • Shared knowledge (e.g. all projects support “auto-merge” like the dlang repositories)
  • Automation with bots – Mark Rz (@skl131313) created a bot that automatically triggers update PRs whenever dependencies are updated (some of the projects in dlang-community still support builds with only git submodules and make)
  • Less overhead for automation with CIs (everyone can connect a repo to a third-party provider or restart a failing CI job)

It has also resulted in increased participation. For example, other D users have joined the group, and Sociomantic Labs (the D shop in Berlin that hosted the 2016 and 2017 editions of DConf) has taken over the release process for dfmt, Brian’s tool for formatting D source code.

There are currently 22 repositories in the dlang-community organization, including the following:

  • DCD (the D Completion Daemon) – an autocomplete program that is used by several D IDE and editor plugins
  • dfmt – a formatter for D source code, also used by many IDE and editor plugins
  • D-Scanner – a tool for analyzing D source code
  • dfix – a tool for automatically upgrading D source code
  • libparse – a library for lexing and parsing D source code
  • drepl – a DMD-based REPL for D
  • stdx-allocator – a frozen version of std.experimental.allocator (which is due for an overhaul)
  • containers – a set of containers backed by stdx.allocator to easily switch between different allocation strategies, with or without the GC
  • D-YAML – A YAML parser and emitter
  • harbored-mod – a documentation generator that supports both D’s built-in Ddoc syntax and Markdown

In addition to other D projects, there’s a repository set up specifically to discuss the dlang-community organization via GitHub issues, and repositories that contain artwork. If you decide to use any of these projects, the discussion repository is the place to ask for help when you need it.

Other projects may be added in the future. According to Sebastian, there are a few questions that form a set of loose criteria for inclusion under the dlang-community umbrella.

  • Is there enough interest from the general public so that it is “worth maintaining”?
  • Is there a similar library with active development out there?
  • Is at least one DLang community member competent for the domain covered by the project? If no, is there anyone who’s willing to fill the role?

Sebastian and the others are looking to add a few features over time. These include:

  • More automatic documentation builds
  • Automatic build of binaries on new tags (especially for Windows)
  • d-apt: Sociomantic is working on moving d-apt to GitHub and enabling full automatic CI builds for it.
  • dfmt: Leandro Lucarella / Sociomantic is introducing neptune and a proper release process

For anyone interested in joining the dlang-community organization, there are two options. If you are already a well-known participant in the D community, simply ping one of the existing members for merge rights. For anyone else, the best approach is to start contributing to one or more of the dlang-community projects to build up trust. At some point, frequent trustworthy contributors will be welcomed into the fold.

As for the current contributors, Sebasitian says:

There are many people working behind the scenes on the dlang-community libraries. A special thanks goes to the active reviewers who make it possible that your potential PR gets merged in a timely manner.

  • Basile Burg
  • Brian Schott
  • Jan Jurzitza
  • Leandro Lucarella
  • Martin Nowak
  • Petar Kirov
  • Richard Andrew Cattermole
  • skl131313
  • Stefan Koch

If you have or know of a D project that is suffering from a lack of attention, bringing it to the dlang-community might be the way to breathe new life into it. Don’t be shy in asking for help.

Project Highlight: BSDScheme

Last year, Phil Eaton started working on BSDScheme, a Scheme interpreter that he ultimately intends to support Scheme R7RS. In college, he had completed two compiler projects in C++ for two different courses. One was a Scheme to Forth compiler, the other an implementation of the Tiger language from Andrew Appel’s ‘Modern Compiler Implementation’ books.

I hadn’t really written a complete interpreter or compiler since then, but I’d been trying to get back into language implementation. I planned to write a Scheme interpreter that was at least bootstrapped from a compiled language so I could go through the traditional steps of lexing, parsing, optimizing, compiling, etc., and not just build a language of the meta-circular interpreter variety. I was spurred to action when my co-worker at Linode, Brian Steffens, wrote bshift, a compiler for a C-like language.

For his new project, he wanted to use something other than C++. Though he knows the language and likes some of the features, he overall finds it a “complicated mess”. So he started on BSDScheme using C, building generic ADTs via macros.

As he worked on the project, he referred to Brian’s bshift for inspiration. As it happens, bshift is implemented in D. Over time, he discovered that he “really liked the power and simplicity of D”. That eventually led him to drop C for D.

It was clear it would save me a ton of time implementing all the same data structures and flows one implements in almost every new C project. The combination of compile-time type checking, GC, generic ADT support, and great C interoperability was appealing.

He’s been developing the project on Mac and FreeBSD, using LDC, the LLVM-based D compiler. In that time, he has found a number of D features beneficial, but two stand out for him above the rest. The first is nested functions.

They’re a good step up from C, where nested functions are not part of the standard and only unofficially supported (in different ways) by different compilers. C++ has lambdas, but that’s not really the same thing. It is a helpful syntactic sugar used in BSDScheme for defining new functions with external context (the names of the parameters to bind).

As for the second, hold on to your seats: it’s the GC.

The existence of a standard GC is a big benefit of D over C and C++. Sure, you could use the Boehm GC, but how that works with threads is up to you to discover. It is not fun to do prototyping in a GC-less language because the amount of boilerplate distracts from the goals. People often say this when they’re referring to Python, Ruby, or Node, but D is not at all comparable for (among) a few reasons: 1) compile-time type-checking, 2) dead-simple C interop, 3) multi-processing support.

Spend some time in the D forums and you’ll often find newcomers from C and C++ who, unlike Phil, have a strong aversion to garbage collection and are actively seeking to avoid it. You’ll also find replies from long-time D coders who started out the same way, but eventually came to embrace the GC completely and learned how to make it work for them. The D GC can certainly be problematic for certain types of software, but it is a net win for others. This point is reiterated frequently in the GC series on this blog, which shows how to get the most out of the GC, profile it, and mitigate its impact if it does become a performance problem.

As Phil learned the language, he identified areas for improvement in the D documentation.

Certainly it is advantageous compared to the C preprocessor that there is not an entirely separate language for doing compile-time macros, but the behavior difference and transition guides are missing or poorly written. A comparison between D templates and C++ templates (in all their complexity) could also be a great source of explanation.

We’re always looking to improve the documentation and make it more friendly to newcomers of all backgrounds. The docs are often written by people who are already well-versed in the language and its ecosystem, making blind spots somewhat inevitable. Anyone in the process of learning D is welcome and encouraged to help improve both the Language and Library docs. In the top right corner of each page are two links: “Report a bug” and “Improve this page”. The first takes you to D’s bug tracker to report a documentation bug, the second allows anyone with a logged-in GitHub account to quickly fork dlang.org, edit the page online, and submit a pull request.

In addition to the ulitmate goal of supporting Scheme R7RS, Phil plans to add FFI support in order to allow BSDScheme to call D functions directly, as well as support for D threads and an LLVM-based backend.

Overall, he seems satisfied with his decision to move to D for the implementation.

I think D has been a good time investment. It is a very practical language with a lot of the necessary high-level aspects and libraries for modern development. In the future, I plan to dig more into the libraries and ecosystem for backend web systems. Furthermore, unlike with C or C++, so far I’d feel comfortable choosing D for projects where I am not the sole developer. This includes issues ranging from prospective ease of onboarding to long-term performance and maintainability.

A big thanks to Phil for taking the time to contribute to this post (be sure to check out his blog, too). We’re always happy to hear about new projects in D. BSDScheme is released under the three-clause BSD license, so it’s a great place to start for anyone looking for an interesting open-source D project to contribute to or learn from. Have fun!

DMD 2.078.0 Has Been Released

Another major release of DMD, this time 2.078.0, has been packaged and delivered in time for the new year. See the full changelog at dlang.org and download the compiler for your platform either from the main download page or the 2.078.0 release directory.

This release brings a number of quality-of-life improvements, fixing some minor annoyances and inconsistencies, three of which are targeted at smoothing out the experience of programming in D without DRuntime.

C runtime construction and destruction

D has included static constructors and destructors, both as aggregate type members and at module level, for most of its existence. The former are called in lexical order as DRuntime goes through its initialization routine, and the latter are called in reverse lexical order as the runtime shuts down. But when programming in an environment without DRuntime, such as when using the -betterC compiler switch, or using a stubbed-out runtime, static construction and destruction are not available.

DMD 2.078.0 brings static module construction and destruction to those environments in the form of two new pragmas, pragma(crt_constructor) and pragma(crt_destructor) respectively. The former causes any function to which it’s applied to be executed before the C main, and the latter after the C main, as in this example:

crun1.d

// Compile with:    dmd crun1.d
// Alternatively:   dmd -betterC crun1.d

import core.stdc.stdio;

// Each of the following functions should have
// C linkage (cdecl).
extern(C):

pragma(crt_constructor)
void init()
{
    puts("init");
}

pragma(crt_destructor)
void fini()
{
    puts("fini");
}

void main()
{
    puts("C main");
}

The compiler requires that any function annotated with the new pragmas be declared with the extern(C) linkage attribute. In this example, though it isn’t required, main is also declared as extern(C). The colon syntax on line 8 applies the attribute to every function that follows, up to the end of the module or until a new linkage attribute appears.

In a normal D program, the C main is the entry point for DRuntime and is generated by the compiler. When the C runtime calls the C main, the D runtime does its initialization, which includes starting up the GC, executing static constructors, gathering command-line arguments into a string array, and calling the application’s main function, a.k.a. D main.

When a D module’s main is annotated with extern(C), it essentially replaces DRuntime’s implementation, as the compiler will never generate a C main function for the runtime in that case. If -betterC is not supplied on the command line, or an alternative implementation is not provided, DRuntime itself is still available and can be manually initialized and terminated.

The example above is intended to clearly show that the crt_constructor pragma will cause init to execute before the C main and the crt_destructor causes fini to run after. This introduces new options for scenarios where DRuntime is unavailable. However, take away the extern(C) from main and the same execution order will print to the command line:

crun2.d

// Compile with:    dmd crun2.d

import core.stdc.stdio;

pragma(crt_constructor)
extern(C) void init()
{
    puts("init");
}

pragma(crt_destructor)
extern(C) void fini()
{
    puts("fini");
}

void main()
{
    import std.stdio : writeln;
    writeln("D main");
}

The difference is that the C main now belongs to DRuntime and our main is the D main. The execution order is: init, C main, D main, fini. This means init is effectively called before DRuntime is initialized and fini after it terminates. Because this example uses the DRuntime function writeln, it can’t be compiled with -betterC.

You may discover that writeln works if you import it at the top of the module and substitute it for puts in the example. However, always remember that even though DRuntime may be available, it’s not in a valid state when a crt_constructor and a crt_destructor are executed.

RAII for -betterC

One of the limitations in -betterC mode has been the absence of RAII. In normal D code, struct destructors are executed when an instance goes out of scope. This has always depended on DRuntime, and since the runtime isn’t available in -betterC mode, neither are struct destructors. With DMD 2.078.0, the are in the preceding sentence becomes were.

destruct.d

// Compile with:    dmd -betterC destruct.d

import core.stdc.stdio : puts;

struct DestroyMe
{
    ~this()
    {
        puts("Destruction complete.");
    }
}

extern(C) void main()
{
    DestroyMe d;
}

Interestingly, this is implemented in terms of try..finally, so a side-effect is that -betterC mode now supports try and finally blocks:

cleanup1.d

// Compile with:    dmd -betterC cleanup1.d

import core.stdc.stdlib,
       core.stdc.stdio;

extern(C) void main()
{
    int* ints;
    try
    {
        // acquire resources here
        ints = cast(int*)malloc(int.sizeof * 10);
        puts("Allocated!");
    }
    finally
    {
        // release resources here
        free(ints);
        puts("Freed!");
    }
}

Since D’s scope(exit) feature is also implemented in terms of try..finally, this is now possible in -betterC mode also:

cleanup2.d

// Compile with: dmd -betterC cleanup2.d

import core.stdc.stdlib,
       core.stdc.stdio;

extern(C) void main()
{
    auto ints1 = cast(int*)malloc(int.sizeof * 10);
    scope(exit)
    {
        puts("Freeing ints1!");
        free(ints1);
    }

    auto ints2 = cast(int*)malloc(int.sizeof * 10);
    scope(exit)
    {
        puts("Freeing ints2!");
        free(ints2);
    }
}

Note that exceptions are not implemented for -betterC mode, so there’s no catch, scope(success), or scope(failure).

Optional ModuleInfo

One of the seemingly obscure features dependent upon DRuntime is the ModuleInfo type. It’s a type that works quietly behind the scenes as one of the enabling mechanisms of reflection and most D programmers will likely never hear of it. That is, unless they start trying to stub out their own minimal runtime. That’s when linker errors start cropping up complaining about the missing ModuleInfo type, since the compiler will have generated an instance of it for each module in the program.

DMD 2.078.0 changes things up. The compiler is aware of the presence of the runtime implementation at compile time, so it can see whether or not the current implementation provides a ModuleInfo declaration. If it does, instances will be generated as appropriate. If it doesn’t, then the instances won’t be generated. This makes it just that much easier to stub out your own runtime, which is something you’d want to do if you were, say, writing a kernel in D.

Other notable changes

New users of DMD on Windows will now have an easier time getting a 64-bit environment set up. It’s still necessary to install the Microsoft build tools, but now DMD will detect the installation of either the Microsoft Build Tools package or Visual Studio at runtime when either -m64 or -m32mscoff is specified on the command line. Previously, configuration was handled automatically only by the installer; manual installs had to be configured manually.

DRuntime has been enhanced to allow more fine-grained control over unit tests. Of particular note is the --DRT-testmode flag which can be passed to any D executable. With the argument "run-main", the current default, any unit tests present will be run and then main will execute if they all pass; with "test-or-main", the planned default beginning with DMD 2.080.0, any unit tests present will run and the program will exit with a summary of the results, otherwise main will execute; with "test-only", main will not be executed, but test results will still be summarized if present.

Onward into 2018

This is the first DMD release of 2018. We can’t wait to see what the next 12 months bring for the D programming language community. From everyone at the D Language Foundation, we hope you have a very Happy New Year!

D’s Newfangled Name Mangling

Rainer Schuetze is the creator and maintainer of Visual D, the D plugin for Visual Studio. Recently, he implemented a new name mangling algorithm for the D frontend, which was released in DMD 2.077.0. In this post, he explains why it was needed and what it does.


What is symbol name mangling?

D embraces the separate compilation model that compiles D source code to object files and uses a linker to bind the object files to an executable binary file. This allows the reuse of precompiled object files and libraries, speeding up the build process. As the linker is usually one that’s also used for other languages with the same compilation model, e.g. C/C++ or Fortran, mixing object files from different languages is straightforward.

In an object file, a symbol name is assigned to each function or global variable, both when it is defined and when it is used via a call or access. The linker uses these names to connect references to definitions of the same name with only very bare knowledge about the symbol. For example, the symbol for this C function declaration,

extern(C) const(char)* find(int ch, const(char)* str);

does not tell the linker anything about function arguments or return type, as the C language uses the plain function name find as the symbol name (some platforms prepend a _ to the symbol). If you later change the order of the arguments to

extern(C) const(char)* find(const(char)* str, int ch);

but fail to update and recompile all source files that use the new declarartion, the linker will happily bind the resulting object files. In that case, the program is likely to crash, since a character passed to the function will be interpreted as a string pointer and vice versa.

D and C++ avoid this problem by adding more information to the symbol name, i.e. they encode into a symbol name the scope in which the symbol is defined, the function argument types, and the return type. Even if the linker does not interpret this information, linking fails with an undefined symbol error if the definitions used to build the object files don’t match. For example, the D function declaration

module test;
extern(D) const(char)* find(int ch, const(char)* str);

has a symbol name _D4test4findFiPxaZPxa, where _D is a prefix to identify the symbol as being generated from a D source symbol, 4test4find encodes the “fully qualified name” find in module test, and FiPxaZPxa describes the function type with an integer argument (designated by i) and the C-style string pointer type Pxa by just concatenating the encodings for argument types. Z terminates the function argument list and is followed by the encoding for the return type, again Pxa for a C-style string pointer. In contrast,

extern(D) const(char)* find(const(char)* str, int ch);

is encoded as _D4test4findFPxaiZPxa, making it a different symbol with the argument types reversed. The encoding ensures a normalized representation of types and scopes while also providing shorter symbols than minimal source code. This encoding is called “name mangling”.

Ed: Note that extern(C) and extern(D) are linkage attributes. When a function is declared in D without an explicit linkage attribute, extern(D) is the default.

In D, some function attributes are also mangled into the symbol name, e.g. @safe, nothrow, pure and @nogc. In theory, mangling could also cover parameter names, user defined attributes, or even contracts, but that is currently considered excessive.

Please note that even though name mangling can detect some mismatches in the binary interface of functions (i.e. how arguments are passed in registers or on the stack), it won’t catch every error; for example, structs, classes and other user defined types are mangled by name only, so that a change to their definition will still pass unnoticed by the linker.

The mangled name of a symbol is also available during compilation using the .mangleof property. This used to be exploited to provide type reflection of the symbol at compile time. This should no longer be necessary due to the introduction of new __traits that make this information accessible faster and more convenient, for example,

__traits(getLinkage,symbol);

or

__traits(getFunctionAttributes, symbol);

Thus, usage of .mangleof is not recommended except for debugging purposes.

When reversing the mangling process in the “demangler”, all the encoded information is kept to make it available to the user, but that does not always yield correct D syntax. The first definition above demangles as

const(char)* test.find(int, const(char)*)

i.e. the module name test is added to the function name.

Template symbols

The two definitions of find shown above can coexist in D and C++, so name mangling is not only a way to detect errors at link time but also a necessity to represent overloads. It should at least contain enough information to distinguish different overloads of the same scoped identifier.

This becomes even more obvious when considering templates that usually instantiate different functions or variable definitions for each argument type. In D, the template instantiation information is added to the qualified name of a symbol.

Consider expression templates, a common example of meta programming used for delayed evaluation of expressions:

module expr;
struct Mul(X,Y)
{
    X x;
    Y y;
}
struct Add(X,Y)
{
    X x;
    Y y;
}

auto mul(X,Y)(X x, Y y) { return Mul!(X,Y)(x, y); }
auto add(X,Y)(X x, Y y) { return Add!(X,Y)(x, y); }

A function template is lowered by the compiler to an eponymous template:

template mul(X, Y)
{
    auto mul(X x, Y y) { return Mul!(X,Y)(x, y); }
}

The template name is part of the qualified function name, expr.mul!(X,Y).mul, and the auto return type is inferred to be Mul!(X,Y). This causes the symbol to reference the types X and Y three times. The demangled mangled name of an instantiation with types double and float of this template is

expr.Mul!(double,float) expr.mul!(double,float).mul(double,float)

The mangling process of DMD before version 2.077 walks the abstract syntax tree of the declaration and emits the mangled representation of the types whenever it is hit. Now consider stacking operations, e.g.

auto square(X)(X x) { return mul(x, x); }

auto len = square("var");
pragma(msg, len.square.mangleof);
// S4expr66__T3MulTS4expr16__T3MulTAyaTAyaZ3MulTS4expr16__T3MulTAyaTAyaZ3MulZ3Mul

pragma(msg, typeof(len).mangleof.length);
pragma(msg, len.square.mangleof.length);
pragma(msg, len.square.square.mangleof.length);
pragma(msg, len.square.square.square.mangleof.length);
pragma(msg, len.square.square.square.square.mangleof.length);
pragma(msg, len.square.square.square.square.square.mangleof.length);
pragma(msg, len.square.square.square.square.square.square.mangleof.length);

With DMD 2.076 or earlier, this displays 28u, 78u, 179u, 381u, 785u, 1594u, 3212u, showing exponential growth of the mangled symbol name length even though the expression in the source code just grows linearly. This happens because types like Mul!(Mul!(string, string), Mul!(string, string)) are combined and the mangling repeats their full representation every time they are referenced.

Create a chain of 12 calls to square above and the symbol length increases to 207,114. Even worse, the resulting object file for COFF/64-bit is larger than 15 MB and the time to compile increases from 0.1 seconds to about 1 minute. Most of that time is spent generating code for functions only used at compile time.

Voldemort types returned from template functions can be similar, as they carry the function signature including the template arguments as part of the type name. These can also show a dramatic increase in build times without generating as much code as in the example.

Symbol compression to the rescue

In early 2016, a couple of attempts were made to shorten these long symbols:

  • cut off symbol names if they exceed a given threshold, but append a checksum of the full symbol instead. This was already done with an MD5 hash when emitting symbols for the DigitalMars C compiler tool chain as the OMF object file format does not allow symbols longer than 255 characters. The downside to this is that these symbols can no longer be demangled, so that symbols in linker messages cannot be translated back into human digestible names.
  • apply binary compression to the symbol name. Standard techniques use part of the full symbol name as a dictionary to encode repetitions within the name. This is usually done by encoding a position-length pair using characters outside the normal identifier set. Again, this is already in use when DMD tries to fit symbols into the OMF limit of 255 characters (before applying the MD5 hash trick), but it also has shown some disadvantages: when using characters above the ASCII range, this interferes with UTF8 encoded characters that are also allowed as symbol characters in the D language. It can also break linker output as the console might misinterpret it as a locale-specific character encoding. Avoiding this by applying a binary to ASCII conversion like base64 to the symbols would obfuscate the actual symbol name even more.
  • extend the mangling grammar by allowing references to entities already encoded. This is similar to binary compression, but does not need to encode match length as the entities have terminators embedded into the grammar. The most prominent entities are types. This is the road C++ has taken, as it is also affected by the issues described here. In C++, name mangling is not standardised by the language, but by the compiler or the platform. GNU g++ uses Itanium C++ ABI mangling, which does a pretty good job with C++ code similar to the example above or in the Voldemort issue. Even though Microsoft’s Visual C++ can encode recurring types as well, it still generates very long names because it limits the encoding to the first 10 types of an argument list.

The first attempts at applying the latter scheme to the mangling of D symbols showed disappointing results. As it turned out, these implementations missed a subtle detail of the mangler in the DMD front end at that time; it reused cached representations of mangled type names to combine them to more complex types. This fails to find repetitions of the types from which a cached type name was built.

This is where I stepped in to create a proof-of-concept version of the mangling without these omissions. Early results were promising, so I looked for more opportunities to reduce symbol length:

  • with fully qualified names always containing the package and module names of a symbol, identifiers tend to appear often in a mangled name.
  • qualified names are likely to come from the same module or package, so it would be nice to encode them as a single entity.

The unit tests of the Phobos runtime library are benchmark candidates, as they contain a lot of symbols for template-heavy code. At the given time there were 127,172 symbols found in the map file of the Windows build. These were the results of the different manglings:

back references max length average length
none 416133 369
types 2095 157
types+identifiers 1263 128
types+identifiers+qualified names 1114 117

(This has been measured with the implementation at the time, which is not exactly the same as the final mangling; it used special characters for the different back reference types, but this turned out not to be a good idea. The D mangling is supposed to be the same on all platforms, but these characters will have a special meaning to the linker on one of them.)

It’s rather simple in DMD to determine the identity of identifiers and types, as the latter are merged according to their mangling anyway. Qualified names and their associated symbols turned out to introduce a number of complications, though. Namely, mangleFunc in core.demangle allows building a mangled name of a function from a fully qualified name given as a string function argument and a type specified as a template argument. Implementing this for run-time usage requires copying the full mangling machinery and introspection capabilities of the compiler, which is unrealistic. Considering the limited benefit shown by the above Phobos statistics, the idea of encoding qualified names was dropped.

Here are some details about the new mangling:

  • Back references are now encoded by the character Q, followed by the relative position of the original appearance of the same identifier or type. These positions are encoded with respect to base 26, with the last digit encoded by a lowercase letter and the other digits encoded by an uppercase letter. That way, most back references are 2 or 3 characters long, 4 in extreme cases. Using a different encoding for the last digit allows determining the end of a number without looking at the next character. This helps to avoid ambiguities. (The Itanium C++ ABI mangling uses base 36 encoding by combining numbers and letters, but need a termination character _.)
  • Counting encodable entities as in the C++ mangling would result in slightly shorter mangled names, but needs the mangler to keep a dynamic list of respective positions. The current demangler is designed not to allocate as long as the supplied output buffer is large enough.
  • Relative positions are chosen instead of absolute positions to allow prepending the _D prefix without having to re-encode the symbol. Some platforms also prepend an additional underscore, for which the relative positions are agnostic.
  • The mangling grammar sometimes allows types and identifiers at the same position, so a demangler needs to distinguish between the two even if given by a back reference. That’s why a lookup to the referenced position is necessary to continue demangling; an identifier always starts with a number, while a type always starts with a letter.
  • Using Q for back references grabs the last free letter used to encode types, but there is at least one type defined in the mangling grammar that is not supposed to appear in a mangling anyway (namely TypeIdent), so it can be resurrected if the necessity appears.

For example, the expression template type shown above now mangles as

pragma(msg, len.square.mangleof);
// S4expr__T3MulTSQo__TQlTAyaTQeZQvTQtZQBb
//                ^^   ^^     ^^ ^^ ^^ ^^^ decode to:
//                |    |      |  |  |  |
//                |    |      |  |  |  +- 3Mul
//                |    |      |  |  +---- S4expr__T3MulTAyaTAyaZ3Mul
//                |    |      |  +------- 3Mul
//                |    |      +---------- Aya
//                |    +----------------- 3Mul
//                +---------------------- 4expr

with a length of 39 instead of 78 without back references. The resulting sizes are 23, 39, 57, 76, 95, 114, 133 showing linear growth. The chain of 12 calls to square shrinks from 207,114 characters to 247, i.e. by a factor of more than 800.

Implementing mangleFunc mentioned above for the mangling with back referencing identifiers still is not obvious; while the fully qualified name is not supposed to contain any types (e.g. as a struct template argument) identifiers in the mangled name can appear again in the function type. This was solved by extending the demangler to use “Design by Introspection” (DbI) (as coined by Andrei Alexandrescu):

  • make the Demangle struct a template that parameterizes on a struct that supplies a couple of hooks
    struct NoHooks {}  // supports: static bool parseLName(ref Demangle); ...
    private struct Demangle(Hooks = NoHooks)
    {
    Hooks hooks;
        // ...
        void parseLName()
        {
            static if(__traits(hasMember, Hooks, "parseLName"))
                if (hooks.parseLName(this))
                    return;
                // normal decode...
        }
    }
  • create a hook that replaces a reoccurring identifier with the appropriate back reference
    struct RemangleHooks
    {
        char[] result;
        size_t[const(char)[]] idpos;
        // ...
        bool parseLName(ref Demangler!RemangleHooks d)
        {
            // flush input so far to result[]
            if (d.front == 'Q')
            {
                // re-encode back reference...
            }
            else if (auto ppos = currentIdentifier in idpos)
            {
                // encode back reference to identifier at *ppos
            }
            else
            {
                idpos[currentIdentifier] = currentPos;
            }
            return true;
        }
    }
  • combine the qualified name and the type as before (core.demangle is still capable of decoding it) and run it through the hooked demangler
    char[] mangleFunc(FuncType)(const(char)[] qualifiedName)
    {
        const(char)mangledQualifiedName = encodeLNames(qualifiedName);
        const(char)mangled = mangledQualifiedName ~ FuncType.mangleof;
        auto d = Demangle!RemangleHooks(mangled, null);
        d.mute = true; // no demangled output
        d.parseMangledName();
        return d.hooks.result;
    }

Is the new mangling sound?

The back references encoded into the mangling extend the existing mangling. Unfortunately, the latter had ambiguities reported to the D issue tracking system, with more of these likely yet to be uncovered. The demangler in core.demangle rejected about 3% of the unmodified symbols from the Phobos unit tests, while 15% were demangled only partially.

It’s tough to verify the soundness of an addition to an already complex and fragile definition, as a change to the mangling would need an update to the tooling (demangler, debuggers). Anyway, it was a good opportunity to get rid of these, too.

So scrutiny of the existing definition was required. To do this mechanically, the mangling specification from the web site was converted into a grammar digestible by the bison parser generator. Bison can create LALR(1) parser tables, which basically means that, while scanning a mangled symbol, looking at a character and its successor is enough to determine whether the character adds to a previous entity or starts a new one. When conflicts are reported when processing a grammar, they might be resolvable with a larger context, but they can also hint at actual problems or undesirable complexity. Adding pseudo-tokens representing handcrafted parser control flow can avoid these conflicts.

This gist shows a grammar for the D mangling scheme without the back references. It still has a couple of conflicts when run through Bison, one of which was determined to be an actual ambiguity in the definition. Adding back references to
the grammar doesn’t add any conflicts.

In addition, core.demangle was fixed to work for all symbols but those exposing the known ambiguities.

Aftermath

Some of the implementations in std.traits used the mangling of a symbol to introspect compile-time properties, for example, to determine the linkage. This was done using a simplified demangler. With the introduction of back references, these
didn’t work any more except for simple symbol names. Using a solution as for core.mangleFunc is feasible, but can slow down compilation considerably as the demangling needs to be executed via CTFE. Fortunately, new __traits have been added which cover all information that can be found in the mangling.

While most users will not notice any changes to their programs other than smaller object and executable file sizes, the new mangling can be a breaking change to external tools like the linker or a debugger. These will continue to work, but until they are updated, be prepared to eventually see the new mangled names for a little while instead of the demangled ones.

Thanks to Mike Parker, Walter Bright and Steven Schveighoffer for review.

DMD 2.077.0 Released

The D Language Foundation is happy to announce DMD 2.077.0. This latest release of the reference compiler for the D programming language is available from the dlang.org Downloads page. Among the usual slate of bug and regression fixes, this release brings a couple of particulary beneficial enhancements that will have an immediate impact on some existing projects.

Cutting symbol bloat

Thanks to Rainer Schütze, the compiler now produces significantly smaller mangled names in situations where they had begun to get out of control, particularly in the case of IFTI (Implicit Function Template Instantiation) where Voldemort types are involved. That may call for a bit of a detour here.

The types that shall not be named

Voldemort types are perhaps one of D’s more interesting features. They look like this:

auto getHeWhoShallNotBeNamed() 
{
    struct NoName 
    {
        void castSpell() 
        {
            import std.stdio : writeln;
            writeln("Crucio!");
        }           
    }
    return NoName();
}

void main() 
{
    auto voldemort = getHeWhoShallNotBeNamed();
    voldemort.castSpell();
}

Here we have an auto function, a function for which the return type is inferred, returning an instance of a type declared inside the function. It’s possible to access public members on the instance even though its type can never be named outside of the function where it was declared. Coupled with type inference in variable declarations, it’s possible to store the returned instance and reuse it. This serves as an extra level of encapsulation where it’s desired.

In D, for any given API, as far as the world outside of a module is concerned, module private is the lowest level of encapsulation.

module foobar;

private struct Foo
{
    int x;
}

struct Bar 
{
    private int y;
    int z;
}

Here, the type Foo is module private. Bar is shown here for completeness, as those new to D are often surprised to learn that private members of an aggregate type are also module private (D’s equivalent of the C++ friend relationship). There is no keyword that indicates a lower level of encapsulation.

Sometimes you just may not want Foo to be visible to the entire module. While it’s true that anyone making a breaking change to Foo’s interface also has access to the parts of the module that break (which is the rationale behind module-private members), there are times when you may not want the entire module to have access to Foo at all. Voldemort types fill that role of hiding details not just from the world, but from the rest of the module.

The evil side of Voldemort types

One unforeseen consequence of Voldemort types that was first reported in mid–2016 was that, when used in templated functions, they caused a serious explosion in the size of the mangled function names (in some cases up to 1 MB!), making for some massive object files. There was a good bit of forum discussion on how to trim them down, with a number of ideas tossed around. Ultimately, Rainer Schütze took it on. His persistence has resulted in shorter mangled names all around, but the wins are particularly impressive when it comes to IFTI and Voldemort types. (Rainer is also the maintainer of Visual D, the D programming language plugin for Visual Studio)

D’s name-mangling scheme is detailed in the ABI documentation. The description of the new enhancement is in the section titled ‘Back references’.

Improved vectorization

D has long supported array operations such as element-wise addtion, multiplication, etc. For example:

int[] arr1 = [0, 1, 2];
int[] arr2 = [3, 4, 5];
int[3] arr3 = arr1[] + arr2[];
assert(arr3 == [3, 5, 7]);

In some cases, such operations could be vectorized. The reason it was some cases and not all cases is because dedicated assembly routines were used to achieve the vectorization and they weren’t implemented for every case.

With 2.077.0, that’s no longer true. Vectorization is now templated so that all array operations benefit. Any codebase out there using array operations that were not previously vectorized can expect a sizable performance increase for those operations thanks to the increased throughput (though whether an application benefits overall is of course context-dependent). How the benefit is received depends on the compiler being used. From the changelog:

For GDC/LDC the implementation relies on auto-vectorization, for DMD the implementation performs the vectorization itself. Support for vector operations with DMD is determined statically (-mcpu=native, -mcpu=avx2) to avoid binary bloat and the small test overhead. DMD enables SSE2 for 64-bit targets by default.

Note that the changelog initially showed -march instead of -mcpu in the quoted lines, and the updated version had not yet been posted when this announcement was published.

DMD’s implementation is implemented in terms of core.simd, which is also part of DRuntime’s public API.

The changelog also notes that there’s a potential for division performed on float arrays in existing code to see a performance decrease in exchange for an increase in precision.

The implementation no longer weakens floating point divisions (e.g. ary[] / scalar) to multiplication (ary[] * (1.0 / scalar)) as that may reduce precision. To preserve the higher performance of float multiplication when loss of precision is acceptable, use either -ffast-math with GDC/LDC or manually rewrite your code to multiply by (1.0 / scalar) for DMD.

Other assorted treats

Just the other day, someone asked in the forums if DMD supports reproducible builds. As of 2.077.0, the answer is affirmative. DMD now ensures that compilation is deterministic such that given the same source code and the same compiler version, the binaries produced will be identical. If this is important to you, be sure not to use any of the non-determistic lexer tokens (__DATE__, __TIME__, and __TIMESTAMP__) in your code.

DMD’s -betterC command line option gets some more love in this release. When it’s enabled, DRuntime is not available. Library authors can now use the predefined version D_BetterC to determine when that is the case so that, where it’s feasible, they can more conveniently support applications with and without the runtime. Also, the option’s behavior is now documented, so it’s no longer necessary to go to the forums or parse through search results to figure out what is and isn’t actually supported in BetterC mode.

The entire changelog is, as always, available at dlang.org.