Category Archives: Compilers & Tools

DMD 2.088.0 Released

Digital Mars logoThe newest DMD has rolled off the assembly line and is ready for download. A total of 58 contributors fixed 58 bugs and introduced 27 major changes to version 2.088.0 of the compiler.

I’m always looking for the big ticket items in a new DMD release to highlight on the blog, but this is a workaday release that isn’t showing off anything too shiny in the changleog. Much of it is run-of-the mill maintenance: deprecations, removals, and behavior adjustments. All of that is important, and we all welcome it, but it doesn’t make for great reading on the blog. That said, there are a handful of useful additions that I can point to, one of which actually is a big deal when it comes to C++ interop.

std::string and std::vector

Thanks to the work Manu Evans has been performing and advocating, C++ interoperability gets a big boost in this release with bindings to std::string and std::vector in the DRuntime modules core.stdcpp.string and core.stdcpp.vector, respectively.  There’s one caveat with the std::string binding that anyone intending to use it must be aware of.

When compiling on Linux, where DMD makes use of the GCC libraries and linker, there’s a compatibility issue when using the modern version of std::string which is compliant with C++11. It contains an interior pointer, which in D is both illegal and incompatible with move semantics. The work around is to pass -D_GLIBCXX_USE_CXX11_ABI=0 to g++ and compile your D application with -version=_GLIBCXX_USE_CXX98_ABI. This will be resolved in the future when work on move constructors in D is complete.

New Utilities

The language gets an interesting new compile-time trait in the form of getLocation. Given a symbol, this trait will return a tuple containing the file name, line number, and column number at which the symbol appears in the source code. This opens the door to more informative debug logging and error reporting beyond the functionality already available via __FILE__ and __LINE__. And I’m sure folks will find other uses for it.

The standard library utility module std.file, which provides a lot of convenience functions for working with files as a unit, now has the new function getAvailableDiskSpace. Give it a directory path on Windows, or the path to a directory or file on Posix, and it will give you the number of bytes available on that path.

Other News

The Symmetry Autumn of Code 2019 participants all have mentors now and they are hard at work laying out their milestones. Milestone 1 officially kicks off on September 15, after which we can expect to see weekly updates from the participants in the General forum.

Google Summer of Code 2019 has come to an end. Five of our students submitted their work at the end of August. You can find information about their projects and view their code submissions from our GSOC projects page. Congratulations to all who participated!

The D Language Foundation is currently in discussions to put some of the Human Resource Fund to use in finalizing LDC support for iOS and Android. Hopefully, I’ll have details to report on that front in the very near future. In the meantime, please help us raise the HR Fund even higher than it is now. There’s some important work waiting to be done that will require as much money as we can throw at it. You can donate any amount directly to the HR Fund Campaign or use the special campaign we set up to send $60 to the HR Fund and get a DConf 2019 t-shirt in return.

Speaking of t-shirts, thanks to everyone who has made a purchase in our DLang Swag Emporium. You’ve helped us raise over $77 so far, all of which will go to the General Fund. If you haven’t yet dropped in, what are you waiting for? We’ve got t-shirts, stickers, and coffee mugs, with updates coming soon. It’s an easy way to support our favorite programming language!

 

DMD 2.087.0 Released

Digital Mars logoThe latest release of the Digital Mars D compiler (DMD) is now available. Version 2.087.0 marks 44 closed Bugzilla issues and 22 major changes courtesy of 63 contributors. See the changelog for the details and related links. Visit the Digital Mars Downloads page to get the release package for your platform(s).

One of the changes in this release is the end of a transitional period regarding imports, another involves a certain compiler switch and the compilation of Phobos. There’s also something developers on Windows will find useful, and more options for documenting code with Ddoc.

Endings and beginnings

Once upon a time, two related compiler issues were reported in the D bug tracker, where they remained for years beyond measure (it was actually just shy of a decade). These bugs allowed symbols to sometimes be accessed inside a scope in which they weren’t supposed to be visible. Eventually, once the bugs were fixed, two switches were introduced to help users maintain their existing code: -transtion=import caused the code to compile under the old, incorrect behavior, and -transition=checkimport would report on all occurences of the erroneous behavior in a code base. Steven Schveighoffer did a write up about it all on his blog at the time, which is a good read for anyone interested in the details.

In DMD 2.087.0, the transitional period is over. The -transition=import and -transition=checkimports switches no longer have any effect. Henceforth, if you have any existing code you’ve been compiling with -transition=import, your code will break with the new release if you are still relying on the old buggy behavior.

As one period ends, another begins. A new deprecation will give you warnings if you are initializing immutable global data via a static constructor like so:

immutable int bar;
static this()
{
    bar = 42;
}

This behavior is deprecated. Static constructors and destructors are called once per thread. Given that immutable global data is implicitly shared across threads rather than being thread-local like normal D variables, data like bar would be overwritten every time a new thread is spawned. The fix is to ensure that the static constructor is also shared across threads:

immutable int bar;
shared static this()
{
    bar = 42;
}

Static constructors and destructors marked shared are invoked once per process rather than once per thread.

DIP 1000 and Phobos

DIP 1000 was the first D Improvement Proposal submitted after the DIP process was transformed from an informal, wiki-based approach, to a formal, managed approach with structured review periods. It proposed a feature called “Scoped Pointers” intended to “provide a mechanism to guarantee that a reference cannot escape lexical scope”. Unfortunately, the document itself remained in a sort of limbo as the proposed feature was implemented and evolved. Eventually, the implementation diverged from the proposal to such a degree that the DIP was marked as Superseded and retired. But not the feature!

The -preview=dip1000 flag has been available for some time now, but it has been a bit tricky to use given that the standard library could not be compiled with it. With DMD 2.087.0, that is true no more. Phobos now compiles with -preview=dip1000 and D programmers can now more easily make use of the feature.

Given that this is still in preview mode and hasn’t yet seen enough wide-spread use, please don’t be surprised if you uncover any bugs. Please do report any bugs you find to the issue tracker so that they can be squashed as soon as possible.

Explicitly choose the LLD linker on Windows

From the beginning, DMD shipped with self-contained support for 32-bit development on Windows. This was possible because Walter Bright made use of the existing platform libraries and linker (OPTLINK) that he was already shipping with his Digital Mars C and C++ compiler. Unfortunately, OPTLINK only supports the OMF object format, so the output of DMD on Windows was incompatible with the greater Windows ecosystem which is primarily built around the PE COFF output (see this PDF of the specification) of the Microsoft build tools. PE COFF, known as MSCOFF in the D universe, is a Microsoft-specific version of the COFF format.

Eventually, Walter added MSCOFF support to DMD for 64-bit (and later, 32-bit) development, but that required that developers have the Microsoft linker and platform libraries installed. In the past, that meant installing either Visual Studio or the Microsoft Build Tools package along with the Windows SDK. In recent years, installing Visual Studio Community edition would provide everything necessary. When compiling with -m64 or -m32mscoff, OPTLINK would be ignored in favor of the Microsoft linker.

With the release of DMD 2.079.0, the compiler began shipping with the LLVM linker (LLD), a set of platform libraries derived from those that ship with the MinGW compiler, and a wrapper library for the Visual C++ 2010 runtime. From that point forward, when given -m64 or -m32mscoff on Windows, the compiler would search for the Microsoft installation and, if not found, fallback on the bundled linker and libraries if they were installed. For the first time, DMD had self-contained 64-bit output on Windows. (Interestingly, it’s never been self-contained on other platforms, where DMD relies on the system linker and libraries, the presence of which is a given and not a source of complaint as the dependence on the MS linker has been.)

Now that the new set up has been put through its paces for a while and some of the kinks have been ironed out, DMD 2.087.0 makes it possible to explicitly select the bundled MSCOFF import libraries and LLD linker via the command line switch -mscrtlib=msvcrt100.

Markdown support in Ddoc

Ddoc was originally designed as a macro-based system for documenting source code, but its use has expanded beyond that scenario in the years since. Most sections of the dlang.org website are created with Ddoc, as is the dconf.org site. Ali Çehreli used it to write his book, Programming in D.

Support for Markdown-like syntax has been requested now and again in the forums. Now that’s available in DMD 2.087. It’s currently in preview mode, so it requires the -preview=markdown flag. There are some differences from the other flavors of Markdown you may be familiar with, so be sure to read the list of supported features before putting it to use.

Supporting the development of D

Much of the blood, sweat, and tears that went into this and every release of the compiler was provided by volunteers. They do the work they can within the constraints of their knowledge, skills, experience and, the mother of all limiters, time. There are some tasks that have yet to fit within the bounds of any volunteer’s constraints. These will require dedicated effort to strike off the list.

To that end, I’d like to remind everyone that we are raising money for a Human Resource Fund that we will use to bring in some folks specifically to tackle these difficult tasks. WekaIO seeded the fund with a generous contribution and a few of our community members have thrown some of their own resources into the pot with the gratitude of the D Language Foundation.

But we need more! We still have DConf 2019 t-shirts for anyone who wants to throw $60 at the HR fund, as well as DMan shirts and DConf 2020 registration discounts for those able and willing to donate more. See my recent blog post on the topic for details to make sure you donate to the correct campaign in order to get the shirt you want.

Another reminder: if you want to become a Gold Donor or Personal Sponsor on our Open Collective page (which is separate from the HR Fund – again, read my recent blog post to avoid confusion), please either ensure your email address is included in your profile or contact me directly to let me know who you are. Otherwise, I can’t send you a DMan shirt!

DStep 1.0.0

DStep is a tool for automatically generating D bindings for C and Objective-C libraries. This is implemented by processing C or Objective-C header files and outputting D modules. DStep uses the Clang compiler as a library (libclang) to process the header files.

Background

The first version of DStep was released on the 7th of July, 2012. There have been four subsequent releases, the last of which was on the 16th of January, 2016. Quite a lot has happened in the D world and with DStep since then.

After the release of DStep 0.2.1 in January 2016, there wasn’t much progress on DStep. I had a limited amount of time and chose to spend it on other projects. Fortunately, in 2016, DStep got picked as one of four D-related projects for Google Summer of Code (GSoC). The student who chose to work on DStep was Wojciech Szęszoł. He did a tremendous amount of work and pushed DStep forward by years compared to the time it would have taken me. In fact, I was often a blocker because I couldn’t keep up with reviewing all the changes he made.

New Release

The latest release of DStep contains a huge number of new features and bug fixes. A lot of the new features add support for translating C preprocessor macros in various forms. DStep also gained support for one more platform: Windows. Here follow some of the new features available in DStep 1.0.0:

Support for Simple Defines

This feature adds support for translating a simple form of #define to a manifest constant in D. Example:

#define FOO 1

The above C code is translated to the following D code:

enum FOO = 1;

DStep will try to translate the C code so that the D code looks as much as possible like the original C code. If a #define contains an expression instead of a single literal, DStep will try to preserve the original expression:

#define FOO 1 + 3

Instead of translating this to a manifest constant with the value of 4 (which would be semantically correct), DStep will preserve the original expression and translate it to:

enum FOO = 1 + 3;

This also goes for other types of literals, like hexadecimal literals:

#define FOO 0x1

Here DStep will preserve the hexadecimal literal and translate it to:

enum FOO = 0x1;

Function-Like Macros

DStep is now able to translate function-like macros. This is a pretty advanced feature that requires a small parser for the macros. DStep uses libclang to tokenize the macros and the parses them to be able to do the proper translations. The most basic example looks like:

#define FOO() 0 + 1

The above macro will translate to the following D code:

extern (D) int FOO()
{
    return 0 + 1;
}

Although not shown here (to minimize the examples), DStep will output extern (C): at the top of each file. Therefore, for macros translated to functions, DStep will add extern (D) to give the functions D linkage and mangling.

Here’s an example of a C macro containing parameters:

#define FOO(a, b) a + b

Unfortunately, in C, a and b can be basically anything. D doesn’t have an exact corresponding feature. DStep will translate this as accurately as
possible by outputting a templated function:

extern (D) auto FOO(T0, T1)(auto ref T0 a, auto ref T1 b)
{
    return a + b;
}

The assumption in this translation is that a and b will be a value of some kind of type. They can either be of the same type or of different types. To
avoid copying any of the values, ref parameters are used. Since an rvalue cannot be passed to a ref parameter, auto ref is used instead to properly handle both rvalues and lvalues.

More advanced expressions are supported as well:

#define BAR 4
#define FOO(a, b) a + 3 + (b + BAR) - sizeof(b)

In the above example there’s a combination of parameters, literals, parenthesized expression, usages of other macros, and built-in operators. DStep handles all those and translates it to:

enum BAR = 4;

extern (D) auto FOO(T0, T1)(auto ref T0 a, auto ref T1 b)
{
    return a + 3 + (b + BAR) - b.sizeof;
}

Again, the expression is preserved as closely as possible to the original source code. The parentheses, the reference to the BAR macro, all are preserved.

Token Concatenation

This feature adds support for translating the token concatenation, or token pasting, operator to a D string concatenation:

#define CONCAT(prefix, name) prefix ## name

The above function-like macro concatenates the two given tokens. DStep translates that to a function that converts the arguments to strings and concatenates the two resulting strings. This can then be used together with the string mixin statement to give the same behavior as in C.

extern (D) string CONCAT(T0, T1)(auto ref T0 prefix, auto ref T1 name)
{
    import std.conv : to;

    return to!string(prefix) ~ to!string(name);
}

Another example is parameters combined with tokens:

#define CONCAT(prefix) prefix ## name

This translates similarly to the previous example, but since name is not a parameter this will be translated to a string literal:

extern (D) string CONCAT(T)(auto ref T prefix)
{
    import std.conv : to;

    return to!string(prefix) ~ "name";
}

Preprocessor Constants in Array Sizes

DStep will now preserve preprocessor constants for the size of arrays:

#define Foo 3
int a[Foo];

In previous versions of DStep the translation would just output the size of the array a as 3. This would be semantically accurate but the generated source
code would look less like the original C source code. Now DStep is able to translate preprocessor constants and can, therefore, use the preprocessor constant as the size of the array:

enum Foo = 3;
extern __gshared int[Foo] a;

In the above example, the manifest constant Foo is used as the size of a instead of a plain 3. This more closely matches the original C source code.

Preserving Comments

In previous versions of DStep comments were completely stripped out. With this release DStep is able to preserve comments in the D code from the original C code:

// This comment describes this whole file

// Documentation for the symbol `foo`
void foo();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

int a; // this is `a`

In the above example there are three types of comments:

  • A header comment for the whole file
  • A preceding comment for the symbol foo
  • Loose comments not belonging to any symbol
  • A trailing comment for the symbol a

All of these comments are now properly preserved:

// This comment describes this whole file

extern (C):

// Documentation for the symbol `foo`
void foo ();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

extern __gshared int a; // this is `a`

In the above example, notice how the header comment is placed above the extern (C): line. If a module declaration is output in the D file (when the --package flag is used), the header comment will be placed above that as well:

// This comment describes this whole file

module bar.foo;

extern (C):

// Documentation for the symbol `foo`
void foo ();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

extern __gshared int a; // this is `a`

It’s also possible to disable the preservation of comments using the --comments=false flag.

Package Prefix

To better help organize bindings, DStep supports the
--package <name.of.package> flag. When this flag is enabled DStep will put all the translated modules in the <name> package. Note, this will only add the package prefix to the module declaration of the translated file. It will not place the output file in a directory corresponding to the package.

Removing Excessive Newlines

DStep will now remove excessive newlines but still preserve spacing for the original C code. This is best illustrated with an example:

int a;


int b;

In the above example there are two newlines between the declarations of a and b. DStep will remove the excessive newline and only output one to still preserve the spacing:

extern __gshared int a;

extern __gshared int b;

But if there is no spacing between the declarations, that is respected as well:

int a;
int b;

In the above example there is no newline between the declarations and DStep will preserve that:

extern __gshared int a;
extern __gshared int b;

Preserving Order of Declarations

In previous versions of DStep the declarations in the translated D code would follow a certain order defined by DStep, aliases first, then constants, then types and last functions. With this release, DStep will now preserve the order of the declarations of the original C code:

void bar();

struct Foo
{
    int a;
};

Previous versions would translate the above to:

struct Foo
{
    int a;
}

void bar ();

with the struct first and then the function declaration. With this release, the order is preserved:

void bar ();

struct Foo
{
    int a;
}

Multiple Input Files

Previous versions of DStep only allowed a single header file as input. With this release, multiple files can be passed to DStep at once. Each input file will produce one D source file as input. To pass multiple input files to DStep, just pass the filenames when invoking DStep.

$ dstep foo.h bar.h

Running the above command will produce two D source files: foo.d and bar.d.

If multiple input files and the -o flag are given, the -o flag specifies the output directory where the D source files will be placed. When multiple input
files are given it’s not possible to specify the names of the D source files.

$ dstep foo.h bar.h -o foobar
$ ls foobar
bar.d foo.d

The above command will place the two D source files in the directory foobar.

--reduce-aliases Flag

Normally when DStep translates a header file to a D module it will reduce aliases if possible. DStep contains a set of common typedefs that can be reduced to native D types. That means that code like this:

#include ;
int32_t a = 3;

Will be translated to the following D code:

extern __gshared int a;

In this release of DStep, there’s a new flag, --reduce-aliases. This flag allows the reduce aliases feature to be enabled or disabled. By default it’s enabled, but can be disabled by invoking DStep with the following command: dstep --reduce-aliases=false. When this feature is disabled, it will translate the above example to the following D code:

import core.stdc.stdint;
extern __gshared int32_t a;

It will keep the name int32_t as the type of the variable declaration and add the import for the module that contains the declaration of int32_t.

--alias-enum-members Flag

In C, enum members are accessible directly in the global scope. Example:

enum Foo
{
    foo,
    bar
};

enum Foo a = foo;

In D however, enum members need to be qualified with the enum name. The correct translation of the above would be:

enum Foo
{
    foo = 0,
    bar = 1
}


Foo a = Foo.foo;

In this release of DStep a new flag has been added, --alias-enum-members, that enables the generation of aliases to enum members at module scope. This will allow keeping the translation more closely to the original C code:

enum Foo
{
    foo = 0,
    bar = 1
}

alias foo = Foo.foo;
alias bar = Foo.bar;

By default this feature is not enabled for the translated D code to more closely follow the D conventions.

--translate-macros Flag

In this release, DStep can now translate several kinds of C macros to their equivalent in D. This might not always be desirable because the translations are not bulletproof. Therefore, there’s a new flag, --translate-macros, which will enable or disable the translation of macros. By default translation of macros is enabled.

libclang Bindings

DStep uses Clang as a library to process the C code. There two main ways of using Clang as a library. One is to use the C++ APIs directly. This will give full access to what Clang can do. The problem with this API is that is not a stable API. It’s also a C++ API and when DStep was first implemented, the C++ integration in D was quite lacking. One of my requirements when implementing DStep was to implement it in D. Therefore, the natural choice was to use the C API, called “libclang”, which is also provided. This library is the main interface intended to be used by editors and IDEs that want to leverage Clang as a library. It’s also recommended to use libclang when accessing Clang from a language other than C++—D in my case.

Since libclang is a C library only exposing C header files, I needed bindings to be able to use from it within D. Up until now these bindings were hand made. Fortunately, these headers are the most forgiving I have ever seen when it comes to translating into D. Only a single header file was needed containing hardly any macros at all. It was quite a quick job of translating these headers by using search-and-replace.

With this release of DStep, since DStep has improved so much, the bindings are now self-hosted. That is, DStep has been used to generate the bindings. In addition to that, generation of the bindings has now been added to the test suite to make sure it doesn’t break.

Support for Windows

DStep was originally developed on macOS since that is my main development platform. Thanks to the Posix standard it was easy to port to Linux as well. The first release of DStep was available for both macOS and Linux. Back in 2011 when the development of DStep first started, Clang did not support Windows. There seems to have been some support for MinGW, but that was not a target supported by DMD. The LLVM and Clang team has made huge progress since then and in 2016 when DStep got picked as a GSoC project, support for Windows was available, targeting compatibility with the Visual Studio compiler.

In this release, thanks to Wojciech Szęszoł, DStep is now available on Windows. Due to Clang being compatible with Visual Studio, DStep needs to be built to use the same object format. When compiling with DMD, that means compiling for 64-bit (via the -m64 flag) or using the -m32mscoff flag when compiling for 32 bit. The Dub package automatically takes care of this.

Continuous Integration/Deployment

In the area of CI/CD quite a few things have happened. Originally the test suite of DStep was implemented using Cucumber and Ruby. These tests were a form of end-to-end test and failed to take advantage of the reasons to use Cucumber in the first place. These tests have now been replaced with a combination of unit tests and end-to-end tests, all implemented in D.

LDC

LDC has been added as a supported compiler. That means that DStep is compiled with LDC in addition to DMD as part of the CI pipelines. Every commit and every pull request is now tested with LDC as well.

Upgrade of Compilers

Both DMD and LDC have been upgraded to their latest versions. In addition, beta and nightly releases are being tested in the CI pipelines. A scheduled job has been added to the CI pipelines as well, which will run once every day to make sure new releases of the compilers won’t break DStep even if no changes have been made to DStep. This also means that only the latest versions of LDC and DMD are supported for building DStep.

Testing Windows using AppVeyor

Since DStep now supports Windows as an additional platform a new CI pipeline has been added in the form of AppVeyor. This is a CI service that provides Windows as a platform to run builds on. This build run compiles DStep both using DMD and LDC and it also builds both 32-bit and 64-bit versions.

Complex Floating-Point Types

Another feature that is new in this version of DStep is that complex floating-point types are now supported. There are three complex types that are supported:
float _Complex, double _Complex and long double _Complex.

float _Complex a;
double _Complex b;
long double _Complex c;

The above code snippet in C is translated to the following D code:

extern __gshared cfloat a;
extern __gshared cdouble b;
extern __gshared creal c;

New Alias Syntax

Typedefs in C header files are translated to alias declarations in the D code. Up until this release they used the old alias syntax: alias oldName newName. Since the previous release of DStep the D language has improved and gained new features. One of them is a new (now considered the standard) alias syntax: alias newName = oldName. It’s easier to follow which name is the alias and which is the original when it’s using a more familiar syntax similar to variable declarations. Here’s an example of how the C code is translated into D code with the new alias syntax:

typedef int foo;

The above C code is translated to the following D code:

alias foo = int;

Custom Global Attributes

By default DStep doesn’t add any attributes like @nogc or nothrow to the translated code. In this release of DStep, support for attributes has been added. Custom global attributes can be enabled with the --global-attribute flag. For a C header file with the following content:

int a;

And invoking DStep with the following command:

$ dstep foo.h --global-attribute @nogc --global-attribute nothrow

Will output the following D code:

@nogc:
nothrow:

extern __gshared int a;

Rename Enums

Unlike in D, enums in C don’t create a new scope for their members, even if a name is given to the enum. Example:

enum
{
    a,
    b
};

enum Foo
{
    c,
    d
};

int e = a;
int f = c;

In the above example it’s possible to access the enum members from both of the enums without qualifying the type. In D, this is not the case for named enums. They require the qualifying the enum member with the type name:

enum
{
    a,
    b
}

enum Foo
{
    c,
    d
}

int e = a; // ok since the first enum is anonymous
int f = Foo.c; // need to qualifying the enum member with the type name

To reduce the risk of symbol conflict it’s quite common for C libraries to prefix enum members with the name of the type:

enum Foo
{
    FooC,
    FooD
};

While the following is perfectly fine in D as well, it gets a bit redundant and verbose to have to specify Foo twice:

enum Foo
{
    FooC,
    FooD
}

int c = Foo.FooC;
int d = Foo.FooD;

For this reason DStep now supports a new flag, --rename-enum-members, which when enabled will try to remove any prefix of the enum member names. Given the
following C header file:

enum Foo
{
    FooA,
    FooB
};

And running DStep as follows:

$ dstep foo.h --rename-enum-members

It will produce the following D code:

enum Foo
{
    a = 0,
    b = 1
}

DStep identified the Foo prefix and removed it from the enum member names. It also converted the names to lowercase to better match the standard D naming
conventions.

By default this feature is not enabled to more closely match the original C code.

Normalize Modules

When the --package flag is specified DStep will add a module declaration to all D modules. By default, it will use the name of the input file as the name of the module. In the C world there’s no direct file-naming convention. Some libraries will use all lowercase letters, some will use snake case, some will use camel case, some will use Pascal case, and so on.

The standard D naming convention for modules (and therefore files) is to use only lowercase letters and underscores, i.e snake case. To help with following this convention DStep now supports the new flag --normalize-modules. When this flag is enabled (and the --package flag is used) DStep will try to convert the name of the input file to a name matching the D conventions.

Given a C header file named Foo.h and only invoking DStep with the --package flag:

$ dstep Foo.h --package bar

DStep will produce the following D code:

module bar.Foo;

When the --normalize-modules flag is used as well:

$ dstep Foo.h --package bar --normalize-modules

DStep will output the following D code:

module bar.foo;

Note that Foo has been converted to foo.

Another example with a file using the Pascal naming convention:

$ dstep NSString.h --package bar --normalize-modules

And the result:

module bar.ns_string;

By default this feature is not enabled to more closely match the original C code.

Bit Fields

Another new feature that has been added in this release of DStep is support for bit fields. The bit field is a built-in language construct in C, but there’s no language support for it in D. Fortunately, with the help of D’s metaprogramming capabilities, the bit field has been implemented as a library construct and is available in the standard library [3]. The library construct will generate getters and setters that perform the same bit manipulation that
the C compiler would have generated.

The following snippet in C:

struct Foo
{
    unsigned int a : 1;
    unsigned int b : 2;
    unsigned int c : 5;
};

Is translated to D:

struct Foo
{
    import std.bitmanip : bitfields;

    mixin(bitfields!(
        uint, "a", 1,
        uint, "b", 2,
        uint, "c", 5));
}

The D translation makes use of the bitfields template from the standard library. It’s automatically imported, directly inside the struct, to minimize the scope of where the symbol is available.


In his day job, Jacob Carlborg is a DevOps engineer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and DVM, the topic of another post on this blog. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.

Project Highlight: DPP

D was designed from the beginning to be ABI compatible with C. Translate the declarations from a C header file into a D module and you can link directly with the corresponding C library or object files. The same is true in the other direction as long as the functions in the D code are annotated with the appropriate linkage attribute. These days, it’s possible to bind with C++ and even Objective-C.

Binding with C is easy, but can sometimes be a bit tedious, particularly when done by hand. I can speak to this personally as I originally implemented the Derelict collection of bindings by hand and, though I slapped together some automation when I ported it all over to its successor project, BindBC, everything there is maintained by hand. Tools like dstep exist and can work well enough, though they come with limitations which require careful attention to and massaging of the output.

Tediousness is an enemy of productivity. That’s why several pages of discussion were generated from Átila Neves’s casual announcement a few weeks before DConf 2018 that it was now possible to #include C headers in D code.

dpp is a compiler wrapper that will parse a D source file with the .dpp extension and expand in place any #include directives it encounters, translating all of the C or C++ symbols to D, and then pass the result to a D compiler (DMD by default). Says Átila:

What motivated the project was a day at Cisco when I wanted to use D but ended up choosing C++ for the task at hand. Why? Because with C++ I could include the relevant header and be on my way, whereas with D (or any other language really) I’d have to somehow translate the header and all its transitive dependencies somehow. I tried dstep and it failed miserably. Then there’s the fact that the preprocessor is nearly always needed to properly use a C API. I wanted to remove one advantage C++ has over D, so I wrote dpp.

Here’s the example he presented in the blog post accompanying the initial announcement:

// stdlib.dpp
#include <stdio.h>
#include <stdlib.h>

void main() {
    printf("Hello world\n".ptr);

    enum numInts = 4;
    auto ints = cast(int*) malloc(int.sizeof * numInts);
    scope(exit) free(ints);

    foreach(int i; 0 .. numInts) {
        ints[i] = i;
        printf("ints[%d]: %d ".ptr, i, ints[i]);
    }

    printf("\n".ptr);
}

Three months later, dpp was successfully compiling the julia.h header allowing the Julia language to be embedded in a D program. The following month, it was enabled by default on run.dlang.io.

C support is fairly solid, though not perfect.

Although preprocessor macro support is one of dpp’s key features, some macros just can’t be translated because they expand to C/C++ code fragments. I can’t parse them because they’re not actual code yet (they only make sense in context), and I can’t guess what the macro parameters are. Strings? Ints? Quoted strings? How does one programmatically determine that #define FOO(S) (S) is meant to be a C cast? Did you know that in C macros can have the same name as functions and it all works? Me neither until I got a bug report!

Push the stdlib.dpp code block from above through run.dlang.io and read the output to see an example of translation difficulties.

The C++ story is more challenging. Átila recently wrote about one of the problems he faced. That one he managed to solve, but others remain.

dpp can’t translate C++ template specialisations on reference types because reference types don’t exist in D. I don’t know how to translate anything that depends on SFINAE because it also doesn’t exist in D.

For those not in the know, classes in D are reference types in the same way that Java classes are reference types, and function parameters annotated with ref accept arguments by reference, but when it comes to variable declarations, D has no equivalent for the C++ lvalue reference declarator, e.g. int& someRef = i;.

Despite the difficulties, Átila persists.

The holy grail is to be able to #include a C++ standard library header, but that’s so difficult that I’m currently concentrating on a much easier problem first: being able to successfully translate C++ headers from a much simpler library that happens to use standard library types (std::string, std::vector, std::map, the usual suspects). The idea there is to treat all types that dpp can’t currently handle as opaque binary blobs, focusing instead on the production library types and member functions. This sounds simple, but in practice I’ve run into issues with LLVM IR with ldc, ABI issues with dmd, mangling issues, and the most fun of all: how to create instances of C++ stdlib types in D to pass back into C++? If a function takes a reference to std::string, how do I give it one? I did find a hacky way to pass D slices to C++ functions though, so that was cool!

On the plus side, he’s found some of D’s features particularly helpful in implementing dpp, though he did say that “this is harder for me to recall since at this point I mostly take D’s advantages for granted.” The first thing that came to mind was a combination of built-in unit tests and token strings:

unittest {
    shouldCompile(
        C(q{ struct Foo { int i; } }),
        D(q{ static assert(is(typeof(Foo.i) == int)); })
    );
}

It’s almost self-explanatory: the first parameter to shouldCompile is C code (a header), and the second D code to be compiled after translating the C header. D’s token strings allow the editor to highlight the code inside, and the fact that C syntax is so similar to D lets me use them on C code as well!

He also found help from D’s contracts and the garbage collector.

libclang is a C library and as such has hardly any abstractions or invariant enforcement. All nodes in the AST are represented by a libclang “cursor”, which can have several “kinds”. D’s contracts allowed me to document and enforce at runtime which kind(s) of cursors a function expects, preventing bugs. Also, libclang in certain places requires the client code to manually manage memory. D’s GC makes for a wrapper API in which that is never a concern.

During development, he exposed some bugs in the DMD frontend.

I tried using sumtype in a separate branch of dpp to first convert libclang AST entities into types that are actually enforced at compile time instead of run time. Unfortunately that caused me to have to switch to compiling all code at once since sumtype behaves differently in separate compilation, triggering previously unseen frontend bugs.

For unit testing, he uses unit-threaded, a library he created to augment D’s built-in unit testing feature with advanced functionality. To achieve this, the library makes use of D’s compile-time reflection features. But dpp has a lot of unit tests.

Given the number of tests I wrote for dpp, compiling takes a very long time. This is exacerbated by -unittest, which is a known issue. Not using unit-threaded’s runner would speed up compilation, but then I’d lose all the features. It’d be better if the compile-time reflection required were made faster.

Perhaps he’ll see some joy there when Stefan Koch’s ongoing work on NewCTFE is completed.

Átila will be speaking about dpp on May 9 as part of his presentation at DConf 2019 in London. The conference runs from May 8–11, so as I write there’s still plenty of time to register. For those who can’t make it, you can watch the livestream (a link to which will be provided in the D forums each day of the conference) or see the videos of all the talks on the D Language Foundation’s YouTube channel after DConf is complete.

DMD 2.085.0 and DConf 2019 News

Coinciding with news of a new release of DMD is news about DConf 2019 in London. From new GC options in DRuntime to free beers and free tours at DConf, we may as well kill two birds with one blog post!

Compiler news

The 2.085.0 release of DMD, the D reference compiler, is now ready for download. Among other things, this release sees support for 32-bit OS X removed, more support for interfacing with Objective-C, and big news on the garbage collection front. There’s also been some work on compatibility with the standard C++ library. In total, 2.085.0 represents 58 closed Bugzilla issues and the efforts of 49 contributors. See the changelog for the full details.

Interfacing with Objective-C

DMD has had limited support for binding with Objective-C for some time. This release expands that support to include classes, instance variables, and super calls.

Previously, binding with Objective-C classes required using D interfaces. No longer. Now, Objective-C classes can be declared directly as D classes. Decorate an Objective-C class with extern(Objective-C), make use of the @selector attribute on the methods, and away you go.

To better facilitate interaction between the two languages, such classes have slightly modified behavior. Any static and final methods in an extern(Objective-C) class are virtual. However, final methods still are forbidden from being overridden by subclasses. static methods are overridable.

extern (Objective-C)
class NSObject
{
    static NSObject alloc() @selector("alloc");
    NSObject init() @selector("init");
    void release() @selector("release");
}

extern (Objective-C)
class Foo : NSObject
{
    override static Foo alloc() @selector("alloc");
    override Foo init() @selector("init");

    int bar(int a) @selector("bar:")
    {
        return a;
    }
}

void main()
{
    auto foo = Foo.alloc.init;
    scope (exit) foo.release();

    assert(foo.bar(3) == 3);
}

It’s also now possible to declare instance variables in Objective-C classes and for a method in an Objective-C class to make a super call.

extern (Objective-C)
class NSObject
{
    void release() @selector("release");
}

extern (Objective-C)
class Foo : NSObject
{
    // instance variable
    int bar;

    int foo() @selector("foo")
    {
        return 3;
    }

    int getBar() @selector("getBar")
    {
        return bar;
    }
}

extern (Objective-C)
class Bar : Foo
{
    static Bar alloc() @selector("alloc");
    Bar init() @selector("init");

    override int foo() @selector("foo")
    {
        // super call
        return super.foo() + 1;
    }
}

New GC stuff

Perhaps the biggest of the GC news items is that DRuntime now ships with a precise garbage collector. This can be enabled on any executable compiled and linked against the latest DRuntime by passing the runtime option --DRT-gcopt=gc:precise. To be clear, this is not a DMD compiler option. Read the documentation on the precise GC for more info.

Another new GC configuration option controls the behavior of the GC at program termination. Currently, the GC runs a collection at termination. This is to present the opportunity to finalize any live objects still holding on to resources that might affect the system state. However, this doesn’t guarantee all objects will be finalized as roots may still exist, nor is the need for it very common, so it’s often just a waste of time. As such, a new cleanup option allows the user of a D program to specify three possible approaches to GC clean up at program termination:

  • collect: the default, current, behavior for backward compatibility
  • none: do no cleanup at all
  • finalize: unconditionally finalize all live objects

This can be passed on the command line of a program compiled and linked against DRuntime as, e.g. --DRT-gcopt=cleanup:finalize.

All DRuntime options, including the two new ones, can be set up in code rather than being passed on the command line by declaring and initializing an array of strings called rt_options. It must be both extern(C) and __gshared:

extern(C) __gshared string[] rt_options = ["gcopt=gc:precise cleanup:none"];

See the documentation on configuring the garbage collector for more GC options.

Additional GC-related enhancements include programmatic access to GC profiling statistics and a new GC registry that allows user-supplied GCs to be linked with the runtime (see the documentation for details).

Standard C++

There are two enhancements to D’s C++ interface in this release. The first is found in the new DRuntime module, core.stdcpp.new_. This provides access to the global C++ new and delete operators so that D programs can allocate from the C++ heap. The second is the new core.stdcpp.allocator module, which exposes the std::allocator<T> class of C++ as a foundation for binding to the STL container types that allocate.

DConf 2019 news

There are two interesting perks for conference goers this year.

The nightly gathering spot

We now have an “official” gathering spot. Usually at DConf, we pick an “official” hotel where the D Language Foundation folks and many attendees stay, but where a number of conference goers gather in the evenings after dinner. This year, a number of factors made it difficult to pick a reasonable spot, so we opted for something different.

There’s a cozy little pub around the corner from the venue, the Prince Arthur, that has a nice room on the second floor available for reservation. There’s a limit on how many bodies we can pack in there at once, but folks generally come and go throughout the evening anyway. Any overflow can head downstairs to the public area. We’ve got the room May 8, 9, and 10.

Additionally, we’ll be offering a couple of free rounds each night courtesy of Mercedes-Benz Research & Development North America. Free drinks in a cozy backstreet London pub sounds like a great way to pass the time!

Check out the DConf venue page for details about the Prince Arthur and how to get there.

A free tour by Joolz Guides

Julian McDonnell of Joolz Guides will be taking some DConf registrants on a guided walk May 6 and 7. If you’ve never seen his YouTube channel, we recommend it. His video guides are quirky, informative, and fun.

This is available for free to all registrants, but space is limited! When you register for DConf, you’ll receive information on how to reserve your spot. We’ve arranged to have the tours in the mid-afternoon on both days so that folks arriving in the morning will have a chance to participate. This is a walking tour that will last somewhere between 3 – 4 hours, so be sure to wear comfortable shoes.

The current plan is to start at Temple Station and end at London Bridge. We hope you join us!

Deadlines

The DConf submission deadline of March 10 is just around the corner. Now’s the time to send in your proposal for a talk, demo, panel or research report. See the DConf 2019 homepage for submission guidelines and selection criteria. And remember, speakers are eligible for reimbursement of travel and accommodation expenses.

The early-bird registration deadline of March 17 is fast approaching. Register now to take advantage of the 15% discount!

Project Highlight: Spasm

In 2014, Sebastiaan Koppe was working on a React project. The app’s target market included three-year-old mobile phones. He encountered some performance issues that, after investigating, he discovered weren’t attributable solely to the mobile platform.

It all became clear to me once I saw the flame graph. There were so many function calls! We managed to fix the performance issues by being a little smarter about updates and redraws, but the sight of that flame graph never quite left me. It made me wonder if things couldn’t be done more efficiently.

As time went on, Sebastiaan gained the insight that a UI is like a state machine.

Interacting with the UI moves you from one state to the next, but the total set of states and the rules governing the UI are all static. In fact, we require them to be static. Each time a user clicks a button, we want it to respond the same way. We want the same input validations to run each time a user submits a form.

So, if everything is static and defined up-front, why do all the javascript UI frameworks work so hard during runtime to figure out exactly what you want rendered? Would it not be more efficient to figure that out before you send the code to the browsers?

This led him to the idea of analyzing UI definitions to create an optimal UI renderer, but he was unable to act on it at the time. Then in 2018, native-language DOM frameworks targeting WebAsm, like asm-dom for C++ and Percy for Rust, came to his attention. Around the same time, the announcement of Vladimir Panteleev’s dscripten-tools introduced him to Sebastien Alaiwan’s older dscripten project. The former is an alternative build toolchain for the latter, which is an example of compiling D to asm.js via Emscripten. Here he saw an opportunity to revisit his old idea using D.

D’s static introspection gave me the tools to create render code at compile time, bypassing the need for a virtual DOM. The biggest challenge was to map existing UI declarations and patterns to plain D code in such a way that static introspection can be used to extract all of the information necessary for generating the rendering code.

One thing he really wanted to avoid was the need to embed React’s Javascript extension, JSX, in the D code, as that would require the creation of a compile-time parser. Instead, he decided to leverage the D compiler.

For the UI declarations, I ended up at a design where every HTML node is represented by a D struct and all the node’s attributes and properties are members of that struct. Combined with annotations, it gives enough information to generate optimal render code. With that, I implemented the famous todo-mvc application. The end result was quite satisfying. The actual source code was on par or shorter than most javascript implementations, the compiled code was only 60kB after gzip, and rendering various stages in the todo app took less than 2ms.

He announced his work on the D forums in September of 2018 (the live demo is still active as I write).

Unfortunately, he wasn’t satisfied with the amount of effort involved to get the end result. It required using the LLVM-based D compiler, LDC, to compile D to LLVM IR, then using Emscripten to produce asm.js, and finally using binaryen to compile that into WebAssembly. On top of that…

…you needed a patched version of LLVM to generate the asm.js, which required a patched LDC. Compiling all those dependencies takes a long time. Not anything I expect end users to willfully subject themselves to. They just want a compiler that’s easy to install and that just works.

As it happened, the 1.11.0 release of LDC in August 2018 actually had rudimentary support for WebAssembly baked in. Sebastiaan started rewriting the todo app and his Javascript glue code to use LDC’s new WebAssembly target. In doing so, he lost Emsripten’s bundled musl libc, so he switched his D code to use -betterC mode, which eliminates D’s dependency on DRuntime and, in turn, the C standard library.

With that, he had the easy-to-install-and-use package he wanted and was able to get the todo-mvc binary down to 5kb after gzip. When he announced this news in the D forums, he was led down a new path.

Someone asked about WebGL. That got me motivated to think about creating bindings to the APIs of the browser. That same night I did a search and found underrun, an entry in the 2018 js13k competition. I decided to port it to D and use it to figure out how to write bindings to WebGL and WebAudio.

He created the bindings by hand, but later he discovered WebIDL. After the underrun port was complete, he started work on using WebIDL to generate bindings.

It formed very quickly over the course of 2–3 months. The hardest part was mapping every feature of WebIDL to D, and at the same time figuring out how to move data, objects and callbacks between D and Javascript. All kinds of hard choices came up. Do I support undefined? Or optional types? Or union types? What about the “any” type? The answer is yes, all are supported.

He’s been happy with the result. D bindings to web APIs are included in Spasm and they follow the Javascript API as much as possible. A post-compile step is used to generate Javascript glue code.

It runs fairly quickly and generates only the Javascript glue code you actually need. It does that by collecting imported functions from the generated WebAssembly binary, cross-referencing those to functions to WebIDL definitions and then generating Javascript code for those.

The result of all this is Spasm, a library for developing single-page WebAssembly applications in D. The latest version is always available in the DUB repository.

I can’t wait to start working on hot module replacement with Spasm. Many Javascript frameworks provide it out-of-the box and it is really valuable when developing. Server-side rendering is also something that just wants to get written. While Spasm doesn’t need it so much to reduce page load times, it is necessary when you want to unittest HTML output or do SEO.

At the moment, his efforts are directed toward creating a set of basic material components. He’s had a hard time getting something together that works in plain D, and at one point considered abandoning the effort and working instead on a declarative UI language that compiles to D, but ultimately he persisted and will be announcing the project soon.

After the material project there are still plenty of challenges. The biggest thing I have been postponing is memory management. Right now the allocator in Spasm is a simple bump-the-pointer allocator. The memory for the WebAssembly instance in the browser is hardcoded to 16mb and when it’s full, it’s full. I could grow the memory of course, but I really need a way to reuse memory. Without help from the compiler – like Rust has – that either means manual memory management or a GC.

One way to solve the problem would be to port DRuntime to WebAssembly, something he says he’s considered “a few times” already.

At least the parts that I need. But so far the GC has always been an issue. In WebAssembly, memory is linear and starts at 0. When you combine that with an imprecise GC, suddenly everything looks like a pointer and, as a consequence, it won’t free any memory. Recently someone wrote a precise GC implementation. So that is definitely back on the table.

He’s also excited that he recently ran WebAssembly generated from D on Cloudflare workers.

The environment is different from a browser, but its the same in many ways. This is all very exciting and lots of possibilities will emerge for D. In part because you can generate WebAssembly binaries that are pretty lean and mean.

We’re pretty excited about the work Sebastiaan is doing and can’t wait to see where it goes. Keep an eye on the Dlang Newsfeed (@dlang_ng) on Twitter, or the official D Twitter feed (@D_Programming) to learn about future Spasm announcements.

DMD 2.083.0 Released

Version 2.083.0 of DMD, the D reference compiler, is ready for download. The changelog lists 47 fixes and enhancements across all components of the release package. Notable among them are some C++ compatibility enhancements and some compile-time love.

C++ compatibility

D’s support for linking to C++ binaries has been evolving and improving with nearly every release. This time, the new things aren’t very dramatic, but still very welcome to those who work with both C++ and D in the same code base.

What’s my runtime?

For a while now, D has had predefined version identifiers for user code to detect the C runtime implementation at compile time. These are:

  • CRuntime_Bionic
  • CRuntime_DigitalMars
  • CRuntime_Glibc
  • CRuntime_Microsoft
  • CRuntime_Musl
  • CRuntime_UClibc

These aren’t reliable when linking against C++ code. Where the C runtime in use often depends on the system, the C++ runtime is compiler-specific. To remedy that, 2.083.0 introduces a few new predefined versions:

  • CppRuntime_Clang
  • CppRuntime_DigitalMars
  • CppRuntime_Gcc
  • CppRuntime_Microsoft
  • CppRuntime_Sun

Why so much conflict?

C++ support also gets a new syntax for declaring C++ linkage, which affects how a symbol is mangled. Consider a C++ library, MyLib, which uses the namespace mylib. The original syntax for binding a function in that namespace looks like this:

/*
 The original C++:
 namespace mylib { void cppFunc(); }
*/
// The D declaration
extern(C++, mylib) void cppFunc();

This declares that cppFunc has C++ linkage (the symbol is mangled in a manner specific to the C++ compiler) and that the symbol belongs to the C++ namespace mylib . On the D side, the function can be referred to either as cppFunc or as mylib.cppFunc.

In practice, this approach creates opportunities for conflict when a namespace is the same as a D keyword. It also has an impact on how one approaches the organization of a binding.  It’s natural to want to name the root package in D mylib, as it matches the library name and it is a D convention to name modules and packages using lowercase. In that case, extern(C++, mylib) declarations will not be compilable anywhere in the mylib package because the symbols conflict.

To alleviate the problem, an alternative syntax was proposed using strings to declare the namespaces in the linkage attribute, rather than identifiers:

/*
 The original C++:
 namespace foo { void cppFunc(); }
*/
// The D declaration
extern(C++, "foo") void cppFunc();

With this syntax, no mylib symbol is created on the D side; it is used solely for name mangling. No more conflicts with keywords, and D packages can be used to match the C++ namespaces on the D side. The old syntax isn’t going away anytime soon, though.

New compile-time things

This release provides two new built-in traits for more compile-time reflection options. Like all built-in traits, they are accessible via the __traits expression. There’s also a new pragma that lets you bring some linker options into the source code in a very specific circumstance.

Are you a zero?

isZeroInit can be used to determine if the default initializer of a given type is 0, or more specifically, it evaluates to true if all of the init value’s bits are zero. The example below uses compile-time asserts to verify the zeroness and nonzeroness of a few default init values, but I’ve saved a version that prints the results at runtime, for more immediate feedback, and can be compiled and run from the browser.

struct ImaZero {
    int x;
}

struct ImaNonZero {
    int x = 10;
}

// double.init == double.nan
static assert(!__traits(isZeroInit, double));

// int.init == 0
static assert(__traits(isZeroInit, int));

// ImaZero.init == ImaZero(0)
static assert(__traits(isZeroInit, ImaZero));

// ImaNonZeror.init == ImaZero(10)
static assert(!__traits(isZeroInit, ImaNonZero));

Computer, query target.

The second new trait is getTargetInfo, which allows compile-time queries about the target platform. The argument is a string that serves as a key, and the result is “an expression describing the requested target information”. Currently supported strings are “cppRuntimeLibrary”, “floatAbi”, and “ObjectFormat”.

The following prints all three at compile time.

pragma(msg, __traits(getTargetInfo, "cppRuntimeLibrary"));
pragma(msg, __traits(getTargetInfo, "floatAbi"));
pragma(msg, __traits(getTargetInfo, "objectFormat"));

On Windows, using the default (-m32) DigitalMars toolchain, I see this:

snn
hard
omf

With the Microsoft Build Tools (via VS 2017), compiling with -m64 and -m32mscoff, I see this:

libcmt
hard
coff

Yo! Linker! Take care of this, will ya?

D has long supported a lib pragma, allowing programmers to tell the compiler to pass a given library to the linker in source code rather than on the command line. Now, there’s a new pragma in town that let’s the programmer specify specific linker commands in source code and behaves rather differently. Meet the linkerDirective pragma:

pragma(linkerDirective, "/FAILIFMISMATCH:_ITERATOR_DEBUG_LEVEL=2");

The behavior is specified as “Implementation Defined”. The current implementation is specced to behave like so:

  • The string literal specifies a linker directive to be embedded in the generated object file.
  • Linker directives are only supported for MS-COFF output.

Just to make sure you didn’t gloss over the first list item, look at it again. The linker directive is not passed by the compiler to the linker, but emitted to the object file. Since it is only supported for MS-COFF, that means its only a thing for folks on Windows when they are compiling with -m64 or -m32mscoff. And some say the D community doesn’t care about Windows!

Of course there’s more!

The above are just a few cherries I picked from the list. For a deeper dive, see the full changelog. And head over to the Downloads page to get the installer for your platform. It looks a lot nicer than the boring list of files linked in the changelog.

DMD 2.082.0 Released

DMD 2.082.0 was released over the weekend. There were 28 major changes and 76 closed Bugzilla issues in this release, including some very welcome improvements in the toolchain. Head over to the download page to pick up the official package for your platform and visit the changelog for the details.

Tooling improvements

While there were several improvements and fixes to the compiler, standard library, and runtime in this release, there were some seemingly innocuous quality-of-life changes to the tooling that are sure to be greeted with more enthusiasm.

DUB gets dubbier

DUB, the build tool and package manager for D that ships with DMD, received a number  of enhancements, including better dependency resolution, variable support in the build settings, and improved environment variable expansion.

Arguably the most welcome change will be the removal of the regular update check. Previously, DUB would check for dependency updates once a day before starting a project build. If there was no internet connection, or if there were any errors in dependency resolution, the process could hang for some time. With the removal of the daily check, upgrades will only occur when running dub upgrade in a project directory. Add to that the brand new --dry-run flag to get a list of any upgradable dependencies without executing the upgrades.

Signed binaries for Windows

For quite some time users of DMD on Windows have had the annoyance of seeing a warning from Windows Smartscreen when running the installer, and the occasional false positive from AntiVirus software when running DMD.

Now those in the Windows D camp can do a little victory dance, as all of the binaries in the distribution, including the installer, are signed with the D Language Foundation’s new code signing certificate. This is one more quality-of-life issue that can finally be laid to rest. On a side note, the cost of the certificate was the first expense entered into our Open Collective page.

Compiler and libraries

Many of the changes and updates in the compiler and library department are unlikely to compel anyone to shout from the rooftops, but a handful are nonetheless notable.

The compiler

One such is an expansion of the User-Defined Attribute syntax. Previously, these were only allowed on declarations. Now, they can be applied to function parameters:

// Previously, it was illegal to attach a UDA to a function parameter
void example(@(22) string param)
{
    // It's always been legal to attach UDAs to type, variable, and function declarations.
    @(11) string var;
    pragma(msg, [__traits(getAttributes, var)] == [11]);
    pragma(msg, [__traits(getAttributes, param)] == [22]);
}

Run this example online

The same goes for enum members (it’s not explicitly listed in the highlights at the top of the changelog, but is mentioned in the bugfix list):

enum Foo {
@(10) one,
@(20) two,
}

void main()
{
pragma(msg, [__traits(getAttributes, Foo.one)] == [10]);
pragma(msg, [__traits(getAttributes, Foo.two)] == [20]);
}

Run this example online

The DasBetterC subset of D is enhanced in this release with some improvements. It’s now possible to use array literals in initializers. Previously, array literals required the use of TypeInfo, which is part of DRuntime and therefore unavailable in -betterC mode. Moreover, comparing arrays of structs is now supported and comparing arrays of byte-sized types should no longer generate any linker errrors.

import core.stdc.stdio;
struct Sint
{
    int x;
    this(int v) { x = v;}
}

extern(C) void main()
{
    // No more TypeInfo error in this initializer
    Sint[6] a1 = [Sint(1), Sint(2), Sint(3), Sint(1), Sint(2), Sint(3)];
    foreach(si; a1) printf("%i\n", si.x);

    // Arrays/slices of structs can now be compared
    assert(a1[0..3] == a1[3..$]);

    // No more linker error when comparing strings, either explicitly
    // or implicitly such as in a switch.
    auto s = "abc";
    switch(s)
    {
        case "abc":
            puts("Got a match!");
            break;
        default:
            break;
    }

    // And the same goes for any byte-sized type
    char[6] a = [1,2,3,1,2,3];
    assert(a[0..3] >= a[3..$]);

    puts("All the asserts passed!");
}

Run this example online

DRuntime

Another quality-of-life fix, this one touching on the debugging experience, is a new run-time flag that can be passed to any D program compiled against the 2.082 release of the runtime or later, --DRT-trapException=0. This allows exception trapping to be disabled from the command line.

Previously, this was supported only via a global variable, rt_trapExceptions. To disable exception trapping, this variable had to be set to false before DRuntime gained control of execution, which meant implementing your own extern(C) main and calling _d_run_main to manually initialize DRuntime which, in turn, would run the normal D main—all of which is demonstrated in the Tip of the Week from the August 7, 2016, edition of This Week in D (you’ll also find there a nice explanation of why you might want to disable this feature. HINT: running in your debugger). A command-line flag is sooo much simpler, no?

Phobos

The std.array module has long had an array function that can be used to create a dynamic array from any finite range. With this release, the module gains a staticArray function that can do the same for static arrays, though it’s limited to input ranges (which includes other arrays). When the length of a range is not knowable at compile time, it must be passed as a template argument. Otherwise, the range itself can be passed as a template argument.

import std.stdio;
void main()
{
    import std.range : iota;
    import std.array : staticArray;

    auto input = 3.iota;
    auto a = input.staticArray!2;
    pragma(msg, is(typeof(a) == int[2]));
    writeln(a);
    auto b = input.staticArray!(long[4]);
    pragma(msg, is(typeof(b) == long[4]));
    writeln(b);
}

Run this example online

September pumpkin spice

Participation in the #dbugfix campaign for this cycle was, like last cycle, rather dismal. Even so, I’ll have an update on that topic later this month in a post of its own.

Three of eight applicants were selected for the Symmetry Autumn of Code, which officially kicked off on September 1. Stay tuned here for a post on that topic as well.

The blog has been quiet for a few weeks, but the gears are slowly and squeakily starting to grind again. Other posts lined up for this month include the next long-overdue installment in the GC Series and the launch of a new ‘D in Production’ profile.

DMD 2.081.0 Released

DMD 2.081.0 is now ready for download. Things that stand out in this release are a few deprecations, the implementation of a recently approved DIP (D Improvement Proposal), and quite a bit of work on C++ compatibility. Be sure to check the changelog for details.

Improving C++ interoperability

D has had binary compatibility with C from the beginning not only because it made sense, but also because it was relatively easy to implement. C++ is a different beast. Its larger-than-C feature set and the differences between it and D introduce complexities that make implementing binary compatibility across all supported platforms a challenge to get right. Because of this, D’s extern(C++) feature has been considered a work in progress since its initial inception.

DMD 2.081.0 brings several improvements to the D <-> C++ story, mostly in the form of name mangling bug fixes and improvements. The mangling of constructors and destructors in extern(C++) now properly match the C++ side, as does that of most of D’s operator overloads (where they are semantically equivalent to C++).

Proper mangling of nullptr_t is implemented now as well. On the D side, use typeof(null):

alias nullptr_t = typeof(null);
extern(C++) void fun(nullptr_t);

The alias in the example is not required, but may help with usability and readability when interfacing with C++. As typing null everywhere is likely reflexive for most D programmers, nullptr_t may be easier to keep in mind than typeof(null) for those special C++ cases.

Most of the D operator overloads in an extern(C++) class will now correctly mangle. This means it’s now possible to bind to operator overloads in C++ classes using the standard D opBinary, opUnary, etc. The exceptions are opCmp, which has no compatible C++ implementation, and the C++ operator!, which has no compatible D implementation.

In addition to name mangling improvements, a nasty bug where extern(C++) destructors were being placed incorrectly in the C++ virtual table has been fixed, and extern(C++) constructors and destructors now semantically match C++. This means mixed-language class hierarchies are now possible and you can now pass extern(C++) classes to object.destroy when you’re done with them.

Indirectly related,  __traits(getLinkage, ...) has been updated to now tell you the ABI with which a struct, class, or interface has been declared, so you can now filter out your extern(C++) aggregates from those which are extern(D) and extern(Objective-C).

The following shows some of the new features in action. First, the C++ class:

#include <iostream>
class CClass {
private:
    int _val;
public:
    CClass(int v) : _val(v) {}
    virtual ~CClass() { std::cout << "Goodbye #" << _val << std::endl; }
    virtual int getVal() { return _val; }
    CClass* operator+(CClass const * const rhs);
};

CClass* CClass::operator+(CClass const * const rhs) {
    return new CClass(_val + rhs->_val);
}

And now the D side:

extern(C++) class CClass
{
    private int _val;
    this(int);
    ~this();
    int getVal();
    CClass opBinary(string op : "+")(const CClass foo);
}

class DClass : CClass
{
    this(int v)
    {
        super(v);
    }
    ~this() {}
    override extern(C++) int getVal() { return super.getVal() + 10; }
}

void main()
{
    import std.stdio : writeln;

    writeln("CClass linkage: ", __traits(getLinkage, CClass));
    writeln("DClass linkage: ", __traits(getLinkage, DClass));

    DClass clazz1 = new DClass(5);
    scope(exit) destroy(clazz1);
    writeln("clazz1._val: ", clazz1.getVal());

    DClass clazz2 = new DClass(6);
    scope(exit) destroy(clazz2);
    writeln("clazz2._val: ", clazz2.getVal());

    CClass clazz3 = clazz1 + clazz2;
    scope(exit) destroy(clazz3);
    writeln("clazz3._val: ", clazz3.getVal);
}

Compile the C++ class to an object file with your C++ compiler, then pass the object file to DMD on the command line with the D source module and Bob’s your uncle (just make sure on Windows to pass -m64 or -m32mscoff to dmd if you compile the C++ file with the 64-bit or 32-bit Microsoft Build Tools, respectively).

This is still a work in progress and users diving into the deep end with C++ and D are bound to hit shallow spots more frequently than they would like, but this release marks a major leap forward in C++ interoperability.

DIP 1009

Given the amount of time DIP 1009 spent crawling through the DIP review process, it was a big relief for all involved when it was finally approved. The DIP proposed a new syntax for contracts in D. For the uninitiated, the old syntax looked like this:

int fun(ref int a, int b)
in
{
    // Preconditions
    assert(a > 0);
    assert(b >= 0, "b cannot be negative");
}
out(result) // (result) is optional if you don't need to test it
{
    // Postconditions
    assert(result > 0, "returned result must be positive");
    assert(a != 0);
}
do
{
 	// The function body
    a += b;
    return b * 100;
}

Thanks to DIP 1009, starting in DMD 2.081.0 you can do all of that more concisely with the new expression-based contract syntax:

int fun(ref int a, int b)
    in(a > 0)
    in(b >= 0, "b cannot be negative")
    out(result; result > 0, "returned result must be positive")
    out(; a != 0)
{
    a += b;
    return b * 100;
}

Note that result is optional in both the old and new out contract syntaxes and can be given any name. Also note that the old syntax will continue to work.

Deprecations

There’s not much information to add here beyond what’s already in the changelog, but these are cases that users should be aware of:

The #dbugfix campaign

The inaugural #dbugfix round prior to the release of DMD 2.080 was a success, but Round 2 has been much, much quieter (few nominations, very little discussion, and no votes).

One of the two nominated bugs selected from Round 1 was issue #18068. It was fixed and merged into the new 2.081.0 release. The second bug selected was issue #15984, which has not yet been fixed.

In Round 2, the following bugs were nominated with one vote each:

I’ll hand this list off to our team of bug fixing volunteers and hope there’s something here they can tackle.

Round 3 of the #dbugfix campaign is on now. Please nominate the bugs you want to see fixed! Create a thread in the General Forum with #dbugfix and the issue number in the title, or send out a tweet containing #dbugfix and the issue number. I’ll tally them up at the end of the cycle (September 28).

And please, if you do use the #dbugfix in a tweet, remember that it’s intended for nominating bugs you want fixed and not for bringing attention to your pull requests!

How an Engineering Company Chose to Migrate to D

Bastiaan Veelo is the lead developer of a specialised program for the
computer aided geometric design of ship hulls called Fairway, for the
company SARC in the Netherlands.


Imagine there is this little-known programming language in which you enjoy programming in your free time. You know it is ready for prime time and you dream about using it at work everyday. This is the story about how I made a dream like that come true.

My early acquaintance with D

Back when “google” was not yet a common verb, I was doing a web search for “parsing C++”. The reason was that writing a report for an assignment had derailed into writing a syntax highlighter for noweb using bison and flex, and I found out firsthand that C++ is not easy to parse. That web search brought up this page (present version) with an overview of the D Programming Language, and the following statement has had me hooked ever since:

D’s lexical analyzer and parser are totally independent of each other and of the semantic analyzer. This means it is easy to write simple tools to manipulate D source perfectly without having to build a full compiler. It also means that source code can be transmitted in tokenized form for specialized applications.

“Genius,” I thought, “here we have someone who knows what he’s doing.” This is representative of the pragmatic professionalism that still radiates from the D community, and it combines with an unpretentious flair that makes it pleasant to be around. This funny quote decorated its homepage for many years:

“Great, just what I need.. another D in programming.” – Segfault

Nevertheless, I didn’t have many opportunities to use the language and I largely remained sitting on the fence, observing its development.

Programming professionally

With mostly academic programming experience, I started programming professionally in 2006 for SARC, a Dutch engineering company serving the maritime industry. Since the early ’80s they have been developing software for ship design and onboard loading calculations, which today amounts to roughly half a million lines of code. I think their success can partly be attributed to their choice of programming language: Extended Pascal (the ISO 10206 standard, not one of the many proprietary extensions of Pascal).

Extended Pascal was a great improvement over ISO 7185 Pascal. Its compiler, by Prospero Software from England, was fast and well documented. The language is small enough and its syntax appropriately verbose to make engineering professionals quickly productive in programming. Personally though, I spent most of my time programming in C++, modernizing their system for computer aided design of ship hulls using Qt and Coin3D.

When your company outlives a programming language

Although selecting an ISO standard in favor of a proprietary Pascal dialect seemed wise at the time, it is apparent now that the company has outlived the language. Prospero Development Software Ltd was officially dissolved 15 years ago. Still, its former director, Tony Hetherington, continued giving support many years after, but he’d be close to 86 years old now and can no longer be reached. Its website is gone, last archived in 2013. There’s GNU Pascal, which also supports ISO 10206, but that project has stopped moving and long ago lost synchrony with gcc. Although there is no immediate crisis, it is clear that something needs to happen sometime if the company wants to continue its activities in the coming decades.

Changing the odds

A couple of years ago, I secretly started playing with the fantasy of replacing Extended Pascal with D. Even though D’s syntax is somewhat different from Pascal, it shares at least four important similarities: support for nested functions, boundary checking, modules, and compilation speed. In addition, it has many traits that make the language attractive to engineers: good focus on performance and numerics, garbage collection, dynamic arrays, easy parallelization, understandable templates, contract programming, memory safety, unit tests, and even wysiwyg strings and formatted numerals. D’s language features encourage experimentation, which resonates well with engineers.

So I wondered what I could do to highlight D’s significance to my employer and show it’s an attractive language to switch to. I thought I could make a compelling case if I could write a parser in D that would take Extended Pascal source and transpile it to D source. At least I would have fun trying!

So I went over to code.dlang.org to see if there were any D alternatives to flex and bison. There, I found Pegged, and instantly the fun began. Pegged combines the functionality of flex and bison in one incredibly easy to use package, for which its creator Philippe Sigaud obviously enjoyed writing excellent documentation. Nowadays, Pegged is part of the D language tour and you can try it out on-line without having to install a thing. The beauty is that the grammar from the Extended Pascal language specification maps almost linearly to the PEG from which Pegged generates the parser. For this it makes heavy use of D’s generic programming capabilities and compile-time function evaluation — it can generate a parser at compile time if you want it to!

However, it wasn’t smooth sailing all along. As I was testing D, I suddenly found myself being tested as well. I learned the hard way that there is a phenomenon called left-recursion, from which a PEG parser typically cannot break out of. And the Extended Pascal grammar is left-recursive in several ways. Consequently, I spent many evenings and weekends researching parsing theory, until eventually I managed to extend Pegged with support for all kinds of left-recursion! From one thing came another, and I added longest match alternation, case insensitive literals, the toHTML() method for dynamically browsing the syntax tree, and a tracer for logging the parsing process.

Obviously, I was having fun. But more importantly, I was demonstrating that the D programming language is accessible enough that a naval architect can understand other people’s code and expand it in non-trivial ways. The icing on the cake came when I was asked to present my experiences at DConf 2017 in Berlin, which you can watch here (and here’s the extra bit I presented at lunch time for the livestream audience).

At this time, I was able to automatically translate the following trivial example:

program hello(output);

begin
    writeln('Hello D''s "World"!');
end.

into D:

import std.stdio;

// Program name: hello
void main(string[] args)
{
    writeln("Hello D's \"World\"!");
}

Language competition

Having come this far, the founder of SARC agreed that it was time to investigate the merits of various alternative programming languages. We would do a thorough and objective comparison based on trial translations of a comprehensive set of language features. Due to the amount of manual labor that this requires, we had to drastically prune the space of programming languages in an initial review round. Note that what I am about to present does not declare which programming language is the best in our industry. What we are looking for is a language that allows an efficient transition from Extended Pascal without interrupting our business, and which enables us to take advantage of modern insights and tools.

In the initial review round we looked at general language characteristics. Here I’ll just highlight what fell through the sieve and why.

Performance is important to us, which is why we did not consider interpreted languages. C++ is in use for one component of our software, but that was written from the ground up. We feel that the options for translation are not favorable, that its long compile times are a serious hindrance to productivity, and that there are too many ways in which one can shoot one’s self in the foot. We cannot require our expert naval architects to also become experts in C++.

Nowadays, whenever D is publicly evaluated, the younger languages Go and Rust are often brought up as alternatives. Here, we need not go into an in-depth comparison of these languages because both Rust and Go lack one feature that we rely on heavily: nested functions with access to variables in their enclosing scope. Solutions for eliminating nested functions, like bringing them into global scope and passing extra variables, or breaking files up into smaller modules, we find unattractive because it would complicate automated translation, and we’d like to preserve the structure and style of our current code. GNU C does offer nested functions, but it is a non-standard extension and it has been predicted that many will move away from C due to its unsafe features. After this initial pruning, three languages remained on our shortlist: Free Pascal, Ada and D.

As a basis for our detailed comparison, we wrote fifteen small programs that each used a specific feature of Extended Pascal that is important in our current code base. We then translated those programs into each language on our shortlist. We kept a simple score board on how well these features were represented in each language: +1 if the feature is supported or can be implemented, 0 if the lack of the feature can be worked around, and -1 if it can’t. This is what came out of that evaluation:

Test Free Pascal Ada D
Arrays beginning at arbitrary indices +1 +1 +1
Sets 0 0 +1
Schema types 0 0 +1
Types with custom initial values -1 0 +1
Classes +1 +1 +1
Casts +1 +1 +1
Protection against use of dangling pointers -1 +1 +1
Thread safe memory [de]allocation +1 +1 +1
Calling into Windows API +1 +1 +1
Forwarding Windows callbacks to nested functions +1 +1 +1
Speed of calculations +1 +1 +1
Calling procedures written in assembly +1 0 +1
Calling procedures in a DLL +1 +1 +1
Binary compatibility of strings 0 +1 +1
Binary compatible file i/o -1 0 0
Score 6 10 14

So, Free Pascal is the only candidate with negative scores, Ada positions itself in the middle, and D achieves an almost perfect score. Not effortlessly, though; we’ll talk about some of the technical challenges later. Because Free Pascal is, like D, fully Open Source and written in itself, extending the language and filling in the gaps is theoretically possible. Although some of its deficiencies could certainly be resolved that way, others would be quite complicated and/or unlikely to be accepted upstream.

We also estimated the productivity of the languages. Free Pascal scored high because it is closest to what we are used to. Despite its dissimilar syntax, D scored high because of its expressiveness and flexibility. Ada scored lowest because of its rigidity and because of the extra work the programmer has to put in (most importantly casts and conversions). Ada is more verbose than Pascal which was disliked by some of us because it can somewhat obscure the essence of what a piece of code tries to express, and frequently the code became not only verbose but cryptic, which was unanimously disliked.

Third, we estimated the future prospects and the advantages each language could bring to the table. Although Free Pascal has a more active community than we expected it to have, we do not see great potential for growth. Ada is renowned for its support for writing reliable code (although it has no monopoly in that field) but it does come at a cost and requires real effort. D has a dynamic and open community, supports both script-like productivity and high performance, includes various features for writing reliable software (approaching Ada but at a much lower cost), and offers some unique advanced features with which wonders can be accomplished.

Finally, we estimated the effort of translation. Although Free Pascal is very similar to Extended Pascal, missing features pose a real problem and would require a high degree of manual translation and rewriting. Although p2ada exists, it only works partially in our case and does not fully support Extended Pascal. Because Ada frequently requires additional code (casting to the correct type, pulling in a package, instantiating a generic type, adding a pragma, splitting up Put_Lines etc.), writing or extending a reliable transpiler into Ada would be more difficult than doing the same into D.

Selecting a winner

I gave away the winner in the title, but we landed at that conclusion as follows. Ada was the first language to be dropped. We really felt that the extra work that the programmer has to put in is a brake on productivity and creativity. Although it barely played a role in our evaluation, illustrative is the difference between the Ada and D equivalents to the Expressive C++17 Challenge. The D solution is both concise and expressive, the Ada solution is hardly expressive and consists of more lines than I want to write or read. Also of secondary importance, but difficult to ignore, is the difference between the communities surrounding the languages, which in Ada’s case is AdaCore Support, who has no problems demanding annual five-figure subscription fees.

Although akin to our current language, Free Pascal was mainly dropped due to its porting challenges and our estimation that its potential is lower and its future outlook is less optimistic than that of D. If we were to choose Free Pascal, we would basically invest a lot of effort only to arrive at a technological solution that we felt would be of lower quality than we currently have.

And that’s were I saw a dream come true: A clap on the table by the company founder and it was decided to commit to the effort of bringing twenty-five years worth of Extended Pascal code to D!

What makes a difference

In short, my experience is that if a feature is not present in the language, D is powerful enough that the feature can be implemented in a library. Translating each sample program by hand has really helped to focus on replicating functionality, leaving the translation process for later concern. This has led to writing a compatibility library with types and functions that are vital for the conversion. Now that equivalents are known and the parser is done, I just have to implement code generation.

Below follows another example that currently translates automatically and executes identically. It iterates over a fixed length array running from 2 to 20 inclusive, fills it with values, prints the memory footprint and writes it to binary file:

program arraybase(input,output);

type t = array[2..20] of integer;
var a : t;
    n : integer;
    f : bindable file of t;

begin
  for n := 2 to 20 do
    a[n] := n;
  writeln('Size of t in bytes is ',sizeof(a):1); { 76 }
  if openwrite(f,'array.dat') then
    begin
      write(f,a);
      close(f);
    end;
end.

Transpiled to D (or should I say Dascal?) and post-processed by dfmt to fix up formatting:

import epcompat;
import std.stdio;

// Program name: arraybase
alias t = StaticArray!(int, 2, 20);

t a;
int n;
Bindable!t f;

void main(string[] args)
{
    for (n = 2; n <= 20; n++)
        a[n] = n;
    writeln("Size of t in bytes is ", a.sizeof); // 76
    if (openwrite(f, "array.dat"))
    {
        epcompat.write(f, a);
        close(f);
    }
}

Of course this is by no means idiomatic D, but the fact that it is recognizable and readable is nice, especially for my colleagues who will have to go through an unusual transition. By the way, did you notice that code comments are preserved?

One very-nice-to-have feature is binary file compatibility; In fact it may have been the killer feature, without which D might not have been so victorious. The case is that whenever a persistent data structure is extended in our software, we make sure that we can still read and convert that structure from its prior format. That way, if a client pulls out an old design from its archives and runs it through our current software, it will still work without the user even being aware that conversion occurs, possibly in multiple steps. Not having to give up that ability is very attractive.

But it wasn’t easy to get there. The main difficulty is the difference in how strings are represented in D and the Prospero implementation of Extended Pascal, in memory and on file. This presented the challenge of how to preserve binary compatibility in file I/O with data structures that contain string members.

Strings

In Prospero Extended Pascal, strings are implemented as a schema type, which is a parameterized type that can be used in the following ways:

type string80 = string(80);
var str1 : string80;
    str2 : string(60);
procedure foo(s : string);

This defines string80 to be an alias for a string type discriminated to have a capacity of 80 characters. Discriminated string variables, like str1 and str2, can be passed to functions and procedures that take undiscriminated strings as arguments, like foo, which thereby work on strings of any capacity. In memory, str1 is laid out as a sequence of 80 chars, followed by a ushort that encodes the length of the string. I say encodes because a shorter string is padded with \0s up to the capacity and the ushort actually contains the length of that padding. This way, when a pointer to the string is passed to a C function and the contents of the string occupy its full capacity, the 0 in the padding length doubles as the terminating \0 of the C string.

My first thought was to mimic this data representation with a D template. But that would require procedures like foo to be turned into templates as well, which would escalate horribly into template bloat, a problem with multiple string arguments and argument ordering, and would complicate translation. Besides, schema types can also be discriminated at run time, which does not translate to a template.

Could some sort of inheritance scheme be the solution? Not really, because instances of D classes live on the heap, so a string embedded in a struct would just be a pointer instead of the char array and ushort.

But binary layout is actually only relevant in files, and in a stroke of insight I realized that this must be why user-defined attributes, or UDAs, exist. If I annotate the string with the correct capacity for file I/O, then I can just use native D strings everywhere, which genuinely must be the best possible translation and solves the function argument issue. Annotation can be done with an instance of a struct like

struct EPString
{
    ushort capacity;
}

The above Pascal snippet then translates to D like so:

@EPString(80) struct string80 { string _; alias _ this; }
string80 str1;
@EPString(60) string str2;
void foo(string s);

Notice how the string80 alias is translated into the slightly convoluted struct instead of a normal D alias, which would have looked like

@EPString(80) alias string80 = string;
</code>

Although that compiles, there is no way to retrieve the UDA in that case because plain alias does not introduce a symbol. Then hasUDA!(typeof(str1), EPString) would have been equivalent to hasUDA!(string, EPString) which evaluates to false. By using the struct, string80 is a symbol so typeof(str1) gives string80, and hasUDA!(string80, EPString) evaluates to true in this example.

There is one side effect that we will have to learn to accept, and that is that taking a slice of a string does not produce the same result in D as it does in Extended Pascal. That is because string indices start at 1 in Extended Pascal and at 0 in D. My strategy is to eliminate slices from the source and replace them with a call to the standard substr function, which I can implement with index correction. Finding all string slices can be accomplished with a switch in the transpiler that makes it insert a static if to test if the slice is being taken on a string, and abort compilation if it is. (Arrays are transpiled into a custom array type that handles slices and indices compatibly with Extended Pascal.)

Binary compatible file I/O

Now, to write structs to file and handle any embedded @EPString()-annotated strings specially, we can use compile-time introspection in an overload to toFile that acts on structs as shown below. I have left out handling of aliased strings for clarity, as well as shortstring, which is a legacy string type with yet a different binary format.

void toFile(S)(S s, File f) if (is(S == struct))
{
    import std.traits;
    static if (!hasIndirections!S)
        f.lockingBinaryWriter.put(s);
    else
        // TODO unions
        foreach(field; FieldNameTuple!S)
        {
            // If the member has itself a toFile method, call it.
            static if (hasMember!(typeof(__traits(getMember, s, field)), "toFile") &&
                       __traits(compiles, __traits(getMember, s, field).toFile(f)))
                __traits(getMember, s, field).toFile(f);
            // If the member is a struct, recurse.
            else static if (is(typeof(__traits(getMember, s, field)) == struct))
                toFile(__traits(getMember, s, field), f);
            // Treat strings specially.
            else static if (is(typeof(__traits(getMember, s, field)) == string))
            {
                // Look for a UDA on the member string.
                static if (hasUDA!(__traits(getMember, s, field), EPString))
                {
                    enum capacity = getUDAs!(__traits(getMember, s, field), EPString)[0].capacity;
                    static assert(capacity > 0);
                    writeAsEPString(__traits(getMember, s, field), capacity, f);
                }
                else static assert(false, `Need an @EPString(n) in front of ` ~ fullyQualifiedName!S ~ `.` ~ field );
            }
            // Just write other data members.
            else static if(!isFunction!(__traits(getMember, s, field)))
                f.lockingBinaryWriter.put(__traits(getMember, s, field));
        }
}

At the time of writing, I still have work to do for unions, which are used in the translation of variant records (including considering the use of one of the seven existing library solutions 1, 2, 3, 4, 5, 6, 7).

Currently, detecting unions is a bit involved . Also, there is a complication in the determination of the size of a union when the largest variant contains strings: the D version of that variant may not be the largest because D strings are just slices. I’ll probably work around this by adding a dummy variant that is a fixed size array of bytes to force the size of the union to be compatible with Extended Pascal. This is the reason why D scored a mere 0 in file format compatibility. It is amazing what D allows you to do though, so I may be able to do all of that automatically and award D a perfect score retroactively. On the other hand, it is probably easiest to just add the dummy variant in the Pascal source at the few places where it matters and be done with it.

The way forward

Obviously, this is long term planning. It has taken years to grow into D; it will possibly take a year, and probably longer, to migrate to D. Unless others turn up who are in the same boat as us (please contribute!) it’ll be me who has to row this ship to D-land and I still have my regular duties to attend to. My colleagues will continue to develop in Extended Pascal as usual, and once my transpiler is able to translate all or almost all of it, we will make the switch to D overnight. From then on, we’ll be in it for the long run. We trust to be with D and D to be with us for decades to come!