Category Archives: Code

Flexible Default Function Parameters via structs with Nullable Fields

The problem

Sometimes we need to combine an aggregate of a set of values with an aggregate of the corresponding set of default values to create a combined result. The result for each member is either the explicitly specified value or, where no value is specified, the default value. This is similar to default function arguments in D. However, D forces one to always specify the first N values in function parameters, but I want to be able to specify an arbitrary subset of the values. Example:

Explicit values: a: 1, b: 2

Default values: a: 3, b: 4, c: 5

Combined result: a: 1, b: 2, c: 5

Possible solutions

The first idea is to use associative arrays. This approach is inefficient, however, because it combines values with string (or enum at best) names at runtime with associative array lookups and stores. It could be used like this (untested) example:

combine(["a": 1, "b": 2], ["a": 3, "b": 4, "c": 5])

I was advised to instead use structs with nullable values (see below) to pass multiple values. This is nearly as efficient as possible because the members of the structs are enumerated at compile time (in fact, I use static foreach in my implementation). So I implemented this solution. The source code is quite useful and released under the Apache 2.0 license.

To represent the (explicit) values of type T, we use a struct member of type Nullable!T. If it is null, this means that the explicit value is missing and the default value is used instead; otherwise the specified value is used.

Example of definition and combination

First, install the struct-params package with DUB (see the DUB documentation; I strongly recommend using DUB to build D projects) or clone my GitHub repository.

Then add the following import to your source:

import struct_params;

Example code:

mixin StructParams!("S", int, "x", float, "y");
immutable S.WithDefaults combinedMain = { x: 12 }; // note y is default initialized to null
immutable S.Regular combinedDefault = { x: 11, y: 3.0 };
immutable combined = combine(combinedMain, combinedDefault);
assert(combined.x == 12 && combined.y == 3.0);

StructParams is a string mixin, a D construct which generates D code at compile time and mixes it in at the point of declaration.

mixin StructParams!("S", int, "x", float, "y"); effectively defines the following struct:

struct S {
  struct Regular {
    int x;
    float y;
  }
  struct WithDefaults {
    Nullable!int x;
    Nullable!float y;
  }
}

WithDefaults is the struct to pass, for example, explicit values (which can be present (non-null) or missing (null) to be replaced with default values). For this, the D template type Nullable is used to represent either a value of a type or a null denoting a missing value.

Regular is just a standard struct with fields. Regular is a type which can be used to pass default values to the function combine.

This function combine combines explicit values with default values (as described above).

Then we assert that the result is correct.

Calling functions

Finally, we will do the main thing for which all the above was intended and call a function with combined values:

float f(int a, float b) {
    return a + b;
}
assert(callFunctionWithParamsStruct!f(combined) == combined.x + combined.y);

The structure combined is “split” into members and the members are passed as parameters to f in the order the fields of the struct are defined, that is in the order their names are specified as arguments to the StructParams string mixin. For those interested in the details: I use the built-in D struct and class property .tupleof to split the structure into a tuple of members (e.g. explicitly calling f with an instance reg of Regular would look like this: f(reg.tupleof);).

We can also call a member function of a struct or class instance (t in the example below):

struct Test {
    float f(int a, float b) {
        return a + b;
    }
}
Test t;
assert(callMemberFunctionWithParamsStruct!(t, "f")(combined) == combined.x + combined.y);

It is very unnatural to call the member f using a string of its name, but I have not found a better solution.

Another variant would be to use callFunctionWithParamsStruct!((int a, float b) => t.f(a, b))(combined), but this way is inconvenient as it requires specifying arguments explicitly.

Final considerations

Note that we cannot currently use struct initializers with named arguments, like S.Regular(x: 11, y: 3.0), as the current version of D does not have this feature. There is a draft D Improvement Proposal (DIP) to introduce the feature, but I hear its author is going to replace it with a more general-case proposal after DConf 2019 in London.

It would be beneficial to implement such structs with default values, but this seems impossible, but representing every possible D value as a literal is apparently impossible (for example, a structure with circular references to other structures seems not to be representable as a literal). We can attempt to implement it for a subset of types of values, or even for some values of types and not for others, but this would require further consideration.

Victor Porton is an open source developer, a math researcher, and a Christian writer. He earns his living as a programmer.

DStep 1.0.0

DStep is a tool for automatically generating D bindings for C and Objective-C libraries. This is implemented by processing C or Objective-C header files and outputting D modules. DStep uses the Clang compiler as a library (libclang) to process the header files.

Background

The first version of DStep was released on the 7th of July, 2012. There have been four subsequent releases, the last of which was on the 16th of January, 2016. Quite a lot has happened in the D world and with DStep since then.

After the release of DStep 0.2.1 in January 2016, there wasn’t much progress on DStep. I had a limited amount of time and chose to spend it on other projects. Fortunately, in 2016, DStep got picked as one of four D-related projects for Google Summer of Code (GSoC). The student who chose to work on DStep was Wojciech Szęszoł. He did a tremendous amount of work and pushed DStep forward by years compared to the time it would have taken me. In fact, I was often a blocker because I couldn’t keep up with reviewing all the changes he made.

New Release

The latest release of DStep contains a huge number of new features and bug fixes. A lot of the new features add support for translating C preprocessor macros in various forms. DStep also gained support for one more platform: Windows. Here follow some of the new features available in DStep 1.0.0:

Support for Simple Defines

This feature adds support for translating a simple form of #define to a manifest constant in D. Example:

#define FOO 1

The above C code is translated to the following D code:

enum FOO = 1;

DStep will try to translate the C code so that the D code looks as much as possible like the original C code. If a #define contains an expression instead of a single literal, DStep will try to preserve the original expression:

#define FOO 1 + 3

Instead of translating this to a manifest constant with the value of 4 (which would be semantically correct), DStep will preserve the original expression and translate it to:

enum FOO = 1 + 3;

This also goes for other types of literals, like hexadecimal literals:

#define FOO 0x1

Here DStep will preserve the hexadecimal literal and translate it to:

enum FOO = 0x1;

Function-Like Macros

DStep is now able to translate function-like macros. This is a pretty advanced feature that requires a small parser for the macros. DStep uses libclang to tokenize the macros and the parses them to be able to do the proper translations. The most basic example looks like:

#define FOO() 0 + 1

The above macro will translate to the following D code:

extern (D) int FOO()
{
    return 0 + 1;
}

Although not shown here (to minimize the examples), DStep will output extern (C): at the top of each file. Therefore, for macros translated to functions, DStep will add extern (D) to give the functions D linkage and mangling.

Here’s an example of a C macro containing parameters:

#define FOO(a, b) a + b

Unfortunately, in C, a and b can be basically anything. D doesn’t have an exact corresponding feature. DStep will translate this as accurately as
possible by outputting a templated function:

extern (D) auto FOO(T0, T1)(auto ref T0 a, auto ref T1 b)
{
    return a + b;
}

The assumption in this translation is that a and b will be a value of some kind of type. They can either be of the same type or of different types. To
avoid copying any of the values, ref parameters are used. Since an rvalue cannot be passed to a ref parameter, auto ref is used instead to properly handle both rvalues and lvalues.

More advanced expressions are supported as well:

#define BAR 4
#define FOO(a, b) a + 3 + (b + BAR) - sizeof(b)

In the above example there’s a combination of parameters, literals, parenthesized expression, usages of other macros, and built-in operators. DStep handles all those and translates it to:

enum BAR = 4;

extern (D) auto FOO(T0, T1)(auto ref T0 a, auto ref T1 b)
{
    return a + 3 + (b + BAR) - b.sizeof;
}

Again, the expression is preserved as closely as possible to the original source code. The parentheses, the reference to the BAR macro, all are preserved.

Token Concatenation

This feature adds support for translating the token concatenation, or token pasting, operator to a D string concatenation:

#define CONCAT(prefix, name) prefix ## name

The above function-like macro concatenates the two given tokens. DStep translates that to a function that converts the arguments to strings and concatenates the two resulting strings. This can then be used together with the string mixin statement to give the same behavior as in C.

extern (D) string CONCAT(T0, T1)(auto ref T0 prefix, auto ref T1 name)
{
    import std.conv : to;

    return to!string(prefix) ~ to!string(name);
}

Another example is parameters combined with tokens:

#define CONCAT(prefix) prefix ## name

This translates similarly to the previous example, but since name is not a parameter this will be translated to a string literal:

extern (D) string CONCAT(T)(auto ref T prefix)
{
    import std.conv : to;

    return to!string(prefix) ~ "name";
}

Preprocessor Constants in Array Sizes

DStep will now preserve preprocessor constants for the size of arrays:

#define Foo 3
int a[Foo];

In previous versions of DStep the translation would just output the size of the array a as 3. This would be semantically accurate but the generated source
code would look less like the original C source code. Now DStep is able to translate preprocessor constants and can, therefore, use the preprocessor constant as the size of the array:

enum Foo = 3;
extern __gshared int[Foo] a;

In the above example, the manifest constant Foo is used as the size of a instead of a plain 3. This more closely matches the original C source code.

Preserving Comments

In previous versions of DStep comments were completely stripped out. With this release DStep is able to preserve comments in the D code from the original C code:

// This comment describes this whole file

// Documentation for the symbol `foo`
void foo();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

int a; // this is `a`

In the above example there are three types of comments:

  • A header comment for the whole file
  • A preceding comment for the symbol foo
  • Loose comments not belonging to any symbol
  • A trailing comment for the symbol a

All of these comments are now properly preserved:

// This comment describes this whole file

extern (C):

// Documentation for the symbol `foo`
void foo ();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

extern __gshared int a; // this is `a`

In the above example, notice how the header comment is placed above the extern (C): line. If a module declaration is output in the D file (when the --package flag is used), the header comment will be placed above that as well:

// This comment describes this whole file

module bar.foo;

extern (C):

// Documentation for the symbol `foo`
void foo ();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

extern __gshared int a; // this is `a`

It’s also possible to disable the preservation of comments using the --comments=false flag.

Package Prefix

To better help organize bindings, DStep supports the
--package <name.of.package> flag. When this flag is enabled DStep will put all the translated modules in the <name> package. Note, this will only add the package prefix to the module declaration of the translated file. It will not place the output file in a directory corresponding to the package.

Removing Excessive Newlines

DStep will now remove excessive newlines but still preserve spacing for the original C code. This is best illustrated with an example:

int a;


int b;

In the above example there are two newlines between the declarations of a and b. DStep will remove the excessive newline and only output one to still preserve the spacing:

extern __gshared int a;

extern __gshared int b;

But if there is no spacing between the declarations, that is respected as well:

int a;
int b;

In the above example there is no newline between the declarations and DStep will preserve that:

extern __gshared int a;
extern __gshared int b;

Preserving Order of Declarations

In previous versions of DStep the declarations in the translated D code would follow a certain order defined by DStep, aliases first, then constants, then types and last functions. With this release, DStep will now preserve the order of the declarations of the original C code:

void bar();

struct Foo
{
    int a;
};

Previous versions would translate the above to:

struct Foo
{
    int a;
}

void bar ();

with the struct first and then the function declaration. With this release, the order is preserved:

void bar ();

struct Foo
{
    int a;
}

Multiple Input Files

Previous versions of DStep only allowed a single header file as input. With this release, multiple files can be passed to DStep at once. Each input file will produce one D source file as input. To pass multiple input files to DStep, just pass the filenames when invoking DStep.

$ dstep foo.h bar.h

Running the above command will produce two D source files: foo.d and bar.d.

If multiple input files and the -o flag are given, the -o flag specifies the output directory where the D source files will be placed. When multiple input
files are given it’s not possible to specify the names of the D source files.

$ dstep foo.h bar.h -o foobar
$ ls foobar
bar.d foo.d

The above command will place the two D source files in the directory foobar.

--reduce-aliases Flag

Normally when DStep translates a header file to a D module it will reduce aliases if possible. DStep contains a set of common typedefs that can be reduced to native D types. That means that code like this:

#include ;
int32_t a = 3;

Will be translated to the following D code:

extern __gshared int a;

In this release of DStep, there’s a new flag, --reduce-aliases. This flag allows the reduce aliases feature to be enabled or disabled. By default it’s enabled, but can be disabled by invoking DStep with the following command: dstep --reduce-aliases=false. When this feature is disabled, it will translate the above example to the following D code:

import core.stdc.stdint;
extern __gshared int32_t a;

It will keep the name int32_t as the type of the variable declaration and add the import for the module that contains the declaration of int32_t.

--alias-enum-members Flag

In C, enum members are accessible directly in the global scope. Example:

enum Foo
{
    foo,
    bar
};

enum Foo a = foo;

In D however, enum members need to be qualified with the enum name. The correct translation of the above would be:

enum Foo
{
    foo = 0,
    bar = 1
}


Foo a = Foo.foo;

In this release of DStep a new flag has been added, --alias-enum-members, that enables the generation of aliases to enum members at module scope. This will allow keeping the translation more closely to the original C code:

enum Foo
{
    foo = 0,
    bar = 1
}

alias foo = Foo.foo;
alias bar = Foo.bar;

By default this feature is not enabled for the translated D code to more closely follow the D conventions.

--translate-macros Flag

In this release, DStep can now translate several kinds of C macros to their equivalent in D. This might not always be desirable because the translations are not bulletproof. Therefore, there’s a new flag, --translate-macros, which will enable or disable the translation of macros. By default translation of macros is enabled.

libclang Bindings

DStep uses Clang as a library to process the C code. There two main ways of using Clang as a library. One is to use the C++ APIs directly. This will give full access to what Clang can do. The problem with this API is that is not a stable API. It’s also a C++ API and when DStep was first implemented, the C++ integration in D was quite lacking. One of my requirements when implementing DStep was to implement it in D. Therefore, the natural choice was to use the C API, called “libclang”, which is also provided. This library is the main interface intended to be used by editors and IDEs that want to leverage Clang as a library. It’s also recommended to use libclang when accessing Clang from a language other than C++—D in my case.

Since libclang is a C library only exposing C header files, I needed bindings to be able to use from it within D. Up until now these bindings were hand made. Fortunately, these headers are the most forgiving I have ever seen when it comes to translating into D. Only a single header file was needed containing hardly any macros at all. It was quite a quick job of translating these headers by using search-and-replace.

With this release of DStep, since DStep has improved so much, the bindings are now self-hosted. That is, DStep has been used to generate the bindings. In addition to that, generation of the bindings has now been added to the test suite to make sure it doesn’t break.

Support for Windows

DStep was originally developed on macOS since that is my main development platform. Thanks to the Posix standard it was easy to port to Linux as well. The first release of DStep was available for both macOS and Linux. Back in 2011 when the development of DStep first started, Clang did not support Windows. There seems to have been some support for MinGW, but that was not a target supported by DMD. The LLVM and Clang team has made huge progress since then and in 2016 when DStep got picked as a GSoC project, support for Windows was available, targeting compatibility with the Visual Studio compiler.

In this release, thanks to Wojciech Szęszoł, DStep is now available on Windows. Due to Clang being compatible with Visual Studio, DStep needs to be built to use the same object format. When compiling with DMD, that means compiling for 64-bit (via the -m64 flag) or using the -m32mscoff flag when compiling for 32 bit. The Dub package automatically takes care of this.

Continuous Integration/Deployment

In the area of CI/CD quite a few things have happened. Originally the test suite of DStep was implemented using Cucumber and Ruby. These tests were a form of end-to-end test and failed to take advantage of the reasons to use Cucumber in the first place. These tests have now been replaced with a combination of unit tests and end-to-end tests, all implemented in D.

LDC

LDC has been added as a supported compiler. That means that DStep is compiled with LDC in addition to DMD as part of the CI pipelines. Every commit and every pull request is now tested with LDC as well.

Upgrade of Compilers

Both DMD and LDC have been upgraded to their latest versions. In addition, beta and nightly releases are being tested in the CI pipelines. A scheduled job has been added to the CI pipelines as well, which will run once every day to make sure new releases of the compilers won’t break DStep even if no changes have been made to DStep. This also means that only the latest versions of LDC and DMD are supported for building DStep.

Testing Windows using AppVeyor

Since DStep now supports Windows as an additional platform a new CI pipeline has been added in the form of AppVeyor. This is a CI service that provides Windows as a platform to run builds on. This build run compiles DStep both using DMD and LDC and it also builds both 32-bit and 64-bit versions.

Complex Floating-Point Types

Another feature that is new in this version of DStep is that complex floating-point types are now supported. There are three complex types that are supported:
float _Complex, double _Complex and long double _Complex.

float _Complex a;
double _Complex b;
long double _Complex c;

The above code snippet in C is translated to the following D code:

extern __gshared cfloat a;
extern __gshared cdouble b;
extern __gshared creal c;

New Alias Syntax

Typedefs in C header files are translated to alias declarations in the D code. Up until this release they used the old alias syntax: alias oldName newName. Since the previous release of DStep the D language has improved and gained new features. One of them is a new (now considered the standard) alias syntax: alias newName = oldName. It’s easier to follow which name is the alias and which is the original when it’s using a more familiar syntax similar to variable declarations. Here’s an example of how the C code is translated into D code with the new alias syntax:

typedef int foo;

The above C code is translated to the following D code:

alias foo = int;

Custom Global Attributes

By default DStep doesn’t add any attributes like @nogc or nothrow to the translated code. In this release of DStep, support for attributes has been added. Custom global attributes can be enabled with the --global-attribute flag. For a C header file with the following content:

int a;

And invoking DStep with the following command:

$ dstep foo.h --global-attribute @nogc --global-attribute nothrow

Will output the following D code:

@nogc:
nothrow:

extern __gshared int a;

Rename Enums

Unlike in D, enums in C don’t create a new scope for their members, even if a name is given to the enum. Example:

enum
{
    a,
    b
};

enum Foo
{
    c,
    d
};

int e = a;
int f = c;

In the above example it’s possible to access the enum members from both of the enums without qualifying the type. In D, this is not the case for named enums. They require the qualifying the enum member with the type name:

enum
{
    a,
    b
}

enum Foo
{
    c,
    d
}

int e = a; // ok since the first enum is anonymous
int f = Foo.c; // need to qualifying the enum member with the type name

To reduce the risk of symbol conflict it’s quite common for C libraries to prefix enum members with the name of the type:

enum Foo
{
    FooC,
    FooD
};

While the following is perfectly fine in D as well, it gets a bit redundant and verbose to have to specify Foo twice:

enum Foo
{
    FooC,
    FooD
}

int c = Foo.FooC;
int d = Foo.FooD;

For this reason DStep now supports a new flag, --rename-enum-members, which when enabled will try to remove any prefix of the enum member names. Given the
following C header file:

enum Foo
{
    FooA,
    FooB
};

And running DStep as follows:

$ dstep foo.h --rename-enum-members

It will produce the following D code:

enum Foo
{
    a = 0,
    b = 1
}

DStep identified the Foo prefix and removed it from the enum member names. It also converted the names to lowercase to better match the standard D naming
conventions.

By default this feature is not enabled to more closely match the original C code.

Normalize Modules

When the --package flag is specified DStep will add a module declaration to all D modules. By default, it will use the name of the input file as the name of the module. In the C world there’s no direct file-naming convention. Some libraries will use all lowercase letters, some will use snake case, some will use camel case, some will use Pascal case, and so on.

The standard D naming convention for modules (and therefore files) is to use only lowercase letters and underscores, i.e snake case. To help with following this convention DStep now supports the new flag --normalize-modules. When this flag is enabled (and the --package flag is used) DStep will try to convert the name of the input file to a name matching the D conventions.

Given a C header file named Foo.h and only invoking DStep with the --package flag:

$ dstep Foo.h --package bar

DStep will produce the following D code:

module bar.Foo;

When the --normalize-modules flag is used as well:

$ dstep Foo.h --package bar --normalize-modules

DStep will output the following D code:

module bar.foo;

Note that Foo has been converted to foo.

Another example with a file using the Pascal naming convention:

$ dstep NSString.h --package bar --normalize-modules

And the result:

module bar.ns_string;

By default this feature is not enabled to more closely match the original C code.

Bit Fields

Another new feature that has been added in this release of DStep is support for bit fields. The bit field is a built-in language construct in C, but there’s no language support for it in D. Fortunately, with the help of D’s metaprogramming capabilities, the bit field has been implemented as a library construct and is available in the standard library [3]. The library construct will generate getters and setters that perform the same bit manipulation that
the C compiler would have generated.

The following snippet in C:

struct Foo
{
    unsigned int a : 1;
    unsigned int b : 2;
    unsigned int c : 5;
};

Is translated to D:

struct Foo
{
    import std.bitmanip : bitfields;

    mixin(bitfields!(
        uint, "a", 1,
        uint, "b", 2,
        uint, "c", 5));
}

The D translation makes use of the bitfields template from the standard library. It’s automatically imported, directly inside the struct, to minimize the scope of where the symbol is available.


In his day job, Jacob Carlborg is a DevOps engineer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and DVM, the topic of another post on this blog. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.

Project Highlight: DPP

D was designed from the beginning to be ABI compatible with C. Translate the declarations from a C header file into a D module and you can link directly with the corresponding C library or object files. The same is true in the other direction as long as the functions in the D code are annotated with the appropriate linkage attribute. These days, it’s possible to bind with C++ and even Objective-C.

Binding with C is easy, but can sometimes be a bit tedious, particularly when done by hand. I can speak to this personally as I originally implemented the Derelict collection of bindings by hand and, though I slapped together some automation when I ported it all over to its successor project, BindBC, everything there is maintained by hand. Tools like dstep exist and can work well enough, though they come with limitations which require careful attention to and massaging of the output.

Tediousness is an enemy of productivity. That’s why several pages of discussion were generated from Átila Neves’s casual announcement a few weeks before DConf 2018 that it was now possible to #include C headers in D code.

dpp is a compiler wrapper that will parse a D source file with the .dpp extension and expand in place any #include directives it encounters, translating all of the C or C++ symbols to D, and then pass the result to a D compiler (DMD by default). Says Átila:

What motivated the project was a day at Cisco when I wanted to use D but ended up choosing C++ for the task at hand. Why? Because with C++ I could include the relevant header and be on my way, whereas with D (or any other language really) I’d have to somehow translate the header and all its transitive dependencies somehow. I tried dstep and it failed miserably. Then there’s the fact that the preprocessor is nearly always needed to properly use a C API. I wanted to remove one advantage C++ has over D, so I wrote dpp.

Here’s the example he presented in the blog post accompanying the initial announcement:

// stdlib.dpp
#include <stdio.h>
#include <stdlib.h>

void main() {
    printf("Hello world\n".ptr);

    enum numInts = 4;
    auto ints = cast(int*) malloc(int.sizeof * numInts);
    scope(exit) free(ints);

    foreach(int i; 0 .. numInts) {
        ints[i] = i;
        printf("ints[%d]: %d ".ptr, i, ints[i]);
    }

    printf("\n".ptr);
}

Three months later, dpp was successfully compiling the julia.h header allowing the Julia language to be embedded in a D program. The following month, it was enabled by default on run.dlang.io.

C support is fairly solid, though not perfect.

Although preprocessor macro support is one of dpp’s key features, some macros just can’t be translated because they expand to C/C++ code fragments. I can’t parse them because they’re not actual code yet (they only make sense in context), and I can’t guess what the macro parameters are. Strings? Ints? Quoted strings? How does one programmatically determine that #define FOO(S) (S) is meant to be a C cast? Did you know that in C macros can have the same name as functions and it all works? Me neither until I got a bug report!

Push the stdlib.dpp code block from above through run.dlang.io and read the output to see an example of translation difficulties.

The C++ story is more challenging. Átila recently wrote about one of the problems he faced. That one he managed to solve, but others remain.

dpp can’t translate C++ template specialisations on reference types because reference types don’t exist in D. I don’t know how to translate anything that depends on SFINAE because it also doesn’t exist in D.

For those not in the know, classes in D are reference types in the same way that Java classes are reference types, and function parameters annotated with ref accept arguments by reference, but when it comes to variable declarations, D has no equivalent for the C++ lvalue reference declarator, e.g. int& someRef = i;.

Despite the difficulties, Átila persists.

The holy grail is to be able to #include a C++ standard library header, but that’s so difficult that I’m currently concentrating on a much easier problem first: being able to successfully translate C++ headers from a much simpler library that happens to use standard library types (std::string, std::vector, std::map, the usual suspects). The idea there is to treat all types that dpp can’t currently handle as opaque binary blobs, focusing instead on the production library types and member functions. This sounds simple, but in practice I’ve run into issues with LLVM IR with ldc, ABI issues with dmd, mangling issues, and the most fun of all: how to create instances of C++ stdlib types in D to pass back into C++? If a function takes a reference to std::string, how do I give it one? I did find a hacky way to pass D slices to C++ functions though, so that was cool!

On the plus side, he’s found some of D’s features particularly helpful in implementing dpp, though he did say that “this is harder for me to recall since at this point I mostly take D’s advantages for granted.” The first thing that came to mind was a combination of built-in unit tests and token strings:

unittest {
    shouldCompile(
        C(q{ struct Foo { int i; } }),
        D(q{ static assert(is(typeof(Foo.i) == int)); })
    );
}

It’s almost self-explanatory: the first parameter to shouldCompile is C code (a header), and the second D code to be compiled after translating the C header. D’s token strings allow the editor to highlight the code inside, and the fact that C syntax is so similar to D lets me use them on C code as well!

He also found help from D’s contracts and the garbage collector.

libclang is a C library and as such has hardly any abstractions or invariant enforcement. All nodes in the AST are represented by a libclang “cursor”, which can have several “kinds”. D’s contracts allowed me to document and enforce at runtime which kind(s) of cursors a function expects, preventing bugs. Also, libclang in certain places requires the client code to manually manage memory. D’s GC makes for a wrapper API in which that is never a concern.

During development, he exposed some bugs in the DMD frontend.

I tried using sumtype in a separate branch of dpp to first convert libclang AST entities into types that are actually enforced at compile time instead of run time. Unfortunately that caused me to have to switch to compiling all code at once since sumtype behaves differently in separate compilation, triggering previously unseen frontend bugs.

For unit testing, he uses unit-threaded, a library he created to augment D’s built-in unit testing feature with advanced functionality. To achieve this, the library makes use of D’s compile-time reflection features. But dpp has a lot of unit tests.

Given the number of tests I wrote for dpp, compiling takes a very long time. This is exacerbated by -unittest, which is a known issue. Not using unit-threaded’s runner would speed up compilation, but then I’d lose all the features. It’d be better if the compile-time reflection required were made faster.

Perhaps he’ll see some joy there when Stefan Koch’s ongoing work on NewCTFE is completed.

Átila will be speaking about dpp on May 9 as part of his presentation at DConf 2019 in London. The conference runs from May 8–11, so as I write there’s still plenty of time to register. For those who can’t make it, you can watch the livestream (a link to which will be provided in the D forums each day of the conference) or see the videos of all the talks on the D Language Foundation’s YouTube channel after DConf is complete.

Memoization in the D Programming Language

The D programming language provides advanced facilities for structuring programs logically, almost like Python or Ruby, but with high performance and the higher reliability of static typing and contract programming.

In this article, I will describe how to use D templates and mixins for memoization, that is, to automatically remember a function (or property) result.

std.functional.memoize from the standard library

The first way is built straight into Phobos, the D standard library, and is very easy to use:

import std.functional;
import std.stdio;

float doCalculations() {
    writeln("Doing calculations.");
    return -1.7; // the value of the calculations
}

// Apply the template “memoize” to the function doCalculations():
alias doCalculationsOnce = memoize!doCalculations;

Now the alias doCalculationsOnce() does the same as doCalculations(), but the calculations are done only once (if we call doCalculationsOnce() then “Doing calculations.” would be printed only once). This is useful for long slow calculations and in some other situations (like to create only a single window on the screen). That is memoization.

It’s even possible to memoize a function with arguments:

float doCalculations2(float arg1, float arg2) {
    writeln("Doing calculations.");
    return arg1 + arg2; // the value of the calculations
}

// Apply the template “memoize” to the function  doCalculations2():
alias doCalculationsOnce2 = memoize!doCalculations2;

void main(string[] args)
{
    writeln(doCalculationsOnce2(1.0, 1.2));
    writeln(doCalculationsOnce2(1.0, 1.3));
    writeln(doCalculationsOnce2(1.0, 1.3));
}

This outputs:

Doing calculations.
2.2
Doing calculations.
2.3
2.3

You see that the calculations are not repeated again when the argument values are the same.

Memoizing struct or class properties

I’ve found another way to memoize in D. It involves caching a property of a struct or class. Properties are zero-argument (for reading) or one-argument (for writing) member functions (or sometimes two-argument non-member functions) and differ only in syntax. I deemed it more elegant to cache a property of a struct or class rather than to cache a member function’s return value. My code can easily be changed to memoize a member function instead of a property, but you can always convert zero-argument member functions into a property, so why bother?

Here is the source (you can also install it from https://github.com/vporton/memoize-dlang for use in your programs):

module memoize;

mixin template CachedProperty(string name, string baseName = '_' ~ name) {
    mixin("private typeof(" ~ baseName ~ ") " ~ name ~ "Cache;");
    mixin("private bool " ~ name ~ "IsCached = false;");
    mixin("@property typeof(" ~ baseName ~ ") " ~ name ~ "() {\n" ~
          "if (" ~ name ~ "IsCached" ~ ") return " ~ name ~ "Cache;\n" ~
          name ~ "IsCached = true;\n" ~
          "return " ~ name ~ "Cache = " ~ baseName ~ ";\n" ~
          '}');
}

It is used like this (with either structs or classes at your choice):

import memoize;

struct S {
    @property float _x() { return 1.5; }
    mixin CachedProperty!"x";
}

Then just:

S s;
assert(s.x == 1.5);

Or you can specify the explicit name of the cached property:

import memoize;

struct S {
    @property float _x() { return 1.5; }
    mixin CachedProperty!("x", "_x");
}

CachedProperty is a template mixin. Template mixins insert the declarations of the template body directly into the current context. In this case, the template body is composed of string mixins. As you can guess, a string mixin generates code at compile time from strings. So,

struct S {
    @property float _x() { return 1.5; }
    mixin CachedProperty!"x";
}

turns into

struct S {
    @property float _x() { return 1.5; }
    private typeof(_x) xCache;
    private bool xIsCached = false;
    @property typeof(_x) x() {
        if (xIsCached) return xCache;
        xIsCached = true;
        return xCache = _x;
    }
}

That is, it sets xCache to _x unless xIsCached and also sets xIsCached to true when retrieving x.

The idea originates from the following Python code:

class cached_property(object):
    """A version of @property which caches the value.  On access, it calls the
    underlying function and sets the value in `__dict__` so future accesses
    will not re-call the property.
    """
    def __init__(self, f):
        self._fname = f.__name__
        self._f = f

    def __get__(self, obj, owner):
        assert obj is not None, 'call {} on an instance'.format(self._fname)
        ret = obj.__dict__[self._fname] = self._f(obj)
        return ret

My way of memoization does not (yet) involve caching return values of functions with arguments.


Victor Porton is an open source developer, a math researcher, and a Christian writer. He earns his living as a programmer.

Using const to Enforce Design Decisions

The saying goes that the best code is no code. As soon as a project starts to grow, technical debt is introduced. When a team is forced to adapt to a new company guideline inconsistent with their previous vision, the debt results from a business decision. This could be tackled at the company level. Sometimes technical debt can arise simply due to the passage of time, when new versions of dependencies or of the compiler introduce breaking changes. You can try to tackle this by letting your local physicist stop the flow of time. More often however, technical debt is caused when issues are fixed by a quick hack, due to time pressure or a lack of knowledge of the code base. Design strategies that were carefully crafted are temporarily neglected. This blog post will focus on using the const modifier. It is one of the convenient tools D offers to minimize the increase of technical debt and enforce design decisions.

To keep a code base consistent, often design guidelines, either explicit or implicit, are put in place. Every developer on the team is expected to adhere to the guidelines as a gentleman’s agreement. This effectively results in a policy that is only enforced if both the programmer and the reviewer have had enough coffee. Simple changes, like adding a call to an object method, might seem innocent, but can reduce the consistency and the traceability of errors. To detect this in a code review requires in-depth knowledge of the method’s implementation.

An example from a real world project I’ve worked on is generating financial transactions for a read-only view, the display function in the following code fragment. Nothing seemed wrong with it, until I realized that those transactions were persisted and eventually used for actual payments without being calculated again, as seen in the process method. Potentially different payments occurred depending on whether the user decided to glance at the summary, thereby triggering the generation with new currency exchange rates, or just blindly clicked the OK button. That’s not what an innocent bystander like myself expects and has caused many frowns already.

public class Order
{
    private Transaction[] _transactions;

    public Transaction[] getTransactions()
    {
        _transactions = calculate();
        return _transactions;
    }

    public void process()
    {
        foreach(t; _transactions){
            // ...
        }
    }
}

void display(Order order)
{
    auto t = order.getTransactions();
    show(t);
}

The internet has taught me that if it is possible, it will one day happen. Therefore, we should make an attempt to make the undesired impossible.

A constant feature

By default, variables and object instances in D are mutable, just like in many other programming languages. If we want to prevent objects from ever changing, we can mark them immutable, i.e. immutable MyClass obj = new MyClass();. immutable means that we can modify neither the object reference (head constant) nor the object’s properties (tail constant). The first case corresponds to final in Java and readonly in C#, both of which signify head constant only. D’s implementation means that nobody can ever modify an object marked immutable. What if an object needs to be mutable in one place, but immutable in another? That’s where D’s const pops in.

Unlike immutable, whose contract states that an object cannot be mutated through any reference, const allows an object to be modified through another, non-const reference. This means it’s illegal to initialize an immutable reference with a mutable one, but a const reference can be initialized with a mutable, const, or immutable reference. In a function parameter list, const is preferred over immutable because it can accept arguments with either qualifier or none. Schematically, it can be visualized as in the following figure.

const relationships

A constant detour

D’s const differs from C++ const in a significant way: it’s transitive (see the const(FAQ) for more details). In other words, it’s not possible to declare any object in D as head constant. This isn’t obvious from examples with classes, since classes in D are reference types, but is more apparent with pointers and arrays. Consider these C++ declarations:

const int *cp0;         // mutable pointer to const data
int const *cp1;         // alternative syntax for the same

Variable declarations in C++ are best read from right to left. Although const int is likely the more common syntax, int const matches the way the declaration should be read: cp0 is a mutable pointer to a const int. In D, the equivalent of cp0 and cp1 is:

const(int)* dp0;

The next example shows what head constant looks like in C++.

int *const cp2;         // const pointer to mutable data

We can read the declaration of cp2 as: cp2 is a const pointer to a mutable int. There is no equivalent in D. It’s possible in C++ to have any number and combination of const and mutable pointers to const or mutable data. But in D, if const is applied to the outermost pointer, then it applies to all the inner pointers and the data as well. Or, as they say, it’s turtles all the way down.

The equivalent in C++ looks like this:

int const *const cp3;         // const pointer to const data

This declaration says cp3 is a const pointer to const data, and is possible in D like so:

const(int*) dp1;
const int* dp2;     // same as dp1

All of the above syntax holds true for D arrays:

const(int)[] a1;    // mutable reference to const data
const(int[]) a2;    // const reference to const data
const int[] a3;     // same as a2

const in the examples can be replaced with immutable, with the caveat that initializers must match the declaration, e.g. immutable(int)* can only be initialized with immutable(int)*.

Finally, note that classes in D are reference types; the reference is baked in, so applying const or immutable to a class reference is always equivalent to const(p*) and there is no equivalent to const(p)* for classes. Structs in D, on the other hand, are value types, so pointers to structs can be declared like the int pointers in the examples above.

A constant example

For the sake of argument, we assume that updating the currency exchange rates is a function that, by definition, needs to mutate the order. After updating the order, we want to show the updated prices to our user. Conceptually, the display function should not modify the order. We can prevent mutation by adding the const modifier to our function parameter. The implicit rule is now made explicit: the function takes an Order as input, and treats the object as a constant. We no longer have a gentleman’s agreement, but a formal contract. Changing the contract will, hopefully, require thorough negotiation with your peer reviewer.

void updateExchangeRates(Order order);
void display(const Order order);

void updateAndDisplay()
{
    Order order = //…
    updateExchangeRates(order);
    display(order); // Implicitly converted to const.
}

As with all contracts, defining it is the easiest part. The hard part is enforcing it. The D compiler is our friend in this problem. If we try to compile the code, we will get a compiler error.

void display(const Order order)
{
    // ERROR: cannot call mutable method
    auto t = order.getTransactions();
    show(t);
}

We never explicitly stated that getTransactions doesn’t modify the object. As the method is virtual by default, the compiler cannot derive the behavior either way. Without that knowledge, the compiler is required to assume that the method might modify the object. In other words, in the D justice system one is guilty until proven innocent. Let’s prove our innocence by marking the method itself const, telling the compiler that we do not intend to modify our data.

public class Order
{
    private Transaction[] _transactions;

    public Transaction[] getTransactions() const
    {
        _transactions = calculate(); // ERROR: cannot mutate field
        return _transactions;
    }
}

void display(const Order order)
{
    auto t = order.getTransactions(); // Now compiles :)
    show(t);
}

By marking the method const, the original compile error has moved away. The promise that we do not modify any object state is part of the method signature. The compiler is now satisfied with the method call in the display function, but finds another problem. Our getter, which we stated should not modify data, actually does modify it. We found our code smell by formalizing our guidelines and letting the compiler figure out the rest.

It seems promising enough to try it on a real project.

A constant application

I had a pet project lying around and decided to put the effort into enforcing the constraint. This is what inspired me to write this post. The project is a four-player mahjong game. The relevant part, in abstraction, is highlighted in the image.

Mahjong abstraction

The main engine behind the game is the white box in the center. A player or AI is sent a message with a const view of the game data for display purposes and to determine their next move. A message is sent back to the engine, which then determines the mutation on the internally mutable game data. The most obvious win is that I cannot accidentally modify the game data when drawing my UI. Which, of course, appeared to be the case before I refactored in the const-ness of the game data.

Upon closer inspection, coming from the UI there is only one way to manipulate the state of the game. The UI sends a message to the engine and remains oblivious of the state changes that need to be applied. This also encourages layered development and improves testability of the code. So far, so good. But during the refactoring, a problem arose. Recall that marking an object with const means that only member functions that promise not to modify the object, those marked with const themselves, can be called. Some of these could be trivially fixed by applying const or, at worst, inout (a sort of wildcard). However, the more persistent issues, like in the Order example, required me to go back to the drawing board and rethink my domain. In the end, being forced to think about mutability versus immutability improved my understanding of my own code base.

A constant verdict

Is const all good? It’s not a universal answer and certainly has downsides. The most notable one is that this kills lazy initialization where a property is computed only when first requested and then the result is cached. Sometimes, like in the earlier example, this is a code smell, but there are legit use cases. In my game, I have a class that composes the dashboard per player. Updating it is expensive and rarely required. The screen, however, gets rendered sixty times a second. It makes sense to cache the dashboards and only update them when the player objects change. I could split the method in two, but then my abstraction would leak its optimization. I settled for not using const here, as it was a module-private class and didn’t have a large impact on my codebase.

A complaint that is sometimes heard regarding const is that it does not work well with one of D’s main selling points, ranges. A range is D’s implementation of a lazy iterator, usable in foreach loops and heavily used in std.algorithm. The functions in std.algorithm can handle ranges with const elements perfectly fine. However, iterating a range changes the internal values and ultimately consumes the range. Therefore, when a range itself is const, it cannot be iterated over. I think this makes sense by design, but I can imagine that this behavior can be surprising in edge cases. I haven’t encountered this as I generate new ranges on every query, both on mutable and const objects.

A guideline for when to use const would be to separate the queries from the command, a.k.a. ye olde Command-Query Separation (CQS). All queries should, in principle, be const. To perform a mutation, even on nested objects, one should call a method on the object. I’d double down on this and state that member functions should be commands, with logic that can be overridden, and therefore never be marked constant. Queries basically serve as a means to peek at encapsulated data and don’t need to be overridden. This should be a final, non-virtual, function that simply exposes a read-only view on the inner field. For example, using D’s module-private modifier on the field in conjunction with an acquainted function in the same module, we can put the logic inside the class definition and the queries outside.

// in order.d
public class Order
{
    private Transaction[] _transactions; // Accessible in order.d

    public void process(); // Virtual and not restricted
}

public const(Transaction)[] getTransactions(const Order order)
{
    // Function, not virtual, operates on a read-only view of Order
    return order._transactions;
}

// in view.d
void display(const Order order)
{
    auto t = order.getTransactions();
    show(t);
}

We should take care, however, not to overapply the modifier. The question that we need to answer is not “Do I modify this object here?”, but rather, “Does it make sense that the object is modified here?” Wrongly assuming constant objects will result in trouble when you finally need to change the instance due to a new feature. For example, in my game a central Game class contains the players’ hands, but doesn’t explicitly modify them. However, given the structure of my code, it does not make sense to make the player objects constant, as free functions in the engine do use mutable player instances of the game object.

Reflecting on my design, even when writing this blog post, gave me valuable insights. Taking the effort to properly use the tool called const has paid off for me. It improved the structure of my code and improved my understanding of the ramblings I write. It is like any other tool not a silver bullet. It serves to formalize our gentleman’s agreement and is therefore just as fragile as one.


Marco graduated in physics, were he used Fortran and Matlab. He used Programming in D to learn application programming. After 3 years of being a C# developer, he is now trainer-coach of mainly Java and C# developers at Sogyo. Marco uses D for programming experiments and side projects including his darling mahjong game.

Containerize Your D Server Application

A container consists of an application packed together with all of its required dependencies. The container is run as an isolated process on Linux or Windows. The Docker tool has made the handling of containers very popular and is now the de-facto standard for deploying containers to a cloud environment. In this blog post I discuss how to create a simple vibe.d application and ship it as a Docker container.

Setting up the build environment

I use Ubuntu 18.10 as my development environment for this application. Additionally, I installed the packages ldc (the LLVM-based D compiler), dub (the D package manager and build tool), gcc, zlib1g-dev and libssl-dev (all required for compiling my vibe.d application). To build and run my container I use Docker CE. I installed it following the instructions at https://docs.docker.com/install/linux/docker-ce/ubuntu/. As the last step, I added my user to the docker group (sudo adduser kai docker).

A sample REST application

My vibe.d application is a very simple REST server. You can call the /hello endpoint (with an optional name parameter) and you get back a friendly message in JSON format. The second endpoint, /healthz, is intended as a health check and simply returns the string "OK". You can clone my source repository at https://github.com/redstar/vibed-docker/ to get the source code. Here is the application:

import vibe.d;
import std.conv : to;
import std.process : environment;
import std.typecons : Nullable;

shared static this()
{
    logInfo("Environment dump");
    auto env = environment.toAA;
    foreach(k, v; env)
        logInfo("%s = %s", k, v);

    auto host = environment.get("HELLO_HOST", "0.0.0.0");
    auto port = to!ushort(environment.get("HELLO_PORT", "17890"));

    auto router = new URLRouter;
    router.registerRestInterface(new HelloImpl());

    auto settings = new HTTPServerSettings;
    settings.port = port;
    settings.bindAddresses = [host];

    listenHTTP(settings, router);

    logInfo("Please open http://%s:%d/hello in your browser.", host, port);
}

interface Hello
{
    @method(HTTPMethod.GET)
    @path("hello")
    @queryParam("name", "name")
    Msg hello(Nullable!string name);

    @method(HTTPMethod.GET)
    @path("healthz")
    string healthz();
}

class HelloImpl : Hello
{
    Msg hello(Nullable!string name) @safe
    {
        logInfo("hello called");
        return Msg(format("Hello %s", name.isNull ? "visitor" : name));
    }

    string healthz() @safe
    {
        logInfo("healthz called");
        return "OK";
    }
}

struct Msg
{
    string msg;
}

And this is the dub.sdl file to compile the application:

name "hellorest"
description "A minimal REST server."
authors "Kai Nacke"
copyright "Copyright © 2018, Kai Nacke"
license "BSD 2-clause"
dependency "vibe-d" version="~>0.8.4"
dependency "vibe-d:tls" version="*"
subConfiguration "vibe-d:tls" "openssl-1.1"
versions "VibeDefaultMain"

Compile and run the application with dub. Then open the URL http://127.0.0.1:17890/hello to check that you get a JSON result.

A cloud-native application should follow the twelve-factor app methodology. You can read about the twelve-factor app at https://12factor.net/. In this post I only highlight two of the factors: III. Config and XI. Logs.

Ideally, you build an application only once and then deploy it into different environments, e.g. first to your quality testing environment and then to production. When you ship your application as a container, it comes with all of its required dependencies. This solves the problem that different versions of a library might be installed in different environments, possibly causing hard-to-find errors. You still need to find a solution for how to deal with different configuration settings. Port numbers, passwords or the location of databases are all configuration settings which typically differ from environment to environment. The factor III. Config recommends that the configuration be stored in environment variables. This has the advantage that you can change the configuration without touching a single file. My application follows this recommendation. It uses the environment variable HELLO_HOST for the configuration of the host IP and the variable HELLO_PORT for the port number. For easy testing, the application uses the default values 0.0.0.0 and 17890 in case the variables do not exist. (To be sure that every configuration is complete, it would be safer to stop the application with an error message in case an environment variable is not found.)

The application writes log entries on startup and when a url endpoint is called. The log is written to stdout. This is exactly the point of factor XI. Logs: an application should not bother to handle logs at all. Instead, it should treat logs as an event stream and write everything to stdout. The cloud environment is then responsible for collecting, storing and analyzing the logs.

Building the container

A Docker container is specified with a Dockerfile. Here is the Dockerfile for the application:

FROM ubuntu:cosmic

RUN \
  apt-get update && \
  apt-get install -y libphobos2-ldc-shared81 zlib1g libssl1.1 && \
  rm -rf /var/lib/apt/lists/*

COPY hellorest /

USER nobody

ENTRYPOINT ["/hellorest"]

A Docker container is a stack of read-only layers. With the first line, FROM ubuntu:cosmic, I specify that I want to use this specific Ubuntu version as the base layer of my container. During the first build, this layer is downloaded from Docker Hub. Every other line in the Dockerfile creates a new layer. The RUN line is executed at build time. I use it to install dependent libraries which are needed for the application. The COPY command copies the executable into the root directory inside the container. And last, CMD specifies the command which the container will run.

Run the Docker command

docker build -t vibed-docker/hello:v1 .

to build the Docker container. After the container is built successfully, you can run it with

docker run -p 17890:17890 vibed-docker/hello:v1

Now open again the URL http://127.0.0.1:17890/hello. You should get the same result as before. Congratulations! Your vibe.d application is now running in a container!

Using a multi-stage build for the container

The binary hellorest was compiled outside the container. This creates difficulties as soon as dependencies in your development environment change. It is easy to integrate compiliation into the Dockerfile, but this creates another issue. The requirements for compiling and running the application are different, e.g. the compiler is not required to run the application.

The solution is to use a multi-stage build. In the first stage, the application is build. The second stage contains only the runtime dependencies and application binary built in the first stage. This is possible because Docker allows the copying of files between stages. Here is the multi-stage Dockerfile:

FROM ubuntu:cosmic AS build

RUN \
  apt-get update && \
  apt-get install -y ldc gcc dub zlib1g-dev libssl-dev && \
  rm -rf /var/lib/apt/lists/*

COPY . /tmp

WORKDIR /tmp

RUN dub -v build

FROM ubuntu:cosmic

RUN \
  apt-get update && \
  apt-get install -y libphobos2-ldc-shared81 zlib1g libssl1.1 && \
  rm -rf /var/lib/apt/lists/*

COPY --from=build /tmp/hellorest /

USER nobody

ENTRYPOINT ["/hellorest"]

In my repository I called this file Dockerfile.multi. Therefore, you have to specify the file on the command line:

docker build -f Dockerfile.multi -t vibed-docker/hello:v1 .

Building the container now requires much more time because a clean build of the application is included. The advantage is that your build environment is now independent of your host environment.

Where to go from here?

Using containers is fun. But the fun diminishes as soon as the containers get larger. Using Ubuntu as the base image is comfortable but not the best solution. To reduce the size of your container you may want to try Alpine Linux as the base image, or use no base image as all.

If your application is split over several containers then you can use Docker Compose to manage your containers. For real container orchestration in the cloud you will want to learn about Kubernetes.


A long-time contributor to the D community, Kai Nacke is the author of ‘D Web Development‘ and a maintainer of LDC, the LLVM D Compiler.

Writing a D Wrapper for a C Library

In porting to D a program I created for a research project, I wrote a D wrapper of a C library in an object-oriented manner. I want to share my experience with other programmers. This article provides some D tips and tricks for writers of D wrappers around C libraries.

I initially started my research project using the Ada 2012 programming language (see my article “Experiences on Writing Ada Bindings for a C Library” in Ada User Journal, Volume 39, Number 1, March 2018). Due to a number of bugs that I was unable to overcome, I started looking for another programming language. After some unsatisfying experiments with Java and Python, I settled on the D programming language.

The C Library

We have a C library, written in an object-oriented style (C structure pointers serve as objects, and C functions taking such structure pointers serve as methods). Fortunately for us, there is no inheritance in that C library.

The particular libraries we will deal with are the Redland RDF Libraries, a set of libraries which parse Resource Description Framework (RDF) files or other RDF resources, manages them, enables RDF queries, etc. Don’t worry if you don’t know what RDF is, it is not really relevant for this article.

The first stage of this project was to write a D wrapper over librdf. I modeled it on the Ada wrapper I had already written. One advantage I found in D over Ada is that template instantiation is easier—there’s no need in D to instantiate every single template invocation with a separate declaration. I expect this to substantially simplify the code of XML Boiler, my program which uses this library.

I wrote both raw bindings and a wrapper. The bindings translate the C declarations directly into D, and the wrapper is a new API which is a full-fledged D interface. For example, it uses D types with constructors and destructors to represent objects. It also uses some other D features which are not available in C. This is a work in progress and your comments are welcome.

The source code of my library (forked from Dave Beckett’s original multi-language bindings of his libraries) is available at GitHub (currently only in the dlang branch). Initially, I tried some automatic parsers of C headers which generate D code. I found these unsatisfactory, so I wrote the necessary bindings myself.

Package structure

I put my entire API into the rdf.* package hierarchy. I also have the rdf.auxiliary package and its subpackages for things used by or with my bindings. I will discuss some particular rdf.auxiliary.* packages below.

My mixins

In Ada I used tagged types, which are a rough equivalent of D classes, and derived _With_Finalization types from _Without_Finalization types (see below). However, tagged types increase variable sizes and execution time.

In D I use structs instead of classes, mainly for efficiency reasons. D structs do not support inheritance, and therefore have no virtual method table (vtable), but do provide constructors and destructors, making classes unnecessary for my use case (however, see below). To simulate inheritance, I use template mixins (defined in the rdf.auxiliary.handled_record module) and the alias this construct.

As I’ve said above, C objects are pointers to structures. All C pointers to structures have the same format and alignment (ISO/IEC 9899:2011 section 6.2.5 paragraph 28). This allows the representation of any pointer to a C structure as a pointer to an opaque struct (in the below example, URIHandle is an opaque struct declared as struct URIHandle;).

Using the mixins shown below, we can declare the public structs of our API this way (you should look into the actual source for real examples):

struct URIWithoutFinalize {
    mixin WithoutFinalize!(URIHandle,
                           URIWithoutFinalize,
                           URI,
                           raptor_uri_copy);
    // …
}
struct URI {
    mixin WithFinalize!(URIHandle,
                        URIWithoutFinalize,
                        URI,
                        raptor_free_uri);
}

The difference between the WithoutFinalize and WithFinalize mixins is explained below.

About finalization and related stuff

The main challenge in writing object-oriented bindings for a C library is finalization.

In the C library in consideration (as well as in many other C libraries), every object is represented as a pointer to a dynamically allocated C structure. The corresponding D object can be a struct holding the pointer (aka handle), but oftentimes a C function returns a so-called “shared handle”—a pointer to a C struct which we should not free because it is a part of a larger C object and shall be freed by the C library only when that larger C object goes away.

As such, I first define both (for example) URIWithoutFinalize and URI. Only URI has a destructor. For URIWithoutFinalize, a shared handle is not finalized. As D does not support inheritance for structs, I do it with template mixins instead. Below is a partial listing. See the above URI example on how to use them:

mixin template WithoutFinalize(alias Dummy,
                               alias _WithoutFinalize,
                               alias _WithFinalize,
                               alias copier = null)
{
    private Dummy* ptr;
    private this(Dummy* ptr) {
        this.ptr = ptr;
    }
    @property Dummy* handle() const {
        return cast(Dummy*)ptr;
    }
    static _WithoutFinalize fromHandle(const Dummy* ptr) {
        return _WithoutFinalize(cast(Dummy*)ptr);
    }
    static if(isCallable!copier) {
        _WithFinalize dup() {
            return _WithFinalize(copier(ptr));
        }
    }
    // ...
}


mixin template WithFinalize(alias Dummy,
                            alias _WithoutFinalize,
                            alias _WithFinalize,
                            alias destructor,
                            alias constructor = null)
{
    private Dummy* ptr;
    @disable this();
    @disable this(this);
    // Use fromHandle() instead
    private this(Dummy* ptr) {
        this.ptr = ptr;
    }
    ~this() {
        destructor(ptr);
    }
    /*private*/ @property _WithoutFinalize base() { // private does not work in v2.081.2
        return _WithoutFinalize(ptr);
    }
    alias base this;
    @property Dummy* handle() const {
        return cast(Dummy*)ptr;
    }
    static _WithFinalize fromHandle(const Dummy* ptr) {
        return _WithFinalize(cast(Dummy*)ptr);
    }
    // ...
}

I’ve used template alias parameters here, which allow a template to be parameterized with more than just types. The Dummy argument is the type of the handle instance (usually an opaque struct). The destructor and copier arguments are self-explanatory. For the usage of the constructor argument, see the real source (here it is omitted).

The _WithoutFinalize and _WithFinalize template arguments should specify the structs we define, allowing them to reference each other. Note that the alias this construct makes _WithoutFinalize essentially a base of _WithFinalize, allowing us to use all methods and properties of _WithoutFinalize in _WithFinalize.

Also note that instances of the _WithoutFinalize type may become invalid, i.e. it may contain dangling access values. It seems that there is no easy way to deal with this problem because of the way the C library works. We may not know when an object is destroyed by the C library. Or we may know but be unable to appropriately “explain” it to the D compiler. Just be careful when using this library not to use objects which are already destroyed.

Dealing with callbacks

To deal with C callbacks (particularly when accepting a void* argument for additional data) in an object-oriented way, we need a way to convert between C void pointers and D class objects (we pass D objects as C “user data” pointers). D structs are enough (and are very efficient) to represent C objects like librdf library objects, but for conveniently working with callbacks, classes are more useful because they provide good callback machinery in the form of virtual functions.

First, the D object, which is passed as a callback parameter to C, should not unexpectedly be moved in memory by the D garbage collector. So I make them descendants of this class:

class UnmovableObject {
    this() {
        GC.setAttr(cast(void*)this, GC.BlkAttr.NO_MOVE);
    }
}

Moreover, I add the property context() to pass it as a void* pointer to C functions which register callbacks:

abstract class UserObject : UnmovableObject {
    final @property void* context() const { return cast(void*)this; }
}

When we create a callback we need to pass a D object as a C pointer and an extern(C) function defined by us as the callback. The callback receives the pointer previously passed by us and in the callback code we should (if we want to stay object-oriented) convert this pointer into a D object pointer.

What we need is a bijective (“back and forth”) mapping between D pointers and C void* pointers. This is trivial in D: just use the cast() operator.

How to do this in practice? The best way to explain is with an example. We will consider how to create an I/O stream class which uses the C library callbacks to implement it. For example, when the user of our wrapper requests to write some information to a file, our class receives write message. To handle this message, our implementation calls our virtual function doWriteBytes(), which actually handles the user’s request.

private immutable DispatcherType Dispatch =
    { version_: 2,
      init: null,
      finish: null,
      write_byte : &raptor_iostream_write_byte_impl,
      write_bytes: &raptor_iostream_write_bytes_impl,
      write_end  : &raptor_iostream_write_end_impl,
      read_bytes : &raptor_iostream_read_bytes_impl,
      read_eof   : &raptor_iostream_read_eof_impl };


class UserIOStream : UserObject {
    IOStream record;
    this(RaptorWorldWithoutFinalize world) {
        IOStreamHandle* handle = raptor_new_iostream_from_handler(world.handle,
                                                                  context,
                                                                  &Dispatch);
        record = IOStream.fromNonnullHandle(handle);
    }
    void doWriteByte(char byte_) {
        if(doWriteBytes(&byte_, 1, 1) != 1)
            throw new IOStreamException();
    }
    abstract int doWriteBytes(char* data, size_t size, size_t count);
    abstract void doWriteEnd();
    abstract size_t doReadBytes(char* data, size_t size, size_t count);
    abstract bool doReadEof();
}

And for example:

int raptor_iostream_write_bytes_impl(void* context, const void* ptr, size_t size, size_t nmemb) {
    try {
        return (cast(UserIOStream)context).doWriteBytes(cast(char*)ptr, size, nmemb);
    }
    catch(Exception) {
        return -1;
    }
}

More little things

I “encode” C strings (which can be null) as a D template instance, Nullable!string. If the string is null, the holder is empty. However, it is often enough to transform an empty D string into a null C string (this can work only if we don’t differentiate between empty and null strings). See rdf.auxiliary.nullable_string for an actually useful code.

I would write a lot more advice on how to write D bindings for a C library, but you can just follow my source, which can serve as an example.

Static if

One thing which can be done in D but not in Ada is compile-time comparison via static if. This is a D construct (similar to but more advanced than C conditional preprocessor directives) which allows conditional compilation based on compile-time values. I use static if with my custom Version type to enable/disable features of my library depending on the available features of the version of the base C library in use. In the following example, rasqalVersionFeatures is a D constant defined in my rdf.config package, created by the GNU configure script from the config.d.in file.

static if(Version(rasqalVersionFeatures) >= Version("0.9.33")) {
    private extern extern(C)
    QueryResultsHandle* rasqal_new_query_results_from_string(RasqalWorldHandle* world,
                                                             QueryResultsType type,
                                                             URIHandle* base_uri,
                                                             const char* string,
                                                             size_t string_len);
    static create(RasqalWorldWithoutFinalize world,
                  QueryResultsType type,
                  URITypeWithoutFinalize baseURI,
                  string value)
    {
        return QueryResults.fromNonnullHandle(
            rasqal_new_query_results_from_string(world.handle,
                                                 type,
                                                 baseURI.handle,
                                                 value.ptr, value.length));
    }
}

Comparisons

Order comparisons between structs can be easily done with this mixin:

mixin template CompareHandles(alias equal, alias compare) {
    import std.traits;
    bool opEquals(const ref typeof(this) s) const {
        static if(isCallable!equal) {
          return equal(handle, s.handle) != 0;
        } else {
          return compare(handle, s.handle) == 0;
        }
    }
    int opCmp(const ref typeof(this) s) const {
      return compare(handle, s.handle);
    }
}

Sadly, this mixin has to be called in both the _WithoutFinalization and the _WithFinalization structs. I found no solution to write it once.

Conclusion

I’ve found that D is a great language for writing object-oriented wrappers around C libraries. There are some small annoyances like using class wrappers around structs for callbacks, but generally, D wraps up around C well.


Victor Porton is an open source developer, a math researcher, and a Christian writer. He earns his living as a programmer.

Lost in Translation: Encapsulation

I first learned programming in BASIC. Outgrew it, and switched to Fortran. Amusingly, my early Fortran code looked just like BASIC. My early C code looked like Fortran. My early C++ code looked like C. – Walter Bright, the creator of D

Programming in a language is not the same as thinking in that language. A natural side effect of experience with one programming language is that we view other languages through the prism of its features and idioms. Languages in the same family may look and feel similar, but there are guaranteed to be subtle differences that, when not accounted for, can lead to compiler errors, bugs, and missed opportunities. Even when good docs, books, and other materials are available, most misunderstandings are only going to be solved through trial-and-error.

D programmers come from a variety of programming backgrounds, C-family languages perhaps being the most common among them. Understanding the differences and how familiar features are tailored to D can open the door to more possibilities for organizing a code base, and designing and implementing an API. This article is the first of a few that will examine D features that can be overlooked or misunderstood by those experienced in similar languages.

We’re starting with a look at a particular feature that’s common among languages that support Object-Oriented Programming (OOP). There’s one aspect in particular of the D implementation that experienced programmers are sure they already fully understand and are often surprised to later learn they don’t.

Encapsulation

Most readers will already be familiar with the concept of encapsulation, but I want to make sure we’re on the same page. For the purpose of this article, I’m talking about encapsulation in the form of separating interface from implementation. Some people tend to think of it strictly as it relates to object-oriented programming, but it’s a concept that’s more broad than that. Consider this C code:

#include <stdio.h>
static size_t s_count;

void print_message(const char* msg) {
    puts(msg);
    s_count++;
}

size_t num_prints() { return s_count; }

In C, functions and global variables decorated with static become private to the translation unit (i.e. the source file along with any headers brought in via #include) in which they are declared. Non-static declarations are publicly accessible, usually provided in header files that lay out the public API for clients to use. Static functions and variables are used to hide implementation details from the public API.

Encapsulation in C is a minimal approach. C++ supports the same feature, but it also has anonymous namespaces that can encapsulate type definitions in addition to declarations. Like Java, C#, and other languages that support OOP, C++ also has access modifiers (alternatively known as access specifiers, protection attributes, visibility attributes) which can be applied to class and struct member declarations.

C++ supports the following three access modifiers, common among OOP languages:

  • public – accessible to the world
  • private – accessible only within the class
  • protected – accessible only within the class and its derived classes

An experienced Java programmer might raise a hand to say, “Um, excuse me. That’s not a complete definition of protected.” That’s because in Java, it looks like this:

  • protected – accessible within the class, its derived classes, and classes in the same package.

Every class in Java belongs to a package, so it makes sense to factor packages into the equation. Then there’s this:

  • package-private (not a keyword) – accessible within the class and classes in the same package.

This is the default access level in Java when no access modifier is specified. This combined with protected make packages a tool for encapsulation beyond classes in Java.

Similarly, C# has assemblies, which MSDN defines as “a collection of types and resources that forms a logical unit of functionality”. In C#, the meaning of protected is identical to that of C++, but the language has two additional forms of protection that relate to assemblies and that are analogous to Java’s protected and package-private.

  • internal – accessible within the class and classes in the same assembly.
  • protected internal – accessible within the class, its derived classes, and classes in the same assembly.

Examining encapsulation in other programming languages will continue to turn up similarities and differences. Common encapsulation idioms are generally adapted to language-specific features. The fundamental concept remains the same, but the scope and implementation vary. So it should come as no surprise that D also approaches encapsulation in its own, language-specific manner.

Modules

The foundation of D’s approach to encapsulation is the module. Consider this D version of the C snippet from above:

module mymod;

private size_t _count;

void printMessage(string msg) {
    import std.stdio : writeln;

    writeln(msg);
    _count++;
}

size_t numPrints() { return _count; }

In D, access modifiers can apply to module-scope declarations, not just class and struct members. _count is private, meaning it is not visible outside of the module. printMessage and numPrints have no access modifiers; they are public by default, making them visible and accessible outside of the module. Both functions could have been annotated with the keyword public.

Note that imports in module scope are private by default, meaning the symbols in the imported modules are not visible outside the module, and local imports, as in the example, are never visible outside of their parent scope.

Alternative syntaxes are supported, giving more flexibility to the layout of a module. For example, there’s C++ style:

module mymod;

// Everything below this is private until either another
// protection attribute or the end of file is encountered.
private:
    size_t _count;

// Turn public back on
public:
    void printMessage(string msg) {
        import std.stdio : writeln;

        writeln(msg);
        _count++;
    }

    size_t numPrints() { return _count; }

And this:

module mymod;

private {
    // Everything declared within these braces is private.
    size_t _count;
}

// The functions are still public by default
void printMessage(string msg) {
    import std.stdio : writeln;

    writeln(msg);
    _count++;
}

size_t numPrints() { return _count; }

Modules can belong to packages. A package is a way to group related modules together. In practice, the source files corresponding to each module should be grouped together in the same directory on disk. Then, in the source file, each directory becomes part of the module declaration:

// mypack/amodule.d
mypack.amodule;

// mypack/subpack/anothermodule.d
mypack.subpack.anothermodule;

Note that it’s possible to have package names that don’t correspond to directories and module names that don’t correspond to files, but it’s bad practice to do so. A deep dive into packages and modules will have to wait for a future post.

mymod does not belong to a package, as no packages were included in the module declaration. Inside printMessage, the function writeln is imported from the stdio module, which belongs to the std package. Packages have no special properties in D and primarily serve as namespaces, but they are a common part of the codescape.

In addition to public and private, the package access modifier can be applied to module-scope declarations to make them visible only within modules in the same package.

Consider the following example. There are three modules in three files (only one module per file is allowed), each belonging to the same root package.

// src/rootpack/subpack1/mod2.d
module rootpack.subpack1.mod2;
import std.stdio;

package void sayHello() {
    writeln("Hello!");
}

// src/rootpack/subpack1/mod1.d
module rootpack.subpack1.mod1;
import rootpack.subpack1.mod2;

class Speaker {
    this() { sayHello(); }
}

// src/rootpack/app.d
module rootpack.app;
import rootpack.subpack1.mod1;

void main() {
    auto speaker = new Speaker;
}

Compile this with the following command line:

cd src
dmd -i rootpack/app.d

The -i switch tells the compiler to automatically compile and link imported modules (excluding those in the standard library namespaces core and std). Without it, each module would have to be passed on the command line, else they wouldn’t be compiled and linked.

The class Speaker has access to sayHello because they belong to modules that are in the same package. Now imagine we do a refactor and we decide that it could be useful to have access to sayHello throughout the rootpack package. D provides the means to make that happen by allowing the package attribute to be parameterized with the fully-qualified name (FQN) of a package. So we can change the declaration of sayHello like so:

package(rootpack) void sayHello() {
    writeln("Hello!");
}

Now all modules in rootpack and all modules in packages that descend from rootpack will have access to sayHello. Don’t overlook that last part. A parameter to the package attribute is saying that a package and all of its descendants can access this symbol. It may sound overly broad, but it isn’t.

For one thing, only a package that is a direct ancestor of the module’s parent package can be used as a parameter. Consider a module rootpack.subpack.subsub.mymod. That name contains all of the packages that are legal parameters to the package attribute in mymod.d, namely rootpack, subpack, and subsub. So we can say the following about symbols declared in mymod:

  • package – visible only to modules in the parent package of mymod, i.e. the subsub package.
  • package(subsub) – visible to modules in the subsub package and modules in all packages descending from subsub.
  • package(subpack) – visible to modules in the subpack package and modules in all packages descending from subpack.
  • package(rootpack) – visible to modules in the rootpack package and modules in all packages descending from rootpack.

This feature makes packages another tool for encapsulation, allowing symbols to be hidden from the outside world but visible and accessible in specific subtrees of a package hierarchy. In practice, there are probably few cases where expanding access to a broad range of packages in an entire subtree is desirable.

It’s common to see parameterized package protection in situations where a package exposes a common public interface and hides implementations in one or more subpackages, such as a graphics package with subpackages containing implementations for DirectX, Metal, OpenGL, and Vulkan. Here, D’s access modifiers allow for three levels of encapsulation:

  • the graphics package as a whole
  • each subpackage containing the implementations
  • individual modules in each package

Notice that I didn’t include class or struct types as a fourth level. The next section explains why.

Classes and structs

Now we come to the motivation for this article. I can’t recall ever seeing anyone come to the D forums professing surprise about package protection, but the behavior of access modifiers in classes and structs is something that pops up now and then, largely because of expectations derived from experience in other languages.

Classes and structs use the same access modifiers as modules: public, package, package(some.pack), and private. The protected attribute can only be used in classes, as inheritance is not supported for structs (nor for modules, which aren’t even objects). public, package, and package(some.pack) behave exactly as they do at the module level. The thing that surprises some people is that private also behaves the same way.

import std.stdio;

class C {
    private int x;
}

void main() {
    C c = new C();
    c.x = 10;
    writeln(c.x);
}

Run this example online

Snippets like this are posted in the forums now and again by people exploring D, accompanying a question along the lines of, “Why does this compile?” (and sometimes, “I think I’ve found a bug!”). This is an example of where experience can cloud expectations. Everyone knows what private means, so it’s not something most people bother to look up in the language docs. However, those who do would find this:

Symbols with private visibility can only be accessed from within the same module.

private in D always means private to the module. The module is the lowest level of encapsulation. It’s easy to understand why some experience an initial resistance to this, that it breaks encapsulation, but the intent behind the design is to strengthen encapsulation. It’s inspired by the C++ friend feature.

Having implemented and maintained a C++ compiler for many years, Walter understood the need for a feature like friend, but felt that it wasn’t the best way to go about it.

Being able to declare a “friend” that is somewhere in some other file runs against notions of encapsulation.

An alternative is to take a Java-like approach of one class per module, but he felt that was too restrictive.

One may desire a set of closely interrelated classes that encapsulate a concept, and those should go into a module.

So the way to view a module in D is not just as a single source file, but as a unit of encapsulation. It can contain free functions, classes, and structs, all operating on the same data declared in module scope and class scope. The public interface is still protected from changes to the private implementation inside the module. Along those same lines, protected class members are accessible not just in derived classes, but also in the module.

Sometimes though, there really is a benefit to denying access to private members in a module. The bigger a module becomes, the more of a burden it is to maintain, especially when it’s being maintained by a team. Every place a private member of a class is accessed in a module means more places to update when a change is made, thereby increasing the maintenance burden. The language provides the means to alleviate the burden in the form of the special package module.

In some cases, we don’t want to require the user to import multiple modules individually. Splitting a large module into smaller ones is one of those cases. Consider the following file tree:

-- mypack
---- mod1.d
---- mod2.d

We have two modules in a package called mypack. Let’s say that mod1.d has grown extremely large and we’re starting to worry about maintaining it. For one, we want to ensure that private members aren’t manipulated outside of class declarations with hundreds or thousands of lines in between. We want to split the module into smaller ones, but at the same time we don’t want to break user code. Currently, users can get at the module’s symbols by importing it with import mypack.mod1. We want that to continue to work. Here’s how we do it:

-- mypack
---- mod1
------ package.d
------ split1.d
------ split2.d
---- mod2.d

We’ve split mod1.d into two new modules and put them in a package named mod1. We’ve also created a special package.d file, which looks like this:

module mypack.mod1;

public import mypack.mod1.split1,
              mypack.mod1.split2;

When the compiler sees package.d, it knows to treat it specially. Users will be able to continue using import mypack.mod1 without ever caring that it’s now split into two modules in a new package. The key is the module declaration at the top of package.d. It’s telling the compiler to treat this package as the module mod1. And instead of automatically importing all modules in the package, the requirement to list them as public imports in package.d allows more freedom in implementing the package. Sometimes, you might want to require the user to explicitly import a module even when a package.d is present.

Now users will continue seeing mod1 as a single module and can continue to import it as such. Meanwhile, encapsulation is now more stringently enforced internally. Because split1 and split2 are now separate modules, they can’t touch each other’s private parts. Any part of the API that needs to be shared by both modules can be annotated with package protection. Despite the internal transformation, the public interface remains unchanged, and encapsulation is maintained.

Wrapping up

The full list of access modifiers in D can be defined as such:

  • public – accessible everywhere.
  • package – accessible to modules in the same package.
  • package(some.pack) – accessible to modules in the package some.pack and to the modules in all of its descendant packages.
  • private – accessible only in the module.
  • protected (classes only) – accessible in the module and in derived classes.

Hopefully, this article has provided you with the perspective to think in D instead of your “native” language when thinking about encapsulation in D.

Thanks to Ali Çehreli, Joakim Noah, and Nicholas Wilson for reviewing and providing feedback on this article.

DMD 2.083.0 Released

Version 2.083.0 of DMD, the D reference compiler, is ready for download. The changelog lists 47 fixes and enhancements across all components of the release package. Notable among them are some C++ compatibility enhancements and some compile-time love.

C++ compatibility

D’s support for linking to C++ binaries has been evolving and improving with nearly every release. This time, the new things aren’t very dramatic, but still very welcome to those who work with both C++ and D in the same code base.

What’s my runtime?

For a while now, D has had predefined version identifiers for user code to detect the C runtime implementation at compile time. These are:

  • CRuntime_Bionic
  • CRuntime_DigitalMars
  • CRuntime_Glibc
  • CRuntime_Microsoft
  • CRuntime_Musl
  • CRuntime_UClibc

These aren’t reliable when linking against C++ code. Where the C runtime in use often depends on the system, the C++ runtime is compiler-specific. To remedy that, 2.083.0 introduces a few new predefined versions:

  • CppRuntime_Clang
  • CppRuntime_DigitalMars
  • CppRuntime_Gcc
  • CppRuntime_Microsoft
  • CppRuntime_Sun

Why so much conflict?

C++ support also gets a new syntax for declaring C++ linkage, which affects how a symbol is mangled. Consider a C++ library, MyLib, which uses the namespace mylib. The original syntax for binding a function in that namespace looks like this:

/*
 The original C++:
 namespace mylib { void cppFunc(); }
*/
// The D declaration
extern(C++, mylib) void cppFunc();

This declares that cppFunc has C++ linkage (the symbol is mangled in a manner specific to the C++ compiler) and that the symbol belongs to the C++ namespace mylib . On the D side, the function can be referred to either as cppFunc or as mylib.cppFunc.

In practice, this approach creates opportunities for conflict when a namespace is the same as a D keyword. It also has an impact on how one approaches the organization of a binding.  It’s natural to want to name the root package in D mylib, as it matches the library name and it is a D convention to name modules and packages using lowercase. In that case, extern(C++, mylib) declarations will not be compilable anywhere in the mylib package because the symbols conflict.

To alleviate the problem, an alternative syntax was proposed using strings to declare the namespaces in the linkage attribute, rather than identifiers:

/*
 The original C++:
 namespace foo { void cppFunc(); }
*/
// The D declaration
extern(C++, "foo") void cppFunc();

With this syntax, no mylib symbol is created on the D side; it is used solely for name mangling. No more conflicts with keywords, and D packages can be used to match the C++ namespaces on the D side. The old syntax isn’t going away anytime soon, though.

New compile-time things

This release provides two new built-in traits for more compile-time reflection options. Like all built-in traits, they are accessible via the __traits expression. There’s also a new pragma that lets you bring some linker options into the source code in a very specific circumstance.

Are you a zero?

isZeroInit can be used to determine if the default initializer of a given type is 0, or more specifically, it evaluates to true if all of the init value’s bits are zero. The example below uses compile-time asserts to verify the zeroness and nonzeroness of a few default init values, but I’ve saved a version that prints the results at runtime, for more immediate feedback, and can be compiled and run from the browser.

struct ImaZero {
    int x;
}

struct ImaNonZero {
    int x = 10;
}

// double.init == double.nan
static assert(!__traits(isZeroInit, double));

// int.init == 0
static assert(__traits(isZeroInit, int));

// ImaZero.init == ImaZero(0)
static assert(__traits(isZeroInit, ImaZero));

// ImaNonZeror.init == ImaZero(10)
static assert(!__traits(isZeroInit, ImaNonZero));

Computer, query target.

The second new trait is getTargetInfo, which allows compile-time queries about the target platform. The argument is a string that serves as a key, and the result is “an expression describing the requested target information”. Currently supported strings are “cppRuntimeLibrary”, “floatAbi”, and “ObjectFormat”.

The following prints all three at compile time.

pragma(msg, __traits(getTargetInfo, "cppRuntimeLibrary"));
pragma(msg, __traits(getTargetInfo, "floatAbi"));
pragma(msg, __traits(getTargetInfo, "objectFormat"));

On Windows, using the default (-m32) DigitalMars toolchain, I see this:

snn
hard
omf

With the Microsoft Build Tools (via VS 2017), compiling with -m64 and -m32mscoff, I see this:

libcmt
hard
coff

Yo! Linker! Take care of this, will ya?

D has long supported a lib pragma, allowing programmers to tell the compiler to pass a given library to the linker in source code rather than on the command line. Now, there’s a new pragma in town that let’s the programmer specify specific linker commands in source code and behaves rather differently. Meet the linkerDirective pragma:

pragma(linkerDirective, "/FAILIFMISMATCH:_ITERATOR_DEBUG_LEVEL=2");

The behavior is specified as “Implementation Defined”. The current implementation is specced to behave like so:

  • The string literal specifies a linker directive to be embedded in the generated object file.
  • Linker directives are only supported for MS-COFF output.

Just to make sure you didn’t gloss over the first list item, look at it again. The linker directive is not passed by the compiler to the linker, but emitted to the object file. Since it is only supported for MS-COFF, that means its only a thing for folks on Windows when they are compiling with -m64 or -m32mscoff. And some say the D community doesn’t care about Windows!

Of course there’s more!

The above are just a few cherries I picked from the list. For a deeper dive, see the full changelog. And head over to the Downloads page to get the installer for your platform. It looks a lot nicer than the boring list of files linked in the changelog.

Interfacing D with C: Arrays Part 1

This post is part of an ongoing series on working with both D and C in the same project. The previous post showed how to compile and link C and D objects. This post is the first in a miniseries focused on arrays.

When interacting with C APIs, it’s almost a given that arrays are going to pop up in one way or another (perhaps most often as strings, a subject of a future article in the “D and C” series). Although D arrays are implemented in a manner that is not directly compatible with C, the fundamental building blocks are the same. This makes compatibility between the two relatively painless as long as the differences are not forgotten. This article is the first of a few exploring those differences.

When using a C API from D, it’s sometimes necessary to translate existing code from C to D. A new D program can benefit from existing examples of using the C API, and anyone porting a program from C that uses the API would do well to keep the initial port as close to the original as possible. It’s on that basis that we’re starting off with a look at the declaration and initialization syntax in both languages and how to translate between them. Subsequent posts in this series will cover multidimensional arrays, the anatomy of a D array, passing D arrays to and receiving C arrays from C functions, and how the GC fits into the picture.

My original concept of covering this topic was much smaller in scope, my intent to brush over the boring details and assume that readers would know enough of the basics of C to derive the why from the what and the how. That was before I gave a D tutorial presentation to a group among whom only one person had any experience with C. I’ve also become more aware that there are regular users of the D forums who have never touched a line of C. As such, I’ll be covering a lot more ground than I otherwise would have (hence a two-part article has morphed into at least three). I urge those for whom much of said ground is old hat not to get complacent in their skimming of the page! A comfortable experience with C is more apt than none at all to obscure some of the pitfalls I describe.

Array declarations

Let’s start with a simple declaration of a one-dimensional array:

int c0[3];

This declaration allocates enough memory on the stack to hold three int values. The values are stored contiguously in memory, one right after the other. c0 may or may not be initialized, depending on where it’s declared. Global variables and static local variables are default initialized to 0, as the following C program demonstrates.

definit.c

#include <stdio.h>

// global (can also be declared static)
int c1[3];

void main(int argc, char** argv)
{
    static int c2[3];       // static local
    int c3[3];              // non-static local

    printf("one: %i %i %i\n", c1[0], c1[1], c1[2]);
    printf("two: %i %i %i\n", c2[0], c2[1], c2[2]);
    printf("three: %i %i %i\n", c3[0], c3[1], c3[2]);
}

For me, this prints:

one: 0 0 0
two: 0 0 0
three: -1 8 0

The values for c3 just happened to be lying around at that memory location. Now for the equivalent D declaration:

int[3] d0;

Try it online

Here we can already find the first gotcha.

A general rule of thumb in D is that C code pasted into a D source file should either work as it does in C or fail to compile. For a long while, C array declaration syntax fell into the former category and was a legal alternative to the D syntax. It has since been deprecated and subsequently removed from the language, meaning int d0[3] will now cause the compiler to scold you:

Error: instead of C-style syntax, use D-style int[3] d0

It may seem an arbitrary restriction, but it really isn’t. At its core, it’s about consistency at a couple of different levels.

One is that we read declarations in D from right to left. In the declaration of d0, everything flows from right to left in the same order that we say it: “(d0) is an (array of three) (integers)”. The same is not true of the C-style declaration.

Another is that the type of d0 is actually int[3]. Consider the following pointer declarations:

int* p0, p1;

The type of both p0 and p1 is int* (in C, only p0 would be a pointer; p1 would simply be an int). It’s the same as all type declarations in D—type on the left, symbol on the right. Now consider this:

int d1[3], d2[3];
int[3] d4, d5;

Having two different syntaxes for array declarations, with one that splits the type like an infinitive, sets the stage for the production of inconsistent and potentially confusing code. By making the C-style syntax illegal, consistency is enforced. Code readability is a key component of maintainability.

Another difference between d0 and c0 is that the elements of d0 will be default initialized no matter where or how it’s declared. Module scope, local scope, static local… it doesn’t matter. Unless the compiler is told otherwise, variables in D are always default initialized to the predefined value specified by the init property of each type. Array elements are initialized to the init property of the element type. As it happens, int.init == 0. Translate definit.c to D and see it for yourself (open up run.dlang.io and give it a go).

When translating C to D, this default initialization business is a subtle gotcha. Consider this innocently contrived C snippet:

// static variables are default initialized to 0 in C
static float vertex[3];
some_func_that_expects_inited_vert(vertex);

A direct translation straight to D will not produce the expected result, as float.init == float.nan, not 0.0f!

When translating between the two languages, always be aware of which C variables are not explicitly initialized, which are expected to be initialized, and the default initialization value for each of the basic types in D. Failure to account for the subtleties may well lead to debugging sessions of the hair-pulling variety.

Default initialization can easily be disabled in D with = void in the declaration. This is particularly useful for arrays that are going to be loaded with values before they’re read, or that contain elements with an init value that isn’t very useful as anything other than a marker of uninitialized variables.

float[16] matrix = void;
setIdentity(matrix);

On a side note, the purpose of default initialization is not to provide a convenient default value, but to make uninitialized variables stand out (a fact you may come to appreciate in a future debugging session). A common mistake is to assume that types like float and char, with their “not a number” (float.nan) and invalid UTF–8 (0xFF) initializers, are the oddball outliers. Not so. Those values are great markers of uninitialized memory because they aren’t useful for much else. It’s the integer types (and bool) that break the pattern. For these types, the entire range of values has potential meaning, so there’s no single value that universally shouts “Hey! I’m uninitialized!”. As such, integer and bool variables are often left with their default initializer since 0 and false are frequently the values one would pick for explicit initialization for those types. Floating point and character values, however, should generally be explicitly initialized or assigned to as soon as possible.

Explicit array initialization

C allows arrays to be explicitly initialized in different ways:

int ci0[3] = {0, 1, 2};  // [0, 1, 2]
int ci1[3] = {1};        // [1, 0, 0]
int ci2[]  = {0, 1, 2};  // [0, 1, 2]
int ci3[3] = {[2] = 2, [0] = 1}; // [1, 0, 2]
int ci4[]  = {[2] = 2, [0] = 1}; // [1, 0, 2]

What we can see here is:

  • elements are initialized sequentially with the constant values in the initializer list
  • if there are fewer values in the list than array elements, then all remaining elements are initialized to 0 (as seen in ci1)
  • if the array length is omitted from the declaration, the array takes the length of the initializer list (ci2)
  • designated initializers, as in ci3, allow specific elements to be initialized with [index] = value pairs, and indexes not in the list are initialized to 0
  • when the length is omitted from the declaration and a designated initializer is used, the array length is based on the highest index in the initializer and elements at all unlisted indexes are initialized to 0, as seen in ci4

Initializers aren’t supposed to be longer than the array (gcc gives a warning and initializes a three-element array to the first three initializers in the list, ignoring the rest).

Note that it’s possible to mix the designated and non-designated syntaxes in a single initializer:

// [0, 1, 0, 5, 0, 0, 0, 8, 44]
int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};

Each value without a designation is applied in sequential order as normal. If there is a designated initializer immediately preceding it, then it becomes the value for the next index, and all other elements are initialized to 0. Here, 0 and 1 go to indexes ci5[0] and ci5[1] as normal, since they are the first two values in the list. Next comes a designator for ci5[3], so ci5[2] has no corresponding value in this list and is initialized to 0. Next comes the designator for ci5[7].  We have skipped ci5[4], ci5[5], and ci5[6],  so they are all initialized to 0. Finally, 44 lacks a designator, but immediately follows [7], so it becomes the value for the element at ci5[8]. In the end, ci5 is initialized to a length of 9 elements.

Also note that designated array initializers were added to C in C99. Some C compiler versions either don’t support the syntax or require a special command line flag to enable it. As such, it’s probably not something you’ll encounter very much in the wild, but still useful to know about when you do.

Translating all of these to D opens the door to more gotchas. Thankfully, the first one is a compiler error and won’t cause any heisenbugs down the road:

int[3] wrong = {0, 1, 2};
int[3] right = [0, 1, 2];

Array initializers in D are array literals. The same syntax can be used to pass anonymous arrays to functions, as in writeln([0, 1, 2]). For the curious, the declaration of wrong produces the following compiler error:

Error: a struct is not a valid initializer for a int[3]

The {} syntax is used for struct initialization in D (not to be confused with struct literals, which can also be used to initialize a struct instance).

The next surprise comes in the translation of ci1.

// int ci1[3] = {1};
int[3] di1 = [1];

This actually produces a compiler error:

Error: mismatched array lengths, 3 and 1

What gives? First, take a look at the translation of ci2:

// int ci2[] = {0, 1, 2};
int[] di2 = [0, 1, 2];

In the C code, there is no difference between ci1 and ci2. They both are fixed-length, three-element arrays allocated on the stack. In D, this is one case where that general rule of thumb about pasting C code into D source modules breaks down.

Unlike C, D actually makes a distinction between arrays of types int[3] and int[]. The former is, like C, a fixed-length array, commonly referred to in D as a static array. The latter, unlike C, is a dynamic-length array, commonly referred to as a dynamic array or a slice. Its length can grow and shrink as needed.

Initializers for static arrays must have the same length as the array. D simply does not allow initializers shorter than the declared array length. Dynamic arrays take the length of their initializers. di2 is initialized with three elements, but more can be appended. Moreover, the initializer is not required for a dynamic array. In C, int foo[]; is illegal, as the length can only be omitted from the declaration when an initializer is present.

// gcc says "error: array size missing in 'illegalC'"
// int illegalC[]
int[] legalD;
legalD ~= 10;

legalD is an empty array, with no memory allocated for its elements. Elements can be added via the append operator, ~=.

Memory for dynamic arrays is allocated at the point of declaration only when an explicit initializer is provided, as with di2. If no initializer is present, memory is allocated when the first element is appended. By default, dynamic array memory is allocated from the GC heap (though the compiler may determine that it’s safe to allocate on the stack as an optimization) and space for more elements than needed is initialized in order to reduce the need for future allocations (the reserve function can be used to allocate a large block in one go, without initializing any elements). Appended elements go into the preallocated slots until none remain, then the next append triggers a new allocation. Steven Schveighoffer’s excellent array article goes into the details, and also describes array features we’ll touch on in the next part.

Often, when translating a declaration like ci2 to D, the difference between the fixed-length, stack-allocated C array and the dynamic-length, GC-allocated D array isn’t going to matter one iota. One case where it does matter is when the D array is declared inside a function marked @nogc:

@nogc void main()
{
    int[] di2 = [0, 1, 2];
}

Try it online

The compiler ain’t letting you get away with that:

Error: array literal in @nogc function D main may cause a GC allocation

The same error isn’t triggered when the array is static, since it’s allocated on the stack and the literal elements are just shoved right in there. New C programmers coming to D for the first time tend to reach for @nogc almost as if it goes against their very nature not to, so this is something they will bump into until they eventually come to the realization that the GC is not the enemy of the people.

To wrap this up, that big paragraph on designated array initializers in C is about to pull double duty. D also supports designated array initializers, just with a different syntax.

// [0, 1, 0, 5, 0, 0, 0, 8, 44]
// int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};
int[] di5 = [0, 1, 3:5, 7:8, 44];
int[9] di6 = [0, 1, 3:5, 7:8, 44];

Try it online

It works with both static and dynamic arrays, following the same rules and producing the same initialization values as in C.

The main takeaways from this section are:

  • there is a distinction in D between static and dynamic arrays, in C there is not
  • static arrays are allocated on the stack
  • dynamic arrays are allocated on the GC heap
  • uninitialized static arrays are default initialized to the init property of the array elements
  • dynamic arrays can be explicitly initialized and take the length of the initializer
  • dynamic arrays cannot be explicitly initialized in @nogc scopes
  • uninitialized dynamic arrays are empty

This is the time on the D Blog when we dance

There are a lot more words in the preceding sections than I had originally intended to write about array declarations and initialization, and I still have quite a bit more to say about arrays. In the next post, we’ll look at the anatomy of a D array and dig into the art of passing D arrays across the language divide.