Category Archives: Code

D For Data Science: Calling R from D

Digital Mars D logoD is a good language for data science. The advantages include a pleasant syntax, interoperability with C (in many cases as simple as adding an #include directive to import a C header file via the dpp tool), C-like speed, a large standard library, static typing, built-in unit tests and documentation generation, and a garbage collector that’s there when you want it but can be avoided when you don’t.

Library selection for data science is a different story. Although there are some libraries available, such as those provided by the mir project, the available functionality is extremely limited compared with languages like R and Python. The good news is that it’s possible to call functions in either language from D.

This article shows how to embed an R interpreter inside a D program, pass data between the two languages, execute arbitrary R code from within a D program, and call the R interface to C, C++, and Fortran libraries from D. Although I only provide examples for Linux, the same steps apply for Windows if you’re using WSL, and with minor modifications to the DUB package file, everything should work on macOS. Although it is possible to do so, I don’t talk about calling D functions from R, and I don’t include any discussion of interoperability with Python. (This is normally done using pyd.)

Dependencies

The following three dependencies should be installed:

  • R
  • R package RInsideC
  • R package embedr

It’s assumed that anyone reading this post already has R installed or can install it if they don’t. The RInsideC package is a slightly modified version of the excellent RInside project of Dirk Eddelbuettel and Romain Francois. RInside provides a C++ interface to R. The modifications provide a C interface so that R can be called from any language capable of calling C functions. Install the package using devtools:

library(devtools)
install_bitbucket("bachmeil/rinsidec")

The embedr package provides the necessary functions to work with R from within D. That package is also installed with devtools:

install_bitbucket("bachmeil/embedr")

A First Program

The easiest way to do the compilation is to use D’s package manager, called DUB. From within your project directory, open R and create a project skeleton:

library(embedr)
dubNew()

This will create a /src subdirectory to hold your project’s source code if it doesn’t already exist, add a file called r.d to /src and create a dub.sdl file in the project directory. Create a file in the /src directory called hello.d, containing the following program:

import embedr.r;

void main() {
  evalRQ(`print("Hello, World!")`);
}

From the terminal, in the project directory (the one holding dub.sdl, not the /src subdirectory), enter

dub run

This will print out “Hello, World!”. The important thing to realize is that even though you just used DUB to compile and run a D program, it was R that printed “Hello, World!” to the screen.

Executing R Code From D

There are two ways to execute R code from a D program. evalR executes a string in R and returns the output to D, while evalRQ does the same thing but suppresses the output. evalRQ also accepts an array of strings that are executed sequentially.

Create a new project directory and run dubNew inside it, as you did for the first example. In the src/ subdirectory, add a file named reval.d:

import embedr.r;
import std.stdio;

void main() {
  // Example 1
  evalRQ(`print(3+2)`); // evaluates to 5 in R, R prints the output [1] 5 to the screen

  // Example 2
  writeln(evalR(`3+2`).scalar); // evaluates to 5 in R, output is 5

  // Example 3
  evalRQ(`3+2`); // evaluates to 5 in R, but there is no output

  // Example 4
  evalRQ([`x <- 3`, `y <- 2`, `z <- x+y`, `print(z)`]); // evaluates this code in R
}

Example 1 tells R to print the sum of 3 and 2. Because we use evalRQ, no output is returned to D, but R is able to print to the screen. Example 2 evaluates 3+2 in R and returns the output to D in the form of an Robj. evalR(``3+2``).scalar executes 3+2 inside R, captures the output in an Robj, and converts the Robj into a double holding the value 5. This value is passed to the writeln function and printed to the screen. Example 3 doesn’t output anything, because evalRQ does not return any output, and R isn’t being told to print anything to the screen. Example 4 executes the four strings in the array sequentially, returning nothing to D, but the last tells R to print the value of z to the screen.

There’s not much more to say about executing R code from D. You can execute any valid R code from D, and if there’s an error, it will be caught and printed to the screen. Graphical output is automatically captured in a PDF file. To work interactively with R, or if it’s sufficient to save the results to a text file and read them into D, this is all you need to know. The more interesting cases involve passing data between D and R, and for the times when there is no alternative, using the R interface to call directly into C, C++, or Fortran libraries.

Passing Data Between D and R

A little background is needed to understand how to pass data between D and R. Everything in R is represented as a C struct named SEXPREC, and a pointer to a SEXPREC struct is called a SEXP in the R source code. Those names reflect R’s origin as a Scheme dialect, where code takes the form of s-expressions. In order to avoid misunderstanding, embedr uses the name Robj instead of SEXP.

It’s necessary to let R allocate the memory for any data passed to R. For instance, you cannot tell D to allocate a double[] array and then pass a pointer to that array to R. You would instead do something like this:

auto v = RVector(100);
foreach(ii; 0..100) {
  v[ii] = 1.5*ii;
}
v.toR("vv");
evalRQ(`print(vv)`);

The first line tells R to allocate a vector with room for 100 elements. v is a D struct holding a pointer to the memory allocated by R plus additional information that allows you to read and change the elements of the vector. Behind the scenes, the RVector struct protects the vector from R’s garbage collector. R is a garbage collected language, and if the only reference to the data is in your D program, there’s nothing to prevent the R garbage collector from freeing that memory. The RVector struct uses the reference counting mechanism described in Adam Ruppe’s D Cookbook to protect objects from R’s garbage collector and unprotect them when they’re no longer in use.

After filling in all 100 elements of v, the toR function creates a new variable in R called vv, and associates it with the vector held inside v. The final line tells R to print out the variable vv.

In practice, no data is ever passed between D and R. The only thing that’s passed around is a single pointer to the memory allocated by R. That means it’s practical to call R functions from D even for very large datasets.

Calling the R API

The R API provides a convenient (by C standards) interface to some of R’s functions and constants, including the numerical optimization routines underlying optim, distribution functions, and random number generators. This example shows how to solve an unconstrained nonlinear optimization problem using the Nelder-Mead algorithm, which is the default when calling optim in R.

The objective function is

f = x^2 + y^2

We want to choose x and y to minimize f. The obvious solution is x=0 and y=0.

Create a new project directory and initialize DUB from within R, with the one additional step to add the wrapper for R’s optimization libraries:

library(embedr)
dubNew()
dubOptim()

dubOptim() adds the file optim.d to the src/ directory. Create a file called nelder.d inside the src directory with the following program:

import embedr.r, embedr.optim;
import std.stdio;

extern(C) {
  double f(int n, double * par, void * ex) {
    return par[0]*par[0] + par[1]*par[1];
  }
}

void main() {
  auto nm = NelderMead(&f);
  OptimSolution sol = nm.solve([3.5, -5.5]);
  sol.print;
}

First we define the objective function, f, using the C calling convention so it can be passed to various C functions. We then create a new struct called NelderMead, passing a pointer to f to its constructor. Finally, we call the solve method, using [3.5, -5.5] as the array of starting values, and print out the solution. You’ll want to confirm that the failure code in the output is false (implying the convergence criterion was met). The most common reason that Nelder-Mead will fail to converge is because it took too many iterations. To change the maximum number of iterations to 10,000, you’d add nm.maxit = 10_000; to your program before the call to nm.solve.

There’s no overhead associated with calling an interpreted language in this example. We’re calling a C shared library directly, and at no point does the R interpreter get involved. As in the previous example, since there’s no copying of data, this approach is efficient even for large datasets. Finally, if you’re not comfortable with garbage collection, the inner loops of the optimization are done entirely in C. We nonetheless do take advantage of the convenience and safety of D’s garbage collector when allocating the nm and sol structs, as the performance advantages of manual memory management (to the extent that there are any) are irrelevant.

Calling R Interfaces from D

The purpose of many R packages is to provide a convenient interface to a C, C++, or Fortran library. The term “R interface” normally means one of two things. For modern C or C++ code, it’s a function taking Robj structs as arguments and returning one Robj struct as the output. For Fortran code and older C or C++ code, it’s a void function taking pointers as arguments. In either case, you can call the R interface directly from D code, meaning any library with an R interface also has a D interface.

An example of an R interface to Fortran code is found in the popular glmnet package.
Lasso estimation using the elnet function is done by passing 28 pointers to the function elnet in libglmnet.so with this interface:

.Fortran("elnet", ka, parm=alpha, nobs, nvars, as.double(x), y,
                  weights, jd, vp, cl, ne, nx, nlam, flmin, ulam, thresh,
                  isd, intr, maxit, lmu=integer(1), a0=double(nlam),
                  ca=double(nx*nlam), ia=integer(nx), nin=integer(nlam),
                  rsq=double(nlam), alm=double(nlam), nlp=integer(1),
                  jerr=integer(1), PACKAGE="glmnet")

You might want to work with the R interface directly if you’re calling elnet inside a loop in your D program. Most of the time it’s better to pass the data to R and then call the R function that calls elnet. Calling Fortran functions can be error-prone, leading to hard to debug segmentation faults.

Conclusion

D was designed from the beginning to be compatible with the C ABI. The intention was to facilitate the integration of new D code into existing C code bases. The practical result has been that, due to C’s lingua franca status, D can be used in combination with myriad languages. Data scientists looking for alternatives to C and C++ when working with R may find benefit in giving D a close look.

Lance Bachmeier is an associate professor of economics at Kansas State University and co-editor of the journal Energy Economics. He does research on macroeconomics and energy economics. He has been using the D programming language in his research since 2013.

Saving Money by Switching from PHP to D

2night was born in 2000 as an online magazine focused on nightlife and restaurants in Italy. Over the years, we have evolved into a full-blown experiential marketing agency, keeping up our vocation of spreading what’s cool to do when you go out, but specialized in producing brand events and below-the-line unconventional marketing campaigns.

We started using D at 2night in 2012 when we developed a webservice used by our Android and iOS apps. It has worked fine since then, but it was just a small experiment. In 2019, after many other experiments, we decided to take the big step: we switched the complete website from PHP to D. The time was right; we had been planning to give our website a new look and we took this opportunity to rewrite the entire infrastructure.

Development

The job turned out to be easier than we had imagined. We implemented a small D backend over our Mongo database in a few hundred lines. We created a Simple Common Gateway Interface (SCGI) library to interface with the NGINX server and another library to work with the DOM. Using the HTML DOM instead of an obscure HTML template language helped us speed up development a lot. In this way, someone who works on HTML or JavaScript is not required to know D or any template language and can deploy plain HTML and CSS files. On the other hand, someone who works on the backend does not care so much about HTML tags since he can simply access elements by ID, class, etc.; if some HTML tags are moved around the page the whole thing still works. HTML+CSS+JavaScript on the frontend and D on the backend are totally independent.

Writing code in this way is quite simple. Let’s say we want to build a blog page. We start from a simple HTML file like this:

<!DOCTYPE html>
<html lang="en">
  <head><title>Test page</title></head>
  <body>

    <!-- Main post -->
    <h1>Post title</h1>
    <h2>The optional subheading</h2>
    <p>
      Lorem ipsum dolor sit amet, consectetur adipiscing elit.
      Proin a velit tempus, eleifend ex non, aliquam ipsum.
      Nullam molestie enim leo, viverra finibus diam faucibus a.
      Ut dapibus a orci in eleifend.
    </p>

    <!-- Two more posts -->
    <div id="others">
      <h3>Other posts</h3>

      <div>
        <h4>Post#2</h4>
        <p>
          Morbi tempus pretium metus, et aliquet dolor.
          Duis venenatis convallis nisi, auctor elementum augue rutrum in.
          Quisque euismod vestibulum velit id pharetra.
          Morbi hendrerit faucibus sem, ac tristique libero...
        </p>
      </div>

      <div>
        <h4>Post #3</h4>
        <p>Sed sit amet vehicula nisl. Nulla in mi est.
          Vivamus mollis purus eu magna ullamcorper, eget posuere metus sodales.
          Vestibulum ipsum ligula, vehicula sit amet libero at, elementum vestibulum mi.
        </p>
      </div>
    </div>

  </body>
</html>

This is a valid HTML5 file that can be edited by anyone who knows HTML. Now we have to fill this template with real data from a database, which we can represent as an array in this example for the sake of simplicity:

// A blog post
struct SimplePost
{
  string heading;
  string subheading;
  string text;
  string uri;
}

SimplePost[] posts = [
  SimplePost("D is awesome!", "This is a real subheading", "Original content was replaced", "http://dlang.org"),
  SimplePost("Example post #1", "Example subheading #1", "Random text #1"),
  SimplePost("Example post #2", "Example subheading #2", "Random text #2"),
  SimplePost("Example post #3", "Example subheading #3", "This will never be shown")
];

First, we must read our HTML template just as it is and parse it using our html5 library:

  auto page = readText("html/test.html");

  // Parse the source
  auto dom = parser.parse(page);

Then we replace the content of the main article with data from the first element of our array. We use the tag name in order to select the correct HTML element:

  // Take the first element from our data source
  auto mainPost = posts.front;

  // Update rendered data of main post
  dom.byTagName("h1").front.innerText = mainPost.heading;
  dom.byTagName("p").front.innerText = mainPost.text;
  dom.byTagName("a").front["href"] = mainPost.uri;

We want to check if our article has a subtitle. If it doesn’t we’re going to remove the related tag.

  // If we have a subtitle we show it. If not, we remove the node from our page
  if (mainPost.subheading.empty) dom.byTagName("h2").front.detach();
  else dom.byTagName("h2").front.innerText = mainPost.subheading;

If you wanted to get the same result with a template language, you’d probably need to mess up the HTML with something like this:

<!-- We don't like this! -->
<? if(!post.subheading.isEmpty) ?>
<h2><?= post.subheading ?></h2>
<? endif ?>

This mixes logic inside the view and it disrupts the whole HTML file. Anyone who works on the HTML frontend is supposed to know what post is, the logic behind this object, and the template language itself. Last but not least, many HTML editors would probably be driven crazy by any custom syntax. And this is still a simple case!

Going back to our example, to fill the last part of our page we must get the container from the DOM. All we need is to perform a search by ID on the DOM:

auto container = dom.byId("others").front;

Now we use the first element inside the container as a template. So we clone it and we empty the container itself:

  // Use the first children as template
  auto containerItems = container.byCssSelector(`div[id="others"] > div`);
  auto otherPostTemplate = containerItems.front.clone();

  // Remove all existing children from container
  containerItems.each!(item => item.detach);

Finally we add a new child to the container for each post in our data source:

  // Take 2 more posts from list. We drop the first, it's the main one.
  foreach(post; posts.drop(1).take(2))
  {
    // Clone our html template
    auto newOtherPost = otherPostTemplate.clone();

    // Update it with our data
    newOtherPost.byTagName("h4").front.innerText = post.heading;
    newOtherPost.byTagName("p").front.innerText = post.text;

    // Add it to html container
    container.appendChild(newOtherPost);
  }

Putting it all together:

import std;
import arrogant;

// Init
auto parser = Arrogant();

// A blog post
struct SimplePost
{
  string heading;
  string subheading;
  string text;
  string uri;
}

/*
  Of course real data should come from a db query.
  We're using an array for simplicity
*/
SimplePost[] posts = [
  SimplePost("D is awesome!", "This is a real subheading", "Original content was replaced", "http://dlang.org"),
  SimplePost("Example post #1", "Example subheading #1", "Random text #1"),
  SimplePost("Example post #2", "Example subheading #2", "Random text #2"),
  SimplePost("Example post #3", "Example subheading #3", "This will never be shown")
];

void main()
{
  // Our template from disk
  auto page = readText("html/test.html");

  // Parse the source
    auto dom = parser.parse(page);

  // Take the first element from our data source
  auto mainPost = posts.front;

  // Update rendered data of main post
  dom.byTagName("h1").front.innerText = mainPost.heading;
  dom.byTagName("p").front.innerText = mainPost.text;
  dom.byTagName("a").front["href"] = mainPost.uri;

  // If we have a subtitle we show it. If not, we remove the node from our page
  if (mainPost.subheading.empty) dom.byTagName("h2").front.detach();
  else dom.byTagName("h2").front.innerText = mainPost.subheading;

  // -----
  // Other articles
  // -----

  // Get the container
  auto container = dom.byId("others").front;

  // Use the first children as template
  auto containerItems = container.byCssSelector(`div[id="others"] > div`);
  auto otherPostTemplate = containerItems.front.clone();

  containerItems.each!(item => item.detach);

  // Take 2 more posts from list. We drop the first, it's the main one.
  foreach(post; posts.drop(1).take(2))
  {
    // Clone our html template
    auto newOtherPost = otherPostTemplate.clone();

    // Update it with our data
    newOtherPost.byTagName("h4").front.innerText = post.heading;
    newOtherPost.byTagName("p").front.innerText = post.text;

    // Add it to html container
    container.appendChild(newOtherPost);
  }

  writeln(dom.document);

}

This program will output a new valid HTML5 page like this:

<!DOCTYPE html>
<html lang="en">
  <head><title>Test page</title></head>
  <body>
    <h1>D is awesome!</h1>
    <h2>This is a real subheading</h2>
    <p>Original content was replaced</p>
    <a href="http://dlang.org">More...</a>
    <h3>Other posts</h3>
    <div id="others">
      <div>
        <h4>Example post #1</h4>
        <p>Random text #1</p>
      </div>
      <div>
        <h4>Example post #2</h4>
        <p>Random text #2</p>
      </div>
    </div>
  </body>
</html>

Of course, the same results could be achieved in many other ways and in other languages too. Our library is just a wrapper over a plain C library named Modest. But what really makes the difference is how easy it is to write and read code thanks to D’s powerful and easy-to-understand syntax. The code shown above can be easily understood by anyone has some programming experience. I’ve received pull requests for our project from colleagues who had never heard of D at all.

That’s only one part of the big picture since we’re using many different libraries for different purposes.

Performance

Obviously, performance was a big win. The website felt like it was running on local machines, bringing a dramatic increase to speed and lower latency across the board. After the switch, at first the load on our cloud servers was so low that we thought the website was down! Switching from PHP to D meant we could cut in half the instance size of each Amazon AWS machine in our cloud. And these machines are still underloaded. Our database cloud was highly affected by this too. We now use one quarter of its original computational power. All of this brought an instantaneous and dramatic cost savings, down to more than half of what our costs used to be.

One more thing…

A few days after launch we realized that some of our costs were rising anyway. We were relying on a third-party service to host and cut the pictures we display on the website. This is not a simple task; in order to crop a picture correctly, you need to know where the subjects of the picture are located and you must try to keep them inside the trimmed frame. On the legacy website we mostly used a fixed proportion for images and we used a third-party service for some special cases. The new version of 2night.it has several different possible cuts for each “master” picture, and this raised the costs by 15x! Luckily, we found that a D binding to the OpenCV API is available. We used this to develop a smart algorithm that can cut any photo while preserving the subject of the picture. And again, the performance of our service is so impressive that we do not need a new machine to host it. In a week or so the costs for pictures dropped from some thousands of euros per month to almost 0.

Ownership and Borrowing in D

Digital Mars logoNearly all non-trivial programs allocate and manage memory. Getting it right is becoming increasingly important, as programs get ever more complex and mistakes get ever more costly. The usual problems are:

  1. memory leaks (failure to free memory when no longer in use)
  2. double frees (freeing memory more than once)
  3. use-after-free (continuing to refer to memory already freed)

The challenge is in keeping track of which pointers are responsible for freeing the memory (i.e. owning the memory), which pointers are merely referring to the memory, where they are, and which are active (in scope).

The common solutions are:

  1. Garbage Collection – The GC owns the memory and periodically scans memory looking for any pointers to that memory. If none are found, the memory is released. This scheme is reliable and in common use in languages like Go and Java. It tends to use much more memory than strictly necessary, have pauses, and slow down code because of inserted write gates.
  2. Reference Counting – The RC object owns the memory and keeps a count of how many pointers point to it. When that count goes to zero, the memory is released. This is also reliable and is commonly used in languages like C++ and ObjectiveC. RC is memory efficient, needing only a slot for the count. The downside of RC is the expense of maintaining the count, building an exception handler to ensure the decrement is done, and the locking for all this needed for objects shared between threads. To regain efficiency, sometimes the programmer will cheat and temporarily refer to the RC object without dealing with the count, engendering a risk that this is not done correctly.
  3. Manual – Manual memory management is exemplified by C’s malloc and free. It is fast and memory efficient, but there’s no language help at all in using them correctly. It’s entirely up to the programmer’s skill and diligence in using it. I’ve been using malloc and free for 35 years, and through bitter and endless experience rarely make a mistake with them anymore. But that’s not the sort of thing a programming shop can rely on, and note I said “rarely” and not “never”.

Solutions 2 and 3 more or less rely on faith in the programmer to do it right. Faith-based systems do not scale well, and memory management issues have proven to be very difficult to audit (so difficult that some coding standards prohibit use of memory allocation).

But there is a fourth way – Ownership and Borrowing. It’s memory efficient, as performant as manual management, and mechanically auditable. It has been recently popularized by the Rust programming language. It has its downsides, too, in the form of a reputation for having to rethink how one composes algorithms and data structures.

The downsides are manageable, and the rest of this article is an outline of how the ownership/borrowing (OB) system works, and how we propose to fit it into D. I had originally thought this would be impossible, but after spending a lot of time thinking about it I’ve found a way to fit it in, much like we’ve fit functional programming into D (with transitive immutability and function purity).

Ownership

The solution to who owns the memory object is ridiculously simple—there is only one pointer to it, so that pointer must be the owner. It is responsible for releasing the memory, after which it will cease to be valid. It follows that any pointers in the memory object are the owners of what they point to, there are no other pointers into the data structure, and the data structure therefore forms a tree.

It also follows that pointers are not copied, they are moved:

T* f();
void g(T*);
T* p = f();
T* q = p; // value of p is moved to q, not copied
g(p);     // error, p has invalid value

Moving a pointer out of a data structure is not allowed:

struct S { T* p; }
S* f();
S* s = f();
T* q = s.p; // error, can't have two pointers to s.p

Why not just mark s.p as being invalid? The trouble there is one would need to do so with a runtime mark, and this is supposed to be a compile-time solution, so attempting it is simply flagged as an error.

Having an owning pointer fall out of scope is also an error:

void h() {
  T* p = f();
} // error, forgot to release p?

It’s necessary to move the pointer somewhere else:

void g(T*);
void h() {
  T* p = f();
  g(p);  // move to g(), it's now g()'s problem
}

This neatly solves memory leaks and use-after-free problems. (Hint: to make it clearer, replace f() with malloc(), and g() with free().)

This can all be tracked at compile time through a function by using Data Flow Analysis (DFA) techniques, like those used to compute Common Subexpressions. DFA can unravel whatever rat’s nest of gotos happen to be there.

Borrowing

The ownership system described above is sound, but it is a little too restrictive. Consider:

struct S { void car(); void bar(); }
struct S* f();
S* s = f();
s.car();  // s is moved to car()
s.bar();  // error, s is now invalid

To make it work, s.car() would have to have some way of moving the pointer value back into s when s.car() returns.

In a way, this is how borrowing works. s.car() borrows a copy of s for the duration of the execution of s.car(). s is invalid during that execution and becomes valid again when s.car() returns.

In D, struct member functions take the this by reference, so we can accommodate borrowing through an enhancement: taking an argument by ref borrows it.

D also supports scope pointers, which are also a natural fit for borrowing:

void g(scope T*);
T* f();
T* p = f();
g(p);      // g() borrows p
g(p);      // we can use p again after g() returns

(When functions take arguments by ref, or pointers by scope, they are not allowed to escape the ref or the pointer. This fits right in with borrow semantics.)

Borrowing in this way fulfills the promise that only one pointer to the memory object exists at any one time, so it works.

Borrowing can be enhanced further with a little insight that the ownership system is still safe if there are multiple const pointers to it, as long as there are no mutable pointers. (Const pointers can neither release their memory nor mutate it.) That means multiple const pointers can be borrowed from the owning mutable pointer, as long as the owning mutable pointer cannot be used while the const pointers are active.

For example:

T* f();
void g(T*);
T* p = f();  // p becomes owner
{
  scope const T* q = p; // borrow const pointer
  scope const T* r = p; // borrow another one
  g(p); // error, p is invalid while q and r are in scope
}
g(p); // ok

Principles

The above can be distilled into the notion that a memory object behaves as if it is in one of two states:

  1. there exists exactly one mutable pointer to it
  2. there exist one or more const pointers to it

The careful reader will notice something peculiar in what I wrote: “as if”. What do I mean by that weasel wording? Is there some skullduggery going on? Why yes, there is. Computer languages are full of “as if” dirty deeds under the hood, like the money you deposit in your bank account isn’t actually there (I apologize if this is a rude shock to anyone), and this isn’t any different. Read on!

But first, a bit more necessary exposition.

Folding Ownership/Borrowing into D

Isn’t this scheme incompatible with the way people normally write D code, and won’t it break pretty much every D program in existence? And not break them in an easily fixed way, but break them so badly they’ll have to redesign their algorithms from the ground up?

Yup, it sure is. Except that D has a (not so) secret weapon: function attributes. It turns out that the semantics for the Ownership/Borrowing (aka OB) system can be run on a per-function basis after the usual semantic pass has been run. The careful reader may have noticed that no new syntax is added, just restrictions on existing code. D has a history of using function attributes to alter the semantics of a function—for example, adding the pure attribute causes a function to behave as if it were pure. To enable OB semantics for a function, an attribute @live is added.

This means that OB can be added to D code incrementally, as needed, and as time and resources permit. It becomes possible to add OB while, and this is critical, keeping your project in a fully functioning, tested, and releasable state. It’s mechanically auditable how much of the project is memory safe in this manner. It adds to the list of D’s many other memory-safe guarantees (such as no pointers to the stack escaping).

As If

Some necessary things cannot be done with strict OB, such as reference counted memory objects. After all, the whole point of an RC object is to have multiple pointers to it. Since RC objects are memory safe (if built correctly), they can work with OB without negatively impinging on memory safety. They just cannot be built with OB. The solution is that D has other attributes for functions, like @system. @system is where much of the safety checking is turned off. Of course, OB will also be turned off in @system code. It’s there that the RC object’s implementation hides from the OB checker.

But in OB code, the RC object looks to the OB checker like it is obeying the rules, so no problemo!

A number of such library types will be needed to successfully use OB.

Conclusion

This article is a basic overview of OB. I am working on a much more comprehensive specification. It’s always possible I’ve missed something and that there’s a hole below the waterline, but so far it’s looking good. It’s a very exciting development for D and I’m looking forward to getting it implemented.

For further discussion and comments from Walter, see the discussion threads on the /r/programming subreddit and at Hacker News.

Fuzzing Your D Application with LDC and AFL

Fuzzing, or fuzz testing, is a powerful method to find hidden bugs in your application. The basic idea is to present random input to your application and monitor how it behaves. If it crashes or shows some other unusual behavior then you have found a bug.

The use of true random input is not very effective, as most applications reject such input. Therefore many fuzz testing tools mutate valid input, e.g. flipping one or two bits, and present this mutated input to the application. This approach is easy to automate. A fuzz test can run for hours or days until an input is found which crashes your application.

Fuzz testing is very popular. A lot of security bugs have been found with this method. So it’s better to fuzz test your application by yourself instead of waiting for your users to report serious bugs!

Johan Engelen showed at DConf 2018 and in more detail in a blog post how you can use LLVM libFuzzer to fuzz test your application. For libFuzzer, you need to write a test driver. This is powerful because you can make decisions about the function to test. The downside is that you have to code the test driver.

AFL (short for American Fuzzy Lop, a rabbit breed) is another tool to fuzz test an application. AFL has a different approach than libFuzzer and does not require coding. The application under test has to read its data from stdin or from a file. The binary must be instrumented, which requires a recompile of the application. In case you have no source code for the application you can use AFL together with QUEMU. No instrumentation is required but the tests run much slower.

Because random input is not a good choice, you give AFL one or more valid input files, preferably of a small size. AFL mutates this input file, e.g. by flipping a single bit. This new input file is presented to your application and the reaction on it is observed. With the instrumentation in place, AFL discovers the path the data takes through your application. The relationship between bit flips and different code paths that run because of the bit flips is recorded and used to discover new paths and to trigger unexpected behavior. Input which causes crashes is saved in a directory. The main UI gives a lot of information, including how many unique crashes occured in the test session.

AFL works best if the input is a small binary, e.g. a PNG or a ZIP file. If your application has a more verbose and structured input (e.g. a programming language) then you can provide a dictionary which helps AFL with the basic syntax.

The latest release of AFL has an interesting feature. For instrumenting code compiled with clang, a small LLVM plugin is used. This plugin can also be used with LDC, making it possible to fuzz test your D application!

I used AFL to fuzz test LLtool, my recursive-descent parser generator presented at DConf 2019. LLtool expects a grammar description as a file or on stdin. If no error is found, then a D fragment of a recursive-descent parser is produced. Here, I show my approach.

First of all, you need to install AFL. It is included in most Linux distributions, e.g. Ubuntu. A FreeBSD port is also available. One caveat here: please make sure that the AFL plugin is compiled with the same LLVM version as LDC. Otherwise you will see an error message like

ld-elf.so.1: /usr/local/lib/afl/afl-llvm-pass.so: Undefined symbol "...."

during compilation. In this case, download AFL from the link above and compile it yourself.

Different distributions install AFL in different locations. You need to find out the path. E.g. Ubuntu uses /usr/lib/afl, FreeBSD uses /usr/local/lib/afl. I use an environment variable to record this value for later use (bash syntax):

export AFL_PATH=`/ust/lib/afl`

To instrument your code you have to specify the AFL plugin on the LDC command line:

ldc2 -plugin=$AFL_PATH/afl-llvm-pass.so *.d

You will see a short statistic emitted by the new pass:

afl-llvm-pass 2.52b by <lszekeres@google.com>
[+] Instrumented 16118 locations (non-hardened mode, ratio 100%).

For LLVM instrumentation, AFL requires a small runtime library. You need to link the object file $AFL_PATH/afl-llvm-rt.o into your application.

In my dub.sdl file I created a special build type for AFL. This puts all the steps above into a single place. Plus, you can copy and paste this build type directly to your own dub.sdl file because the only dependencies are AFL and LDC!

buildType "afl" {
    toolchainRequirements dmd="no" gdc="no" ldc=">=1.0.0"
    dflags "-plugin=$AFL_PATH/afl-llvm-pass.so"
    sourceFiles "$AFL_PATH/afl-llvm-rt.o"
    versions "AFL"
    buildOptions "debugMode" "debugInfo" "unittests"
}

Now you can type dub build -b=afl on the command line to instrument your application for use with afl. Do not forget to set the AFL_PATH environment variable, otherwise dub will complain.

Now create two new directories called testcases and findings. Put a small, valid input file into the testcases directory. For example save this

%token number
%%
expr: term "+" term;
term: factor "*" factor;
factor: number;

as file t1.g in the testcases folder. Inputs which crash the application will be saved in the findings directory.

To call AFL, you type on the command line:

afl-fuzz -i testcases -o findings ./LLtool --DRT-trapExceptions=0 @@

Two parts of the command line require further explanation. If the application requires a file for input, you specify the file path as @@. Otherwise AFL assumes that the application reads the input from stdin.

If the application crashes, then AFL saves the input causing the crash in the findings/crashes directory. But the D runtime is very friendly. Exceptions uncaught by the application are caught by the D runtime, a stack trace is printed, and the application terminates. This does not count as a crash for AFL. To produce a crash you have to specify the D runtime option --DRT-trapExceptions=0. For more information, read the relevant edition of This week in D.

It is worth reading the AFL documentation because there it provides a lot of tips and background information. Enjoy watching AFL crashing your application and producing test cases for you!


A long-time contributor to the D community, Kai Nacke is the author of ‘D Web Development‘ and a maintainer of LDC, the LLVM D Compiler.

Flexible Default Function Parameters via structs with Nullable Fields

The problem

Sometimes we need to combine an aggregate of a set of values with an aggregate of the corresponding set of default values to create a combined result. The result for each member is either the explicitly specified value or, where no value is specified, the default value. This is similar to default function arguments in D. However, D forces one to always specify the first N values in function parameters, but I want to be able to specify an arbitrary subset of the values. Example:

Explicit values: a: 1, b: 2

Default values: a: 3, b: 4, c: 5

Combined result: a: 1, b: 2, c: 5

Possible solutions

The first idea is to use associative arrays. This approach is inefficient, however, because it combines values with string (or enum at best) names at runtime with associative array lookups and stores. It could be used like this (untested) example:

combine(["a": 1, "b": 2], ["a": 3, "b": 4, "c": 5])

I was advised to instead use structs with nullable values (see below) to pass multiple values. This is nearly as efficient as possible because the members of the structs are enumerated at compile time (in fact, I use static foreach in my implementation). So I implemented this solution. The source code is quite useful and released under the Apache 2.0 license.

To represent the (explicit) values of type T, we use a struct member of type Nullable!T. If it is null, this means that the explicit value is missing and the default value is used instead; otherwise the specified value is used.

Example of definition and combination

First, install the struct-params package with DUB (see the DUB documentation; I strongly recommend using DUB to build D projects) or clone my GitHub repository.

Then add the following import to your source:

import struct_params;

Example code:

mixin StructParams!("S", int, "x", float, "y");
immutable S.WithDefaults combinedMain = { x: 12 }; // note y is default initialized to null
immutable S.Regular combinedDefault = { x: 11, y: 3.0 };
immutable combined = combine(combinedMain, combinedDefault);
assert(combined.x == 12 && combined.y == 3.0);

StructParams is a string mixin, a D construct which generates D code at compile time and mixes it in at the point of declaration.

mixin StructParams!("S", int, "x", float, "y"); effectively defines the following struct:

struct S {
  struct Regular {
    int x;
    float y;
  }
  struct WithDefaults {
    Nullable!int x;
    Nullable!float y;
  }
}

WithDefaults is the struct to pass, for example, explicit values (which can be present (non-null) or missing (null) to be replaced with default values). For this, the D template type Nullable is used to represent either a value of a type or a null denoting a missing value.

Regular is just a standard struct with fields. Regular is a type which can be used to pass default values to the function combine.

This function combine combines explicit values with default values (as described above).

Then we assert that the result is correct.

Calling functions

Finally, we will do the main thing for which all the above was intended and call a function with combined values:

float f(int a, float b) {
    return a + b;
}
assert(callFunctionWithParamsStruct!f(combined) == combined.x + combined.y);

The structure combined is “split” into members and the members are passed as parameters to f in the order the fields of the struct are defined, that is in the order their names are specified as arguments to the StructParams string mixin. For those interested in the details: I use the built-in D struct and class property .tupleof to split the structure into a tuple of members (e.g. explicitly calling f with an instance reg of Regular would look like this: f(reg.tupleof);).

We can also call a member function of a struct or class instance (t in the example below):

struct Test {
    float f(int a, float b) {
        return a + b;
    }
}
Test t;
assert(callMemberFunctionWithParamsStruct!(t, "f")(combined) == combined.x + combined.y);

It is very unnatural to call the member f using a string of its name, but I have not found a better solution.

Another variant would be to use callFunctionWithParamsStruct!((int a, float b) => t.f(a, b))(combined), but this way is inconvenient as it requires specifying arguments explicitly.

Final considerations

Note that we cannot currently use struct initializers with named arguments, like S.Regular(x: 11, y: 3.0), as the current version of D does not have this feature. There is a draft D Improvement Proposal (DIP) to introduce the feature, but I hear its author is going to replace it with a more general-case proposal after DConf 2019 in London.

It would be beneficial to implement such structs with default values, but this seems impossible, but representing every possible D value as a literal is apparently impossible (for example, a structure with circular references to other structures seems not to be representable as a literal). We can attempt to implement it for a subset of types of values, or even for some values of types and not for others, but this would require further consideration.

Victor Porton is an open source developer, a math researcher, and a Christian writer. He earns his living as a programmer.

DStep 1.0.0

DStep is a tool for automatically generating D bindings for C and Objective-C libraries. This is implemented by processing C or Objective-C header files and outputting D modules. DStep uses the Clang compiler as a library (libclang) to process the header files.

Background

The first version of DStep was released on the 7th of July, 2012. There have been four subsequent releases, the last of which was on the 16th of January, 2016. Quite a lot has happened in the D world and with DStep since then.

After the release of DStep 0.2.1 in January 2016, there wasn’t much progress on DStep. I had a limited amount of time and chose to spend it on other projects. Fortunately, in 2016, DStep got picked as one of four D-related projects for Google Summer of Code (GSoC). The student who chose to work on DStep was Wojciech Szęszoł. He did a tremendous amount of work and pushed DStep forward by years compared to the time it would have taken me. In fact, I was often a blocker because I couldn’t keep up with reviewing all the changes he made.

New Release

The latest release of DStep contains a huge number of new features and bug fixes. A lot of the new features add support for translating C preprocessor macros in various forms. DStep also gained support for one more platform: Windows. Here follow some of the new features available in DStep 1.0.0:

Support for Simple Defines

This feature adds support for translating a simple form of #define to a manifest constant in D. Example:

#define FOO 1

The above C code is translated to the following D code:

enum FOO = 1;

DStep will try to translate the C code so that the D code looks as much as possible like the original C code. If a #define contains an expression instead of a single literal, DStep will try to preserve the original expression:

#define FOO 1 + 3

Instead of translating this to a manifest constant with the value of 4 (which would be semantically correct), DStep will preserve the original expression and translate it to:

enum FOO = 1 + 3;

This also goes for other types of literals, like hexadecimal literals:

#define FOO 0x1

Here DStep will preserve the hexadecimal literal and translate it to:

enum FOO = 0x1;

Function-Like Macros

DStep is now able to translate function-like macros. This is a pretty advanced feature that requires a small parser for the macros. DStep uses libclang to tokenize the macros and the parses them to be able to do the proper translations. The most basic example looks like:

#define FOO() 0 + 1

The above macro will translate to the following D code:

extern (D) int FOO()
{
    return 0 + 1;
}

Although not shown here (to minimize the examples), DStep will output extern (C): at the top of each file. Therefore, for macros translated to functions, DStep will add extern (D) to give the functions D linkage and mangling.

Here’s an example of a C macro containing parameters:

#define FOO(a, b) a + b

Unfortunately, in C, a and b can be basically anything. D doesn’t have an exact corresponding feature. DStep will translate this as accurately as
possible by outputting a templated function:

extern (D) auto FOO(T0, T1)(auto ref T0 a, auto ref T1 b)
{
    return a + b;
}

The assumption in this translation is that a and b will be a value of some kind of type. They can either be of the same type or of different types. To
avoid copying any of the values, ref parameters are used. Since an rvalue cannot be passed to a ref parameter, auto ref is used instead to properly handle both rvalues and lvalues.

More advanced expressions are supported as well:

#define BAR 4
#define FOO(a, b) a + 3 + (b + BAR) - sizeof(b)

In the above example there’s a combination of parameters, literals, parenthesized expression, usages of other macros, and built-in operators. DStep handles all those and translates it to:

enum BAR = 4;

extern (D) auto FOO(T0, T1)(auto ref T0 a, auto ref T1 b)
{
    return a + 3 + (b + BAR) - b.sizeof;
}

Again, the expression is preserved as closely as possible to the original source code. The parentheses, the reference to the BAR macro, all are preserved.

Token Concatenation

This feature adds support for translating the token concatenation, or token pasting, operator to a D string concatenation:

#define CONCAT(prefix, name) prefix ## name

The above function-like macro concatenates the two given tokens. DStep translates that to a function that converts the arguments to strings and concatenates the two resulting strings. This can then be used together with the string mixin statement to give the same behavior as in C.

extern (D) string CONCAT(T0, T1)(auto ref T0 prefix, auto ref T1 name)
{
    import std.conv : to;

    return to!string(prefix) ~ to!string(name);
}

Another example is parameters combined with tokens:

#define CONCAT(prefix) prefix ## name

This translates similarly to the previous example, but since name is not a parameter this will be translated to a string literal:

extern (D) string CONCAT(T)(auto ref T prefix)
{
    import std.conv : to;

    return to!string(prefix) ~ "name";
}

Preprocessor Constants in Array Sizes

DStep will now preserve preprocessor constants for the size of arrays:

#define Foo 3
int a[Foo];

In previous versions of DStep the translation would just output the size of the array a as 3. This would be semantically accurate but the generated source
code would look less like the original C source code. Now DStep is able to translate preprocessor constants and can, therefore, use the preprocessor constant as the size of the array:

enum Foo = 3;
extern __gshared int[Foo] a;

In the above example, the manifest constant Foo is used as the size of a instead of a plain 3. This more closely matches the original C source code.

Preserving Comments

In previous versions of DStep comments were completely stripped out. With this release DStep is able to preserve comments in the D code from the original C code:

// This comment describes this whole file

// Documentation for the symbol `foo`
void foo();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

int a; // this is `a`

In the above example there are three types of comments:

  • A header comment for the whole file
  • A preceding comment for the symbol foo
  • Loose comments not belonging to any symbol
  • A trailing comment for the symbol a

All of these comments are now properly preserved:

// This comment describes this whole file

extern (C):

// Documentation for the symbol `foo`
void foo ();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

extern __gshared int a; // this is `a`

In the above example, notice how the header comment is placed above the extern (C): line. If a module declaration is output in the D file (when the --package flag is used), the header comment will be placed above that as well:

// This comment describes this whole file

module bar.foo;

extern (C):

// Documentation for the symbol `foo`
void foo ();

/* Loose comment */ /* Loose comment */

/*
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
    Multi-line loose comment.
*/ /* Loose comment */

extern __gshared int a; // this is `a`

It’s also possible to disable the preservation of comments using the --comments=false flag.

Package Prefix

To better help organize bindings, DStep supports the
--package <name.of.package> flag. When this flag is enabled DStep will put all the translated modules in the <name> package. Note, this will only add the package prefix to the module declaration of the translated file. It will not place the output file in a directory corresponding to the package.

Removing Excessive Newlines

DStep will now remove excessive newlines but still preserve spacing for the original C code. This is best illustrated with an example:

int a;


int b;

In the above example there are two newlines between the declarations of a and b. DStep will remove the excessive newline and only output one to still preserve the spacing:

extern __gshared int a;

extern __gshared int b;

But if there is no spacing between the declarations, that is respected as well:

int a;
int b;

In the above example there is no newline between the declarations and DStep will preserve that:

extern __gshared int a;
extern __gshared int b;

Preserving Order of Declarations

In previous versions of DStep the declarations in the translated D code would follow a certain order defined by DStep, aliases first, then constants, then types and last functions. With this release, DStep will now preserve the order of the declarations of the original C code:

void bar();

struct Foo
{
    int a;
};

Previous versions would translate the above to:

struct Foo
{
    int a;
}

void bar ();

with the struct first and then the function declaration. With this release, the order is preserved:

void bar ();

struct Foo
{
    int a;
}

Multiple Input Files

Previous versions of DStep only allowed a single header file as input. With this release, multiple files can be passed to DStep at once. Each input file will produce one D source file as input. To pass multiple input files to DStep, just pass the filenames when invoking DStep.

$ dstep foo.h bar.h

Running the above command will produce two D source files: foo.d and bar.d.

If multiple input files and the -o flag are given, the -o flag specifies the output directory where the D source files will be placed. When multiple input
files are given it’s not possible to specify the names of the D source files.

$ dstep foo.h bar.h -o foobar
$ ls foobar
bar.d foo.d

The above command will place the two D source files in the directory foobar.

--reduce-aliases Flag

Normally when DStep translates a header file to a D module it will reduce aliases if possible. DStep contains a set of common typedefs that can be reduced to native D types. That means that code like this:

#include ;
int32_t a = 3;

Will be translated to the following D code:

extern __gshared int a;

In this release of DStep, there’s a new flag, --reduce-aliases. This flag allows the reduce aliases feature to be enabled or disabled. By default it’s enabled, but can be disabled by invoking DStep with the following command: dstep --reduce-aliases=false. When this feature is disabled, it will translate the above example to the following D code:

import core.stdc.stdint;
extern __gshared int32_t a;

It will keep the name int32_t as the type of the variable declaration and add the import for the module that contains the declaration of int32_t.

--alias-enum-members Flag

In C, enum members are accessible directly in the global scope. Example:

enum Foo
{
    foo,
    bar
};

enum Foo a = foo;

In D however, enum members need to be qualified with the enum name. The correct translation of the above would be:

enum Foo
{
    foo = 0,
    bar = 1
}


Foo a = Foo.foo;

In this release of DStep a new flag has been added, --alias-enum-members, that enables the generation of aliases to enum members at module scope. This will allow keeping the translation more closely to the original C code:

enum Foo
{
    foo = 0,
    bar = 1
}

alias foo = Foo.foo;
alias bar = Foo.bar;

By default this feature is not enabled for the translated D code to more closely follow the D conventions.

--translate-macros Flag

In this release, DStep can now translate several kinds of C macros to their equivalent in D. This might not always be desirable because the translations are not bulletproof. Therefore, there’s a new flag, --translate-macros, which will enable or disable the translation of macros. By default translation of macros is enabled.

libclang Bindings

DStep uses Clang as a library to process the C code. There two main ways of using Clang as a library. One is to use the C++ APIs directly. This will give full access to what Clang can do. The problem with this API is that is not a stable API. It’s also a C++ API and when DStep was first implemented, the C++ integration in D was quite lacking. One of my requirements when implementing DStep was to implement it in D. Therefore, the natural choice was to use the C API, called “libclang”, which is also provided. This library is the main interface intended to be used by editors and IDEs that want to leverage Clang as a library. It’s also recommended to use libclang when accessing Clang from a language other than C++—D in my case.

Since libclang is a C library only exposing C header files, I needed bindings to be able to use from it within D. Up until now these bindings were hand made. Fortunately, these headers are the most forgiving I have ever seen when it comes to translating into D. Only a single header file was needed containing hardly any macros at all. It was quite a quick job of translating these headers by using search-and-replace.

With this release of DStep, since DStep has improved so much, the bindings are now self-hosted. That is, DStep has been used to generate the bindings. In addition to that, generation of the bindings has now been added to the test suite to make sure it doesn’t break.

Support for Windows

DStep was originally developed on macOS since that is my main development platform. Thanks to the Posix standard it was easy to port to Linux as well. The first release of DStep was available for both macOS and Linux. Back in 2011 when the development of DStep first started, Clang did not support Windows. There seems to have been some support for MinGW, but that was not a target supported by DMD. The LLVM and Clang team has made huge progress since then and in 2016 when DStep got picked as a GSoC project, support for Windows was available, targeting compatibility with the Visual Studio compiler.

In this release, thanks to Wojciech Szęszoł, DStep is now available on Windows. Due to Clang being compatible with Visual Studio, DStep needs to be built to use the same object format. When compiling with DMD, that means compiling for 64-bit (via the -m64 flag) or using the -m32mscoff flag when compiling for 32 bit. The Dub package automatically takes care of this.

Continuous Integration/Deployment

In the area of CI/CD quite a few things have happened. Originally the test suite of DStep was implemented using Cucumber and Ruby. These tests were a form of end-to-end test and failed to take advantage of the reasons to use Cucumber in the first place. These tests have now been replaced with a combination of unit tests and end-to-end tests, all implemented in D.

LDC

LDC has been added as a supported compiler. That means that DStep is compiled with LDC in addition to DMD as part of the CI pipelines. Every commit and every pull request is now tested with LDC as well.

Upgrade of Compilers

Both DMD and LDC have been upgraded to their latest versions. In addition, beta and nightly releases are being tested in the CI pipelines. A scheduled job has been added to the CI pipelines as well, which will run once every day to make sure new releases of the compilers won’t break DStep even if no changes have been made to DStep. This also means that only the latest versions of LDC and DMD are supported for building DStep.

Testing Windows using AppVeyor

Since DStep now supports Windows as an additional platform a new CI pipeline has been added in the form of AppVeyor. This is a CI service that provides Windows as a platform to run builds on. This build run compiles DStep both using DMD and LDC and it also builds both 32-bit and 64-bit versions.

Complex Floating-Point Types

Another feature that is new in this version of DStep is that complex floating-point types are now supported. There are three complex types that are supported:
float _Complex, double _Complex and long double _Complex.

float _Complex a;
double _Complex b;
long double _Complex c;

The above code snippet in C is translated to the following D code:

extern __gshared cfloat a;
extern __gshared cdouble b;
extern __gshared creal c;

New Alias Syntax

Typedefs in C header files are translated to alias declarations in the D code. Up until this release they used the old alias syntax: alias oldName newName. Since the previous release of DStep the D language has improved and gained new features. One of them is a new (now considered the standard) alias syntax: alias newName = oldName. It’s easier to follow which name is the alias and which is the original when it’s using a more familiar syntax similar to variable declarations. Here’s an example of how the C code is translated into D code with the new alias syntax:

typedef int foo;

The above C code is translated to the following D code:

alias foo = int;

Custom Global Attributes

By default DStep doesn’t add any attributes like @nogc or nothrow to the translated code. In this release of DStep, support for attributes has been added. Custom global attributes can be enabled with the --global-attribute flag. For a C header file with the following content:

int a;

And invoking DStep with the following command:

$ dstep foo.h --global-attribute @nogc --global-attribute nothrow

Will output the following D code:

@nogc:
nothrow:

extern __gshared int a;

Rename Enums

Unlike in D, enums in C don’t create a new scope for their members, even if a name is given to the enum. Example:

enum
{
    a,
    b
};

enum Foo
{
    c,
    d
};

int e = a;
int f = c;

In the above example it’s possible to access the enum members from both of the enums without qualifying the type. In D, this is not the case for named enums. They require the qualifying the enum member with the type name:

enum
{
    a,
    b
}

enum Foo
{
    c,
    d
}

int e = a; // ok since the first enum is anonymous
int f = Foo.c; // need to qualifying the enum member with the type name

To reduce the risk of symbol conflict it’s quite common for C libraries to prefix enum members with the name of the type:

enum Foo
{
    FooC,
    FooD
};

While the following is perfectly fine in D as well, it gets a bit redundant and verbose to have to specify Foo twice:

enum Foo
{
    FooC,
    FooD
}

int c = Foo.FooC;
int d = Foo.FooD;

For this reason DStep now supports a new flag, --rename-enum-members, which when enabled will try to remove any prefix of the enum member names. Given the
following C header file:

enum Foo
{
    FooA,
    FooB
};

And running DStep as follows:

$ dstep foo.h --rename-enum-members

It will produce the following D code:

enum Foo
{
    a = 0,
    b = 1
}

DStep identified the Foo prefix and removed it from the enum member names. It also converted the names to lowercase to better match the standard D naming
conventions.

By default this feature is not enabled to more closely match the original C code.

Normalize Modules

When the --package flag is specified DStep will add a module declaration to all D modules. By default, it will use the name of the input file as the name of the module. In the C world there’s no direct file-naming convention. Some libraries will use all lowercase letters, some will use snake case, some will use camel case, some will use Pascal case, and so on.

The standard D naming convention for modules (and therefore files) is to use only lowercase letters and underscores, i.e snake case. To help with following this convention DStep now supports the new flag --normalize-modules. When this flag is enabled (and the --package flag is used) DStep will try to convert the name of the input file to a name matching the D conventions.

Given a C header file named Foo.h and only invoking DStep with the --package flag:

$ dstep Foo.h --package bar

DStep will produce the following D code:

module bar.Foo;

When the --normalize-modules flag is used as well:

$ dstep Foo.h --package bar --normalize-modules

DStep will output the following D code:

module bar.foo;

Note that Foo has been converted to foo.

Another example with a file using the Pascal naming convention:

$ dstep NSString.h --package bar --normalize-modules

And the result:

module bar.ns_string;

By default this feature is not enabled to more closely match the original C code.

Bit Fields

Another new feature that has been added in this release of DStep is support for bit fields. The bit field is a built-in language construct in C, but there’s no language support for it in D. Fortunately, with the help of D’s metaprogramming capabilities, the bit field has been implemented as a library construct and is available in the standard library [3]. The library construct will generate getters and setters that perform the same bit manipulation that
the C compiler would have generated.

The following snippet in C:

struct Foo
{
    unsigned int a : 1;
    unsigned int b : 2;
    unsigned int c : 5;
};

Is translated to D:

struct Foo
{
    import std.bitmanip : bitfields;

    mixin(bitfields!(
        uint, "a", 1,
        uint, "b", 2,
        uint, "c", 5));
}

The D translation makes use of the bitfields template from the standard library. It’s automatically imported, directly inside the struct, to minimize the scope of where the symbol is available.


In his day job, Jacob Carlborg is a DevOps engineer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and DVM, the topic of another post on this blog. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.

Project Highlight: DPP

D was designed from the beginning to be ABI compatible with C. Translate the declarations from a C header file into a D module and you can link directly with the corresponding C library or object files. The same is true in the other direction as long as the functions in the D code are annotated with the appropriate linkage attribute. These days, it’s possible to bind with C++ and even Objective-C.

Binding with C is easy, but can sometimes be a bit tedious, particularly when done by hand. I can speak to this personally as I originally implemented the Derelict collection of bindings by hand and, though I slapped together some automation when I ported it all over to its successor project, BindBC, everything there is maintained by hand. Tools like dstep exist and can work well enough, though they come with limitations which require careful attention to and massaging of the output.

Tediousness is an enemy of productivity. That’s why several pages of discussion were generated from Átila Neves’s casual announcement a few weeks before DConf 2018 that it was now possible to #include C headers in D code.

dpp is a compiler wrapper that will parse a D source file with the .dpp extension and expand in place any #include directives it encounters, translating all of the C or C++ symbols to D, and then pass the result to a D compiler (DMD by default). Says Átila:

What motivated the project was a day at Cisco when I wanted to use D but ended up choosing C++ for the task at hand. Why? Because with C++ I could include the relevant header and be on my way, whereas with D (or any other language really) I’d have to somehow translate the header and all its transitive dependencies somehow. I tried dstep and it failed miserably. Then there’s the fact that the preprocessor is nearly always needed to properly use a C API. I wanted to remove one advantage C++ has over D, so I wrote dpp.

Here’s the example he presented in the blog post accompanying the initial announcement:

// stdlib.dpp
#include <stdio.h>
#include <stdlib.h>

void main() {
    printf("Hello world\n".ptr);

    enum numInts = 4;
    auto ints = cast(int*) malloc(int.sizeof * numInts);
    scope(exit) free(ints);

    foreach(int i; 0 .. numInts) {
        ints[i] = i;
        printf("ints[%d]: %d ".ptr, i, ints[i]);
    }

    printf("\n".ptr);
}

Three months later, dpp was successfully compiling the julia.h header allowing the Julia language to be embedded in a D program. The following month, it was enabled by default on run.dlang.io.

C support is fairly solid, though not perfect.

Although preprocessor macro support is one of dpp’s key features, some macros just can’t be translated because they expand to C/C++ code fragments. I can’t parse them because they’re not actual code yet (they only make sense in context), and I can’t guess what the macro parameters are. Strings? Ints? Quoted strings? How does one programmatically determine that #define FOO(S) (S) is meant to be a C cast? Did you know that in C macros can have the same name as functions and it all works? Me neither until I got a bug report!

Push the stdlib.dpp code block from above through run.dlang.io and read the output to see an example of translation difficulties.

The C++ story is more challenging. Átila recently wrote about one of the problems he faced. That one he managed to solve, but others remain.

dpp can’t translate C++ template specialisations on reference types because reference types don’t exist in D. I don’t know how to translate anything that depends on SFINAE because it also doesn’t exist in D.

For those not in the know, classes in D are reference types in the same way that Java classes are reference types, and function parameters annotated with ref accept arguments by reference, but when it comes to variable declarations, D has no equivalent for the C++ lvalue reference declarator, e.g. int& someRef = i;.

Despite the difficulties, Átila persists.

The holy grail is to be able to #include a C++ standard library header, but that’s so difficult that I’m currently concentrating on a much easier problem first: being able to successfully translate C++ headers from a much simpler library that happens to use standard library types (std::string, std::vector, std::map, the usual suspects). The idea there is to treat all types that dpp can’t currently handle as opaque binary blobs, focusing instead on the production library types and member functions. This sounds simple, but in practice I’ve run into issues with LLVM IR with ldc, ABI issues with dmd, mangling issues, and the most fun of all: how to create instances of C++ stdlib types in D to pass back into C++? If a function takes a reference to std::string, how do I give it one? I did find a hacky way to pass D slices to C++ functions though, so that was cool!

On the plus side, he’s found some of D’s features particularly helpful in implementing dpp, though he did say that “this is harder for me to recall since at this point I mostly take D’s advantages for granted.” The first thing that came to mind was a combination of built-in unit tests and token strings:

unittest {
    shouldCompile(
        C(q{ struct Foo { int i; } }),
        D(q{ static assert(is(typeof(Foo.i) == int)); })
    );
}

It’s almost self-explanatory: the first parameter to shouldCompile is C code (a header), and the second D code to be compiled after translating the C header. D’s token strings allow the editor to highlight the code inside, and the fact that C syntax is so similar to D lets me use them on C code as well!

He also found help from D’s contracts and the garbage collector.

libclang is a C library and as such has hardly any abstractions or invariant enforcement. All nodes in the AST are represented by a libclang “cursor”, which can have several “kinds”. D’s contracts allowed me to document and enforce at runtime which kind(s) of cursors a function expects, preventing bugs. Also, libclang in certain places requires the client code to manually manage memory. D’s GC makes for a wrapper API in which that is never a concern.

During development, he exposed some bugs in the DMD frontend.

I tried using sumtype in a separate branch of dpp to first convert libclang AST entities into types that are actually enforced at compile time instead of run time. Unfortunately that caused me to have to switch to compiling all code at once since sumtype behaves differently in separate compilation, triggering previously unseen frontend bugs.

For unit testing, he uses unit-threaded, a library he created to augment D’s built-in unit testing feature with advanced functionality. To achieve this, the library makes use of D’s compile-time reflection features. But dpp has a lot of unit tests.

Given the number of tests I wrote for dpp, compiling takes a very long time. This is exacerbated by -unittest, which is a known issue. Not using unit-threaded’s runner would speed up compilation, but then I’d lose all the features. It’d be better if the compile-time reflection required were made faster.

Perhaps he’ll see some joy there when Stefan Koch’s ongoing work on NewCTFE is completed.

Átila will be speaking about dpp on May 9 as part of his presentation at DConf 2019 in London. The conference runs from May 8–11, so as I write there’s still plenty of time to register. For those who can’t make it, you can watch the livestream (a link to which will be provided in the D forums each day of the conference) or see the videos of all the talks on the D Language Foundation’s YouTube channel after DConf is complete.

Memoization in the D Programming Language

The D programming language provides advanced facilities for structuring programs logically, almost like Python or Ruby, but with high performance and the higher reliability of static typing and contract programming.

In this article, I will describe how to use D templates and mixins for memoization, that is, to automatically remember a function (or property) result.

std.functional.memoize from the standard library

The first way is built straight into Phobos, the D standard library, and is very easy to use:

import std.functional;
import std.stdio;

float doCalculations() {
    writeln("Doing calculations.");
    return -1.7; // the value of the calculations
}

// Apply the template “memoize” to the function doCalculations():
alias doCalculationsOnce = memoize!doCalculations;

Now the alias doCalculationsOnce() does the same as doCalculations(), but the calculations are done only once (if we call doCalculationsOnce() then “Doing calculations.” would be printed only once). This is useful for long slow calculations and in some other situations (like to create only a single window on the screen). That is memoization.

It’s even possible to memoize a function with arguments:

float doCalculations2(float arg1, float arg2) {
    writeln("Doing calculations.");
    return arg1 + arg2; // the value of the calculations
}

// Apply the template “memoize” to the function  doCalculations2():
alias doCalculationsOnce2 = memoize!doCalculations2;

void main(string[] args)
{
    writeln(doCalculationsOnce2(1.0, 1.2));
    writeln(doCalculationsOnce2(1.0, 1.3));
    writeln(doCalculationsOnce2(1.0, 1.3));
}

This outputs:

Doing calculations.
2.2
Doing calculations.
2.3
2.3

You see that the calculations are not repeated again when the argument values are the same.

Memoizing struct or class properties

I’ve found another way to memoize in D. It involves caching a property of a struct or class. Properties are zero-argument (for reading) or one-argument (for writing) member functions (or sometimes two-argument non-member functions) and differ only in syntax. I deemed it more elegant to cache a property of a struct or class rather than to cache a member function’s return value. My code can easily be changed to memoize a member function instead of a property, but you can always convert zero-argument member functions into a property, so why bother?

Here is the source (you can also install it from https://github.com/vporton/memoize-dlang for use in your programs):

module memoize;

mixin template CachedProperty(string name, string baseName = '_' ~ name) {
    mixin("private typeof(" ~ baseName ~ ") " ~ name ~ "Cache;");
    mixin("private bool " ~ name ~ "IsCached = false;");
    mixin("@property typeof(" ~ baseName ~ ") " ~ name ~ "() {\n" ~
          "if (" ~ name ~ "IsCached" ~ ") return " ~ name ~ "Cache;\n" ~
          name ~ "IsCached = true;\n" ~
          "return " ~ name ~ "Cache = " ~ baseName ~ ";\n" ~
          '}');
}

It is used like this (with either structs or classes at your choice):

import memoize;

struct S {
    @property float _x() { return 1.5; }
    mixin CachedProperty!"x";
}

Then just:

S s;
assert(s.x == 1.5);

Or you can specify the explicit name of the cached property:

import memoize;

struct S {
    @property float _x() { return 1.5; }
    mixin CachedProperty!("x", "_x");
}

CachedProperty is a template mixin. Template mixins insert the declarations of the template body directly into the current context. In this case, the template body is composed of string mixins. As you can guess, a string mixin generates code at compile time from strings. So,

struct S {
    @property float _x() { return 1.5; }
    mixin CachedProperty!"x";
}

turns into

struct S {
    @property float _x() { return 1.5; }
    private typeof(_x) xCache;
    private bool xIsCached = false;
    @property typeof(_x) x() {
        if (xIsCached) return xCache;
        xIsCached = true;
        return xCache = _x;
    }
}

That is, it sets xCache to _x unless xIsCached and also sets xIsCached to true when retrieving x.

The idea originates from the following Python code:

class cached_property(object):
    """A version of @property which caches the value.  On access, it calls the
    underlying function and sets the value in `__dict__` so future accesses
    will not re-call the property.
    """
    def __init__(self, f):
        self._fname = f.__name__
        self._f = f

    def __get__(self, obj, owner):
        assert obj is not None, 'call {} on an instance'.format(self._fname)
        ret = obj.__dict__[self._fname] = self._f(obj)
        return ret

My way of memoization does not (yet) involve caching return values of functions with arguments.


Victor Porton is an open source developer, a math researcher, and a Christian writer. He earns his living as a programmer.

Using const to Enforce Design Decisions

The saying goes that the best code is no code. As soon as a project starts to grow, technical debt is introduced. When a team is forced to adapt to a new company guideline inconsistent with their previous vision, the debt results from a business decision. This could be tackled at the company level. Sometimes technical debt can arise simply due to the passage of time, when new versions of dependencies or of the compiler introduce breaking changes. You can try to tackle this by letting your local physicist stop the flow of time. More often however, technical debt is caused when issues are fixed by a quick hack, due to time pressure or a lack of knowledge of the code base. Design strategies that were carefully crafted are temporarily neglected. This blog post will focus on using the const modifier. It is one of the convenient tools D offers to minimize the increase of technical debt and enforce design decisions.

To keep a code base consistent, often design guidelines, either explicit or implicit, are put in place. Every developer on the team is expected to adhere to the guidelines as a gentleman’s agreement. This effectively results in a policy that is only enforced if both the programmer and the reviewer have had enough coffee. Simple changes, like adding a call to an object method, might seem innocent, but can reduce the consistency and the traceability of errors. To detect this in a code review requires in-depth knowledge of the method’s implementation.

An example from a real world project I’ve worked on is generating financial transactions for a read-only view, the display function in the following code fragment. Nothing seemed wrong with it, until I realized that those transactions were persisted and eventually used for actual payments without being calculated again, as seen in the process method. Potentially different payments occurred depending on whether the user decided to glance at the summary, thereby triggering the generation with new currency exchange rates, or just blindly clicked the OK button. That’s not what an innocent bystander like myself expects and has caused many frowns already.

public class Order
{
    private Transaction[] _transactions;

    public Transaction[] getTransactions()
    {
        _transactions = calculate();
        return _transactions;
    }

    public void process()
    {
        foreach(t; _transactions){
            // ...
        }
    }
}

void display(Order order)
{
    auto t = order.getTransactions();
    show(t);
}

The internet has taught me that if it is possible, it will one day happen. Therefore, we should make an attempt to make the undesired impossible.

A constant feature

By default, variables and object instances in D are mutable, just like in many other programming languages. If we want to prevent objects from ever changing, we can mark them immutable, i.e. immutable MyClass obj = new MyClass();. immutable means that we can modify neither the object reference (head constant) nor the object’s properties (tail constant). The first case corresponds to final in Java and readonly in C#, both of which signify head constant only. D’s implementation means that nobody can ever modify an object marked immutable. What if an object needs to be mutable in one place, but immutable in another? That’s where D’s const pops in.

Unlike immutable, whose contract states that an object cannot be mutated through any reference, const allows an object to be modified through another, non-const reference. This means it’s illegal to initialize an immutable reference with a mutable one, but a const reference can be initialized with a mutable, const, or immutable reference. In a function parameter list, const is preferred over immutable because it can accept arguments with either qualifier or none. Schematically, it can be visualized as in the following figure.

const relationships

A constant detour

D’s const differs from C++ const in a significant way: it’s transitive (see the const(FAQ) for more details). In other words, it’s not possible to declare any object in D as head constant. This isn’t obvious from examples with classes, since classes in D are reference types, but is more apparent with pointers and arrays. Consider these C++ declarations:

const int *cp0;         // mutable pointer to const data
int const *cp1;         // alternative syntax for the same

Variable declarations in C++ are best read from right to left. Although const int is likely the more common syntax, int const matches the way the declaration should be read: cp0 is a mutable pointer to a const int. In D, the equivalent of cp0 and cp1 is:

const(int)* dp0;

The next example shows what head constant looks like in C++.

int *const cp2;         // const pointer to mutable data

We can read the declaration of cp2 as: cp2 is a const pointer to a mutable int. There is no equivalent in D. It’s possible in C++ to have any number and combination of const and mutable pointers to const or mutable data. But in D, if const is applied to the outermost pointer, then it applies to all the inner pointers and the data as well. Or, as they say, it’s turtles all the way down.

The equivalent in C++ looks like this:

int const *const cp3;         // const pointer to const data

This declaration says cp3 is a const pointer to const data, and is possible in D like so:

const(int*) dp1;
const int* dp2;     // same as dp1

All of the above syntax holds true for D arrays:

const(int)[] a1;    // mutable reference to const data
const(int[]) a2;    // const reference to const data
const int[] a3;     // same as a2

const in the examples can be replaced with immutable, with the caveat that initializers must match the declaration, e.g. immutable(int)* can only be initialized with immutable(int)*.

Finally, note that classes in D are reference types; the reference is baked in, so applying const or immutable to a class reference is always equivalent to const(p*) and there is no equivalent to const(p)* for classes. Structs in D, on the other hand, are value types, so pointers to structs can be declared like the int pointers in the examples above.

A constant example

For the sake of argument, we assume that updating the currency exchange rates is a function that, by definition, needs to mutate the order. After updating the order, we want to show the updated prices to our user. Conceptually, the display function should not modify the order. We can prevent mutation by adding the const modifier to our function parameter. The implicit rule is now made explicit: the function takes an Order as input, and treats the object as a constant. We no longer have a gentleman’s agreement, but a formal contract. Changing the contract will, hopefully, require thorough negotiation with your peer reviewer.

void updateExchangeRates(Order order);
void display(const Order order);

void updateAndDisplay()
{
    Order order = //…
    updateExchangeRates(order);
    display(order); // Implicitly converted to const.
}

As with all contracts, defining it is the easiest part. The hard part is enforcing it. The D compiler is our friend in this problem. If we try to compile the code, we will get a compiler error.

void display(const Order order)
{
    // ERROR: cannot call mutable method
    auto t = order.getTransactions();
    show(t);
}

We never explicitly stated that getTransactions doesn’t modify the object. As the method is virtual by default, the compiler cannot derive the behavior either way. Without that knowledge, the compiler is required to assume that the method might modify the object. In other words, in the D justice system one is guilty until proven innocent. Let’s prove our innocence by marking the method itself const, telling the compiler that we do not intend to modify our data.

public class Order
{
    private Transaction[] _transactions;

    public Transaction[] getTransactions() const
    {
        _transactions = calculate(); // ERROR: cannot mutate field
        return _transactions;
    }
}

void display(const Order order)
{
    auto t = order.getTransactions(); // Now compiles :)
    show(t);
}

By marking the method const, the original compile error has moved away. The promise that we do not modify any object state is part of the method signature. The compiler is now satisfied with the method call in the display function, but finds another problem. Our getter, which we stated should not modify data, actually does modify it. We found our code smell by formalizing our guidelines and letting the compiler figure out the rest.

It seems promising enough to try it on a real project.

A constant application

I had a pet project lying around and decided to put the effort into enforcing the constraint. This is what inspired me to write this post. The project is a four-player mahjong game. The relevant part, in abstraction, is highlighted in the image.

Mahjong abstraction

The main engine behind the game is the white box in the center. A player or AI is sent a message with a const view of the game data for display purposes and to determine their next move. A message is sent back to the engine, which then determines the mutation on the internally mutable game data. The most obvious win is that I cannot accidentally modify the game data when drawing my UI. Which, of course, appeared to be the case before I refactored in the const-ness of the game data.

Upon closer inspection, coming from the UI there is only one way to manipulate the state of the game. The UI sends a message to the engine and remains oblivious of the state changes that need to be applied. This also encourages layered development and improves testability of the code. So far, so good. But during the refactoring, a problem arose. Recall that marking an object with const means that only member functions that promise not to modify the object, those marked with const themselves, can be called. Some of these could be trivially fixed by applying const or, at worst, inout (a sort of wildcard). However, the more persistent issues, like in the Order example, required me to go back to the drawing board and rethink my domain. In the end, being forced to think about mutability versus immutability improved my understanding of my own code base.

A constant verdict

Is const all good? It’s not a universal answer and certainly has downsides. The most notable one is that this kills lazy initialization where a property is computed only when first requested and then the result is cached. Sometimes, like in the earlier example, this is a code smell, but there are legit use cases. In my game, I have a class that composes the dashboard per player. Updating it is expensive and rarely required. The screen, however, gets rendered sixty times a second. It makes sense to cache the dashboards and only update them when the player objects change. I could split the method in two, but then my abstraction would leak its optimization. I settled for not using const here, as it was a module-private class and didn’t have a large impact on my codebase.

A complaint that is sometimes heard regarding const is that it does not work well with one of D’s main selling points, ranges. A range is D’s implementation of a lazy iterator, usable in foreach loops and heavily used in std.algorithm. The functions in std.algorithm can handle ranges with const elements perfectly fine. However, iterating a range changes the internal values and ultimately consumes the range. Therefore, when a range itself is const, it cannot be iterated over. I think this makes sense by design, but I can imagine that this behavior can be surprising in edge cases. I haven’t encountered this as I generate new ranges on every query, both on mutable and const objects.

A guideline for when to use const would be to separate the queries from the command, a.k.a. ye olde Command-Query Separation (CQS). All queries should, in principle, be const. To perform a mutation, even on nested objects, one should call a method on the object. I’d double down on this and state that member functions should be commands, with logic that can be overridden, and therefore never be marked constant. Queries basically serve as a means to peek at encapsulated data and don’t need to be overridden. This should be a final, non-virtual, function that simply exposes a read-only view on the inner field. For example, using D’s module-private modifier on the field in conjunction with an acquainted function in the same module, we can put the logic inside the class definition and the queries outside.

// in order.d
public class Order
{
    private Transaction[] _transactions; // Accessible in order.d

    public void process(); // Virtual and not restricted
}

public const(Transaction)[] getTransactions(const Order order)
{
    // Function, not virtual, operates on a read-only view of Order
    return order._transactions;
}

// in view.d
void display(const Order order)
{
    auto t = order.getTransactions();
    show(t);
}

We should take care, however, not to overapply the modifier. The question that we need to answer is not “Do I modify this object here?”, but rather, “Does it make sense that the object is modified here?” Wrongly assuming constant objects will result in trouble when you finally need to change the instance due to a new feature. For example, in my game a central Game class contains the players’ hands, but doesn’t explicitly modify them. However, given the structure of my code, it does not make sense to make the player objects constant, as free functions in the engine do use mutable player instances of the game object.

Reflecting on my design, even when writing this blog post, gave me valuable insights. Taking the effort to properly use the tool called const has paid off for me. It improved the structure of my code and improved my understanding of the ramblings I write. It is like any other tool not a silver bullet. It serves to formalize our gentleman’s agreement and is therefore just as fragile as one.


Marco graduated in physics, were he used Fortran and Matlab. He used Programming in D to learn application programming. After 3 years of being a C# developer, he is now trainer-coach of mainly Java and C# developers at Sogyo. Marco uses D for programming experiments and side projects including his darling mahjong game.

Containerize Your D Server Application

A container consists of an application packed together with all of its required dependencies. The container is run as an isolated process on Linux or Windows. The Docker tool has made the handling of containers very popular and is now the de-facto standard for deploying containers to a cloud environment. In this blog post I discuss how to create a simple vibe.d application and ship it as a Docker container.

Setting up the build environment

I use Ubuntu 18.10 as my development environment for this application. Additionally, I installed the packages ldc (the LLVM-based D compiler), dub (the D package manager and build tool), gcc, zlib1g-dev and libssl-dev (all required for compiling my vibe.d application). To build and run my container I use Docker CE. I installed it following the instructions at https://docs.docker.com/install/linux/docker-ce/ubuntu/. As the last step, I added my user to the docker group (sudo adduser kai docker).

A sample REST application

My vibe.d application is a very simple REST server. You can call the /hello endpoint (with an optional name parameter) and you get back a friendly message in JSON format. The second endpoint, /healthz, is intended as a health check and simply returns the string "OK". You can clone my source repository at https://github.com/redstar/vibed-docker/ to get the source code. Here is the application:

import vibe.d;
import std.conv : to;
import std.process : environment;
import std.typecons : Nullable;

shared static this()
{
    logInfo("Environment dump");
    auto env = environment.toAA;
    foreach(k, v; env)
        logInfo("%s = %s", k, v);

    auto host = environment.get("HELLO_HOST", "0.0.0.0");
    auto port = to!ushort(environment.get("HELLO_PORT", "17890"));

    auto router = new URLRouter;
    router.registerRestInterface(new HelloImpl());

    auto settings = new HTTPServerSettings;
    settings.port = port;
    settings.bindAddresses = [host];

    listenHTTP(settings, router);

    logInfo("Please open http://%s:%d/hello in your browser.", host, port);
}

interface Hello
{
    @method(HTTPMethod.GET)
    @path("hello")
    @queryParam("name", "name")
    Msg hello(Nullable!string name);

    @method(HTTPMethod.GET)
    @path("healthz")
    string healthz();
}

class HelloImpl : Hello
{
    Msg hello(Nullable!string name) @safe
    {
        logInfo("hello called");
        return Msg(format("Hello %s", name.isNull ? "visitor" : name));
    }

    string healthz() @safe
    {
        logInfo("healthz called");
        return "OK";
    }
}

struct Msg
{
    string msg;
}

And this is the dub.sdl file to compile the application:

name "hellorest"
description "A minimal REST server."
authors "Kai Nacke"
copyright "Copyright © 2018, Kai Nacke"
license "BSD 2-clause"
dependency "vibe-d" version="~>0.8.4"
dependency "vibe-d:tls" version="*"
subConfiguration "vibe-d:tls" "openssl-1.1"
versions "VibeDefaultMain"

Compile and run the application with dub. Then open the URL http://127.0.0.1:17890/hello to check that you get a JSON result.

A cloud-native application should follow the twelve-factor app methodology. You can read about the twelve-factor app at https://12factor.net/. In this post I only highlight two of the factors: III. Config and XI. Logs.

Ideally, you build an application only once and then deploy it into different environments, e.g. first to your quality testing environment and then to production. When you ship your application as a container, it comes with all of its required dependencies. This solves the problem that different versions of a library might be installed in different environments, possibly causing hard-to-find errors. You still need to find a solution for how to deal with different configuration settings. Port numbers, passwords or the location of databases are all configuration settings which typically differ from environment to environment. The factor III. Config recommends that the configuration be stored in environment variables. This has the advantage that you can change the configuration without touching a single file. My application follows this recommendation. It uses the environment variable HELLO_HOST for the configuration of the host IP and the variable HELLO_PORT for the port number. For easy testing, the application uses the default values 0.0.0.0 and 17890 in case the variables do not exist. (To be sure that every configuration is complete, it would be safer to stop the application with an error message in case an environment variable is not found.)

The application writes log entries on startup and when a url endpoint is called. The log is written to stdout. This is exactly the point of factor XI. Logs: an application should not bother to handle logs at all. Instead, it should treat logs as an event stream and write everything to stdout. The cloud environment is then responsible for collecting, storing and analyzing the logs.

Building the container

A Docker container is specified with a Dockerfile. Here is the Dockerfile for the application:

FROM ubuntu:cosmic

RUN \
  apt-get update && \
  apt-get install -y libphobos2-ldc-shared81 zlib1g libssl1.1 && \
  rm -rf /var/lib/apt/lists/*

COPY hellorest /

USER nobody

ENTRYPOINT ["/hellorest"]

A Docker container is a stack of read-only layers. With the first line, FROM ubuntu:cosmic, I specify that I want to use this specific Ubuntu version as the base layer of my container. During the first build, this layer is downloaded from Docker Hub. Every other line in the Dockerfile creates a new layer. The RUN line is executed at build time. I use it to install dependent libraries which are needed for the application. The COPY command copies the executable into the root directory inside the container. And last, CMD specifies the command which the container will run.

Run the Docker command

docker build -t vibed-docker/hello:v1 .

to build the Docker container. After the container is built successfully, you can run it with

docker run -p 17890:17890 vibed-docker/hello:v1

Now open again the URL http://127.0.0.1:17890/hello. You should get the same result as before. Congratulations! Your vibe.d application is now running in a container!

Using a multi-stage build for the container

The binary hellorest was compiled outside the container. This creates difficulties as soon as dependencies in your development environment change. It is easy to integrate compiliation into the Dockerfile, but this creates another issue. The requirements for compiling and running the application are different, e.g. the compiler is not required to run the application.

The solution is to use a multi-stage build. In the first stage, the application is build. The second stage contains only the runtime dependencies and application binary built in the first stage. This is possible because Docker allows the copying of files between stages. Here is the multi-stage Dockerfile:

FROM ubuntu:cosmic AS build

RUN \
  apt-get update && \
  apt-get install -y ldc gcc dub zlib1g-dev libssl-dev && \
  rm -rf /var/lib/apt/lists/*

COPY . /tmp

WORKDIR /tmp

RUN dub -v build

FROM ubuntu:cosmic

RUN \
  apt-get update && \
  apt-get install -y libphobos2-ldc-shared81 zlib1g libssl1.1 && \
  rm -rf /var/lib/apt/lists/*

COPY --from=build /tmp/hellorest /

USER nobody

ENTRYPOINT ["/hellorest"]

In my repository I called this file Dockerfile.multi. Therefore, you have to specify the file on the command line:

docker build -f Dockerfile.multi -t vibed-docker/hello:v1 .

Building the container now requires much more time because a clean build of the application is included. The advantage is that your build environment is now independent of your host environment.

Where to go from here?

Using containers is fun. But the fun diminishes as soon as the containers get larger. Using Ubuntu as the base image is comfortable but not the best solution. To reduce the size of your container you may want to try Alpine Linux as the base image, or use no base image as all.

If your application is split over several containers then you can use Docker Compose to manage your containers. For real container orchestration in the cloud you will want to learn about Kubernetes.


A long-time contributor to the D community, Kai Nacke is the author of ‘D Web Development‘ and a maintainer of LDC, the LLVM D Compiler.