Category Archives: Tutorials

wc in D: 712 Characters Without a Single Branch

After reading “Beating C With 80 Lines Of Haskell: Wc”, which I found on Hacker News, I thought D could do better. So I wrote a wc in D.

The Program

It consists of one file and has 34 lines and 712 characters.

import std.stdio : writefln, File;
import std.algorithm : map, fold, splitter;
import std.range : walkLength;
import std.typecons : Yes;
import std.uni : byCodePoint;

struct Line {
	size_t chars;
	size_t words;
}

struct Output {
	size_t lines;
	size_t words;
	size_t chars;
}

Output combine(Output a, Line b) pure nothrow {
	return Output(a.lines + 1, a.words + b.words, a.chars + b.chars);
}

Line toLine(char[] l) pure {
	return Line(l.byCodePoint.walkLength, l.splitter.walkLength);
}

void main(string[] args) {
	auto f = File(args[1]);
	Output o = f
		.byLine(Yes.keepTerminator)
		.map!(l => toLine(l))
		.fold!(combine)(Output(0, 0, 0));

	writefln!"%u %u %u %s"(o.lines, o.words, o.chars, args[1]);
}

Sure, it is using Phobos, D’s standard library, but then why wouldn’t it? Phobos is awesome and ships with every D compiler. The program itself does not contain a single if statement. The Haskell wc implementation has several if statements. The D program, apart from the main function, contains three tiny functions. I could have easily put all the functionally in one range chain, but then it probably would have exceeded 80 characters per line. That’s a major code-smell.

The Performance

Is the D wc faster than the coreutils wc? No, but it took me 15 minutes to write mine (I had to search for walkLength, because I forgot its name).

file lines bytes coreutils haskell D
app.d 46 906 3.5 ms +- 1.9 ms 39.6 ms +- 7.8 ms 8.9 ms +- 2.1 ms
big.txt 862 64k 4.7 ms +- 2.0 ms 39.6 ms +- 7.8 ms 9.8 ms +- 2.1 ms
vbig.txt 1.7M 96M 658.6ms +- 24.5ms 226.4 ms +- 29.5 ms 1.102 s +- 0.022 s
vbig2.txt 12.1M 671M 4.4 s +- 0.058 s 1.1 s +- 0.039 s 7.4 s +- 0.085 s

Memory:

file coreutils haskell D
app.d 2052K 7228K 7708K
big.txt 2112K 7512K 7616K
vbig.txt 2288K 42620K 7712K
vbig2.txt 2360K 50860K 7736K

Is the Haskell wc faster? For big files, absolutely, but then it is using threads. For small files, GNU’s coreutils still beats the competition. At this stage my version is very likely IO bound, and it’s fast enough anyway.

I’ll not claim that one language is faster than another. If you spend a chunk of time on optimizing a micro-benchmark, you are likely going to beat the competition. That’s not real life. But I will claim that functional programming in D gives functional programming in Haskell a run for its money.

A Bit About Ranges

Digital Mars D logoA range is an abstraction that you can consume through iteration without consuming the underlying collection (if there is one). Technically, a range can be a struct or a class that adheres to one of a handful of Range interfaces. The most basic form, the InputRange, requires the function

void popFront();

and two members or properties:

T front;
bool empty;

T is the generic type of the elements the range iterates.

In D, ranges are special in a way that other objects are not. When a range is given to a foreach statement, the compiler does a little rewrite.

foreach (e; range) { ... }

is rewritten to

for (auto __r = range; !__r.empty; __r.popFront()) {
    auto e = __r.front;
    ...
}

auto e = infers the type and is equivalent to T e =.

Given this knowledge, building a range that can be used by foreach is easy.

struct Iota {
	int front;
	int end;

	@property bool empty() const {
		return this.front == this.end;
	}

	void popFront() {
		++this.front;
	}
}

unittest {
	import std.stdio;
	foreach(it; Iota(0, 10)) {
		writeln(it);
	}
}

Iota is a very simple range. It functions as a generator, having no underlying collection. It iterates integers from front to end in steps of one. This snippet introduces a little bit of D syntax.

@property bool empty() const {

The @property attribute allows us to use the function empty the same way as a member variable (calling the function without the parenthesis). The trailing const means that we don’t modify any data of the instance we call empty on. The built-in unit test prints the numbers 0 to 10.

Another small feature is the lack of an explicit constructor. The struct Iota has two member variables of type int. In the foreach statement in the test, we create an Iota instance as if it had a constructor that takes two ints. This is a struct literal. When the D compiler sees this, and the struct has no matching constructor, the ints will be assigned to the struct’s member variables from top to bottom in the order of declaration.

The relation between the three members is really simple. If empty is false, front is guaranteed to return a different element, the next one in the iteration, after a call to popFront. After calling popFront the value of empty might have changed. If it is true, this means there are no more elements to iterate and any further calls to front are not valid. According to the InputRange documentation:

  • front can be legally evaluated if and only if evaluating empty has, or would have, equaled false.
  • front can be evaluated multiple times without calling popFront or otherwise mutating the range object or the underlying data, and it yields the same result for every evaluation.

Now, using foreach statements, or loops in general, is not really functional in my book. Lets say we want to filter all uneven numbers of the Iota range. We could put an if inside the foreach block, but that would only make it worse. It would be nicer if we had a range that takes a range and a predicate that can decide if an element is okay to pass along or not.

struct Filter {
	Iota input;
	bool function(int) predicate;

	this(Iota input, bool function(int) predicate) {
		this.input = input;
		this.predicate = predicate;
		this.testAndIterate();
	}

	void testAndIterate() {
		while(!this.input.empty
				&& !this.predicate(this.input.front))
		{
			this.input.popFront();
		}
	}

	void popFront() {
		this.input.popFront();
		this.testAndIterate();
	}

	@property int front() {
		return this.input.front;
	}

	@property bool empty() const {
		return this.input.empty;
	}
}

bool isEven(int a) {
	return a % 2 == 0;
}

unittest {
	foreach(it; Filter(Iota(0,10), &isEven)) {
		writeln(it);
	}
}

Filter is again really simple: it takes one Iota and a function pointer. On construction of Filter, we call testAndIterate, which pops elements from Iota until it is either empty or the predicate returns false. The idea is that the passed predicate decides what to filter out and what to keep. The properties front and empty just forward to Iota. The only thing that actually does any work is popFront. It pops the current element and calls testAndIterate. That’s it. That’s an implementation of filter.

Sure, there is a new while loop in testAndIterate, but rewriting that with recursion is just silly, in my opinion. What makes D great is that you can use the right tool for the job. Functional programming is fine and dandy a lot of the time, but sometimes it’s not. If a bit of inline assembly would be necessary or nicer, use that.

The call to Filter still does not look very nice. Assuming, you are used to reading from left to right, Filter comes before Iota, even though it is executed after Iota. D has no pipe operator, but it does have Uniform Function Call Syntax (UFCS). If an expression can be implicitly converted to the first parameter of a function, the function can be called like it is a member function of the type of the expression. That’s a lot of words, I know. An example helps:

string foo(string a) {
	return a ~ "World";
}

unittest {
	string a = foo("Hello ");
	string b = "Hello ".foo();
	assert(a == b);
}

The above example shows two calls to the function foo. As the assert indicates, both calls are equivalent. What does that mean for our Iota Filter example? UFCS allows us to rewrite the unit test to:

unittest {
	foreach(it; Iota(1,10).Filter(&isEven)) {
		writeln(it);
	}
}

Implementing a map/transform range should now be possible for every reader. Sure, Filter can be made more abstract through the use of templates, but that’s just work, nothing conceptually new.

Of course, there are different kinds of ranges, like a bidirectional range. Guess what that allows you to do. A small tip: a bidirectional range has two new primitives called back and popBack. There are other range types as well, but after you understand the input range demonstrated twice above, you pretty much know them all.

P.S. Just to be clear, do not implement your own filter, map, or fold; the D standard library Phobos has everything you every need. Have a look at std.algorithm and std.range.

About the Author

Robert Schadek received a master’s degree in Computer Science at the University of Oldenburg. His master thesis was titled “DMCD A Distributed Multithreading Caching D Compiler” where he work on building a D compiler from scratch. He was a computer science PhD student from 2012–2018 at the University of Oldenburg. His PhD research focuses on quorum systems in combination with graphs. Since 2018 he is happily using D in his day job working for Symmetry Investments.

What is Symmetry Investments?

Symmetry Investments is a global investment company with offices in Hong Kong, Singapore and London. We have been in business since 2014 after successfully spinning off from a major New York-based hedge fund.

At Symmetry, we seek to engage in intelligent risk-taking to create value for our clients, partners and employees. We derive our edge from our capacity to generate Win-Wins – in the broadest sense. Win-Win is our fundamental ethical and strategic principle. By generating Win-Wins, we can create unique solutions that reconcile perspectives that are usually seen as incompatible or opposites, and encompass the best that each side has to offer. We integrate fixed-income arbitrage with global macro strategies in a novel way. We invent and develop technology that focuses on the potential of human-machine integration. We build systems where machines do what they do best, supporting people to do what people do best. We are creating a collaborative meritocracy: a culture where individual contribution serves both personal and collective goals – and is rewarded accordingly. We value both ownership thinking AND cooperative team spirit, self-realisation AND community.

People at Symmetry Investments have been active participants in the D community since 2014. We have sponsored the development of excel-d, dpp, autowrap, libmir, and various other projects. We started Symmetry Autumn of Code in 2018 and hosted DConf 2019 in London.

Fuzzing Your D Application with LDC and AFL

Fuzzing, or fuzz testing, is a powerful method to find hidden bugs in your application. The basic idea is to present random input to your application and monitor how it behaves. If it crashes or shows some other unusual behavior then you have found a bug.

The use of true random input is not very effective, as most applications reject such input. Therefore many fuzz testing tools mutate valid input, e.g. flipping one or two bits, and present this mutated input to the application. This approach is easy to automate. A fuzz test can run for hours or days until an input is found which crashes your application.

Fuzz testing is very popular. A lot of security bugs have been found with this method. So it’s better to fuzz test your application by yourself instead of waiting for your users to report serious bugs!

Johan Engelen showed at DConf 2018 and in more detail in a blog post how you can use LLVM libFuzzer to fuzz test your application. For libFuzzer, you need to write a test driver. This is powerful because you can make decisions about the function to test. The downside is that you have to code the test driver.

AFL (short for American Fuzzy Lop, a rabbit breed) is another tool to fuzz test an application. AFL has a different approach than libFuzzer and does not require coding. The application under test has to read its data from stdin or from a file. The binary must be instrumented, which requires a recompile of the application. In case you have no source code for the application you can use AFL together with QUEMU. No instrumentation is required but the tests run much slower.

Because random input is not a good choice, you give AFL one or more valid input files, preferably of a small size. AFL mutates this input file, e.g. by flipping a single bit. This new input file is presented to your application and the reaction on it is observed. With the instrumentation in place, AFL discovers the path the data takes through your application. The relationship between bit flips and different code paths that run because of the bit flips is recorded and used to discover new paths and to trigger unexpected behavior. Input which causes crashes is saved in a directory. The main UI gives a lot of information, including how many unique crashes occured in the test session.

AFL works best if the input is a small binary, e.g. a PNG or a ZIP file. If your application has a more verbose and structured input (e.g. a programming language) then you can provide a dictionary which helps AFL with the basic syntax.

The latest release of AFL has an interesting feature. For instrumenting code compiled with clang, a small LLVM plugin is used. This plugin can also be used with LDC, making it possible to fuzz test your D application!

I used AFL to fuzz test LLtool, my recursive-descent parser generator presented at DConf 2019. LLtool expects a grammar description as a file or on stdin. If no error is found, then a D fragment of a recursive-descent parser is produced. Here, I show my approach.

First of all, you need to install AFL. It is included in most Linux distributions, e.g. Ubuntu. A FreeBSD port is also available. One caveat here: please make sure that the AFL plugin is compiled with the same LLVM version as LDC. Otherwise you will see an error message like

ld-elf.so.1: /usr/local/lib/afl/afl-llvm-pass.so: Undefined symbol "...."

during compilation. In this case, download AFL from the link above and compile it yourself.

Different distributions install AFL in different locations. You need to find out the path. E.g. Ubuntu uses /usr/lib/afl, FreeBSD uses /usr/local/lib/afl. I use an environment variable to record this value for later use (bash syntax):

export AFL_PATH=`/ust/lib/afl`

To instrument your code you have to specify the AFL plugin on the LDC command line:

ldc2 -plugin=$AFL_PATH/afl-llvm-pass.so *.d

You will see a short statistic emitted by the new pass:

afl-llvm-pass 2.52b by <lszekeres@google.com>
[+] Instrumented 16118 locations (non-hardened mode, ratio 100%).

For LLVM instrumentation, AFL requires a small runtime library. You need to link the object file $AFL_PATH/afl-llvm-rt.o into your application.

In my dub.sdl file I created a special build type for AFL. This puts all the steps above into a single place. Plus, you can copy and paste this build type directly to your own dub.sdl file because the only dependencies are AFL and LDC!

buildType "afl" {
    toolchainRequirements dmd="no" gdc="no" ldc=">=1.0.0"
    dflags "-plugin=$AFL_PATH/afl-llvm-pass.so"
    sourceFiles "$AFL_PATH/afl-llvm-rt.o"
    versions "AFL"
    buildOptions "debugMode" "debugInfo" "unittests"
}

Now you can type dub build -b=afl on the command line to instrument your application for use with afl. Do not forget to set the AFL_PATH environment variable, otherwise dub will complain.

Now create two new directories called testcases and findings. Put a small, valid input file into the testcases directory. For example save this

%token number
%%
expr: term "+" term;
term: factor "*" factor;
factor: number;

as file t1.g in the testcases folder. Inputs which crash the application will be saved in the findings directory.

To call AFL, you type on the command line:

afl-fuzz -i testcases -o findings ./LLtool --DRT-trapExceptions=0 @@

Two parts of the command line require further explanation. If the application requires a file for input, you specify the file path as @@. Otherwise AFL assumes that the application reads the input from stdin.

If the application crashes, then AFL saves the input causing the crash in the findings/crashes directory. But the D runtime is very friendly. Exceptions uncaught by the application are caught by the D runtime, a stack trace is printed, and the application terminates. This does not count as a crash for AFL. To produce a crash you have to specify the D runtime option --DRT-trapExceptions=0. For more information, read the relevant edition of This week in D.

It is worth reading the AFL documentation because there it provides a lot of tips and background information. Enjoy watching AFL crashing your application and producing test cases for you!


A long-time contributor to the D community, Kai Nacke is the author of ‘D Web Development‘ and a maintainer of LDC, the LLVM D Compiler.

Containerize Your D Server Application

A container consists of an application packed together with all of its required dependencies. The container is run as an isolated process on Linux or Windows. The Docker tool has made the handling of containers very popular and is now the de-facto standard for deploying containers to a cloud environment. In this blog post I discuss how to create a simple vibe.d application and ship it as a Docker container.

Setting up the build environment

I use Ubuntu 18.10 as my development environment for this application. Additionally, I installed the packages ldc (the LLVM-based D compiler), dub (the D package manager and build tool), gcc, zlib1g-dev and libssl-dev (all required for compiling my vibe.d application). To build and run my container I use Docker CE. I installed it following the instructions at https://docs.docker.com/install/linux/docker-ce/ubuntu/. As the last step, I added my user to the docker group (sudo adduser kai docker).

A sample REST application

My vibe.d application is a very simple REST server. You can call the /hello endpoint (with an optional name parameter) and you get back a friendly message in JSON format. The second endpoint, /healthz, is intended as a health check and simply returns the string "OK". You can clone my source repository at https://github.com/redstar/vibed-docker/ to get the source code. Here is the application:

import vibe.d;
import std.conv : to;
import std.process : environment;
import std.typecons : Nullable;

shared static this()
{
    logInfo("Environment dump");
    auto env = environment.toAA;
    foreach(k, v; env)
        logInfo("%s = %s", k, v);

    auto host = environment.get("HELLO_HOST", "0.0.0.0");
    auto port = to!ushort(environment.get("HELLO_PORT", "17890"));

    auto router = new URLRouter;
    router.registerRestInterface(new HelloImpl());

    auto settings = new HTTPServerSettings;
    settings.port = port;
    settings.bindAddresses = [host];

    listenHTTP(settings, router);

    logInfo("Please open http://%s:%d/hello in your browser.", host, port);
}

interface Hello
{
    @method(HTTPMethod.GET)
    @path("hello")
    @queryParam("name", "name")
    Msg hello(Nullable!string name);

    @method(HTTPMethod.GET)
    @path("healthz")
    string healthz();
}

class HelloImpl : Hello
{
    Msg hello(Nullable!string name) @safe
    {
        logInfo("hello called");
        return Msg(format("Hello %s", name.isNull ? "visitor" : name));
    }

    string healthz() @safe
    {
        logInfo("healthz called");
        return "OK";
    }
}

struct Msg
{
    string msg;
}

And this is the dub.sdl file to compile the application:

name "hellorest"
description "A minimal REST server."
authors "Kai Nacke"
copyright "Copyright © 2018, Kai Nacke"
license "BSD 2-clause"
dependency "vibe-d" version="~>0.8.4"
dependency "vibe-d:tls" version="*"
subConfiguration "vibe-d:tls" "openssl-1.1"
versions "VibeDefaultMain"

Compile and run the application with dub. Then open the URL http://127.0.0.1:17890/hello to check that you get a JSON result.

A cloud-native application should follow the twelve-factor app methodology. You can read about the twelve-factor app at https://12factor.net/. In this post I only highlight two of the factors: III. Config and XI. Logs.

Ideally, you build an application only once and then deploy it into different environments, e.g. first to your quality testing environment and then to production. When you ship your application as a container, it comes with all of its required dependencies. This solves the problem that different versions of a library might be installed in different environments, possibly causing hard-to-find errors. You still need to find a solution for how to deal with different configuration settings. Port numbers, passwords or the location of databases are all configuration settings which typically differ from environment to environment. The factor III. Config recommends that the configuration be stored in environment variables. This has the advantage that you can change the configuration without touching a single file. My application follows this recommendation. It uses the environment variable HELLO_HOST for the configuration of the host IP and the variable HELLO_PORT for the port number. For easy testing, the application uses the default values 0.0.0.0 and 17890 in case the variables do not exist. (To be sure that every configuration is complete, it would be safer to stop the application with an error message in case an environment variable is not found.)

The application writes log entries on startup and when a url endpoint is called. The log is written to stdout. This is exactly the point of factor XI. Logs: an application should not bother to handle logs at all. Instead, it should treat logs as an event stream and write everything to stdout. The cloud environment is then responsible for collecting, storing and analyzing the logs.

Building the container

A Docker container is specified with a Dockerfile. Here is the Dockerfile for the application:

FROM ubuntu:cosmic

RUN \
  apt-get update && \
  apt-get install -y libphobos2-ldc-shared81 zlib1g libssl1.1 && \
  rm -rf /var/lib/apt/lists/*

COPY hellorest /

USER nobody

ENTRYPOINT ["/hellorest"]

A Docker container is a stack of read-only layers. With the first line, FROM ubuntu:cosmic, I specify that I want to use this specific Ubuntu version as the base layer of my container. During the first build, this layer is downloaded from Docker Hub. Every other line in the Dockerfile creates a new layer. The RUN line is executed at build time. I use it to install dependent libraries which are needed for the application. The COPY command copies the executable into the root directory inside the container. And last, CMD specifies the command which the container will run.

Run the Docker command

docker build -t vibed-docker/hello:v1 .

to build the Docker container. After the container is built successfully, you can run it with

docker run -p 17890:17890 vibed-docker/hello:v1

Now open again the URL http://127.0.0.1:17890/hello. You should get the same result as before. Congratulations! Your vibe.d application is now running in a container!

Using a multi-stage build for the container

The binary hellorest was compiled outside the container. This creates difficulties as soon as dependencies in your development environment change. It is easy to integrate compiliation into the Dockerfile, but this creates another issue. The requirements for compiling and running the application are different, e.g. the compiler is not required to run the application.

The solution is to use a multi-stage build. In the first stage, the application is build. The second stage contains only the runtime dependencies and application binary built in the first stage. This is possible because Docker allows the copying of files between stages. Here is the multi-stage Dockerfile:

FROM ubuntu:cosmic AS build

RUN \
  apt-get update && \
  apt-get install -y ldc gcc dub zlib1g-dev libssl-dev && \
  rm -rf /var/lib/apt/lists/*

COPY . /tmp

WORKDIR /tmp

RUN dub -v build

FROM ubuntu:cosmic

RUN \
  apt-get update && \
  apt-get install -y libphobos2-ldc-shared81 zlib1g libssl1.1 && \
  rm -rf /var/lib/apt/lists/*

COPY --from=build /tmp/hellorest /

USER nobody

ENTRYPOINT ["/hellorest"]

In my repository I called this file Dockerfile.multi. Therefore, you have to specify the file on the command line:

docker build -f Dockerfile.multi -t vibed-docker/hello:v1 .

Building the container now requires much more time because a clean build of the application is included. The advantage is that your build environment is now independent of your host environment.

Where to go from here?

Using containers is fun. But the fun diminishes as soon as the containers get larger. Using Ubuntu as the base image is comfortable but not the best solution. To reduce the size of your container you may want to try Alpine Linux as the base image, or use no base image as all.

If your application is split over several containers then you can use Docker Compose to manage your containers. For real container orchestration in the cloud you will want to learn about Kubernetes.


A long-time contributor to the D community, Kai Nacke is the author of ‘D Web Development‘ and a maintainer of LDC, the LLVM D Compiler.

Writing a D Wrapper for a C Library

In porting to D a program I created for a research project, I wrote a D wrapper of a C library in an object-oriented manner. I want to share my experience with other programmers. This article provides some D tips and tricks for writers of D wrappers around C libraries.

I initially started my research project using the Ada 2012 programming language (see my article “Experiences on Writing Ada Bindings for a C Library” in Ada User Journal, Volume 39, Number 1, March 2018). Due to a number of bugs that I was unable to overcome, I started looking for another programming language. After some unsatisfying experiments with Java and Python, I settled on the D programming language.

The C Library

We have a C library, written in an object-oriented style (C structure pointers serve as objects, and C functions taking such structure pointers serve as methods). Fortunately for us, there is no inheritance in that C library.

The particular libraries we will deal with are the Redland RDF Libraries, a set of libraries which parse Resource Description Framework (RDF) files or other RDF resources, manages them, enables RDF queries, etc. Don’t worry if you don’t know what RDF is, it is not really relevant for this article.

The first stage of this project was to write a D wrapper over librdf. I modeled it on the Ada wrapper I had already written. One advantage I found in D over Ada is that template instantiation is easier—there’s no need in D to instantiate every single template invocation with a separate declaration. I expect this to substantially simplify the code of XML Boiler, my program which uses this library.

I wrote both raw bindings and a wrapper. The bindings translate the C declarations directly into D, and the wrapper is a new API which is a full-fledged D interface. For example, it uses D types with constructors and destructors to represent objects. It also uses some other D features which are not available in C. This is a work in progress and your comments are welcome.

The source code of my library (forked from Dave Beckett’s original multi-language bindings of his libraries) is available at GitHub (currently only in the dlang branch). Initially, I tried some automatic parsers of C headers which generate D code. I found these unsatisfactory, so I wrote the necessary bindings myself.

Package structure

I put my entire API into the rdf.* package hierarchy. I also have the rdf.auxiliary package and its subpackages for things used by or with my bindings. I will discuss some particular rdf.auxiliary.* packages below.

My mixins

In Ada I used tagged types, which are a rough equivalent of D classes, and derived _With_Finalization types from _Without_Finalization types (see below). However, tagged types increase variable sizes and execution time.

In D I use structs instead of classes, mainly for efficiency reasons. D structs do not support inheritance, and therefore have no virtual method table (vtable), but do provide constructors and destructors, making classes unnecessary for my use case (however, see below). To simulate inheritance, I use template mixins (defined in the rdf.auxiliary.handled_record module) and the alias this construct.

As I’ve said above, C objects are pointers to structures. All C pointers to structures have the same format and alignment (ISO/IEC 9899:2011 section 6.2.5 paragraph 28). This allows the representation of any pointer to a C structure as a pointer to an opaque struct (in the below example, URIHandle is an opaque struct declared as struct URIHandle;).

Using the mixins shown below, we can declare the public structs of our API this way (you should look into the actual source for real examples):

struct URIWithoutFinalize {
    mixin WithoutFinalize!(URIHandle,
                           URIWithoutFinalize,
                           URI,
                           raptor_uri_copy);
    // …
}
struct URI {
    mixin WithFinalize!(URIHandle,
                        URIWithoutFinalize,
                        URI,
                        raptor_free_uri);
}

The difference between the WithoutFinalize and WithFinalize mixins is explained below.

About finalization and related stuff

The main challenge in writing object-oriented bindings for a C library is finalization.

In the C library in consideration (as well as in many other C libraries), every object is represented as a pointer to a dynamically allocated C structure. The corresponding D object can be a struct holding the pointer (aka handle), but oftentimes a C function returns a so-called “shared handle”—a pointer to a C struct which we should not free because it is a part of a larger C object and shall be freed by the C library only when that larger C object goes away.

As such, I first define both (for example) URIWithoutFinalize and URI. Only URI has a destructor. For URIWithoutFinalize, a shared handle is not finalized. As D does not support inheritance for structs, I do it with template mixins instead. Below is a partial listing. See the above URI example on how to use them:

mixin template WithoutFinalize(alias Dummy,
                               alias _WithoutFinalize,
                               alias _WithFinalize,
                               alias copier = null)
{
    private Dummy* ptr;
    private this(Dummy* ptr) {
        this.ptr = ptr;
    }
    @property Dummy* handle() const {
        return cast(Dummy*)ptr;
    }
    static _WithoutFinalize fromHandle(const Dummy* ptr) {
        return _WithoutFinalize(cast(Dummy*)ptr);
    }
    static if(isCallable!copier) {
        _WithFinalize dup() {
            return _WithFinalize(copier(ptr));
        }
    }
    // ...
}


mixin template WithFinalize(alias Dummy,
                            alias _WithoutFinalize,
                            alias _WithFinalize,
                            alias destructor,
                            alias constructor = null)
{
    private Dummy* ptr;
    @disable this();
    @disable this(this);
    // Use fromHandle() instead
    private this(Dummy* ptr) {
        this.ptr = ptr;
    }
    ~this() {
        destructor(ptr);
    }
    /*private*/ @property _WithoutFinalize base() { // private does not work in v2.081.2
        return _WithoutFinalize(ptr);
    }
    alias base this;
    @property Dummy* handle() const {
        return cast(Dummy*)ptr;
    }
    static _WithFinalize fromHandle(const Dummy* ptr) {
        return _WithFinalize(cast(Dummy*)ptr);
    }
    // ...
}

I’ve used template alias parameters here, which allow a template to be parameterized with more than just types. The Dummy argument is the type of the handle instance (usually an opaque struct). The destructor and copier arguments are self-explanatory. For the usage of the constructor argument, see the real source (here it is omitted).

The _WithoutFinalize and _WithFinalize template arguments should specify the structs we define, allowing them to reference each other. Note that the alias this construct makes _WithoutFinalize essentially a base of _WithFinalize, allowing us to use all methods and properties of _WithoutFinalize in _WithFinalize.

Also note that instances of the _WithoutFinalize type may become invalid, i.e. it may contain dangling access values. It seems that there is no easy way to deal with this problem because of the way the C library works. We may not know when an object is destroyed by the C library. Or we may know but be unable to appropriately “explain” it to the D compiler. Just be careful when using this library not to use objects which are already destroyed.

Dealing with callbacks

To deal with C callbacks (particularly when accepting a void* argument for additional data) in an object-oriented way, we need a way to convert between C void pointers and D class objects (we pass D objects as C “user data” pointers). D structs are enough (and are very efficient) to represent C objects like librdf library objects, but for conveniently working with callbacks, classes are more useful because they provide good callback machinery in the form of virtual functions.

First, the D object, which is passed as a callback parameter to C, should not unexpectedly be moved in memory by the D garbage collector. So I make them descendants of this class:

class UnmovableObject {
    this() {
        GC.setAttr(cast(void*)this, GC.BlkAttr.NO_MOVE);
    }
}

Moreover, I add the property context() to pass it as a void* pointer to C functions which register callbacks:

abstract class UserObject : UnmovableObject {
    final @property void* context() const { return cast(void*)this; }
}

When we create a callback we need to pass a D object as a C pointer and an extern(C) function defined by us as the callback. The callback receives the pointer previously passed by us and in the callback code we should (if we want to stay object-oriented) convert this pointer into a D object pointer.

What we need is a bijective (“back and forth”) mapping between D pointers and C void* pointers. This is trivial in D: just use the cast() operator.

How to do this in practice? The best way to explain is with an example. We will consider how to create an I/O stream class which uses the C library callbacks to implement it. For example, when the user of our wrapper requests to write some information to a file, our class receives write message. To handle this message, our implementation calls our virtual function doWriteBytes(), which actually handles the user’s request.

private immutable DispatcherType Dispatch =
    { version_: 2,
      init: null,
      finish: null,
      write_byte : &raptor_iostream_write_byte_impl,
      write_bytes: &raptor_iostream_write_bytes_impl,
      write_end  : &raptor_iostream_write_end_impl,
      read_bytes : &raptor_iostream_read_bytes_impl,
      read_eof   : &raptor_iostream_read_eof_impl };


class UserIOStream : UserObject {
    IOStream record;
    this(RaptorWorldWithoutFinalize world) {
        IOStreamHandle* handle = raptor_new_iostream_from_handler(world.handle,
                                                                  context,
                                                                  &Dispatch);
        record = IOStream.fromNonnullHandle(handle);
    }
    void doWriteByte(char byte_) {
        if(doWriteBytes(&byte_, 1, 1) != 1)
            throw new IOStreamException();
    }
    abstract int doWriteBytes(char* data, size_t size, size_t count);
    abstract void doWriteEnd();
    abstract size_t doReadBytes(char* data, size_t size, size_t count);
    abstract bool doReadEof();
}

And for example:

int raptor_iostream_write_bytes_impl(void* context, const void* ptr, size_t size, size_t nmemb) {
    try {
        return (cast(UserIOStream)context).doWriteBytes(cast(char*)ptr, size, nmemb);
    }
    catch(Exception) {
        return -1;
    }
}

More little things

I “encode” C strings (which can be null) as a D template instance, Nullable!string. If the string is null, the holder is empty. However, it is often enough to transform an empty D string into a null C string (this can work only if we don’t differentiate between empty and null strings). See rdf.auxiliary.nullable_string for an actually useful code.

I would write a lot more advice on how to write D bindings for a C library, but you can just follow my source, which can serve as an example.

Static if

One thing which can be done in D but not in Ada is compile-time comparison via static if. This is a D construct (similar to but more advanced than C conditional preprocessor directives) which allows conditional compilation based on compile-time values. I use static if with my custom Version type to enable/disable features of my library depending on the available features of the version of the base C library in use. In the following example, rasqalVersionFeatures is a D constant defined in my rdf.config package, created by the GNU configure script from the config.d.in file.

static if(Version(rasqalVersionFeatures) >= Version("0.9.33")) {
    private extern extern(C)
    QueryResultsHandle* rasqal_new_query_results_from_string(RasqalWorldHandle* world,
                                                             QueryResultsType type,
                                                             URIHandle* base_uri,
                                                             const char* string,
                                                             size_t string_len);
    static create(RasqalWorldWithoutFinalize world,
                  QueryResultsType type,
                  URITypeWithoutFinalize baseURI,
                  string value)
    {
        return QueryResults.fromNonnullHandle(
            rasqal_new_query_results_from_string(world.handle,
                                                 type,
                                                 baseURI.handle,
                                                 value.ptr, value.length));
    }
}

Comparisons

Order comparisons between structs can be easily done with this mixin:

mixin template CompareHandles(alias equal, alias compare) {
    import std.traits;
    bool opEquals(const ref typeof(this) s) const {
        static if(isCallable!equal) {
          return equal(handle, s.handle) != 0;
        } else {
          return compare(handle, s.handle) == 0;
        }
    }
    int opCmp(const ref typeof(this) s) const {
      return compare(handle, s.handle);
    }
}

Sadly, this mixin has to be called in both the _WithoutFinalization and the _WithFinalization structs. I found no solution to write it once.

Conclusion

I’ve found that D is a great language for writing object-oriented wrappers around C libraries. There are some small annoyances like using class wrappers around structs for callbacks, but generally, D wraps up around C well.


Victor Porton is an open source developer, a math researcher, and a Christian writer. He earns his living as a programmer.

Interfacing D with C: Arrays Part 1

This post is part of an ongoing series on working with both D and C in the same project. The previous post showed how to compile and link C and D objects. This post is the first in a miniseries focused on arrays.

When interacting with C APIs, it’s almost a given that arrays are going to pop up in one way or another (perhaps most often as strings, a subject of a future article in the “D and C” series). Although D arrays are implemented in a manner that is not directly compatible with C, the fundamental building blocks are the same. This makes compatibility between the two relatively painless as long as the differences are not forgotten. This article is the first of a few exploring those differences.

When using a C API from D, it’s sometimes necessary to translate existing code from C to D. A new D program can benefit from existing examples of using the C API, and anyone porting a program from C that uses the API would do well to keep the initial port as close to the original as possible. It’s on that basis that we’re starting off with a look at the declaration and initialization syntax in both languages and how to translate between them. Subsequent posts in this series will cover multidimensional arrays, the anatomy of a D array, passing D arrays to and receiving C arrays from C functions, and how the GC fits into the picture.

My original concept of covering this topic was much smaller in scope, my intent to brush over the boring details and assume that readers would know enough of the basics of C to derive the why from the what and the how. That was before I gave a D tutorial presentation to a group among whom only one person had any experience with C. I’ve also become more aware that there are regular users of the D forums who have never touched a line of C. As such, I’ll be covering a lot more ground than I otherwise would have (hence a two-part article has morphed into at least three). I urge those for whom much of said ground is old hat not to get complacent in their skimming of the page! A comfortable experience with C is more apt than none at all to obscure some of the pitfalls I describe.

Array declarations

Let’s start with a simple declaration of a one-dimensional array:

int c0[3];

This declaration allocates enough memory on the stack to hold three int values. The values are stored contiguously in memory, one right after the other. c0 may or may not be initialized, depending on where it’s declared. Global variables and static local variables are default initialized to 0, as the following C program demonstrates.

definit.c

#include <stdio.h>

// global (can also be declared static)
int c1[3];

void main(int argc, char** argv)
{
    static int c2[3];       // static local
    int c3[3];              // non-static local

    printf("one: %i %i %i\n", c1[0], c1[1], c1[2]);
    printf("two: %i %i %i\n", c2[0], c2[1], c2[2]);
    printf("three: %i %i %i\n", c3[0], c3[1], c3[2]);
}

For me, this prints:

one: 0 0 0
two: 0 0 0
three: -1 8 0

The values for c3 just happened to be lying around at that memory location. Now for the equivalent D declaration:

int[3] d0;

Try it online

Here we can already find the first gotcha.

A general rule of thumb in D is that C code pasted into a D source file should either work as it does in C or fail to compile. For a long while, C array declaration syntax fell into the former category and was a legal alternative to the D syntax. It has since been deprecated and subsequently removed from the language, meaning int d0[3] will now cause the compiler to scold you:

Error: instead of C-style syntax, use D-style int[3] d0

It may seem an arbitrary restriction, but it really isn’t. At its core, it’s about consistency at a couple of different levels.

One is that we read declarations in D from right to left. In the declaration of d0, everything flows from right to left in the same order that we say it: “(d0) is an (array of three) (integers)”. The same is not true of the C-style declaration.

Another is that the type of d0 is actually int[3]. Consider the following pointer declarations:

int* p0, p1;

The type of both p0 and p1 is int* (in C, only p0 would be a pointer; p1 would simply be an int). It’s the same as all type declarations in D—type on the left, symbol on the right. Now consider this:

int d1[3], d2[3];
int[3] d4, d5;

Having two different syntaxes for array declarations, with one that splits the type like an infinitive, sets the stage for the production of inconsistent and potentially confusing code. By making the C-style syntax illegal, consistency is enforced. Code readability is a key component of maintainability.

Another difference between d0 and c0 is that the elements of d0 will be default initialized no matter where or how it’s declared. Module scope, local scope, static local… it doesn’t matter. Unless the compiler is told otherwise, variables in D are always default initialized to the predefined value specified by the init property of each type. Array elements are initialized to the init property of the element type. As it happens, int.init == 0. Translate definit.c to D and see it for yourself (open up run.dlang.io and give it a go).

When translating C to D, this default initialization business is a subtle gotcha. Consider this innocently contrived C snippet:

// static variables are default initialized to 0 in C
static float vertex[3];
some_func_that_expects_inited_vert(vertex);

A direct translation straight to D will not produce the expected result, as float.init == float.nan, not 0.0f!

When translating between the two languages, always be aware of which C variables are not explicitly initialized, which are expected to be initialized, and the default initialization value for each of the basic types in D. Failure to account for the subtleties may well lead to debugging sessions of the hair-pulling variety.

Default initialization can easily be disabled in D with = void in the declaration. This is particularly useful for arrays that are going to be loaded with values before they’re read, or that contain elements with an init value that isn’t very useful as anything other than a marker of uninitialized variables.

float[16] matrix = void;
setIdentity(matrix);

On a side note, the purpose of default initialization is not to provide a convenient default value, but to make uninitialized variables stand out (a fact you may come to appreciate in a future debugging session). A common mistake is to assume that types like float and char, with their “not a number” (float.nan) and invalid UTF–8 (0xFF) initializers, are the oddball outliers. Not so. Those values are great markers of uninitialized memory because they aren’t useful for much else. It’s the integer types (and bool) that break the pattern. For these types, the entire range of values has potential meaning, so there’s no single value that universally shouts “Hey! I’m uninitialized!”. As such, integer and bool variables are often left with their default initializer since 0 and false are frequently the values one would pick for explicit initialization for those types. Floating point and character values, however, should generally be explicitly initialized or assigned to as soon as possible.

Explicit array initialization

C allows arrays to be explicitly initialized in different ways:

int ci0[3] = {0, 1, 2};  // [0, 1, 2]
int ci1[3] = {1};        // [1, 0, 0]
int ci2[]  = {0, 1, 2};  // [0, 1, 2]
int ci3[3] = {[2] = 2, [0] = 1}; // [1, 0, 2]
int ci4[]  = {[2] = 2, [0] = 1}; // [1, 0, 2]

What we can see here is:

  • elements are initialized sequentially with the constant values in the initializer list
  • if there are fewer values in the list than array elements, then all remaining elements are initialized to 0 (as seen in ci1)
  • if the array length is omitted from the declaration, the array takes the length of the initializer list (ci2)
  • designated initializers, as in ci3, allow specific elements to be initialized with [index] = value pairs, and indexes not in the list are initialized to 0
  • when the length is omitted from the declaration and a designated initializer is used, the array length is based on the highest index in the initializer and elements at all unlisted indexes are initialized to 0, as seen in ci4

Initializers aren’t supposed to be longer than the array (gcc gives a warning and initializes a three-element array to the first three initializers in the list, ignoring the rest).

Note that it’s possible to mix the designated and non-designated syntaxes in a single initializer:

// [0, 1, 0, 5, 0, 0, 0, 8, 44]
int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};

Each value without a designation is applied in sequential order as normal. If there is a designated initializer immediately preceding it, then it becomes the value for the next index, and all other elements are initialized to 0. Here, 0 and 1 go to indexes ci5[0] and ci5[1] as normal, since they are the first two values in the list. Next comes a designator for ci5[3], so ci5[2] has no corresponding value in this list and is initialized to 0. Next comes the designator for ci5[7].  We have skipped ci5[4], ci5[5], and ci5[6],  so they are all initialized to 0. Finally, 44 lacks a designator, but immediately follows [7], so it becomes the value for the element at ci5[8]. In the end, ci5 is initialized to a length of 9 elements.

Also note that designated array initializers were added to C in C99. Some C compiler versions either don’t support the syntax or require a special command line flag to enable it. As such, it’s probably not something you’ll encounter very much in the wild, but still useful to know about when you do.

Translating all of these to D opens the door to more gotchas. Thankfully, the first one is a compiler error and won’t cause any heisenbugs down the road:

int[3] wrong = {0, 1, 2};
int[3] right = [0, 1, 2];

Array initializers in D are array literals. The same syntax can be used to pass anonymous arrays to functions, as in writeln([0, 1, 2]). For the curious, the declaration of wrong produces the following compiler error:

Error: a struct is not a valid initializer for a int[3]

The {} syntax is used for struct initialization in D (not to be confused with struct literals, which can also be used to initialize a struct instance).

The next surprise comes in the translation of ci1.

// int ci1[3] = {1};
int[3] di1 = [1];

This actually produces a compiler error:

Error: mismatched array lengths, 3 and 1

What gives? First, take a look at the translation of ci2:

// int ci2[] = {0, 1, 2};
int[] di2 = [0, 1, 2];

In the C code, there is no difference between ci1 and ci2. They both are fixed-length, three-element arrays allocated on the stack. In D, this is one case where that general rule of thumb about pasting C code into D source modules breaks down.

Unlike C, D actually makes a distinction between arrays of types int[3] and int[]. The former is, like C, a fixed-length array, commonly referred to in D as a static array. The latter, unlike C, is a dynamic-length array, commonly referred to as a dynamic array or a slice. Its length can grow and shrink as needed.

Initializers for static arrays must have the same length as the array. D simply does not allow initializers shorter than the declared array length. Dynamic arrays take the length of their initializers. di2 is initialized with three elements, but more can be appended. Moreover, the initializer is not required for a dynamic array. In C, int foo[]; is illegal, as the length can only be omitted from the declaration when an initializer is present.

// gcc says "error: array size missing in 'illegalC'"
// int illegalC[]
int[] legalD;
legalD ~= 10;

legalD is an empty array, with no memory allocated for its elements. Elements can be added via the append operator, ~=.

Memory for dynamic arrays is allocated at the point of declaration only when an explicit initializer is provided, as with di2. If no initializer is present, memory is allocated when the first element is appended. By default, dynamic array memory is allocated from the GC heap (though the compiler may determine that it’s safe to allocate on the stack as an optimization) and space for more elements than needed is initialized in order to reduce the need for future allocations (the reserve function can be used to allocate a large block in one go, without initializing any elements). Appended elements go into the preallocated slots until none remain, then the next append triggers a new allocation. Steven Schveighoffer’s excellent array article goes into the details, and also describes array features we’ll touch on in the next part.

Often, when translating a declaration like ci2 to D, the difference between the fixed-length, stack-allocated C array and the dynamic-length, GC-allocated D array isn’t going to matter one iota. One case where it does matter is when the D array is declared inside a function marked @nogc:

@nogc void main()
{
    int[] di2 = [0, 1, 2];
}

Try it online

The compiler ain’t letting you get away with that:

Error: array literal in @nogc function D main may cause a GC allocation

The same error isn’t triggered when the array is static, since it’s allocated on the stack and the literal elements are just shoved right in there. New C programmers coming to D for the first time tend to reach for @nogc almost as if it goes against their very nature not to, so this is something they will bump into until they eventually come to the realization that the GC is not the enemy of the people.

To wrap this up, that big paragraph on designated array initializers in C is about to pull double duty. D also supports designated array initializers, just with a different syntax.

// [0, 1, 0, 5, 0, 0, 0, 8, 44]
// int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};
int[] di5 = [0, 1, 3:5, 7:8, 44];
int[9] di6 = [0, 1, 3:5, 7:8, 44];

Try it online

It works with both static and dynamic arrays, following the same rules and producing the same initialization values as in C.

The main takeaways from this section are:

  • there is a distinction in D between static and dynamic arrays, in C there is not
  • static arrays are allocated on the stack
  • dynamic arrays are allocated on the GC heap
  • uninitialized static arrays are default initialized to the init property of the array elements
  • dynamic arrays can be explicitly initialized and take the length of the initializer
  • dynamic arrays cannot be explicitly initialized in @nogc scopes
  • uninitialized dynamic arrays are empty

This is the time on the D Blog when we dance

There are a lot more words in the preceding sections than I had originally intended to write about array declarations and initialization, and I still have quite a bit more to say about arrays. In the next post, we’ll look at the anatomy of a D array and dig into the art of passing D arrays across the language divide.

Complicated Types: Prefer “alias this” Over “alias” For Easier-To-Read Error Messages

Nick Sabalausky is a long-time D user and contributor. He is the maintainer of mysql-native and Scriptlike. In this post, he presents a way to use a specific D language feature to improve error messages involving aliased types.


In the D programming language, alias is a common and handy feature that can be used to provide a simple name for a complex and verbose templated type.

As an example, consider the case of an algebraic type or tagged union:

// A type that can be either an int or a string
Algebraic!(int, string) someVariable;

That’s a fairly simple example. Much more complicated type names are common in D. This sort of thing can be a pain to repeat everywhere it’s used, and can make code difficult to read. alias is often used in situations like this to create a simpler shorthand name:

// A type that can be either an int or a string
alias MyType = Algebraic!(int, string);

// Ahh, much nicer!
MyType someVariable;

There’s one problem this still doesn’t solve. Anytime a compiler error message shows a type name, it shows the full original name, not the convenient easy-to-read alias. Instead of errors saying MyType, they’ll still say Algebraic!(int, string). This can be especially unfriendly if MyType is in the public API of a library and happens to be constructed using some private, internal-only template.

That can be fixed, and error messages forced to provide the customized name, by creating MyType as a separate type on its own, rather than an alias. But how? If this was C or C++, typedef would do the job nicely. There is a D equivalent, std.typecons.Typedef!T, which will create a separate type. But naming the type still involves alias, which just leads back to the same problem.

Luckily, D has another feature which can help simulate a C-style typedef: alias this. Used inside a struct (or class), alias this allows the struct to be implicitly converted to and behave just like any one of its members.

Incidentally, although alias and alias this are separate features of the language, they do have a shared history as their names suggest. Originally, alias was intended to be a variation on C’s typedef, one which would result in two names for the same type instead of two separate types. At the time, D had typedef as well, but it was eventually dropped as a language feature in favor of a standard library solution (the aforementioned std.typecons.Typedef template). As a variant of typedef, alias used the same syntax (alias TypeName AliasName;). Later, alias spawned the alias this feature, which was given a similar syntax: alias memberName this. When alias gained its modern syntax (alias AliasName = TypeName), a lengthy debate resulted in keeping the existing syntax for alias this.

Here is how alias this can be used to solve our problem:

// A type that can be either an int or a string
struct MyType {
    private Algebraic!(int, string) _data;
    alias _data this;
}

// Ahh, much nicer! And now error messages say "MyType"!
MyType someVariable;

There’s an important difference to be aware of, though. Before, when using alias, MyType and Algebraic!(int, string) were considered the same type. Now, they’re not. Is that a problem? What does that imply? Mainly, it means two things:

    1. Although this doesn’t affect any actual code, it can mean the compiler generates extra, duplicate template instantiations. If MyType is passed to one template, and somewhere else Algebraic!(int, string) is passed to the same template, the compiler will now generate two separate template instantiations instead of just one.
      In practice though, this shouldn’t be a problem unless you’re already in a genuine template-bloat situation and are trying to reduce template instantiations. Usually, this won’t be an issue.
    2. Although the alias this means MyType can still be implicitly converted to Algebraic!(int, string), the other way around no longer works. An Algebraic!(int, string) can no longer be implicitly converted to a MyType.
      Arguably, this can be considered a good thing if you believe, as I do, in using domain-specific types. But in any case, you can still manually convert the original type to your MyType with the basic built-in struct constructor:

      Algebraic!(int, string) algebVar;
      auto myVar = MyType(algebVar);

    So when you’re aliasing a large, complicated type name to a simpler name, consider using a struct and alias this instead, especially if it’s a type on offer in a library. There’s little downside, and it will greatly improve the readability of error messages for both yourself and your library’s users.

Interfacing D with C: Getting Started

One of the early design goals behind the D programming language was the ability to interface with C. To that end, it provides ABI compatibility, allows access to the C standard library, and makes use of the same object file formats and system linkers that C and C++ compilers use. Most built-in D types, even structs, are directly compatible with their C counterparts and can be passed freely to C functions, provided the functions have been declared in D with the appropriate linkage attribute. In many cases, one can copy a chunk of C code, paste it into a D module, and compile it with minimal adjustment. Conversely, appropriately declared D functions can be called from C.

That’s not to say that D carries with it all of C’s warts. It includes features intended to eliminate, or more easily avoid, some of the errors that are all too easy to make in C. For example, bounds checking of arrays is enabled by default, and a safe subset of the language provides compile-time enforcement of memory safety. D also changes or avoids some things that C got wrong, such as what Walter Bright sees as C’s biggest mistake: conflating pointers with arrays. It’s in these differences of implementation that surprises lurk for the uninformed.

This post is the first in a series exploring the interaction of D and C in an effort to inform the uninformed. I’ve previously written about the basics of this topic in an article at GameDev.net, and in my book, ‘Learning D’, where the entirety of Chapter 9 covers it in depth.

This blog series will focus on those aforementioned corner cases so that it’s not necessary to buy the book or to employ trial and error in order to learn them. As such, I’ll leave the basics to the GameDev.net article and recommend that anyone interfacing D with C (or C++) give it a read along with the official documentation.

The C and D code that I provide to highlight certain behavior is intended to be compiled and linked by the reader. The code demonstrates both error and success conditions. Recognizing and understanding compiler errors is just as important as knowing how to fix them, and seeing them in action can help toward that end. That implies some prerequisite knowledge of compiling and linking C and D source files. Happily, that’s the focus of the next section of this post.

For the C code, we’ll be using the Digital Mars C/C++ and Microsoft C/C++ compilers on Windows, and GCC and Clang elsewhere. On the D side, we’ll be working exclusively with the D reference compiler, DMD. Windows users unfamiliar with setting up DMD to work with the Microsoft tools will be well served by the post on this blog titled, ‘DMD, Windows, and C’.

We’ll finish the post with a look at one of the corner cases, one that is likely to rear its head early on in any exploration of interfacing D with C, particularly when creating bindings to existing C libraries.

Compiling and linking

The articles in this series will present example C source code that is intended to be saved and compiled into object files for linking with D programs. The command lines for generating the object files look pretty much the same on every platform, with a couple of caveats. We’ll look first at Windows, then lump all the other supported systems together in a single section.

In the next two sections, we’ll be working with the following C and D source files. Save them in the same directory (for convenience) and make sure to keep the names distinct. If both files have the same name in the same directory, then the object files created by the C compiler and DMD will also have the same name, causing the latter to overwrite the former. There are compiler switches to get around this, but for a tutorial we’re better off keeping the command lines simple.

chello.c

#include <stdio.h>
void say_hello(void) 
{
    puts("Hello from C!");
}

hello.d

extern(C) void say_hello();
void main() 
{
    say_hello();
}

The extern(C) bit in the declaration of the C function in the D code is a linkage attribute. That’s covered by the other material I referenced above, but it’s a potential gotcha we’ll look at later in this series.

Windows

The official DMD packages for Windows, available at dlang.org as a zip archive and an installer, are the only released versions of DMD that do not require any additional tooling to be installed as a prerequisite to compile D files. These packages ship with everything they need to compile 32-bit executables in the OMF format (again, I refer you to ‘DMD, Windows, and C’ for the details).

When linking any foreign object files with a D program, it’s important that the object file format and architecture match the D compiler output. The former is an issue primarily on Windows, while attention must be paid to the latter on all platforms.

Compiling C source to a format compatible with vanilla DMD on Windows requires the Digital Mars C/C++ compiler. It’s a free download and ships with some of the same tools as DMD. It outputs object files in the OMF format. With both it and DMD installed and on the system path, the above source files can be compiled, linked, and executed like so:

dmc -c chello.c
dmd hello.d chello.obj
hello

The -c option tells DMC to forego linking, causing it to only compile the C source and write out the object file chello.obj.

To get 64-bit output on Windows, DMC is not an option. In that case, DMD requires the Microsoft build tools on Windows. Once the MS build tools are installed and set up, open the preconfigured x64 Native Tools Command Prompt from the Start menu and execute the following commands (again, see ‘D, Windows, and C’ on this blog for information on how to get the Microsoft build tools and open the preconfigured command prompt, which may have a slightly different name depending on the version of Visual Studio or the MS Build Tools installed):

cl /c chello.c
dmd -m64 hello.d chello.obj
hello

Again, the /c option tells the compiler not to link. To produce 32-bit output with the MS compiler, open a preconfigured x86 Native Tools Command Prompt and execute these commands:

cl /c hello.c
dmd -m32mscoff hello.c chello.obj
hello

DMD recognizes the -m32 switch on Windows, but that tells it to produce 32-bit OMF output (the default), which is not compatible with Microsoft’s linker, so we must use -m32mscoff here instead.

Other platforms

On the other platforms D supports, the system C compiler is likely going to be GCC or Clang, one of which you will already have installed if you have a functioning dmd command. On Mac OS, clang can be installed via XCode in the App Store. Most Linux and BSD systems have a GCC package available, such as via the often recommended command line, apt-get install build-essential, on Debian and Debian-based systems. Please see the documentation for your system for details.

On these systems, the environment variable CC is often set to the system compiler command. Feel free to substitute either gcc or clang for CC in the lines below as appropriate for your system.

CC -c chello.c
dmd hello.d chello.o
./hello

This will produce either 32-bit or 64-bit output, depending on your system configuration. If you are on a 64-bit system and have 32-bit developer tools installed, you can pass -m32 to both CC and dmd to generate 32-bit binaries.

The long way

Now that we’re configured to compile and link C and D source in the same binary, let’s take a look at a rather common gotcha. To fully appreciate this one, it helps to compile it on both Windows and another platform.

One of the features of D is that all of the integral types have a fixed size. A short is always 2 bytes and an int is always 4. This never changes, no matter the underlying system architecture. This is quite different from C, where the spec only imposes relative requirements on the size of each integral type and leaves the specifics to the implementation. Even so, there are wide areas of agreement across modern compilers such that on every platform D currently supports the sizes for almost all the integral types match those in D. The exceptions are long and ulong.

In D, long and ulong are always 8 bytes across all platforms. This never changes. It lines up with the corresponding C types just fine on most 64-bit systems under the version(Posix) umbrella, where the C long and unsigned long are also 8 bytes. However, they are 4 bytes on 32-bit architectures. Moreover, they’re always 4 bytes on Windows, even on a 64-bit architecture.

Most C code these days will account for these differences either by using the preprocessor to define custom integral types or by making use of the C99 stdint.h where types such as int32_t and int64_t are unambiguously defined. Yet, it’s still possible to encounter C libraries using long in the wild.

Consider the following C function:

maxval.c

#include <limits.h>
unsigned long max_val(void)
{
    return ULONG_MAX;
}

The naive D implementation looks like this:

showmax1.d

extern(C) ulong max_val();
void main()
{
    import std.stdio : writeln;
    writeln(max_val());
}

What this does depends on the C compiler and architecture. For example, on Windows with dmc I get 7316910580432895, with x86 cl I get 59663353508790271, and 4294967295 with x64 cl. The last one is actually the correct value, even though the size of the unsigned long on the C side is still 4 bytes as it is in the other two scenarios. I assume this is because the x64 ABI stores return values in the 8-byte RAX register, so it can be read into the 8-byte ulong on the D side with no corruption. The important point here is that the two values in the x86 code are garbage because the D side is expecting a 64-bit return value from 32-bit registers, so it’s reading more than it’s being given.

Thankfully, DRuntime provides a way around this in core.c.config, where you’ll find c_long and c_ulong. Both of these are conditionally configured to match the compile-time C runtime implementation and architecture configuration. With this, all that’s needed is to change the declaration of max_val in the D module, like so:

showmax2.d

import core.stdc.config : c_ulong;
extern(C) c_ulong max_val();

void main()
{
    import std.stdio : writeln;
    writeln(max_val());
}

Compile and run with this and you’ll find it does the right thing everywhere. On Windows, it’s 4294967295 across the board.

Though less commonly encountered, core.stdc.config also declares a portable c_long_double type to match any long double that might pop up in a C library to which a D module must bind.

Looking ahead

In this post, we’ve gotten set up to compile and link C and D in the same executable and have looked at the first of several potential problem spots. We used DMD here, but it should be possible to substitute one of the other D compilers (ldc or gdc) without changing the command line (with the exception of -m32mscoff, which is specific to DMD). The next post in this series will focus entirely on getting D arrays and C arrays to cooperate. See you there!