Big Performance Improvement for std.regex

Dmitry Olshansky has been a frequent contributor to the D programming language. Perhaps his best known work is his overhaul of the std.regex module, which he architected as part of Google Summer of Code 2011. In this post, he describes an algorithmic optimization he implemented this past summer that resulted in a big performance win.


Optimizing std.regex has been my favorite pastime, but it has gotten harder over the years. It eventually became clear that micro-optimizing the engine’s state copy routine, or trying to avoid that extra write, wasn’t going to cut it anymore. To move further, I needed a new algorithmic improvement. This is how the so-called “Bit-NFA” came to be implemented. Developed in May of this year, it has come a long way to land in the main repository.

Before going into details, a short overview of the engine is called for. A user-specified pattern given to the engine first goes through a compilation process, where it gets transformed into a bytecode program, along with a bunch of lookup tables and auxiliary data-structures. Bytecode implies a VM, not unlike, say, the Java VM, but far simpler and more specific. In fact, there are two of them in std.regex, one that evaluates execution of threads in a backtracking manner and another one which evaluates all threads in lock-step, resolving any duplicates along the way.

Now, running a full blown VM, even a tiny one, on each character of input doesn’t sound all that high-performance. That’s why there is an extra trick, a kickstart engine (it should probably be called a “sidekick engine”), which is a dumb approximation of the full engine. It is run over the input first. When it spots something that looks like a match, the full engine is run to check it. The only requirement is that it can have no false negatives. That is, it has to detect as positive all matches of the regex pattern. This kickstart engine is the central piece of today’s post.

Historically, I intended for there to be a lot of different kickstart engines, ranging from a simple ‘memchr the first byte of the pattern’ to a Boyer-Moore search on the prefix of a pattern. But during the gory days of GSOC 2011, a simple solution came first and out-shadowed all others: the Shift Or algorithm.

Basically, it is an NFA (Nondeterministic Finite Automation), where each state is a bit in a word. Shifting this word advances all the states. Masking removes those that don’t match the current character. Importantly, shifting also places 0 as the first bit, indicating the active state.

With these two insights, the whole process of searching for a string becomes shifting + OR-masking the bits. The last point is checking for the successful match – one of the bits is in the finish state.

Looking at this marvelous construction, it’s tempting to try and overcome its limitation – the straight-forward execution of states. So let’s introduce some control flow by denoting some bits as jumps. To carry out a jump, we just need to map every combination of jump bits to the mask of the resulting positions. A basic hashmap could serve us well in this regard. Then the cycle becomes:

  1. Shift the word
  2. Capture control flow bits
  3. Lookup control flow table
  4. Mask AND with control flow bits
  5. Check for finish state(s) bits
  6. Mask OR with match filter table

In the end, we execute the whole engine with nothing more than a hash-map lookup, a table lookup, and a bit of bitwise operations. This is the essence of what I call the Bit-NFA engine.

Of course, there are some tricky bits, such as properly mapping bytecodes to bits. Then there comes Unicode.. oh gosh. The trick to Unicode, though, is having a fast path for ASCII < 0x80 and the rest. For ASCII, we just go with a simple table. Unicode is a two-staged variation of it. Two-staging the table let’s us coalesce identical pages, saving space for the whole 21 bits of the code point range.

Overall, the picture really is worth a thousand words. Here is how the new kickstart engine stacks up.

This now only leaves us to optimize the VMs further. The proven technique is JIT-ing the bytecode and is what top engines are doing. Still, I’m glad there are notable tricks to speed up regex execution in general without pulling out this heavy handed weapon.

Project Highlight: libasync

d6libasync is a cross-platform event loop library written completely in D.  It was created, and continues to be maintained, by Etienne Cimon, who started it as a native driver for vibe.d, a modular asynchronous I/O framework most often used for web app development in D.

In 2014 or so, I was looking for a framework to power my future web development projects. I wasn’t going to use an interpreted language, as binary executables were too attractive. I found vibe.d appealing because, coming from C++, it was relatively simple and featureful. So I studied it, along with the D programming language and the Phobos standard library.

vibe.d has always used libevent under the hood by default. This is where Etienne ran into a problem that bothered him.

I stumbled on some workflow issues when deploying vibe.d apps to other operating systems which may or may not have the right version of libevent in the package repository. I didn’t want to package a DLL with my server, or have to go through dependency hell with my software, and I wanted everything to be consistently written in D to reduce the mental complexity of switching programming languages or to debug other issues.

So he decided to study up on the system APIs across the platforms supported by DMD (Windows, Linux, *BSD and OS X) and create his own event loop library in D. Now he, and anyone using libasync, can issue a single command with DUB to compile and execute a web application without needing to worry about external event loop dependencies.

libasync takes advantage of D’s delegates to provide a very intuitive interface.

void testDNS() {
	auto dns = new shared AsyncDNS(g_evl);
	dns.handler((NetworkAddress addr) {
		writeln("Resolved to: ", addr.toString(), ", it took: ", g_swDns.peek().usecs, " usecs");
	}).resolveHost("127.0.0.1");
}

Etienne says of the code snippet above:

The D garbage collector will keep the AsyncDNS object in dns alive for as long as the delegate used in the parameter of dns.handler is alive in the heap, which is in this object. The delegate syntax is more simple to declare than Javascript, and it is also type-safe. This DNS resolver will work on any platform thrown at it, thanks to D’s compile-time version conditions.

libasync makes use of the asynchronous I/O facilities available on each supported platform and provides a number of event-handlers out of the box.

Cross-platform event handlers have been defined for DNS resolution, UDP Messages, (Buffered/Unbuffered) TCP Connections, TCP Listeners, File Operations, Thread-local (Notifiers) and Cross-thread Signals, Timers and File Watchers. The intrinsics involve EPoll for Linux, KQueue for OS X and BSD, and overlapped I/O for Windows. With all of these features thoroughly tested through a vibe.d driver, libasync has become a very fast and reliable library which I use in all of my projects. My benchmarks show it as being a little slower than the libevent driver in vibe.d, though its self-explanatory code base makes it seamless to understand, maintain, and deploy.

A libasync driver has been added to vibe.d and work is going on to improve the library’s performance.

The stability of the underlying OS features makes for very little need for changes, although there is a big improvement involving the proactor pattern in the works for libasync and a new architecture for vibe.d. Together, those two developments are likely to increase the library’s performance significantly.

If you find yourself needing an event loop in D and want to give libasync a spin, you can visit the library’s page at the DUB repository for information on how to add it as a dependency to your own DUB-managed projects. libasync, in turn, has only one dependency itself, another library maintained by Etienne that provides a set of allocators and allocator-friendly containers called memutils.

It wasn’t so long ago that anyone using D who wanted something like libasync or memutils would need to either roll their own or bind to a C library. The ever-expanding list of libraries in the DUB repository, created and made available by members of the D community like Etienne, make it much easier to jump into D today than ever before.

GSoC Report: std.experimental.xml

Lodovico Giaretta is currently pursuing a Bachelor Degree in Computer Science at the University of Trento, Italy. He participated in Google Summer of Code 2016, working on a new XML module for D’s standard library, Phobos.


GSoC-icon-192I started coding in high school with Pascal. I immediately fell in love with programming, so I started studying it by myself and learned both Java and C++. But when I was using Java, I was missing the powerful metaprogramming facilities and the low level features of C++. When I was using C++, I was missing the simplicity and usability of Java. So I started looking for a language that “filled the gap” between these two worlds. After looking into many languages, I finally found D. Despite being more geared towards C++, D provides a very high level of productivity, as correct code is easier to read and write. As an example, I was programming in D for several months before I was bitten by a segfault for the first time. It easily became one of my favorite languages.

The apparent lack of libraries, my lack of time, and the need to use other languages for university projects made me forget D for some time, at least until someone told me about Google Summer of Code. When I discovered that the D Foundation was participating, I immediately decided to take part and found that there was the need for a new XML library. So I contacted Craig Dillabaugh and Robert Schadek and started to plan my adventure. I want to take this occasion to thank them for their great continuous support, and the entire community for their feedback and help.

This was my first public codebase and my first contribution to a big open source project, so I didn’t really know anything about project management. The advice about this field from my mentor Robert has been fundamental for my success; he helped me improve my workflow, keep my efforts focused towards the goal, and set up correctness tests and performance benchmarks. Without his help, I would never have been able to reach this point.

The first thing to do when writing a library is to pick a set of principles that will guide development. This choice is what will give the library its peculiar shape, and by having a look around one finds that there are XML libraries that want to be minimal in terms of codebase size, or very small in terms of binary size, or fully featured and 101% adherent to the specification. For std.experimental.xml, I decided to focus on genericity and extensibility. The processing is divided in many small, quite simple stages with well-defined interfaces implemented by templated components. The result is a pipeline that is fully customizable; you can add or substitute components anywhere, and add custom validation steps and custom error handlers.

From an XML library, a programmer expects different high level constructs: a SAX parser, a DOM parser, a DOM writer and maybe some extensions like XPath. He also expects to be able to process different kinds of input and, for std.experimental.xml, to “hack in” his own logic in the process. This requires a simple, yet very flexible, intermediate representation, which is produced by the parsing stage and can be easily manipulated, validated, and transformed into whatever high-level construct is needed. For this, I chose a concept called Cursor, a pointer inside an XML document, which can be queried for properties of a given XML node or advanced to a subsequent one. It’s akin to Java’s StAX (Streaming API for XML), from which I took inspiration. In std.experimental.xml, all validations and transformations are implemented as chains of Cursors, which are then usually processed by a SAX parser or a DOM builder, but can also be used directly in user code, providing more control and speed.

Talking about speed, which in XML processing can be very important, I have to admit that I didn’t spend much time on optimization, leaving a lot of space for future performance improvements. Yet, the library is fast enough to guarantee that, for big files (where performance matters), an SSD (Solid State Drive) is needed to move the bottleneck from the fetching to the processing of the data. Being this is an extensible and configurable library, the user can choose his tradeoffs with fine granularity, trading input validation and higher level constructs for speed at will.

To conclude, the GSoC is finished, but the library is not. Although most parts are there, some bits are still missing. As a new university semester has started, time is becoming a rare and valuable resource, but I’ll do my best to finish the work in a short time so that Phobos can finally have a modern XML library to be proud of. I also have a plan to add more advanced functionality, like XML Schemas and XPath, but I don’t know when I’ll manage to work on that, as it is quite a lot to do.

Project Highlight: DlangUI

Vadim Lopatin is an active D user who, like many in the D community, comes from a Java and C++ background.

My current job is writing a Java backend for a virtual call center . I’ve also worked as a C++ developer on IP PBX devices. Programming is my hobby as well. My biggest hobby project, which I’ve rewritten from scratch twice in the last 15 years, is CoolReader, a cross platform e-book reader written in C++.

He kept hearing news about D and, over time, became more interested in its “cool features”, like CTFE and code generation. So, three years ago, he decided to initiate a couple of projects to learn its features.

DDBC is a database connector similar to Java’s JDBC, with an API close to the original. HibernateD is an ORM library, similar to the Java-based Hibernate. Unlike Java, D allows the use of compile-time code introspection and code generation. It was interesting work, and I was impressed by the power of D.

Both projects proved to be no more than learning exercises, however, as he never used either himself and neither became popular in the community. Now they are largely abandoned, but he has since found another area where he could apply his talents and, as it turns out, where community interest has been much higher.

The new idea came about as he surveyed the state of available GUI libraries in D. While there were several options to choose from, he wasn’t satisfied by the fact that they were all either non-native wrappers or not cross-platform. He had already written a cross-platform GUI in C++ for CoolReader GL, a version of his ebook reader that uses the same GUI on all supported platforms. Why not implement another one in D?

He has a long list of items he thinks are important for a GUI library to check off. A few of them are:

  • Cross-platform — the same code should work on all platforms with simple recompilation.
  • Internationalization — it should be easy to write multilingual apps. Unicode everywhere. Strings externalized to resources.
  • Hardware acceleration — take advantage of DirectX or OpenGL where available, but it should be possible to use software rendering where they aren’t.
  • Resolution independence — flexible layouts must be used instead of fixed pixel-by-pixel positioning of controls.

A markup language for describing layouts, touch screen support, 3D rendering, customizable look-and-feel, easy event handling, and several other items complete the list. A big set of requirements for one person to work on alone, but he already had a good deal of experience with the GUI he wrote for CoolReader. So when he got going with his DlangUI project, his previous work is where he started.

Part of DlangUI is a direct port of the CoolReader GL GUI. It was easy to reuse big parts of C++ code thanks to the similarity of D and C++ syntax.

So he set about checking items off of his list. Such as support for hardware acceleration via an interface that easily supports different rendering backends, one of which is implemented using OpenGL. But as things got under way, he discovered that there is one particular issue with porting C++ to D that arises in the parts that can’t be directly reused.

The D GC does not bring any help for resource management, since object destructors may be called in any thread, in any order, or never at all. If an object owns some resources, it ought to be destroyed in a predictable way. Therefore, widgets and other objects holding resources must be destroyed manually by their owners

DlangUI uses reference counting for easy freeing of owned objects. Widgets remove their children on destroy. Windows remove their widgets when closing. I had to add debug mode instance counts for various objects, and corresponding messages in the log, to make sure all resources are freed gracefully.

Some resources (e.g. images) are cached. Their references may be taken from the cache, used, and then released often. To allow cleanup of caches, all such resources have usage flags. The cache provides a checkpoint method which removes the usage flag from all items, and a cleanup method which frees all cache items which have not been used since the last checkpoint.

He has worked on a number of items from his list, such as theme customization.

DlangUI themes are inspired by the Android API. It borrows Android’s state drawables (they may be even used as is), nine-patch PNGs, and resource versions for different screen sizes or resolutions. Usually widgets don’t use a hardcoded look and feel or layout properties. Instead, they use a style ID referencing to currently selected theme. If the theme is changed in runtime, all widgets receive a corresponding notification so that they can reload any cached values from the new theme. Simply providing a new theme changes the  look and feel significally.

Currently, two standard themes are provided in DlangUI: default (light) and dark. Applications may specify a standard theme as a parent, and override only the styles it needs. Standard theme resources are usually embedded into the application executable using the cool D feature import("filename"). Applications may embed their own resources as well. This allows creating a single file app withoug any additional resource files needing to be shipped with the executable.

Another check mark can  be place next to layouts. Here, he again looked to Android.

To support multiple screen resolutions and sizes, widgets must be placed and resized using layouts instead of direct pixel-based positioning. DlangUI uses Android API-like layouts for grouping, placing and resizing widgets, based on a two-phase measure/layout scheme.

And, while a GUI can be assembled entirely in code, he took inspiration from elsewhere for ideas to knock the markup item off his list.

Manually writing code to create a widget hierarchy and setting their properties is a bit boring. DlangUI offers the possibility to create widgets using a JSON-based description similar to Qt QML. I call it DML. Currently, only the creation of widgets and the setting of their properties are supported. In future, I hope to add the ability to describe signal handlers in DML, and automatically assign signals to handlers, and widget instances to variables. There is a GUI app, dlangui:dmledit, which helps to write DML. It combines a text editor for DML and a preview window to see the results.

When it comes to being cross-platform, a lot has been done so far, thanks to different backend implementations: Win32, SDL2, DSFML, X11, and Android. Not long ago, Vadim even announced a text-based interface which works in the Linux terminal or Windows console.

It was a real surprise for me how few changes were required to implement text-mode support. Besides the backend code and the text-mode drawing buffer implementation, most of the changes came in the form of a Console theme. Only a few fixes were required in the widgets, removing several hardcoded margins and sizes. Even DlangIDE, a DlangUI-based IDE for the D programming language, is now usable in terminals.

Here’s what DlangUI’s components normally look like on Windows.
screenshot-example1-windows

And this is what DlangIDE looks like running in the Windows console.

dlangide

When compared to screenshots of programs running with different DlangUI backends, seeing it in a terminal like that is pretty darn cool.

DlangUI manages event handling via signals and has built-in support for 3D graphics, including a 3D scene package. And work still continues on making Vadim’s list smaller, as well as addressing the problems with the library.

The most mentioned issue the non-native look and feel of widgets. Although it’s possible to make a theme looking exactly like native one, it would not track system theme changes anyway. There’s no system menu support on OS X and in Gnome (where a common menu is used for all apps). The documentation is poor. There is some DDOX-generated documentation, but it’s not detailed enough and I seldom update. I need more tutorials and examples. And some advanced controls are missing, e.g. an HTML view.

He also says that there are too few developers working on the project. While some users have submitted PRs, the majority of the work has been done by Vadim alone. Given what he has produced so far, that’s a pretty impressive achievement. But, in addition to solving the problems above, he’s got a lot more he wants to implement, such as:

  • An XML+CSS rendering widget to show/edit HTML or rich text
  • Refactoring DlangUI to extract window creation, OpenGL context creation, drawing, font support, input events code from widget set – for cases when no widgets are needed
  •  Mobile platforms support improvements – add iOS backend, improve android support, improve touch mode support
  •  Native system menu support on OS X and Gnome
  •  Support for fallback fonts in font engines, from which to get missing symbols
  •  A native OSX backend based on Cocoa instead of libSDL2
  • Improvements in Scene3D to make it suitable for writing 3D games

If you need a GUI for your D app, DlangUI is a viable option today. More importantly, if you’re able and willing to help out here and there, Vadim sure could use a few more hands on a few more keyboards!

 

 

 

How to Write @trusted Code in D

Steven Schveighoffer is the creator and maintainer of the dcollections and iopipe libraries. He was the primary instigator of D’s inout feature and the architect of a major rewrite of the language’s built-in arrays. He also authored the oft-recommended introductory article on the latter.


d6In computer programming, there is a concept of memory-safe code, which is guaranteed at some level not to cause memory corruption issues. The ultimate holy grail of memory safety is to be able to mechanically verify you will not corrupt memory no matter what. This would provide immunity from attacks via buffer overflows and the like. The D language provides a definition of memory safety that allows quite a bit of useful code, but conservatively forbids things that are sketchy. In practice, the compiler is not omnipotent, and it lacks the context that we humans are so good at seeing (most of the time), so there is often the need to allow otherwise risky behavior. Because the compiler is very rigid on memory safety, we need the equivalent of a cast to say “yes, I know this is normally forbidden, but I’m guaranteeing that it is fine”. That tool is called @trusted.

Because it’s very difficult to explain why @trusted code might be incorrect without first discussing memory safety and D’s @safe mechanism, I’ll go over that first.

What is Memory Safe Code?

The easiest way to explain what is safe, is to examine what results in unsafe code. There are generally 3 main ways to create a safety violation in a statically-typed language:

  1. Write or read from a buffer outside the valid segment of memory that you have access to.
  2. Cast some value to a type that allows you to treat a piece of memory that is not a pointer as a pointer.
  3. Use a pointer that is dangling, or no longer valid.

The first item is quite simple to achieve in D:

auto buf = new int[1]; 
buf[2] = 1;

With default bounds checks on, this results in an exception at runtime, even in code that is not checked for safety. But D allows circumventing this by accessing the pointer of the array:

buf.ptr[2] = 1;

For an example of the second, all that is needed is a cast:

*cast(int*)(0xdeadbeef) = 5;

And the third is relatively simple as well:

auto buf = new int[1];
auto buf2 = buf;
delete buf;  // sets buf to null
buf2[0] = 5; // but not buf2.

Dangling pointers also frequently manifest by pointing at stack data that is no longer in use (or is being used for a different reason). It’s very simple to achieve:

int[] foo()
{
    int[4] buf;
    int[] result = buf[];
    return result;
}

So simply put, safe code avoids doing things that could potentially result in memory corruption. To that end, we must follow some rules that prohibit such behavior.

Note: dereferencing a null pointer in user-space is not considered a memory safety issue in D. Why not? Because this triggers a hardware exception, and generally does not leave the program in an undefined state with corrupted memory. It simply aborts the program. This may seem undesirable to the user or the programmer, but it’s perfectly fine in terms of preventing exploits. There are potential memory issues possible with null pointers, if one has a null pointer to a very large memory space. But for safe D, this requires an unusually large struct to even begin to worry about it. In the eyes of the D language, instrumenting all pointer dereferences to check for null is not worth the performance degradation for these rare situations.

D’s @safe rules

D provides the @safe attribute that tags a function to be mechanically checked by the compiler to follow rules that should prevent all possible memory safety problems. Of course, there are cases where developers need to make exceptions in order to get some meaningful work done.

The following rules are geared to prevent issues like the ones discussed above (listed in the spec here).

  1. Changing a raw pointer value is not allowed. If @safe D code has a pointer, it has access only to the value pointed at, no others. This includes indexing a pointer.
  2. Casting pointers to any type other than void* is not allowed. Casting from any non-pointer type to a pointer type is not allowed. All other casts are OK (e.g. casting from float to int) as long as they are valid. Casting a dynamic array to a void[] is also allowed.
  3. Unions that have pointer types that overlap other types cannot be accessed. This is similar to rules 1 and 2 above.
  4. Accessing an element in or taking a slice from a dynamic array must be either proven safe by the compiler, or incur a bounds check during runtime. This even happens in release mode, when bounds checks are normally omitted (note: dmd’s option -boundscheck=off will override this, so use with extreme caution).
  5. In normal D, you can create a dynamic array from a pointer by slicing the pointer. In @safe D, this is not allowed, since the compiler has no idea how much space you actually have available via that pointer.
  6. Taking a pointer to a local variable or function parameter (variables that are stored on the stack) or taking a pointer to a reference parameter are forbidden. An exception is slicing a local static array, including the function foo above. This is a known issue.
  7. Explicit casting between immutable and mutable types that are or contain references is not allowed. Casting value-types between immutable and mutable can be done implicitly and is perfectly fine.
  8. Explicit casting between thread-local and shared types that are or contain references is not allowed. Again, casting value-types is fine (and can be done implicitly).
  9. The inline assembler feature of D is not allowed in @safe code.
  10. Catching thrown objects that are not derived from class Exception is not allowed.
  11. In D, all variables are default initialized. However, this can be changed to uninitialized by using a void initializer:
    int *s = void;

    Such usage is not allowed in @safe D. The above pointer would point to random memory and create an obvious dangling pointer.

  12. __gshared variables are static variables that are not properly typed as shared, but are still in global space. Often these are used when interfacing with C code. Accessing such variables is not allowed in @safe D.
  13. Using the ptr property of a dynamic array is forbidden (a new rule that will be released in version 2.072 of the compiler).
  14. Writing to void[] data by means of slice-assigning from another void[] is not allowed (this rule is also new, and will be released in 2.072).
  15. Only @safe functions or those inferred to be @safe can be called.

The need for @trusted

The above rules work well to prevent memory corruption, but they prevent a lot of valid, and actually safe, code. For example, consider a function that wants to use the system call read, which is prototyped like this:

ssize_t read(int fd, void* ptr, size_t nBytes);

For those unfamiliar with this function, it reads data from the given file descriptor, and puts it into the buffer pointed at by ptr and expected to be nBytes bytes long. It returns the number of bytes actually read, or a negative value if an error occurs.

Using this function to read data into a stack-allocated buffer might look like this:

ubyte[128] buf;
auto nread = read(fd, buf.ptr, buf.length);

How is this done inside a @safe function? The main issue with using read in @safe code is that pointers can only pass a single value, in this case a single ubyte. read expects to store more bytes of the buffer. In D, we would normally pass the data to be read as a dynamic array. However, read is not D code, and uses a common C idiom of passing the buffer and length separately, so it cannot be marked @safe. Consider the following call from @safe code:

auto nread = read(fd, buf.ptr, 10_000);

This call is definitely not safe. What is safe in the above read example is only the one call, where the understanding of the read function and calling context assures memory outside the buffer will not be written.

To solve this situation, D provides the @trusted  attribute, which tells the compiler that the code inside the function is assumed to be @safe, but will not be mechanically checked. It’s on you, the developer, to make sure the code is actually @safe.

A function that solves the problem might look like this in D:

auto safeRead(int fd, ubyte[] buf) @trusted
{
    return read(fd, buf.ptr, buf.length);
}

Whenever marking an entire function @trusted, consider if code could call this function from any context that would compromise memory safety. If so, this function should not be marked @trusted under any circumstances. Even if the intention is to only call it in safe ways, the compiler will not prevent unsafe usage by others. safeRead should be fine to call from any @safe context, so it’s a great candidate to mark @trusted.

A more liberal API for the safeRead function might take a void[] array as the buffer. However, recall that in @safe code, one can cast any dynamic array to a void[] array — including an array of pointers. Reading file data into an array of pointers could result in an array of dangling pointers. This is why ubyte[] is used instead.

@trusted escapes

A @trusted escape is a single expression that allows @system (the unsafe default in D) calls such as read without exposing the potentially unsafe call to any other part of the program. Instead of writing the safeRead function, the same feat can be accomplished inline within a @safe function:

auto nread = ( () @trusted => read(fd, buf.ptr, buf.length) )();

Let’s take a closer look at this escape to see what is actually happening. D allows declaring a lambda function that evaluates and returns a single expression, with the () => expr syntax. In order to call the lambda function, parentheses are appended to the lambda. However, operator precedence will apply those parentheses to the expression and not the lambda, so the entire lambda must be wrapped in parentheses to clarify the call. And finally, the lambda can be tagged @trusted as shown, so the call is now usable from the @safe context that contains it.

In addition to simple lambdas, whole nested functions or multi-statement lambdas can be used. However, remember that adding a trusted nested function or saving a lambda to a variable exposes the rest of the function to potential safety concerns! Take care not to expose the escape too much because this risks having to manually verify code that should just be mechanically checked.

Rules of Thumb for @trusted

The previous examples show that tagging something as @trusted has huge implications. If you are disabling memory safety checks, but allowing any @safe code to call it, then you must be sure that it cannot result in memory corruption. These rules should give guidance on where to put @trusted marks and avoid getting into trouble:

Keep @trusted code small

@trusted code is never mechanically checked for safety, so every line must be reviewed for correctness. For this reason, it’s always advisable to keep the code that is @trusted as small as possible.

Apply @trusted to entire functions when the unsafe calls are leaky

Code that modifies or uses data that @safe code also uses creates the potential for unsafe calls to leak into the mechanically checked portion of a @safe function. This means that portion of the code must be manually reviewed for safety issues. It’s better to mark the whole thing @trusted, as that’s more in line with the truth. This is not a hard and fast rule; for example, the read call from the earlier example is perfectly safe, even though it will affect data that is used later by the function in @safe mode.

A pointer allocated with C’s malloc in the beginning of the function, and free‘d later, could have been copied somewhere in between. In this case, the dangling pointer may violate @safe, even in the mechanically checked part. Instead, try wrapping the entire portion that uses the pointer as @trusted, or even the entire function. Alternatively, use scope guards to guarantee the lifetime of the data until the end of the function.

Never use @trusted on template functions that accept arbitrary types

D is smart enough to infer @safe for template functions that follow the rules. This includes member functions of templated types. Just let the compiler do its job here. To ensure the function is actually @safe in the right contexts, create an @safe unittest  to call it. Marking the function @trusted allows any operator overloads or members that might violate memory safety to be ignored by the safety checker! Some tricky ones to remember are postblit and opCast.

It’s still OK to use @trusted escapes here, but be very careful. Consider especially possible types that contain pointers when thinking about how such a function could be abused. A common mistake is to mark a range function or range usage @trusted. Remember that most ranges are templates, and can be easily inferred as @system when the type being iterated has a @system postblit or constructor/destructor, or is generated from a user-provided lambda.

Use @safe to find the parts you need to mark as @trusted

Sometimes, a template intended to be @safe may not be inferred @safe, and it’s not clear why. In this case, try temporarily marking the template function @safe to see where the compiler complains. That’s where @trusted escapes should be inserted if appropriate.

In some cases, a template is used pervasively, and tagging it as @safe may make too many parts break. Make a copy of the template under a different name that you mark @safe, and change the calls that are to be checked so that they call the alternative template instead.

Consider how the function may be edited in the future

When writing a trusted function, always think about how it could be called with the given API, and ensure that it should be @safe. A good example from above is making sure safeRead cannot accept an array of pointers. However, another possibility for unsafe code to creep in is when someone edits a part of the function later, invalidating the previous verification, and the whole function needs to be rechecked. Insert comments to explain the danger of changing something that would then violate safety. Remember, pull request diffs don’t always show the entire context, including that a long function being edited is @trusted!

Use types to encapsulate @trusted operations with defined lifetimes

Sometimes, a resource is only dangerous to create and/or destroy, but not to use during its lifetime. The dangerous operations can be encapsulated into a type’s constructor and destructor, marked @trusted, which allows @safe code to use the resource in between those calls. This takes a lot of planning and care. At no time can you allow @safe code to ferret out the actual resource so that it can keep a copy past the lifetime of the managing struct! It is essential to make sure the resource is alive as long as @safe code has a reference to it.

For example, a reference-counted type can be perfectly safe, as long as a raw pointer to the payload data is never available. D’s std.typecons.RefCounted cannot be marked @safe, since it uses alias this to devolve to the protected allocated struct in order to function, and any calls into this struct are unaware of the reference counting. One copy of that payload pointer, and then when the struct is free‘d, a dangling pointer is present.

This can’t be @safe!

Sometimes, the compiler allows a function to be @safe, or is inferred @safe, and it’s obvious that shouldn’t be allowed. This is caused by one of two things: either a function that is called by the @safe function (or some deeper function) is marked @trusted but allows unsafe calls, or there is a bug or hole in the @safe system. Most of the time, it is the former. @trusted is a very tricky attribute to get correct, as is shown by most of this post. Frequently, developers will mark a function @trusted only thinking of some uses of their function, not realizing the dangers it allows. Even core D developers make this mistake! There can be template functions that are inferred safe because of this, and sometimes it’s difficult to even find the culprit. Even after the root cause is discovered, it’s often difficult to remove the @trusted tag as it will break many users of the function. However, it’s better to break code that is expecting a promise of memory safety than subject it to possible memory exploits. The sooner you can deprecate and remove the tag, the better. Then insert trusted escapes for cases that can be proven safe.

If it does happen to be a hole in the system, please report the issue, or ask questions on the D forums. The D community is generally happy to help, and memory safety is a particular focus for Walter Bright, the creator of the language.

The Origins of Learning D

In ealearningdrly 2015, Adam Ruppe asked in the D forums if anyone was interested in authoring a new book for Packt Publishing, the company that published his D Cookbook. I had submitted a book proposal to Packt a few months before, one with a different concept, and had heard nothing back. So I began to mull over the idea of putting my name forward, but I was concerned about the time investment. Before I could decide, Packt contacted me with some details and an offer. With a target publication date of November 2015, I figured I had plenty of time to get it done, so I accepted. It wasn’t long before I learned how my concept of “plenty of time” was quite a bit off base.

Learning D was not the first programming book I had worked on. I coauthored Learn to Tango with D, which was published in 2008 by Apress, with three other D users. It was a book that simply aimed to introduce the language (version 1) and Tango, the community-driven alternative standard library for D1. I wrote two chapters and had nothing to do with the outline. It was a relatively easy experience that left me completely unprepared for the process of authoring an in-depth programming language book on my own. Tango was a bit controversial at the time and, though it’s no longer actively developed, a fork is still maintained and usable with modern D for those so inclined. The recently released Ocean library from Sociomantic Labs is derived from Tango.

When all the legalities were out of the way, work on the book began in earnest. Packt proposed an outline, with a handful of topics they specifically wanted me to cover. I also had to decide how to approach the book and who my target audience would be. Andrei Alexandrescu’s The D Programming Language already served as the language reference and Ali Çehreli’s Programming in D was targeting programming novices. I wanted to hit somewhere in the middle. I envisioned a young college student or recent undergraduate who, having already picked up some level of proficiency with a C-family language, was not a complete beginner, but also did not yet have the level of experience required to easily think in different languages.

Once I had an imaginary reader to talk to, I had a good idea of what I wanted to say. It was fairly easy to create two groups of features: those fundamental to become proficient enough with the language and the standard library to be productive, and those that are not essential or could be more aptly considered advanced. The former group would comprise the major focus of the book. But in addition to teaching the language, I had a general message I wanted to drive home.

Through all my years of visiting the D forums and maintaining a popular set of bindings to C libraries, I had encountered more than one new D user who was trying to program C++ or Java in D. While that approach will certainly enable some progress, it is bound to lead to compiler errors or unexpected results sooner rather than later. I explicitly pushed this message early in the book, and reiterated it each time I discussed one of the features that look like C++ or Java, but behave differently. Thinking in one language while learning another is a mistake that requires experience–it’s not the sort of thing novices do–and, as such, it’s a hard habit to break. My goal was to help minimize the pain.

I also wanted to create a sample program that demonstrated the language and library features I was discussing. My original plan was to create a new version of the program whenever a new paradigm was discussed. After gaining a false sense of confidence from completing Chapter 1 well before the deadline, Chapter 2 quickly disabused me of any thoughts that the whole book would go that way. It went well over the page budget and took longer than I had estimated, which meant the program for Chapter 2 was written over the space of a few hours during an all-night marathon. It was rightly lambasted by the technical reviewers. In the end, I scrapped the sample programs during the revision process and created a single program that evolves with new features as the book progresses. It didn’t turn out the way I had hoped, but it does show working examples of different paradigms in one program. The web app version developed in Chapter 10, which demonstrates the use of vibe.d, went more smoothly. Since it was the focus of the chapter, I was able to develop it in tandem with the text.

Speaking of the reviewers, the book would have been a mess had it not been for their valuable feedback. In the past, I had always believed authors were simply being kind when they said such a thing, but I can now attest that it is absolutely true. Packt asked me to recommend some technical reviewers, so I gave them a shortlist of people whom I knew to have strengths in areas where I was weak. John Colvin, Jonathan M. Davis, David Nadlinger, and Steven Schveighoffer came in early on, and were later joined by Kingsley Hendrickse and Ilya Yaroshenko. I was determined to be discriminating in implementing their suggestions, but they were so good that I agreed with and implemented most of them. And their criticisms were spot-on, catching coding errors, Big-O crimes, incorrect statements, and so much more. They taught me a few things I hadn’t known about D, cleared up a few misconceptions, and even spotted a couple of compiler bugs.

In the Foreword, Walter Bright says that this book, like D itself, is a labor of love. He is entirely correct. I first encountered D in 2003 and have been using it ever since. It’s just such a fun language. Though it’s a cliché to any long-time D user, I will tell anyone who cares to listen that this language is very much the sum of its parts; it’s not any one specific feature that makes D such a pleasure to use, but all of them taken together. Yes, it has its warts. Yes, it’s possible to encounter frustrating scenarios that require less-than-attractive workarounds. But, for me, my list of cons is nowhere near long enough to detract from the overall experience. I simply enjoy it. I want to see the cons list shrink and hope that more people see in the language what I see in it, so that one day ads looking for a D programmer are as common as those for other major languages. I accepted the offer to write this book as a way to contribute to that process. Most gratifyingly, I have received personal messages from a few readers letting me know that it helped them learn D. As a long time teacher, that’s a feeling I will never get tired of.

Project Highlight: Timur Gafarov

dlib-logoTo begin with, let’s be clear that Timur Gafarov is a person and not a project. The impetus for this post was an open source first-person shooter, called Atrium, that he develops and maintains. In the course of making the game, he has created a few other D projects, each of which could be the focus of its own post. So this time around, we’re going to do a plural Project Highlight and introduce you to the GitHub repository of Timur Gafarov.

When Timur first came to D six years ago, you might say it was love at first sight.

As an indie game developer with a strong bias toward graphics engines and rendering tech, I always try to keep track of modern compiled languages effective enough for writing real-time stuff. The most obvious choice in this field is C++, and I actually used it for several years until I found D in 2010. I immediately fell in love with the language’s clean, beautiful syntax, its powerful template system, the numerous built-in features absent in C++, and the rich and easy to use standard library.

Interestingly, he was actually attracted by one of the things often cited as a turn-off about D back then: the lack of libraries. It was a situation in which he saw opportunities to create things from scratch, without worrying about reinventing the wheel. At the time, DUB was not yet a thing, so the first task he set himself was coding up a build system called Cook, which he still uses sometimes instead of DUB.

After that, he was ready to start making a game. He wanted to use OpenGL, and found an existing binding in Derelict (a collection of dynamic bindings maintained by this post’s author — there also exists a collection of static bindings called Deimos) that allowed him to do so in D. With that off his plate, he next sat down to write a game engine. The first few steps in that direction resulted in a set of utility packages that coalesced into dlib.

At first, there were no clear plans or goals. After a previous period of using third-party engines, I had some experience with low-level graphics coding in C++ and just wanted to port my stuff to D for a start. I began with vector/matrix algebra and image I/O. These efforts resulted in dlib, a general purpose library.

He next turned his attention to 3D physics. Even though there were existing libraries with bindable C interfaces, like ODE (with a dynamic binding that existed then in the form of DerelictODE and a static binding in Deimos) and Newton (for which Deimos-like and Derelict-like bindings have since been created by third parties), he just couldn’t help himself. Enter dmech.

Atrium was born as part of my experiment with writing a 3D physics engine, called dmech. OK, this wasn’t strictly necessary, but the chance of being the first one to write this kind of thing in D was so attractive that I couldn’t stand it 🙂 Of course, I didn’t dare to compete with such industry standards as Bullet, but nevertheless it was an amazing experience. I’ve learned a lot.

A screenshot from Atrium.

A screenshot from Atrium.

A physics engine and utility library weren’t all he needed. That’s where DGL comes in.

The next milestone was writing a graphics engine that I call DGL. I can’t consider this step fully completed, because I’m never happy with my abstractions and design solutions. Finally, I ended up with some kind of simplified Physically-Based Rendering pipeline, Percentage Closer Filtering shadows and multipass rendering, with which I’m sort of satisfied for now.

There was a period when I had to use outdated hardware, so DGL uses OpenGL 1.x, relying on extensions to utilize modern GPU technologies. Yeah, in 2016 this sounds funny 🙂 But recently I started experimenting with OpenGL 4, so the engine is likely to be rewritten. Again!

But for Timur, the graphics system isn’t the most important part of Atrium.

It’s the collision detection and character kinematics. I think that believable character behavior and interactions with the virtual world are crucial for any realistic game. In Atrium, these are fully physics-based, with as few hacks and workarounds as possible. Rigid body dynamics natively ‘talk’ with player-controllable kinematics. That means, for example, that a character can be pushed with moving objects and can push other objects himself. A stack of dynamic boxes can be transported via a kinematic moving platform. A gravity gun can be used to move things. And so on. I’m deeply inspired by Valve’s masterpieces, Half-Life 2 and Portal, so I want to make my own ‘physical’ first person puzzle.

Working on this project has certainly been a labor of love.

It has already taken me about six years of spare-time work and it still exists only in the form of an early gameplay demo. Of course, if I’d used an existing mature engine, like Unity or UE, things would be much simpler. Constraining myself to use D for such a complex task may look strange, but it’s fun. And that’s that.

Through those years, he has learned a good deal about the ups and downs of game development with D. Particularly that ever popular bugbear known as the GC.

Modern D is a very attractive choice as a language for game development. Even the garbage collector is not a problem, because you can use object pools, custom allocators, or simply malloc and free. The key point is to know when the GC is invoked and try to avoid those cases in performance critical code. Personally, I prefer using malloc so that I can free the memory when I want, since delete has been deprecated and destroy  just releases all the references to an object instance without actually deleting it. Using manual memory management imposes some restrictions on the code–for example, you can’t use closures or D’s built-in containers–but that, again, is not a big problem. A large effort is currently underway to lessen GC usage in dlib, so that you can use it to write fully unmanaged applications with ease. It has GC-free containers, file I/O streams, image decoders, and so on.

If you are interested in game development with D, Timur’s GitHub repository should be an early point of call. Even if you aren’t making a game or a game engine, you may well find something useful in dlib. With its growing list of contributors, it’s getting a good deal of care and attention.

Thanks to Timur for taking the time to contribute to this post. We wish him luck with all of his projects!

GSoC Report: DStep

Wojciech Szęszoł is a Computer Science major at the University of Wrocław. As part of Google Summer of Code 2016, he chose to make improvements to Jacob Carlborg’s DStep, a tool to generate D bindings from C and Objective-C header files.


GSoC-icon-192It was December of last year and I was writing an image processing project for a course at my university. I would normally use Python, but the project required some custom processing, so I wasn’t able to use numpy. And writing the inner loops of image processing algorithms in plain Python isn’t the best idea. So I decided to use D.

I’ve been conscious about the existence of the D language for as long as I can remember, but I’d never convinced myself before to try it out. The first thing I needed to do was to load an image. At the time, I didn’t know that there is a DUB repository containing bindings to image loading libraries, so I started writing bindings to libjpeg by myself. It didn’t end very well, so I thought there should be a tool that will do the job for me. That’s when I found DStep and htod.

Unluckily, the capabilities of DStep weren’t satisfying (mostly the lack of any kind of support for the preprocessor) and htod didn’t run on Linux. I ended up coding my project in C++, but as GSoC (Google Summer of Code) was lurking on the horizon, I decided that I should give it a try with DStep. I began by contacting Craig Dillabaugh (Ed. Note: Craig volunteers to organize GSoC projects using D) to learn if there was any need for developing such a project. It sparked some discussion on the forum, the idea was accepted, and, more importantly, Russel Winder agreed to be the mentor of the project. After some time I needed to prepare and submit an official proposal. There was an interview and fortunately I was accepted.

The first commit I made for DStep is dated to February, 1. It was a proof of concept that C preprocessor definitions can be translated to D using libclang. Then I improved the testing framework by replacing the old Cucumber-based tests with some written in D. I made a few more improvements before the actual GSoC coding period began.

During GSoC, I added support for translation of preprocessor macros (constants and functions). It required implementing a parser for a small part of the C language as the information from libclang was insufficient. I implemented translation of comments, improved formatting of the output code (e.g. made DStep keep the spacing from C sources), fixed most of the issues from the GitHub issue list and ported DStep to Windows. While I was coding I was getting support from Jacob Carlborg. He did a great job by reviewing all of the commits I made. When I didn’t know how to accomplish something with D, I could always count on help on forum.dlang.org.

DStep was the first project of such a size that I coded in D. I enjoyed its modern features, notably the module system, garbage collector, built-in arrays, and powerful templates. I used unittest blocks a lot. It would be nice to have named unit tests, so that they can be run selectively. From the perspective of a newcomer, the lack of consistency and symmetry in some features is troubling, at least before getting used to it. For example there is a built-in hash map and no hash set, some identifiers that should be keywords starts with @ (Ed. Note: see the Attributes documentation), etc. I was very sad when I read that the in keyword is not yet fully implemented. Despite those little issues, the language served me very well overall. I suppose I will use D for my personal toy projects in the future. And for DStep of course. I have some unfinished business with it :).

I would like to encourage all students to take part in future editions of GSoC. And I must say the D Language Foundation is a very good place to do this. I learned a lot during this past summer and it was a very exciting experience.

Ruminations on D: An Interview with Walter Bright

Joakim is the resident interviewer for the D Blog. He has also interviewed members of the D community for This Week in D and is responsible for the Android port of LDC.


d6Walter Bright is the creator and first implementer of the D programming language. He was an early developer of C++ compilers starting from the mid-’80s, including the first C++ compiler to translate source code directly to object code without using C as an intermediate, and has written compilers for ABEL, C, Java, and Javascript. He believes he is the only person to have written a full C++98 compiler by himself. Empire, one of the first computer strategy games, was written by Walter at Caltech. Before getting into computer programming full-time, Walter worked for Boeing from 1979-1982 on the 757’s flight controls, particularly the stabilizer trim gearbox and system and the elevators.

Joakim: D appears to be picking up speed. The fourth straight DConf took place earlier this year, Wired wrote a nice article a couple years ago, Gartner ranked it in the top 20 languages soon afterwards, and downloads of the reference compiler, DMD, average 1200 per day in recent years. Talk about the current popularity and status of the language and what you’re doing to take it to the next level.

Walter: I don’t worry too much about that. I spend my efforts making D the best language possible, and let the metrics take care of themselves. It’s like being a CEO; he shouldn’t be sweating the stock price, he should be working on making money for the company, then the stock price will take care of itself.

There’s always a stack of things to do, more or less with the most important one on the top, and I pop the top off and work on it, mostly the one with the maximum benefit/cost ratio. The ordering in the stack changes all the time. If an item has others strongly interested in working on it, I defer to them.

I get the jobs that nobody else wants to do. 🙂 Regression fixes for complex problems is a big one. It’s much more fun designing new stuff than maintenance on the existing stuff, dealing with technical debt, etc.

Joakim: In the last year, @nogc and interfacing with C++ have been at the top of your stack. Why? You always used to say that interfacing more with C++ would be too much work.

Walter: It became clear that the garbage collector wasn’t needed to be embedded in most things, that memory allocation could be decided separately from the algorithm. Letting the user decide seemed like a great way forward.

As for C++, I figured out a way to support it that avoided the problems I thought were impractical to deal with. I did a talk about it PDF slides here– and a Rust user wrote up a partial transcript.

Since that talk, I’ve come up with a solution for exceptions. C++ can throw/catch exceptions of any type. But few people do anything other than throw/catch a reference to a class, like std::exception. All D needed to do was allow throw/catch of a class marked with the extern (C++) attribute. Throw/catch of other types remains unsupported, and most likely will remain unsupported.

In order to make that work, the custom exception handling code in the code generation and the library had to change to work with the DWARF exception handling mechanism. That turned out to be a fair amount of work, as it is rather under-documented. But it’s pretty simple for the user.

Joakim: What do you plan on working on in D for the rest of the year? Work on @nogc and C++ support seems to be ongoing and you’ve tried to increase the use of ranges in the standard library for some time now. Anything else? What is the status of those efforts?

Walter: Those are still high-priority and ongoing efforts. But also we’re flexible; if a major opportunity comes up that needs us to push in a different direction, we’ll adapt. The pervasive use of ranges is advancing rapidly, I’ve been very pleased with the results.

The current focus is on improving memory safety. It’s become more and more important to have verifiable memory safety in a programming language, as the expenses involved when unsafe code is exploited in order to install malware become greater and greater. D has always supported memory safety, but recently we’ve embarked on a much more comprehensive review of memory safety in the core language and are making changes to close the gaps.

Joakim: How much time do you spend on D and what is your daily routine?

Walter: I work full time on D. I probably spend half my time working on the language, and the other half helping others with it, discussing things, doing interviews (!), writing articles, doing conferences, etc.

Joakim: You were 42 when you started working on D and I guess it is the first language you designed? Talk about why you started working on it so late–you were probably older then than most of the D contributors now–and what insights your experience gave you that your younger self or other young contributors may not have.

Walter: ABEL is the first language I designed, and was a solid success for Data I/O. It’s obsolete now, because the electronic devices it was aimed at are now obsolete, but it had a great run for 10 years or so.

Having been writing compilers my whole career, doing tech support for them, and following the various changes in the languages inevitably gave me much insight on what worked and what didn’t. Probably the biggest thing is that simpler is better. But making something simple is actually quite difficult. Anybody can (and does) design a complex solution, but few can see through the complexity to find the underlying simplicity.

Many successful languages were designed by older engineers.

Joakim: Please give some examples of such simplicity and how you were able to find it.

Walter: We nailed it with arrays (Jan Knepper’s idea), the basic template design, compile-time function execution (CTFE), and static if. I have no idea what the thought process is in any repeatable manner. If anything, it’s simply a dogged sense that there’s got to be a better way. It took me years to suddenly realize that a template function is nothing more than a function with two sets of parameters –compile time and run time–and then everything just fell into place.

I was more or less blinded by the complexity of templates such that I had simply missed what they fundamentally were. There was also a bit of the “gee, templates are hard” that predisposes one to believe they actually are hard, and then confirmation bias sets in.

I once attended a Scott Meyers presentation on type lists. He took an hour to explain it, with slide after slide of complexity. I had the thought that if it was an array of ints, his presentation would be 2 minutes long. I realized that an array of types should be equally straightforward.

With CTFE, we just went straight in the front door from asking the question “why can’t we execute this function at compile time, just like constant folding?” I did the initial one just by extending the constant folding logic. Don Clugston took it much further, but at its heart it’s still a modified dwarf. Stefan Koch is currently working on making a real interpreter out of it.

Joakim: Why do you think there hasn’t been a killer app for D yet? For example, Ruby was kind of an obscure language for a dozen years till Rails propelled it into the spotlight.

Walter: I suspect the age of the killer app is behind us. There is so much software existing and being written, for every imaginable purpose, that it’s hard to believe there will even be another killer app of any sort. Of course, predictions are notoriously unreliable.

Joakim: D has a unique approach in the compiled languages market, being mostly community-developed without a major corporate sponsor. C++ had Bell Labs, Rust has Mozilla, Go has Google; all have full-time paid devs on the language, even if they also take open-source contributions. Do you think this is a problem and is D being left behind?

Walter: We recently started a D foundation which will make it a lot easier for corporations to sponsor D.

Joakim: Do you make money off D? I know you’ve contracted with Facebook to write a C++ linter, Flint, and preprocessor, Warp, and that you work with Sociomantic and other companies using D.

Walter: I do some paid consulting work for D, but am careful not to take on so much that it interferes with working on D itself.

Joakim: Do you write much code in D outside of the standard library? If so, talk about recent stuff you’ve written and how the experience has been, plus a bit about using some of the new features.

Walter: D consumes so much of my efforts, there’s not much time to write other D apps other than smallish utilities. Currently, of course, the D compiler front end itself is now in D. But it’s been translated from C++, so isn’t idiomatic D.

Joakim: OK, so I guess Warp is the largest program you’ve written in D lately, about 10 klocs. Can you talk about the experience of actually coding that in D, as opposed to C/C++? What stood out for you?

Walter: What stood out is the speed with which it went together, and the remarkably small number of bugs that surfaced in it after it was released. I credit the extensive use of unit tests for the latter.

Having written another preprocessor before certainly helped, and it would be hard to tease that out as a separate effect. But I still believe that unit testing made the difference, because the way the preprocessor worked was very different from the one I’d done before.

What stood out with D was the ease of changing the data structures to try different ways, compared with doing this in other languages.

Joakim: When you think back to your vision for D as you were starting out in 1999, does it at all resemble that today? Anything big missing that you wanted back then?

Walter: D is far more advanced than what I thought of 15 years ago. Programming language ideas have certainly advanced since then, and D along with it.

D originally wasn’t going to have templates at all, based on my earlier experience with them. But finding a simple way to do them changed everything – even to the point where well over half of a modern D program is templates!

The idea of ranges slowly evolved over time, we’re still learning how to do it right.

bit as a basic type was unworkable. complex as a basic type turned out to be pointless (it works better as a library type). Auto-decoding of UTF-8 to code points turned out not nearly as useful as expected.

Transitive const was a leap of faith, and is consistently overlooked by other languages looking to adopt D features. I have a lot of faith in transitive const, and it is already paying off in making it possible to have pure functions, a key feature for modern programming.

Inside D Version Manager

In his day job, Jacob Carlborg is a Ruby backend developer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and the topic of this post, DVM. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.


D Version Manager (DVM), is a cross-platform tool that allows you to easily download, install and manage multiple D compiler versions. With DVM, you can select a specific version of the compiler to use without having to manually modify the PATH environment variable. A selected compiler is unique in each shell session, and it’s possible to configure a default compiler.

The main advantage of DVM is the easy downloading and installation of different compiler versions. Specify the version of the compiler you would like to install, e.g. dvm install 2.071.1, and it will automatically download and install that version. Then you can tell DVM to use that version by executing dvm use 2.071.1. After that, you can invoke the compiler as usual with dmd. The selected compiler version will persist until the end of the shell session.

DVM makes it possible for the user to select a specific compiler version without having to modify any makefiles or build scripts. It’s enough for any build script to refer to the compiler by name, i.e. dmd, as long as the user selects the compiler version with DVM before invoking the script.

History

DVM was created in the beginning of 2011. That was a different time for D. No proper installers existed, D1 was still a viable option, and each new release of DMD brought with it a number of regressions. Because of all the regressions, it was basically impossible to always use the latest compiler, and often even older compilers, for all of your projects. Taking into consideration projects from other developers, some were written in D1 and some in D2, making it inconvenient to have only one compiler version installed.

It was for these reasons I created DVM. Being able to have different versions of the compiler active in different shell sessions makes it easy to work on different projects requiring different versions of the compiler. For example, it was possible to open one tab for a D1 compiler and another for a D2 compiler.

The concept of DVM comes directly from the Ruby tool RVM. Where DVM installs D compilers, RVM installs Ruby interpreters. RVM can do everything DVM can do and a lot more. One of the major things I did not want to copy from RVM is that it’s completely written in shell script (bash). I wanted DVM to be written in D. Because it’s written in shell script, RVM enables some really useful features that DVM does not support, but some of them are questionable (some might call them hacks). For example, when navigating to an RVM-enabled project, RVM will automatically select the correct Ruby interpreter. However, it accomplishes this by overriding the built-in cd command. When the command is invoked, RVM will look in the target directory for one of the files .rvmrc or .ruby-version. If either is present, it will read that file to determine which Ruby interpreter to select.

Implementation and Usage

One of the goals of DVM was that it should be implemented in D. In the end, it was mostly written in D with a few bits of shell script. Note that the following implementation details are specific to the platforms that fall under D’s Posix umbrella, i.e. version(Posix), but DVM is certainly available for Windows with the same functionality.

Structure of the DVM Installation

Before DVM can be used, it needs to install itself. This is accomplished with the command, dvm install dvm. This will create the ~/.dvm directory. It contains the following subdirectories: archives, bin, compilers, env and scripts.

archives contains a cache of downloaded zip archives of D compilers.

bin contains shell scripts, acting as symbolic links, to all installed D compilers. The name of each contains the version of the compiler, e.g. dmd-2.071.1, making it possible to invoke a specific compiler without first having to invoke the use command. This directory also contains one shell script, dvm-current-dc, pointing to the currently active D compiler. This allows the currently active D compiler to be invoked without knowing which version has been set. This can be useful for executing the compiler from within an editor or IDE, for example. A shell script for the default compiler exists as well. Finally, this directory also contains the binary dvm itself.

The compilers directory contains all installed compilers. All of the downloaded compilers are unpacked here. Due to the varying quality of the D compiler archives throughout the years, the install command will also make a few adjustments if necessary. In the old days, there was only one archive for all platforms. This command will only include binaries and libraries for the current platform. Another adjustment is to make sure all executables have the executable permission set.

The env directory contains helper shell scripts for the use command. There’s one script for each installed compiler and one for the default selected compiler.

The scripts directory currently only contains one file, dvm. It’s a shell script which wraps the dvm binary in the bin directory. The purpose of this wrapper is to aid the use command.

The use Command

The most interesting part of the implementation is the use command, which selects a specific compiler, e.g. dvm use 2.071.1. The selection of a compiler will persist for the duration of the shell session (window, tab, script file).

The command works by prepending the path of the specified compiler to the PATH environment variable. This can be ~/.dvm/compilers/dmd-2.071.1/{platform}/bin for example, where {platform} is the currently running platform. By prepending the path to the environment variable, it guarantees the selected compiler takes precedence over any other possible compilers in the PATH. The reason the {platform} section of the path exists is related to the structure of the downloaded archive. Keeping this structure avoids having to modify the compiler’s configuration file, dmd.conf.

The interesting part here is that it’s not possible to modify the environment variables of the parent process, which in this case is the shell. The magic behind the use command is that the dvm command that you’re actually invoking is not the D binary; it’s the shell script in the ~/.dvm/scripts path. This shell script contains a function called dvm. This can be verified by invoking type dvm | head -n 1, which should print dvm is a function if everything is installed correctly.

The installation of DVM adds a line to the shell initialization file, .bashrc, .bash_profile or similar. This line will load/source the DVM shell script in the ~/.dvm/scripts path which will make the dvm command available. When the dvm function is invoked, it will forward the call to the dvm binary located in ~/.dvm/bin/dvm. The dvm binary contains all of the command logic. When the use command is invoked, the dvm binary will write a new file to ~/.dvm/tmp/result and exit. This file contains a command for loading/sourcing the environment file available in ~/.dvm/env that corresponds to the version that was specified when the use command was invoked. After the dvm binary has exited, the shell script function takes over again and loads/sources the result file if it exists. Since the shell script is loaded/sourced instead of executed, the code will be evaluated in the current shell instead of a sub-shell. This is what makes it possible to modify the PATH environment variable. After the result file is loaded/sourced, it’s removed.

If you find yourself with the need to build your D project(s) with multiple compiler versions, such as the current release of DMD, one or more previous releases, and/or the latest beta, then DVM will allow you to do so in a hassle-free manner. Pull up a shell, execute use on the version you want, and away you go.