# A Pattern for Head-mutable Structures

When Andrei Alexandrescu introduced ranges to the D programming language, the gap between built-in and user-defined types (UDTs) narrowed, enabling new abstractions and greater composability. Even today, though, UDTs are still second-class citizens in D. One example of this is support for head mutability—the ability to manipulate a reference without changing the referenced value(s). This document details a pattern that will further narrow the UDT gap by introducing functions for defining and working with head-mutable user-defined types.

### Introduction

D is neither Kernel nor Scheme—it has first-class and second-class citizens. Among its first-class citizens are arrays and pointers. One of the benefits these types enjoy is implicit conversion to head-mutable. For instance, const(T[]) is implicitly convertible to const(T)[]. Partly to address this difference, D has many ways to define how one type may convert to or behave like another – alias this, constructors, opDispatch, opCast, and, of course, subclassing. The way pointers and dynamic arrays decay into their head-mutable variants is different from the semantics of any of these features, so we would need to define a new type of conversion if we were to mimic this behavior.

Changing the compiler and the language to permit yet another way of converting one type into another is not desirable: it makes the job harder for compiler writers, makes an already complex language even harder to learn, and any implicit conversion can make code harder to read and maintain. If we can define conversions to head-mutable data structures without introducing compiler or language changes, this will also make the feature available to users sooner, since such a mechanism would not necessarily require changes in the standard library, and users could gradually implement it in their own code and benefit from the code in the standard library catching up at a later point.

### Unqual

The tool used today to get a head-mutable version of a type is std.traits.Unqual. In some cases, this is the right tool—it strips away one layer of const, immutable, inout, and shared. For some types though, it either does not give a head-mutable result, or it gives a head-mutable result with mutable indirections:

struct S(T) {
T[] arr;
}

With Unqual, this code fails to compile:

void foo(T)(T a) {
Unqual!T b = a; // cannot implicitly convert immutable(S!int) to S!int
}

unittest {
immutable s = S!int([1,2,3]);
foo(s);
}

A programmer who sees that message hopefully finds a different way to achieve the same goal. However, the error message says that the conversion failed, indicating that a conversion is possible, perhaps even without issue. An inexperienced programmer, or one who knows that doing so is safe right now, could use a cast to shut the compiler up:

void bar(T)(T a) {
Unqual!T b = cast(Unqual!T)a;
b.arr[0] = 4;
}

unittest {
immutable s = S!int([1,2,3]);
bar(s);
assert(s.arr[0] == 1); // Fails, since bar() changed it.
}

If, instead of S!int, the programmer had used int[], the first example would have compiled, and the cast in the second example would have never seen the light of day. However, since S!int is a user-defined type, we are forced to write a templated function that either fails to compile for some types it really should support or gives undesirable behavior at run time.

Clearly, we should be able to do better than Unqual, and in fact we can. D has template this parameters which pick up on the dynamic type of the this reference, and with that, its const or immutable status:

struct S {
void foo(this T)() {
import std.stdio : writeln;
writeln(T.stringof);
}
}
unittest {
S s1;
const S s2;
s1.foo(); // Prints "S".
s2.foo(); // Prints "const(S)".
}

This way, the type has the necessary knowledge of which type qualifiers a head-mutable version needs. We can now define a method that uses this information to create the correct head-mutable type:

struct S(T) {
T[] arr;
import std.traits : CopyTypeQualifiers;
return S!(CopyTypeQualifiers!(This, T))(arr);
}
}
unittest {
const a = S!int([1,2,3]);
assert(is(typeof(b) == S!(const int))); // The correct part of the type is now const.
assert(a.arr is b.arr); // It's the same array, no copying has taken place.
b.arr[0] = 3; // Correctly fails to compile: cannot modify const expression.
}

Thanks to the magic of Uniform Function Call Syntax, we can also define headMutable() for built-in types:

auto headMutable(T)(T value) {
import std.traits;
import std.typecons : rebindable;
static if (isPointer!T) {
// T is a pointer and decays naturally.
return value;
} else static if (isDynamicArray!T) {
// T is a dynamic array and decays naturally.
return value;
} else static if (!hasAliasing!(Unqual!T)) {
// T is a POD datatype - either a built-in type, or a struct with only POD members.
return cast(Unqual!T)value;
} else static if (is(T == class)) {
// Classes are reference types, so only the reference may be made head-mutable.
return rebindable(value);
} else static if (isAssociativeArray!T) {
// AAs are reference types, so only the reference may be made head-mutable.
return rebindable(value);
} else {
}
}
unittest {
const(int*[3]) a = [null, null, null];
assert(is(typeof(b) == const(int)*[3]));
}

Now, whenever we need a head-mutable variable to point to tail-const data, we can simply call headMutable() on the value we need to store. Unlike the ham-fisted approach of casting to Unqual!T, which may throw away important type information and also silences any error messages that may inform you of the foolishness of your actions, attempting to call headMutable() on a type that doesn’t support it will give an error message explaining what you tried to do and why it didn’t work (“Type T cannot be made head-mutable.”). The only thing missing now is a way to get the head-mutable type. Since headMutable() returns a value of that type, and is defined for all types we can convert to head-mutable, that’s a template one-liner:

import std.traits : ReturnType;
alias HeadMutable(T) = ReturnType!((T t) => t.headMutable());

Where Unqual returns a type with potentially the wrong semantics and only gives an error once you try assigning to it, HeadMutable disallows creating the type in the first place. The programmer will have to deal with that before casting or otherwise coercing a value into the variable. Since HeadMutable uses headMutable() to figure out the type, it also gives the same informative error message when it fails.

Lastly, since one common use case requires us to preserve the tail-const or tail-immutable properties of a type, it is beneficial to define a template that converts to head-mutable while propagating const or immutable using std.traits.CopyTypeQualifiers:

import std.traits : CopyTypeQualifiers;
alias HeadMutable(T, ConstSource) = HeadMutable!(CopyTypeQualifiers!(ConstSource, T));

This way, immutable(MyStruct!int) can become MyStruct!(immutable int), while the const version would propagate constness instead of immutability.

### Example Code

Since the pattern for range functions in Phobos is to have a constructor function (e.g. map) that forwards its arguments to a range type (e.g. MapResult), the code changes required to use headMutable() are rather limited. Likewise, user code should generally not need to change at all in order to use headMutable(). To give an impression of the code changes needed, I have implemented map and equal:

import std.range;

// Note that we check not if R is a range, but if HeadMutable!R is
auto map(alias Fn, R)(R range) if (isInputRange!(HeadMutable!R)) {
// This is basically the extent to which code that uses head-mutable data types will need to change.
}

struct MapResult(alias Fn, R) {
R range;

this(R _range) {
range = _range;
}

void popFront() {
range.popFront();
}

@property
auto ref front() {
return Fn(range.front);
}

@property
bool empty() {
return range.empty;
}

static if (isBidirectionalRange!R) {
@property
auto ref back() {
return Fn(range.back);
}

void popBack() {
range.popBack();
}
}

static if (hasLength!R) {
@property
auto length() {
return range.length;
}
alias opDollar = length;
}

static if (isRandomAccessRange!R) {
auto ref opIndex(size_t idx) {
return Fn(range[idx]);
}
}

static if (isForwardRange!R) {
@property
auto save() {
return MapResult(range.save);
}
}

static if (hasSlicing!R) {
auto opSlice(size_t from, size_t to) {
return MapResult(range[from..to]);
}
}

// All the above is as you would normally write it.
// We also need to implement headMutable().
// Generally, headMutable() will look very much like this - instantiate the same
// type template that defines typeof(this), use HeadMutable!(T, ConstSource) to make
// the right parts const or immutable, and call headMutable() on fields as we pass
// them to the head-mutable type.
}
}

// Need to get head-mutable version of the parameters to iterate over them.
while (!_r1.empty && !_r2.empty) {
if (_r1.front != _r2.front) return false;
_r1.popFront();
_r2.popFront();
}
return _r1.empty && _r2.empty;
}

unittest {
// User code does not use headMutable at all:
const arr = [1,2,3];
const squares = arr.map!(a => a*a);
const squaresPlusTwo = squares.map!(a => a+2);
assert(equal(squaresPlusTwo, [3, 6, 11]));
}

(Note that these implementations are simplified slightly from Phobos code to better showcase the use of headMutable)

The unittest block shows a use case where the current Phobos map would fail—it is perfectly possible to create a const MapResult, but there is no way of iterating over it. Note that only two functions are impacted by the addition of headMutable(): map tests if HeadMutable!R is an input range and converts its arguments to head-mutable when passing them to MapResult, and MapResult needs to implement headMutable(). The rest of the code is exactly as you would otherwise write it.

The implementation of equal() shows a situation where implicit conversions would be beneficial. For const(int[]) the call to headMutable() is superfluous—it is implicitly converted to const(int)[] when passed to the function. For user-defined types however, this is not the case, so the call is necessary in the general case.

While I have chosen to implement a range here, ranges are merely the most common example of a place where headmutable would be useful; the idea has merits beyond ranges. Another type in the standard library that would benefit from headmutable is RefCounted!T: const(RefCounted!(T)) should convert to RefCounted!(const(T)).

### Why not Tail-Const?

In previous discussions of this problem, the solution has been described as tail-const, and a function tailConst() has been proposed. While this idea might at first seem the most intuitive solution, it has some problems, which together make headMutable() far superior.

The main problem with tailConst() is that it does not play well with D’s existing const system. It needs to be called on a mutable value, and there is no way to convert a const(Foo!T) to Foo!(const(T)). It thus requires that the programmer explicitly call tailConst() on any value that is to be passed to a function expecting a non-mutable value and, abstain from using const or immutable to convey the same information. This creates a separate world of tail-constness and plays havoc with generic code, which consequently has no way to guarantee that it won’t mutate its arguments.

Secondly, the onus is placed on library users to call tailConst() whenever they pass an argument anywhere, causing an inversion of responsibility: the user has to tell the library that it is not allowed to edit the data instead of the other way around. In the best case, this merely causes unnecessary verbiage. In other cases, the omission of const will lead to mutation of data expected to be immutable.

A minor quibble in comparison is that the tail-const solution also requires the existence of tailImmutable to cover the cases where the values are immutable.

## Issues

The ideas outlined in this document concern only conversion to head-mutable. A related issue is conversion to tail const, e.g. from RefCounted!T or RefCounted!(immutable T) to RefCounted!(const T), a conversion that, again, is implicit for arrays and pointers in D today.

One issue that may be serious is the fact that headMutable often cannot be @safe and may, in fact, need to rely on undefined behavior in some places. For instance, RefCounted!T contains a pointer to the actual ref count. For immutable(RefCounted!T), headMutable() would need to cast away immutable, which is undefined behavior per the spec.

### The Compiler Solution

It is logical to think that, as with built-in types, headMutable() could be elided in its entirety, and the compiler could handle the conversions for us. In many cases, this would be possible, and in fact the compiler already does so for POD types like struct S { int n; }—a const or immutable S may be assigned to a mutable variable of type S. This breaks down, however, when the type includes some level of mutable indirection. For templated types it would be possible to wiggle the template parameters to see if the resulting type compiles and has fields with the same offsets and similar types, but even such an intelligent solution breaks down in the presence of D’s Turing-complete template system, and some cases will always need to be handled by the implementer of a type.

It is also a virtue that the logic behind such an implementation be understandable to the average D programmer. The best case result of that not being true is that the forums would be inundated with a flood of posts about why types don’t convert the way users expect them to.

For these reasons, headMutable() will be necessary even with compiler support. But what would that support look like? Implicit casting to head-mutable happens in the language today in two situations:

• Assignment to head-mutable variables: const(int)[] a = create!(const(int[]))(); (all POD types, pointers and arrays)
• Function calls: fun(create!(const(int[]))(); (only pointers and arrays)

The first is covered by existing language features (alias headMutable this; fits the bill perfectly). The second is not but is equivalent to calling .headMutable whenever a const or immutable value is passed to a function that does not explicitly expect a const or immutable argument. This would change the behavior of existing code, in that templated functions would prefer a.headMutable over a, but would greatly improve the experience of working with const types that do define headMutable(). If headMutable is correctly implemented, the different choice of template instantiations should not cause any actual breakage.

### Future Work

While this document proposes to implement the described feature without any changes to the compiler or language, it would be possible for the compiler in the future to recognize headMutable() and call it whenever a type that defines that method is passed to a function that doesn’t explicitly take exactly that type, or upon assignment to a variable that matches headMutable()’s return value. This behavior would mirror the current behavior of pointers and arrays.

### Conclusion

It is possible to create a framework for defining head-mutable types in D today without compiler or language changes. It requires a little more code in the methods that use head-mutable types but offers a solution to a problem that has bothered the D community for a long time.

While this document deals mostly with ranges, other types will also benefit from this pattern: smart pointers and mutable graphs with immutable nodes are but two possible examples.

### Definitions

A type is head-mutable if some or all of its members without indirections are mutable. Note that a head-mutable datatype may also have const or immutable members without indirections; the requirement is merely that some subset of its members may be mutated. A head-mutable datatype may be tail-const, tail-immutable or tail-mutable—head-mutable only refers to its non-indirected members. Examples of head-mutable types include const(int)[], int*, string, and Rebindable!MyClass. Types without indirections (like int, float and struct S { int n; }) are trivially head-mutable.

##### Tail-const

A type is tail-const if some of its members with indirections have the const type qualifier. A tail-const type may be head-mutable or head-const. Examples of tail-const types are const(int)*, const(int[]), const(immutable(int)[])* and string.

### Source

The source code for HeadMutable and headMutable is available here.

# std.variant Is Everything Cool About D

Jared Hanson has been involved with the D community since 2012, and an active contributor since 2013. Recently, he joined the Phobos team and devised a scheme to make it look like he’s contributing by adding at least 1 tag to every new PR. He holds a Bachelor of Computer Science degree from the University of New Brunswick and works as a Level 3 Support Engineer at one of the largest cybersecurity companies in the world.

I recently read a great article by Matt Kline on how std::visit is everything wrong with modern C++. My C++ skills have grown rusty from disuse (I have long since left for the greener pastures of D), but I was curious as to how things had changed in my absence.

Despite my relative unfamiliarity with post–2003 C++, I had heard about the addition of a library-based sum type in std for C++17. My curiosity was mildly piqued by the news, although like many new additions to C++ in the past decade, it’s something D has had for years. Given the seemingly sensational title of Mr. Kline’s article, I wanted to see just what was so bad about std::visit, and to get a feel for how well D’s equivalent measures up.

My intuition was that the author was exaggerating for the sake of an interesting article. We’ve all heard the oft-repeated criticism that C++ is complex and inconsistent (even some of its biggest proponents think so), and it is true that the ergonomics of templates in D are vastly improved over C++. Nevertheless, the underlying concepts are the same; I was dubious that std::visit could be much harder to use than std.variant.visit, if at all.

For the record, my intuition was completely and utterly wrong.

## Exploring std.variant

Before we continue, let me quickly introduce D’s std.variant module. The module centres around the Variant type: this is not actually a sum type like C++’s std::variant, but a type-safe container that can contain a value of any type. It also knows the type of the value it currently contains (if you’ve ever implemented a type-safe union, you’ll realize why that part is important). This is akin to C++’s std::any as opposed to std::variant; very unfortunate, then, that C++ chose to use the name variant for its implementation of a sum type instead. C’est la vie. The type is used as follows:

import std.variant;

Variant a;
a = 42;
assert(a.type == typeid(int));
a += 1;
assert(a == 43);

float f = a.get!float; //convert a to float
assert(f == 43);
a /= 2;
f /= 2;
assert(a == 21 && f == 21.5);

a = "D rocks!";
assert(a.type == typeid(string));

Variant b = new Object();
Variant c = b;
assert(c is b); //c and b point to the same object
b /= 2; //Error: no possible match found for Variant / int

Luckily, std.variant does provide a sum type as well: enter Algebraic. The name Algebraic refers to algebraic data types, of which one kind is a “sum type”. Another example is the tuple, which is a “product type”.

In actuality, Algebraic is not a separate type from Variant; the former is an alias for the latter that takes a compile-time specified list of types. The values which an Algebraic may take on are limited to those whose type is in that list. For example, an Algebraic!(int, string) can contain a value of type int or string, but if you try to assign a string value to an Algebraic!(float, bool), you’ll get an error at compile time. The result is that we effectively get an in-library sum type for free! Pretty darn cool. It’s used like this:

alias Null = typeof(null); //for convenience
alias Option(T) = Algebraic!(T, Null);

Option!size_t indexOf(int[] haystack, int needle) {
foreach (size_t i, int n; haystack)
if (n == needle)
return Option!size_t(i);
return Option!size_t(null);
}

int[] a = [4, 2, 210, 42, 7];
Option!size_t index = a.indexOf(42); //call indexOf like a method using UFCS
assert(!index.peek!Null); //assert that index does not contain a value of type Null
assert(index == size_t(3));

Option!size_t index2 = a.indexOf(117);
assert(index2.peek!Null);

The peek function takes a Variant as a runtime argument, and a type T as a compile-time argument. It returns a pointer to T that points to the Variant’s contained value iff the Variant contains a value of type T; otherwise, the pointer is null.

Note: I’ve made use of Universal Function Call Syntax to call the free function indexOf as if it were a member function of int[].

In addition, just like C++, D’s standard library has a special visit function that operates on Algebraic. It allows the user to supply a visitor for each type of value the Algebraic may hold, which will be executed iff it holds data of that type at runtime. More on that in a moment.

## To recap:

• std.variant.Variant is the equivalent of std::any. It is a type-safe container that can contain a value of any type.
• std.variant.Algebraic is the equivalent of std::variant and is a sum type similar to what you’d find in Swift, Haskell, Rust, etc. It is a thin wrapper over Variant that restricts what type of values it may contain via a compile-time specified list.
• std.variant provides a visit function akin to std::visit which dispatches based on the type of the contained value.

With that out of the way, let’s now talk about what’s wrong with std::visit in C++, and how D makes std.variant.visit much more pleasant to use by leveraging its powerful toolbox of compile-time introspection and code generation features.

## The problem with std::visit

The main problems with the C++ implementation are that – aside from clunkier template syntax – metaprogramming is very arcane and convoluted, and there are very few static introspection tools included out of the box. You get the absolute basics in std::type_traits, but that’s it (there are a couple third-party solutions, which are appropriately horrifying and verbose). This makes implementing std::visit much more difficult than it has to be, and also pushes that complexity down to consumers of the library, which makes using it that much more difficult as well. My eyes bled at this code from Mr. Kline’s article which generates a visitor struct from the provided lambda functions:

template <class... Fs>

template <class F0, class... Frest>
{

using F0::operator();
};

template <class F0>
{

using F0::operator();
};

template <class... Fs>
auto make_visitor(Fs... fs)
{
}

Now, as he points out, this can be simplified down to the following in C++17:

template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };

template <class... Fs>
auto make_visitor(Fs... fs)
{
}

However, this code is still quite ugly (though I suspect I could get used to the elipses syntax eventually). Despite being a massive improvement on the preceding example, it’s hard to get right when writing it, and hard to understand when reading it. To write (and more importantly, read) code like this, you have to know about:

There’s a lot of complicated template expansion and code generation going on, but it’s hidden behind the scenes. And boy oh boy, if you screw something up you’d better believe that the compiler is going to spit some supremely perplexing errors back at you.

Note: As a fun exercise, try leaving out an overload for one of the types contained in your variant and marvel at the truly cryptic error message your compiler prints.

Here’s an example from cppreference.com showcasing the minimal amount of work necessary to use std::visit:

template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };

using var_t = std::variant<int, long, double, std::string>;
std::vector<var_t> vec = {10, 15l, 1.5, "hello"};

for (auto& v: vec) {
[](auto arg) { std::cout << arg << ' '; },
[](double arg) { std::cout << std::fixed << arg << ' '; },
[](const std::string& arg) { std::cout << std::quoted(arg) << ' '; },
}, v);
}

Note: I don’t show it in this article, but if you want to see this example re-written in D, it’s here.

Why is this extra work forced on us by C++, just to make use of std::visit? Users are stuck between a rock and a hard place: either write some truly stigmata-inducing code to generate a struct with the necessary overloads, or bite the bullet and write a new struct every time you want to use std::visit. Neither is very appealing, and both are a one-way ticket to Boilerplate Hell. The fact that you have to jump through such ridiculous hoops and write some ugly-looking boilerplate for something that should be very simple is just… ridiculous. As Mr. Kline astutely puts it:

The rigmarole needed for std::visit is entirely insane.

We can do better in D.

## The D solution

This is how the typical programmer would implement make_visitor, using D’s powerful compile-time type introspection tools and code generation abilities:

import std.traits: Parameters;

struct variant_visitor(Funs...)
{
Funs fs;
this(Funs fs) { this.fs = fs; }

static foreach(i, Fun; Funs) //Generate a different overload of opCall for each Fs
auto opCall(Parameters!Fun params) { return fs[i](params); }
}

auto make_visitor(Funs...)(Funs fs)
{
return variant_visitor!Funs(fs);
}

And… that’s it. We’re done. No pain, no strain, no bleeding from the eyes. It is a few more lines than the C++ version, granted, but in my opinion, it is also much simpler than the C++ version. To write and/or read this code, you have to understand a demonstrably smaller number of concepts:

• Templates
• Template argument packs
• static foreach

However, a D programmer would not write this code. Why? Because std.variant.visit does not take a callable struct. From the documentation:

Applies a delegate or function to the given Algebraic depending on the held type, ensuring that all types are handled by the visiting functions. Visiting handlers are passed in the template parameter list. (emphasis mine)

So visit only accepts a delegate or function, and figures out which one to pass the contained value to based on the functions’ signatures.

Why do this and give the user fewer options? D is what I like to call an anti-boilerplate language. In all things, D prefers the most direct method, and thus, visit takes a compile-time specified list of functions as template arguments. std.variant.visit may give the user fewer options, but unlike std::visit, it does not require them to painstakingly create a new struct that overloads opCall for each case, or to waste time writing something like make_visitor.

This also highlights the difference between the two languages themselves. D may sometimes give the user fewer options (don’t worry though – D is a systems programming language, so you’re never completely without options), but it is in service of making their lives easier. With D, you get faster, safer code, that combines the speed of native compilation with the productivity of scripting languages (hence, D’s motto: “Fast code, fast”). With std.variant.visit, there’s no messing around defining structs with callable methods or unpacking tuples or wrangling arguments; just straightforward, understandable code:

Algebraic!(string, int, bool) v = "D rocks!";
v.visit!(
(string s) => writeln("string: ", s),
(int    n) => writeln("int: ", n),
(bool   b) => writeln("bool: ", b),
);

And in a puff of efficiency, we’ve completely obviated all this machinery that C++ requires for std::visit, and greatly simplified our users’ lives.

For comparison, the C++ equivalent:

template<class... Ts> struct overloaded : Ts... { using Ts::operator()...; };

std::variant<std::string, int, bool> v = "C++ rocks?";
[](const std::string& s) { std::cout << s << '\n'; },
[](int  n) { std::cout << n << '\n'; },
[](bool b) { std::cout << b << '\n'; }
}, v);

Note: Unlike the C++ version, the error message you get when you accidentally leave out a function to handle one of the types is comprehendable by mere mortals. Check it out for yourself.

As a bonus, our D example looks very similar to the built-in pattern matching syntax that you find in many up-and-coming languages that take inspiration from their functional forebears, but is implemented completely in user code.

That’s powerful.

## Other considerations

As my final point – if you’ll indulge me for a moment – I’d like to argue with a strawman C++ programmer of my own creation. In his article, Mr. Kline also mentions the new if constexpr feature added in C++17 (which of course, D has had for over a decade now). I’d like to forestall arguments from my strawman friend such as:

But you’re cheating! You can use the new if constexpr to simplify the code and cut out make_visitor entirely, just like in your D example!

Yes, you could use if constexpr (and by the same token, static if in D), but you shouldn’t (and Mr. Kline explicitly rejects using it in his article).There are a few problems with this approach which make it all-around inferior in both C++ and D. One, this method is error prone and inflexible in the case where you need to add a new type to your variant/Algebraic. Your old code will still compile but will now be wrong. Two, doing it this way is uglier and more complicated than just passing functions to visit directly (at least, it is in D). Three, the D version would still blow C++ out of the water on readability. Consider:

//D
v.visit!((arg) {
alias T = Unqual!(typeof(arg)); //Remove const, shared, etc.

static if (is(T == string)) {
writeln("string: ", arg);
}
else static if (is(T == int)) {
writeln("int: ", arg);
}
else static if (is(T == bool)) {
writeln("bool: ", arg);
}
});

vs.

//C++
visit([](auto& arg) {
using T = std::decay_t<decltype(arg)>;

if constexpr (std::is_same_v<T, string>) {
printf("string: %s\n", arg.c_str());
}
else if constexpr (std::is_same_v<T, int>) {
printf("integer: %d\n", arg);
}
else if constexpr (std::is_same_v<T, bool>) {
printf("bool: %d\n", arg);
}
}, v);

Which version of the code would you want to have to read, understand, and modify? For me, it’s the D version – no contest.

If this article has whet your appetite and you want to find out more about D, you can visit the official Dlang website, and join us on the mailing list or #D on IRC at freenode.net. D is a community-driven project, so we’re also always looking for people eager to jump in and get their hands dirty – pull requests welcome!.

# Testing In The D Standard Library

Jack Stouffer is a member of the Phobos team and a contributor to dlang.org. You can check out more of his writing on his blog.

In the D standard library, colloquially named Phobos, we take a multi-pronged approach to testing and code review. Currently, there are five different services any addition has to go through:

1. The whole complier chain of tests: DMD’s and DRuntime’s test suite, and Phobos’s unit tests
2. A documentation builder
3. Coverage analysis
4. A style checker
5. And a community project builder/test runner

Using these, we’re able to automatically catch the vast majority of common problems that we see popping up in PRs. And we make regressions much less likely using the full test suite and examining coverage reports.

Hopefully this will provide some insight into how a project like a standard library can use testing in order to increase stability. Also, it can spark some ideas on how to improve your own testing and review process.

## Unit Tests

In D, unit tests are an integrated part of the language rather than a library
feature:

size_t sum(int[] a)
{
size_t result;

foreach (e; a)
{
result += e;
}

return result;
}

unittest
{
assert(sum([1, 2, 3]) == 6);
assert(sum([0, 50, 100]) == 150);
}

void main() {}

Save this as test.d and run dmd -unittest -run test.d. Before your main function is run, all of the unittest blocks will be executed. If any of the asserts fail, execution is terminated and an error is printed to stderr.

The effect of putting unit tests in the language has been enormous. One of the main ones we’ve seen is tests no longer have the “out of sight, out of mind” problem. Comprehensive tests in D projects are the rule and not the exception. Phobos dogfoods inline unittest blocks and uses them as its complete test suite. There are no other tests for Phobos than the inline tests, meaning for a reviewer to check their changes, they can just run dmd -main -unittest -run std/algorithm/searching.d (this is just for quick and dirty tests; full tests are done via make).

Every PR onto Phobos runs the inline unit tests, DMD’s tests, and the DRuntime tests on the following platforms:

• Windows 32 and 64 bit
• MacOS 32 and 64 bit
• Linux 32 and 64 bit
• FreeBSD 32 and 64 bit

This is done by Brad Roberts‘s auto-tester. As a quick aside, work is currently progressing to make bring D to iOS and Android.

### Idiot Proof

In order to avoid pulling untested PRs, we have three mechanisms in place. First, only PRs which have at least one Github review by someone with pull rights can be merged.

Second, we don’t use the normal button for merging PRs. Instead, once a reviewer is satisfied with the code, we tell the auto-tester to merge the PR if and only if all tests have passed on all platforms.

Third, every single change to any of the official repositories has to go through the PR review process. This includes changes made by the BDFL Walter Bright and the Language Architect Andrei Alexandrescu. We have even turned off pushing directly to the master branch in Github to make sure that nothing gets around this.

### Unit Tests and Examples

Unit tests in D solve the perennial problem of out of date docs by using the unit test code itself as the example code in the documentation. This way, documentation examples are part of the test suite rather than just some comment which will go out of date.

With this format, if the unit test goes out of date, then the test suite fails. When the tests are updated, the docs change automatically.

Here’s an example:

/**
* Sums an array of ints.
*
* Params:
*      a = the array to sum
* Returns:
*      The sum of the array.
*/
size_t sum(int[] a)
{
size_t result;

foreach (e; a)
{
result += e;
}

return result;
}

///
unittest
{
assert(sum([1, 2, 3]) == 6);
assert(sum([0, 50, 100]) == 150);
}

// only tests with a doc string above them get included in the
// docs
unittest
{
assert(sum([100, 100, 100]) == 300);
}

void main() {}

Run dmd -D test.d and it generates the following un-styled HTML:

Phobos uses this to great effect. The vast majority of examples in the Phobos documentation are from unittest blocks. For example, here is the documentation for std.algorithm.find and here is the unit test that generates that example.

This is not a catch all approach. Wholesale example programs, which are very useful when introducing a complex module or function, still have to be in comments.

### Protecting Against Old Bugs

Despite our best efforts, bugs do find their way into released code. When they do, we require the person who’s patching the code to add in an extra unit test underneath the buggy function in order to protect against future regressions.

## Docs

For Phobos, the documentation pages which were changed are generated on a test server for every PR. Developed by Vladimir Panteleev, the DAutoTest allows reviewers to compare the old page and the new page from one location.

For example, this PR changed the docs for two structs and their member functions. This page on DAutoTest shows a summary of the changed pages with links to view the final result.

## Coverage

Perfectly measuring the effectiveness of a test suite is impossible, but we can get a good rough approximation with test coverage. For those unaware, coverage is a ratio which represents the number of lines of code that were executed during a test suite vs. lines that weren’t executed.

DMD has built-in coverage analysis to work in tandem with the built-in unit tests. Instead of dmd -unittest -run main.d, do dmd -unittest -cov -run main.d and a file will be generated showing a report of how many times each line of code was executed with a final coverage ratio at the end.

We generate this report for each PR. Also, we use codecov in order to get details on how well the new code is covered, as well as how coverage for the whole project has changed. If coverage for the patch is lower than 80%, then codecov marks the PR as failed.

At the time of writing, of the 77,601 lines of code (not counting docs or whitespace) in Phobos, 68,549 were covered during testing and 9,052 lines were not. This gives Phobos a test coverage of 88.3%, which is increasing all of the time. This is all achieved with the built in unittest blocks.

## Project Tester

Because test coverage doesn’t necessarily “cover” all real world use cases and combinations of features, D uses a Jenkins server to download, build, and run the tests for a select number of popular D projects with the master branches of Phobos, DRuntime, and DMD. If any of the tests fail, the reviewers are notified.

## Style And Anti-Pattern Checker

Having a code style set from on high stops a lot of pointless Internet flame wars (tabs vs spaces anyone?) dead in their tracks. D has had such a style guide for a couple of years now, but its enforcement in official code repos was spotty at best, and was mostly limited to brace style.

Now, we use CircleCI in order to run a series of bash scripts and the fantastically helpful dscanner which automatically checks for all sorts of things you shouldn’t be doing in your code. For example, CircleCI will give an error if it finds:

• trailing whitespace
• using whole module imports inside of functions
• redundant parenthesis

And so on.

The automation of the style checker and coverage reports was done by Sebastian Wilzbach. dscanner was written by Brian Schott.

## Closing Thoughts

We’re still working to improve somethings. Currently, Sebastian is writing a script to automatically check the documentation of every function for at least one example. Plus, the D Style Guide can be expanded to end arguments over the formatting of template constraints and other contested topics.

Practically speaking, other than getting the coverage of Phobos up to >= 95%, there’s not too much more to do. Phobos is one of the most throughly tested projects I’ve ever worked on, and it shows. Just recently, Phobos hit under 1000 open bugs, and that’s including enhancement requests.

# Big Performance Improvement for std.regex

Dmitry Olshansky has been a frequent contributor to the D programming language. Perhaps his best known work is his overhaul of the std.regex module, which he architected as part of Google Summer of Code 2011. In this post, he describes an algorithmic optimization he implemented this past summer that resulted in a big performance win.

Optimizing std.regex has been my favorite pastime, but it has gotten harder over the years. It eventually became clear that micro-optimizing the engine’s state copy routine, or trying to avoid that extra write, wasn’t going to cut it anymore. To move further, I needed a new algorithmic improvement. This is how the so-called “Bit-NFA” came to be implemented. Developed in May of this year, it has come a long way to land in the main repository.

Before going into details, a short overview of the engine is called for. A user-specified pattern given to the engine first goes through a compilation process, where it gets transformed into a bytecode program, along with a bunch of lookup tables and auxiliary data-structures. Bytecode implies a VM, not unlike, say, the Java VM, but far simpler and more specific. In fact, there are two of them in std.regex, one that evaluates execution of threads in a backtracking manner and another one which evaluates all threads in lock-step, resolving any duplicates along the way.

Now, running a full blown VM, even a tiny one, on each character of input doesn’t sound all that high-performance. That’s why there is an extra trick, a kickstart engine (it should probably be called a “sidekick engine”), which is a dumb approximation of the full engine. It is run over the input first. When it spots something that looks like a match, the full engine is run to check it. The only requirement is that it can have no false negatives. That is, it has to detect as positive all matches of the regex pattern. This kickstart engine is the central piece of today’s post.

Historically, I intended for there to be a lot of different kickstart engines, ranging from a simple ‘memchr the first byte of the pattern’ to a Boyer-Moore search on the prefix of a pattern. But during the gory days of GSOC 2011, a simple solution came first and out-shadowed all others: the Shift Or algorithm.

Basically, it is an NFA (Nondeterministic Finite Automation), where each state is a bit in a word. Shifting this word advances all the states. Masking removes those that don’t match the current character. Importantly, shifting also places 0 as the first bit, indicating the active state.

With these two insights, the whole process of searching for a string becomes shifting + OR-masking the bits. The last point is checking for the successful match – one of the bits is in the finish state.

Looking at this marvelous construction, it’s tempting to try and overcome its limitation – the straight-forward execution of states. So let’s introduce some control flow by denoting some bits as jumps. To carry out a jump, we just need to map every combination of jump bits to the mask of the resulting positions. A basic hashmap could serve us well in this regard. Then the cycle becomes:

1. Shift the word
2. Capture control flow bits
3. Lookup control flow table
4. Mask AND with control flow bits
5. Check for finish state(s) bits
6. Mask OR with match filter table

In the end, we execute the whole engine with nothing more than a hash-map lookup, a table lookup, and a bit of bitwise operations. This is the essence of what I call the Bit-NFA engine.

Of course, there are some tricky bits, such as properly mapping bytecodes to bits. Then there comes Unicode.. oh gosh. The trick to Unicode, though, is having a fast path for ASCII < 0x80 and the rest. For ASCII, we just go with a simple table. Unicode is a two-staged variation of it. Two-staging the table let’s us coalesce identical pages, saving space for the whole 21 bits of the code point range.

Overall, the picture really is worth a thousand words. Here is how the new kickstart engine stacks up.

This now only leaves us to optimize the VMs further. The proven technique is JIT-ing the bytecode and is what top engines are doing. Still, I’m glad there are notable tricks to speed up regex execution in general without pulling out this heavy handed weapon.

# GSoC Report: std.experimental.xml

Lodovico Giaretta is currently pursuing a Bachelor Degree in Computer Science at the University of Trento, Italy. He participated in Google Summer of Code 2016, working on a new XML module for D’s standard library, Phobos.

I started coding in high school with Pascal. I immediately fell in love with programming, so I started studying it by myself and learned both Java and C++. But when I was using Java, I was missing the powerful metaprogramming facilities and the low level features of C++. When I was using C++, I was missing the simplicity and usability of Java. So I started looking for a language that “filled the gap” between these two worlds. After looking into many languages, I finally found D. Despite being more geared towards C++, D provides a very high level of productivity, as correct code is easier to read and write. As an example, I was programming in D for several months before I was bitten by a segfault for the first time. It easily became one of my favorite languages.

The apparent lack of libraries, my lack of time, and the need to use other languages for university projects made me forget D for some time, at least until someone told me about Google Summer of Code. When I discovered that the D Foundation was participating, I immediately decided to take part and found that there was the need for a new XML library. So I contacted Craig Dillabaugh and Robert Schadek and started to plan my adventure. I want to take this occasion to thank them for their great continuous support, and the entire community for their feedback and help.

This was my first public codebase and my first contribution to a big open source project, so I didn’t really know anything about project management. The advice about this field from my mentor Robert has been fundamental for my success; he helped me improve my workflow, keep my efforts focused towards the goal, and set up correctness tests and performance benchmarks. Without his help, I would never have been able to reach this point.

The first thing to do when writing a library is to pick a set of principles that will guide development. This choice is what will give the library its peculiar shape, and by having a look around one finds that there are XML libraries that want to be minimal in terms of codebase size, or very small in terms of binary size, or fully featured and 101% adherent to the specification. For std.experimental.xml, I decided to focus on genericity and extensibility. The processing is divided in many small, quite simple stages with well-defined interfaces implemented by templated components. The result is a pipeline that is fully customizable; you can add or substitute components anywhere, and add custom validation steps and custom error handlers.

From an XML library, a programmer expects different high level constructs: a SAX parser, a DOM parser, a DOM writer and maybe some extensions like XPath. He also expects to be able to process different kinds of input and, for std.experimental.xml, to “hack in” his own logic in the process. This requires a simple, yet very flexible, intermediate representation, which is produced by the parsing stage and can be easily manipulated, validated, and transformed into whatever high-level construct is needed. For this, I chose a concept called Cursor, a pointer inside an XML document, which can be queried for properties of a given XML node or advanced to a subsequent one. It’s akin to Java’s StAX (Streaming API for XML), from which I took inspiration. In std.experimental.xml, all validations and transformations are implemented as chains of Cursors, which are then usually processed by a SAX parser or a DOM builder, but can also be used directly in user code, providing more control and speed.

Talking about speed, which in XML processing can be very important, I have to admit that I didn’t spend much time on optimization, leaving a lot of space for future performance improvements. Yet, the library is fast enough to guarantee that, for big files (where performance matters), an SSD (Solid State Drive) is needed to move the bottleneck from the fetching to the processing of the data. Being this is an extensible and configurable library, the user can choose his tradeoffs with fine granularity, trading input validation and higher level constructs for speed at will.

To conclude, the GSoC is finished, but the library is not. Although most parts are there, some bits are still missing. As a new university semester has started, time is becoming a rare and valuable resource, but I’ll do my best to finish the work in a short time so that Phobos can finally have a modern XML library to be proud of. I also have a plan to add more advanced functionality, like XML Schemas and XPath, but I don’t know when I’ll manage to work on that, as it is quite a lot to do.

# Find Was Too Damn Slow, So We Fixed It

This is a guest post from Andreas Zwinkau, a problem solving thinker, working as a doctoral researcher at the IPD Snelting within the InvasIC project on compiler and language perfection. He manages and teaches students at the KIT. With a beautiful wife and two jolly kids, he lives in Karlsruhe, Germany.

Please throw this hat into the ring as well, Andrei wrote when he submitted the winning algorithm. Let me tell you about this ring and how we made string search in D’s standard library, Phobos, faster. You might learn something about performance engineering and benchmarking. Maybe you’ll want to contribute to Phobos yourself after you read this.

The story started for me when I decided to help out D for some recreational programming. I went to the Get Involved wiki page. There was a link to the issue tracker for preapproved stuff and there I found issue 9646, which was about splitter being too slow in a specific case. It turned out that it was actually find, called inside splitter, which was slow. The find algorithm searches for one string (the needle) inside another (the haystack). It returns a substring of the haystack, starting with the first occurrence of the needle and ending with the end of the haystack. Find being slow meant that a naively implemented find (two nested for loops) was much faster than what the standard library provided.

So, let’s fix find!

Before I started coding, the crucial question was: How will I know I am done? At that moment, I only had one test case from the splitter issue. I had to ensure that any solution was fast in the general case. I wanted to fix issue 9646 without making find worse for everybody else. I needed a benchmark.

As a first step, I created a repository. It initially contained a simple program which compared Phobos’s find, a naive implementation from issue 9646, and a copy from Phobos I could modify and tune. My first change: insert the same naive algorithm into my Phobos copy. This was the Hello World of bugfixing and proved two things:

1. I was working on the correct code. Phobos contained multiple overloads of find and I wanted to work on the right one. For example, there was an indirection which cast string into a ubyte array to avoid auto decoding.
2. It was not an issue of meta programming. Phobos code is generic and uses D’s capabilities for meta programming. This means the compiler is responsible for specializing the generic code to every specific case. Fixing that would have required changing the compiler, but not the standard library.

At this point I knew which specific lines I needed to speed up and I had a benchmark to quickly see the effects of my changes. It was time to get creative, try things, and find a better find.

For a start, I tried the good old classic Boyer-Moore, which the standard library provides but wasn’t using for find. I quickly discarded it, as it was orders of magnitude slower in my benchmark. Gigabytes of data are probably needed to make that worthwhile with a modern processor.

I considered simply inserting the naive algorithm. It would have fixed the problem. On the other hand, Phobos contained a slightly more advanced algorithm which tried to skip elements. It first checks the end of the needle and, on a mismatch, we can advance the needle its whole length if the end element does not appear twice in the needle. This requires one pass over the needle initially to determine the step size. That algorithm looked nice. Someone had probably put some thought into it. Maybe my benchmark was biased? To be safe, I decided to fix the performance problem without changing the algorithm (too much).

How? Did the original code have any stupid mistakes? How else could you fix a performance problem without changing the whole algorithm?

One trick could be to use D’s meta programming. The code was generic, but in certain cases we could use a more efficient version. D has static-if, which means we could switch between the versions at compile time without any runtime overhead.

static if (isRandomAccessRange!Needle) {
// new optimized algorithm
} else {
// old algorithm
}

The main difference from the old algorithm was that we could avoid creating a temporary slice to use startsWith on. Instead, a simple for-loop was enough. The requirement was that the needle must be a random access range.

When I had a good version, the time was ripe for a pull request. Of course, I had to fix issues like style guide formatting before the pull request was acceptable. The D community wants high-quality code, so the autotester checked my pull request by running tests on different platforms. Reviewers checked it manually.

Meanwhile in the forum, others chimed in. Chris and Andrei proposed more algorithms. Since we had a benchmark now, it was easy to include them. Here are some numbers:

DMD:                       LDC:
std find:    178 ±32       std find:    156 ±33
manual find: 140 ±28       manual find: 117 ±24
qznc find:   102 ±4        qznc find:   114 ±14
Chris find:  165 ±31       Chris find:  136 ±25
Andrei find: 130 ±25       Andrei find: 112 ±26

You see the five mentioned algorithms. The first number is the mean slowdown compared to the fastest one on each single run. The annotated ± number is the mean absolute deviation. I considered LDC’s performance more relevant than DMD’s. You see manual, qznc, and Andrei find report nearly the same slowdown (117, 114, 112), and the deviation was much larger than the differences. This meant they all had roughly the same speed. Which algorithm would you choose?

We certainly needed to pick one of the three top algorithms and we had to base the decision on this benchmark. Since the numbers were not clear, we needed to improve the benchmark. When we ran it on different computers (usually an Intel i5 or i7) the numbers changed a lot.

So far, the benchmark had been generating a random scenario and then measuring each algorithm against it. The fastest algorithm got a speed of 100 and the others got higher numbers which measured their slowdown. Now we could generate a lot of different scenarios and measure the mean across them for each algorithm. This design placed a big responsibility on the scenario generator. For example, it chose the length of the haystack and the needle from a uniform distribution within a certain range. Was the uniform distribution realistic? Were the boundaries of the range realistic?

After discussion in the forum, it came down to three basic use cases:

1. Searching for a few words within english text. The benchmark has a copy of ‘Alice in Wonderland’ and the task is to search for a part of the last sentence.
2. Searching for a short needle in a short haystack. This corresponds to something like finding line breaks as in the initial splitter use case. This favors naive algorithms which do not require any precomputation or other overhead.
3. Searching in a long haystack for a needle which it doesn’t contain. This favors more clever algorithms which can skip over elements. To guarantee a mismatch, the generator inserts a special character into the needle, which we do not use to generate the haystack.
4. Just for comparison, the previous random scenario is still measured.

At this point, we had a good benchmark on which we could base a decision.

Please throw this hat into the ring as well.

Andrei found another algorithm in his magic optimization bag. He knew the algorithm was good in some cases, but how would it fare in our benchmark? What were the numbers with this new contender?

In short: Andrei’s second algorithm completely dominated the ring. It has two names in the plot: Andrei2 as he posted it and A2Phobos as I generalized it and integrated it into Phobos. In the plots you see those two algorithms always close to the optimal result 100.

It was interesting that the naive algorithm still won in the ‘Alice’ benchmark, but the new Phobos was really close. The old Phobos std was roughly twice as slow for short strings, which we already knew from issue 9646.

What did this new algorithm look like? It used the same nested loop design as Andrei’s first one, but it computed the skip length only on demand. This meant one more conditional branch, but modern branch predictors seem to handle that easily.

Here is the final winning algorithm. The version in Phobos is only slightly more generic.

T[] find(T)(T[] haystack, T[] needle) {
if (needle.length == 0) return haystack;
immutable lastIndex = needle.length - 1;
auto last = needle[lastIndex];
size_t j = lastIndex, skip = 0;
while (j < haystack.length) {
if (haystack[j] != last) {
++j;
continue;
}
immutable k = j - lastIndex;
// last elements match, check rest of needle
for (size_t i = 0; ; ++i) {
if (i == lastIndex)
return haystack[k..$]; // needle found if (needle[i] != haystack[k + i]) break; } if (skip == 0) { // compute skip length skip = 1; while (skip < needle.length && needle[$-1-skip] != needle[$-1]) { ++skip; } } j += skip; } return haystack[$ .. \$];
}

Now you might want to run the benchmark yourself on your specific architecture. Get it from Github and run it with make dmd or make ldc. We are still interested in results from a wide range of architectures.

For me personally, this was my biggest contribution to D’s standard library so far. I’m pleased with the community. I deserved all criticism and it was professionally expressed. Now we can celebrate a faster find and fix the next issue. If you want to help, the D community will welcome you!