Open Methods: From C++ to D

Posted on

Jean-Louis Leroy is not French, but Belgian. He got his first taste of programming from a HP-25 calculator. His first real programming language was Forth, where CTFE is pervasive. Later he programmed (a little) in Lisp and Smalltalk, and (a lot) in C, C++, and Perl. He now works for Bloomberg LP in New York. His interests include object-relational mapping, open multi-methods, DSLs, and language extensions in general.


Prelude


Earlier this year I attended C++Now, a major conference dedicated to C++. I listened to talks given by very bright people, who used all sorts of avant-garde C++ techniques to accomplish all sorts of feats at compile time. It was a constexpr party! However, at the end of the week I had severe doubts about the future of C++.

I’ll say this for the organizers, though: they were quite open minded. They reserved the largest auditorium for a two-hour presentation of competing languages, one every day. We had Haskell and Rust, and Ali Çehreli talked about D.

I knew next to nothing about D. You see, I learned to program in Forth. Later I did some Lisp programming just for fun. To me, the idea of CTFE was natural right off the bat. So when Ali talked about static if and mixins, he definitely got my attention.

In order to learn (and evaluate) D, I decided to reproduce parts of my C++ library yomm11. It implements open multi-methods and contains code that exercises the “interesting” parts of the language, both at compile time and run time. Initially, I thought I would just see how I could reimplement bits of yomm11, how nice (or ugly) the syntax for declaring methods would turn out to be. The result was satisfying. I would even say intoxicating. I ended up bringing the port to completion and I feel that the result–openmethods.d–is the best implementation of open methods I’ve crafted so far. And it’s all done in a library, relying only on existing language features.

But wait, what are open methods?

From Member to Free

Open methods are just like virtual functions, except that they are declared outside of a class hierarchy. They are often conflated with multi-methods, because they are frequently implemented together (as is the case with this library), but really these are two different concepts. The ‘open’ part is, I believe, the more important, so I will focus more on that in this article.

Here is an example of a virtual function:

interface Animal
{
  string kick();
}

class Dog : Animal
{
  string kick() { return "bark"; }
}

class Pitbull : Dog
{
  override string kick() { return super.kick() ~ " and bite"; }
}

void main()
{
  import std.stdio : writeln;
  Animal snoopy = new Dog, hector = new Pitbull;
  writeln("snoopy.kick(): ", snoopy.kick()); // bark
  writeln("hector.kick(): ", hector.kick()); // bark and bite
}

The direct equivalent, translated to open methods, reads like this:

import openmethods;
mixin(registerMethods);

interface Animal
{
}

class Dog : Animal
{
}

class Pitbull : Dog
{
}

string kick(virtual!Animal);

@method
string _kick(Dog dog) { return "bark"; }

@method
string _kick(Pitbull dog) { return next!kick(dog) ~ " and bite"; }

void main()
{
  updateMethods();
  import std.stdio : writeln;
  Animal snoopy = new Dog, hector = new Pitbull;
  writeln("snoopy.kick(): ", snoopy.kick()); // bark
  writeln("hector.kick(): ", hector.kick()); // bark an dbite
}

Let’s break it down.

  • The string kick() in interface Animal becomes the free function declaration string kick(virtual!Animal). The implicit this parameter becomes an explicit parameter, and its type is prefixed with virtual!, thus indicating that the parameter is used to resolve calls at run time.
  • The string kick() override in class Dog becomes the free function definition @method string _kick(Dog dog) { return "bark"; }. Three things here:
    • the override is preceded by the @method attribute
    • the function name is prefixed with an underscore
    • the implicit this argument is explicitly named: Dog dog
  • The same thing happens to the override in class Pitbull, with an extra twist: super.kick() becomes next!kick(dog)
  • The calls to kick in main become free function calls – although, incidentally, they could have remained unchanged, thanks to Uniform Function Call Syntax.
  • After importing the openmethods module, a mixin is called: mixin(registerMethods). It should be called in each module that imports openmethods. It matches method declarations and overrides. It also creates a kick(Animal) function (note: sans the virtual!), which is the entry point in the dynamic dispatch mechanism.
  • Finally, main calls updateMethods. This should be done before calling any method (typically first thing in main) and each time a library containing methods is dynamically loaded or unloaded.

Open Is Good

What does it gain us? Well, a lot. Now we can add polymorphic behavior to any class hierarchy without modifying it. In fact, this implementation even allows you to add methods to Object, in a matter of speaking. Because, of course, class Object is never modified.

Let’s take a more serious example. Suppose that you have written a nifty matrix math library. Matrices come in all sorts of flavors: diagonal, shallow, tri-diagonal, and of course dense (i.e. “normal” matrices). Depending on the exact nature of a matrix, you can optimize some operations. Transposing a diagonal or a symmetric matrix amounts to returning it, unchanged. Adding sparse matrices does not require adding thousands of zeroes; and so on. And you have exploited all these properties in your matrix library, varying the behavior by means of virtual functions.

Neat.

Now let me ask you a question: should you provide a print function? A persist function?

Almost certainly not. For starters, there are many ways to display a matrix. If it is sparse, you may want to show only the non-zero elements… or all of them. You may want to display the null matrix as [0]… or in full. It is the privilige of the application to decide what matrices should look like on screen or paper. The matrix library should do the maths, and the application should do the presentation. If it needs to display matrices at all, that is. In game programming, there may be no need to display matrices. However, if you provide a print function, given the way they are implemented, the print or the persist code will always be pulled in from the library. Not good.

Now the application programmer will have to write his print and persist functions, but immediately he will be facing a problem: certainly he wants to vary the behavior according to the exact matrix type; he wants polymorphism! So he will probably end up coding a set of type switches.

Open methods solve this problem more neatly:

void print(virtual!Matrix m);

@method
void _print(Matrix m)
{
  const int nr = m.rows;
  const int nc = m.cols;
  for (int i = 0; i < nr; ++i) {
    for (int j = 0; j < nc; ++j) {
      writef("%3g", m.at(i, j));
    }
    writeln();
  }
}

@method
void _print(DiagonalMatrix m)
{
  import std.algorithm;
  import std.format;
  import std.array;
  writeln("diag(", m.elems.map!(x => format("%g", x)).join(", "), ")");
}

Accept No Visitors (c) Yuriy Solodkyy

A popular existing solution to this problem comes in the form of the Visitor pattern. Your matrix library could provide one, thus allowing the application writer to process different matrices according to their type.

In truth, Visitor is more an anti-pattern than a pattern, because the base class is aware of all its derived classes – something that flies in the face of all OOP design rules.

Here it is anyway:

import std.stdio;

interface Matrix
{
  interface Visitor
  {
    void visit(DenseMatrix m);
    void visit(DiagonalMatrix m);
  }

  void accept(Visitor v);
}

class DenseMatrix : Matrix
{
  void accept(Visitor v) { v.visit(this); }
}

class DiagonalMatrix : Matrix
{
  void accept(Visitor v) { v.visit(this); }
}

class PrintVisitor : Matrix.Visitor
{
  this(File of) { this.of = of; }

  void visit(DenseMatrix m) { of.writeln("print a DenseMatrix"); }
  void visit(DiagonalMatrix m) { of.writeln("print a DiagonalMatrix"); }

  File of;
}

void main()
{
  Matrix dense = new DenseMatrix, diagonal = new DiagonalMatrix;
  auto printer = new PrintVisitor(stdout);
  dense.accept(printer);
  diagonal.accept(printer);
}

This approach is more verbose than using an open method, and it has a more fatal flaw: it is not extensible. Suppose that the user of your matrix library wants to add matrices of his own design. For example, a SparseMatrix. The Visitor will be of no help here. With open methods, on the other hand, the solution is available, simple, and elegant:

// from library

void print(virtual!Matrix m, File of);

@method
void _print(DenseMatrix m, File of)
{
  of.writeln("print a DenseMatrix");
}

@method
void _print(DiagonalMatrix m, File of)
{
  of.writeln("print a DiagonalMatrix");
}

// extend library

class SparseMatrix : Matrix
{
  // ...
}

@method
void _print(SparseMatrix m, File of)
{
  of.writeln("print a SparseMatrix");
}

Multiple Dispatch

Occasionally, there is a need to take into account the type of two or more arguments to select the appropriate behavior. This is called multiple dispatch. Most languages only support single dispatch in the form of virtual member functions. Once again, the “solution” involves type switches or visitors. A few languages address this situation directly by means of multi-methods. The most notorious example is the Common Lisp Object System. Recently, a string of new languages have native support for multi-methods: Clojure (unsurprising for a lispoid), Julia, Nice, Cecil, TADS (a language for developing text-based adventure games).

This library implements multi-methods as well. There is no limit to the number of arguments that can be adorned with the virtual! qualifier. They will all be considered during dynamic dispatch.

Continuing the matrix library example, you probably want to provide binary operations on matrices: addition, subtraction and multiplication. If both operands are matrices, you really want to pick the right algorithm depending on the respective types of both operands. There is no point wasting time on adding all the elements if both operands are diagonal matrices; adding the diagonals suffices. Crucially, adding two DiagonalMatrix objects should return a DiagonalMatrix, not a plain DenseMatrix. Adding a DiagonalMatrix and a TriDiagonalMatrix should return a TriDiagonalMatrix, etc.

With open multi-methods, there is no problem at all:

module matrix;

Matrix plus(virtual!Matrix, virtual!Matrix);

module densematrix;

@method
Matrix _plus(Matrix a, Matrix b)
{
  // fallback: add all elements, fetched via interface
  // return a DenseMatrix
}

@method
Matrix _plus(DenseMatrix a, DenseMatrix b)
{
  // add all elements, access representation directly
  // return a DenseMatrix
}

module diagonalmatrix;

@method
Matrix _plus(DiagonalMatrix a, DiagonalMatrix b)
{
  // just add the elements on diagonals
  // return a DiagonalMatrix
}

Once again, open methods make the library extensible. It is trivial to plug new types in:

module mymatrices;

@method
Matrix _plus(SparseMatrix a, SparseMatrix b)
{
  // just add the non-zero elements
  // return a SparseMatrix
}

@method
Matrix _plus(SparseMatrix a, DiagonalMatrix b)
{
  // still don't add all the zeroes
  // return a SparseMatrix
}

@method
Matrix _plus(DiagonalMatrix a, SparseMatrix b)
{
  return plus(b, a); // matrix addition is commutative
}

Implementation Notes and Performance

This implementation uses tables of pointers to select the appropriate function to call. The process is very similar to what happens when a regular, virtual member function is called.

Each class involved in method dispatch–either because it is used as a virtual argument in a method declaration, or because it inherits from a class or an interface used as a virtual argument–has an associated method table (mtbl). The pointer to the method table (mptr) associated to a given class is stored, by default, in the deallocator pointer of the class’s ClassInfo. The first entry in a class’s vtable contains a pointer to its ClassInfo. The deallocator pointer was used to implement the deprecated delete method, so it is reasonable to recycle it. The deallocator pointer may be removed some day, or one may want to use methods in conjunction with classes that implement delete, so an alternative is supported. Tagging a method with @mptr("hash") makes it fetch the method table pointer from an array indexed by a perfect integer hash calculated during updateMethods. In this case, finding the mptr amounts to multiplying the vptr’s value by an integer and applying a bit mask.

The method table contains one entry for each virtual parameter for each method. If the method has a single virtual argument, the entry contains the specialization’s address, just like an ordinary virtual function; otherwise, the entry contains a pointer to a row in a multi-dimensional dispatch table for the first argument, and integer indexes for the subsequent virtual arguments.

Since the set of methods applicable to a given class is known only at run time and may change in the presence of dynamic loading, the position of a method’s entries in the method table is not fixed; it is stored in a table associated with each method. Finally, in the presence of multiple dispatch, a per-method array of strides is used to convert the multi-dimensional index to a linear offset.

However, finding the specialization amounts to a few memory reads, additions and perhaps multiplications. As a result, open methods are almost as fast as virtual functions backed by the compiler. How much slower they are depends on several factors, including the compiler, or whether the call is issued from an interface or a class. The following table sums up some of my benchmarks. Rows come in groups of three: the “usual”, compiler-supported virtual member functions; the functional equivalent using open methods; and the cost, expressed as (method - virtual) / virtual:

mptr in deallocator dmd ldc2 gdc
vfunc (interface) 1.84 1.80 1.80
vs 1-method (interface) 10.73 3.53 6.05
delta% 484% 96% 236%
vfunc (class) 1.83 1.80 1.80
vs 1-method (class) 5.12 2.13 1.80
delta% 180% 18% 0%
double dispatch 4.11 2.40 2.13
2-method 7.75 3.14 3.40
delta% 88.45% 30.71 59.85

Times in nanoseconds, measured on my Asus ROG G751JT.

A few results stand out. The first is expected, the others are quite remarkable.

  1. gdc and ldc2 do a better job at optimizing method dispatch
  2. Method calls that take an object perform much better than those taking an interface; there may be some further improvements to be done here.
  3. Method calls from an object are almost as fast as plain virtual function calls when ldc2 is used; they are just as fast with gdc. The latter is surprising and calls for further investigation.
  4. Disappointingly, double dispatch beats binary methods. This is not the case in C++. My intuition is that extracting the method table pointer requires traversing too many indirections, to the point that it is more costly than a plain virtual function call. In contrast, yomm11 sticks the mptr right inside the object (but at the cost of requiring changes to the classes). This deserves further investigation, but I am convinced that a bit of help from the compiler (like reserving the second element of the vtbl for the mptr) would reverse this result.

Memory footprint is also a common concern when implementing table-based multiple dispatch: imagine a method with three virtual arguments, which can each be any of a dozen classes. This gives us a 12x12x12 table, containing 1728 function pointers. Fortunately, it is rare that a specialization is defined for each combination of arguments. Typically, there is a lot of duplication along each axis. This implementation takes advantages of this: it builds tables free of redundancies. The table is not “compressed” per se, as it never exists as a cartesian product of all the class sets; rather, it is built in terms of class partitions, not classes, where all the classes in the same group in the same dimension have the same set of candidate specializations. See
this article for an example.

Extending the Language – in D and in C++

Yomm11, the initial implementation of open methods in C++, takes 1845 lines of code (excluding comments) to implement; the D version weighs 1120 lines. Much of the difference is due to D’s ClassInfo. It contains information on the base class and inherited interfaces. It is used to build a bi-directional inheritance graph of the types that have methods attached to them.

C++’s type_info contains no such informaton, thus yomm11 comes with its own runtime class information system, and a macro that the user must call for each class participating in method dispatch. The usual difficulties with static constructors arise, and necessitates extra code to handle them.

Yomm11 can be used in two modes: intrusive and orthogonal. In the intrusive mode, the user augments the classes using macro calls. One of them allocates a method table pointer in the object; the other–called in each constructor–initializes the method pointer. In the orthogonal mode, no modification of the classes is required: the method pointer is stored in a hash map keyed by the type_info obtained via the typeid operator.

openmethods.d has two modes, too, but they are both orthogonal. The default mode stores the method pointer in the deallocator field of the ClassInfo. The ClassInfo of an object is available as the first pointer of the virtual function table; all this is documented. However, hijacking deallocator is a bit like cheating, and nothing guarantees that that field will be there forever.

For that reason, the library supports another mode, which is only slightly slower than the first: store the method pointer in an array indexed by a perfect integer hash of the virtual table pointer.

Unfortunately, it is not possible to use this approach in C++. It is possible to retrieve an object’s vptr, albeit by resorting to undocumented implementation details. However, the library needs to build the method tables without having instances of objects at hand; in D, on the other hand, the value of the vptr is available in the ClassInfo. Another idea would be to use a pointer to the type_info structure; alas, while a type_info can be obtained from a type as well as from an object, the standard explicitly states that the type_info object for a given type may not be unique.

Thus D provides at bit more information than C++, and that makes all the difference.

As for the meta-programming involved in processing the method declarations and specializations, it is easier, and yields a better syntax, in D than in C++, for several reasons.

Obviously, constructs like static if and foreach on type tuples make meta-programming easier. But the real advantage of D comes from the interplay
of template mixins, string mixins, compile-time reflection and alias. The mixin(registerMethods) incantation scans the entire translation unit and:

  • locates all the method declarations by detecting the functions that have virtual! in their signature
  • creates (via an alias created by a string mixin) a function with the same signature, minus the virtual qualifiers, which is what the user calls
  • finds all the method specializations (by locating the functions that have a @method attribute) and generates code that, at runtime, will register the specializations with the appropriate method

Conclusion

Object-oriented programming became popular in the nineties, but has been subjected to a lot of criticism in the last decade. This is in part because OOP promised modularity and extensibility, but failed to deliver. Instead we got “God” classes and Visitors. It is not the fault of the OOP paradigm per se, but rather of the unnatural and unnecessay fusion of class membership and polymorphism that most OO languages enforce. Open methods correct this mistake. As a bonus, this implementation also supports multiple dispatch. This is OOP done right: not objects “talking” to each other, but applying the appropriate algorithm depending on the arguments’ runtime types.

Open methods can be implemented as a library in C++ and in D, but D has a clear edge when it comes to meta-programming. As a result, the D version of the library delivers a lighter, cleaner syntax.

openmethods.d is available on dub

D as a Better C

Posted on

D was designed from the ground up to interface directly and easily to C, and to a lesser extent C++. This provides access to endless C libraries, the Standard C runtime library, and of course the operating system APIs, which are usually C APIs.

But there’s much more to C than that. There are large and immensely useful programs written in C, such as the Linux operating system and a very large chunk of the programs written for it. While D programs can interface with C libraries, the reverse isn’t true. C programs cannot interface with D ones. It’s not possible (at least not without considerable effort) to compile a couple of D files and link them in to a C program. The trouble is that compiled D files refer to things that only exist in the D runtime library, and linking that in (it’s a bit large) tends to be impractical.

D code also can’t exist in a program unless D controls the main() function, which is how the startup code in the D runtime library is managed. Hence D libraries remain inaccessible to C programs, and chimera programs (a mix of C and D) are not practical. One cannot pragmatically “try out” D by add D modules to an existing C program.

That is, until Better C came along.

It’s been done before, it’s an old idea. Bjarne Stroustrup wrote a paper in 1988 entitled “A Better C“. His early C++ compiler was able to compile C code pretty much unchanged, and then one could start using C++ features here and there as they made sense, all without disturbing the existing investment in C. This was a brilliant strategy, and drove the early success of C++.

A more modern example is Kotlin, which uses a different method. Kotlin syntax is not compatible with Java, but it is fully interoperable with Java, relies on the existing Java libraries, and allows a gradual migration of Java code to Kotlin. Kotlin is indeed a “Better Java”, and this shows in its success.

D as Better C

D takes a radically different approach to making a better C. It is not an extension of C, it is not a superset of C, and does not bring along C’s longstanding issues (such as the preprocessor, array overflows, etc.). D’s solution is to subset the D language, removing or altering features that require the D startup code and runtime library. This is, simply, the charter of the -betterC compiler switch.

Doesn’t removing things from D make it no longer D? That’s a hard question to answer, and it’s really a matter of individual preference. The vast bulk of the core language remains. Certainly the D characteristics that are analogous to C remain. The result is a language somewhere in between C and D, but that is fully upward compatible with D.

Removed Things

Most obviously, the garbage collector is removed, along with the features that depend on the garbage collector. Memory can still be allocated the same way as in C – using malloc() or some custom allocator.

Although C++ classes and COM classes will still work, D polymorphic classes will not, as they rely on the garbage collector.

Exceptions, typeid, static construction/destruction, RAII, and unittests are removed. But it is possible we can find ways to add them back in.

Asserts are altered to call the C runtime library assert fail functions rather than the D runtime library ones.

(This isn’t a complete list, for that see http://dlang.org/dmd-windows.html#switch-betterC.)

Retained Things

More importantly, what remains?

What may be initially most important to C programmers is memory safety in the form of array overflow checking, no more stray pointers into expired stack frames, and guaranteed initialization of locals. This is followed by what is expected in a modern language — modules, function overloading, constructors, member functions, Unicode, nested functions, dynamic closures, Compile Time Function Execution, automated documentation generation, highly advanced metaprogramming, and Design by Introspection.

Footprint

Consider a C program:

#include <stdio.h>

int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

It compiles to:

_main:
push EAX
mov [ESP],offset FLAT:_DATA
call near ptr _printf
xor EAX,EAX
pop ECX
ret

The executable size is 23,068 bytes.

Translate it to D:

import core.stdc.stdio;

extern (C) int main(int argc, char** argv) {
    printf("hello world\n");
    return 0;
}

The executable size is the same, 23,068 bytes. This is unsurprising because the C compiler and D compiler generate the same code, as they share the same code generator. (The equivalent full D program would clock in at 194Kb.) In other words, nothing extra is paid for using D rather than C for the same code.

The Hello World program is a little too trivial. Let’s step up in complexity to the infamous sieve benchmark program:

#include <stdio.h>

/* Eratosthenes Sieve prime number calculation. */

#define true    1
#define false   0
#define size    8190
#define sizepl  8191

char flags[sizepl];

int main() {
    int i, prime, k, count, iter;

    printf ("10 iterations\n");
    for (iter = 1; iter <= 10; iter++) {
        count = 0;
        for (i = 0; i <= size; i++)
            flags[i] = true;
        for (i = 0; i <= size; i++) {
            if (flags[i]) {
                prime = i + i + 3;
                k = i + prime;
                while (k <= size) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf ("\n%d primes", count);
    return 0;
}

Rewriting it in Better C:

import core.stdc.stdio;

extern (C):

__gshared bool[8191] flags;

int main() {
    int count;

    printf("10 iterations\n");
    foreach (iter; 1 .. 11) {
        count = 0;
        flags[] = true;
        foreach (i; 0 .. flags.length) {
            if (flags[i]) {
                const prime = i + i + 3;
                auto k = i + prime;
                while (k < flags.length) {
                    flags[k] = false;
                    k += prime;
                }
                count += 1;
            }
        }
    }
    printf("%d primes\n", count);
    return 0;
}

It looks much the same, but some things are worthy of note:

  • extern (C): means use the C calling convention.
  • D normally puts static data into thread local storage. C sticks them in global storage. __gshared accomplishes that.
  • foreach is a simpler way of doing for loops over known endpoints.
  • flags[] = true; sets all the elements in flags to true in one go.
  • Using const tells the reader that prime never changes once it is initialized.
  • The types of iter, i, prime and k are inferred, preventing inadvertent type coercion errors.
  • The number of elements in flags is given by flags.length, not some independent variable.

And the last item leads to a very important hidden advantage: accesses to the flags array are bounds checked. No more overflow errors! We didn’t have to do anything
in particular to get that, either.

This is only the beginning of how D as Better C can improve the expressivity, readability, and safety of your existing C programs. For example, D has nested functions, which in my experience work very well at prying goto’s from my cold, dead fingers.

On a more personal note, ever since -betterC started working, I’ve been converting many of my old C programs still in use into D, one function at a time. Doing it one function at a time, and running the test suite after each change, keeps the program in a correctly working state at all times. If the program doesn’t work, I only have one function to look at to see where it went wrong. I don’t particularly care to maintain C programs anymore, and with -betterC there’s no longer any reason to.

The Better C ability of D is available in the 2.076.0 beta: download it and read the changelog.

On Tilix and D: An Interview with Gerald Nunn

Posted on

Joakim is the resident interviewer for the D Blog. He has also interviewed members of the D community for This Week in D and is responsible for the Android port of LDC.


Gerald Nunn is the developer of Tilix (formerly called Terminix), an advanced, open-source, Gtk3-based tiling terminal emulator, which is the most-starred D project on github, recently surpassing even the D reference compiler, DMD. Earlier this year at DConf in Berlin, he talked about how he chose D. His slides and a recorded video are available. In his day job, which has nothing to do with desktop GUI applications, he is a Senior Middleware Solutions Architect at Red Hat. [Read more about Gerald’s background in the extended interview — Ed.]

Joakim: What is a tiling terminal emulator?

Gerald: A tiling terminal emulator allows you to split the terminal into multiple tiles and rearrange them in any layout that makes the most sense for the particular task you are working on. People that work in multiple terminals simultaneously typically find them the most useful, particularly with ever-increasing monitor sizes and resolutions.

While the tiling is cool, the primary reason I created Tilix was that I wanted a terminal emulator that followed the Gnome Human Interface Guidelines (HIG) and used Client-Side Decorations (CSD). Tilix follows the Gnome HIG published here, which means adhering to spacing, layout and other recommendations. Following the HIG is important in order for your application to be consistent with the desktop experience as a whole.

CSD refers to the titlebar of the window, where the client assumes responsibility for it, rather than the display manager, and can populate the titlebar with buttons and other controls. This is part of the Gnome HIG and most default Gnome applications (gedit, files, videos, etc.) use this. The one exception is gnome-terminal which does not use the CSD at all.

Joakim: Can you give some examples of how you conform better to the Gnome HIG?

Gerald: The Gnome HIG specifies a specific design language with regards to how applications running in Gnome should look and feel. Some examples of Tilix following the HIG include the use of CSDs and application menus, following various guidelines such as spacing and layouts, etc. Additionally, the Gnome designers have put together a variety of mockups of how they think various applications should look. Tilix uses the mockups developed by Gnome designers for the terminal where feasible. For example, in Tilix the preferences and profiles dialog used to be separate, however one of the Gnome designers proposed this mockup for gnome-terminal. I went ahead and implemented it in Tilix, for a much better experience than what I had previously.

As a result of using CSDs and following the Gnome HIG, hopefully the experience of using Tilix in Gnome feels more organic to its users.

The interesting thing is the tension between people who use Tilix on Gnome and those who use it on other distributions. While I make no bones about the fact that Gnome is my primary target, I do try to make the experience better in other desktop environments by allowing the user to disable the CSD in favor of a normal titlebar if they so wish.

Joakim: You come to D from primarily a Java background. Do you still write Java-style code in D? Was that easy, i.e. how much did you have to change to write D instead?

Gerald: Yes, my background is in Java. I found it quite interesting at DConf when I asked how many people came from a non C/C++ background that only one other fellow raised his hand.

If you look at my code in Tilix, it does look a lot like Java code. Some of that is due to my background and some is due to GtkD being a class-based wrapper. I find switching between D and Java to be pretty seamless for the most part; there is much less cognitive friction between the two than say, switching between Java and Python.

The biggest D idiom I had to learn was ranges, since it is a pretty foundational feature of D. However that wasn’t overly complicated. Compile-Time Function Evaluation (CTFE) is still something that doesn’t come naturally. I have to look it up again every time I need to use it and my current attempts at CTFE from a code perspective are pretty ugh. I’d like to leverage CTFE more in Tilix as I continue to get more experience with D.

Finally, my lack of C experience means that the other area I struggle with a bit is interfacing with C code. While for the most part it’s pretty straightforward, when I have to start deciphering something complex, it gets a bit hoary. Support for flatpack is currently being held up, as I haven’t summoned up the mental energy to work through some of the issues I am having interfacing with C code.

Having said that, I do have some experience with native code development, as I spent a fair amount of time writing code in Delphi and Object Pascal many, many years ago. This is where most of my GUI experience comes from as well.

Joakim: From your DConf talk, you’re obviously not worried much about the Garbage Collector (GC). Have you had to think about it at all when you’re coding Tilix? Any issues with the GC causing stuttering in the GUI?

Gerald: Coming from Java, the GC is pretty natural for me and I definitely do not consider it a negative for D. I think the GC in D gets a lot of bad press based on Java experiences, but it’s important to remember that the GC in D is quite different than the one in Java. The biggest difference to me is that you have more options to control the GC, since it’s well understood when it can kick off a GC cycle. I’ve been very happy to see more people pushing back on the various reddit threads and forum posts complaining about the use of GC.

I haven’t had any issues with the GC in terms of pauses, and no Tilix users have opened cases about it. I have had a few GC-related issues, primarily around leaking memory due to holding references, but these have all been programmer error rather than an issue with D’s implementation of GC. I have another GTK D application, Visual Grep, where I ran into abysmal performance when loading a large amount of matches in a tight loop. However, simply disabling the GC for that section of the code sped things up immensely.

Joakim: The github repository for Tilix is remarkably clean, no open Pull Requests (PRs) and a low percentage of issues still open. How much time do you spend weekly on Tilix? Is Tilix strictly a hobby or has it become something more?

Gerald: Tilix is strictly a hobby. I probably spend 5 to 10 hours a week on it. At this point it’s a mature application, hence the relatively low number of issues. I also prioritize fixing bugs over adding new features, which helps keep the list manageable.

With regards to pull requests, I’m a firm believer in being responsive, so I will typically respond to a PR within a day or two. As a contributor myself, I know nothing kills interest like seeing your PR languish for weeks, months, or even years. If you want people to contribute, which I definitely do, then I feel you owe contributors the courtesy of responding in a timely fashion.

Now, Tilix is a relatively small project so it’s easy for me to adopt that approach. I can understand why larger projects may have more difficulty in this area.

Joakim: D has a lot of features, how well do you know it? You mentioned that you want to use some of the compile-time capabilities more: which D features do you think Tilix would benefit from in the future and how?

Gerald: I don’t think I know it that well to be honest, beyond the core set of features I use in Tilix. I’m always amazed at the guys on the forum that can argue the merits/drawbacks of the low-level details of the language. That’s definitely not me. I’d like to get better at it, but the reality is that I only use D as a hobby so I can’t invest the same amount of time I do with Java. In addition, my job at Red Hat has more of an infrastructure component than my previous jobs, so a lot of my learning time is spent ramping up on that side of the house.

In terms of features that Tilix would benefit from, I think using CTFE and ranges would be very beneficial for exposing some of the capabilities in GtkD in a more idiomatic way. I have a fair amount of code where it could be a lot more concise with appropriate usage of CTFE. As a simple example, using ranges to support iterating over various artifacts using foreach rather than a classical for loop. However, I think some of the more complex use cases, like supporting D-Bus and GObject, would be very useful.

For those not familiar with GObject, it’s the base level object in GTK and is reference-counted. Being able to create GObjects easily in D, like you can in Python, would make it much easier to interface with some of the APIs; right now doing so is the equivalent of writing it in raw C and it’s somewhat laborious. Mike Wey, the GtkD maintainer, has started doing some work on this.

Joakim: You mentioned at DConf that D has a quick edit-compile cycle: how do you enable that, i.e. what IDE, compilers, toolchain do you use, both for development and then for release?

Gerald: I use MS Visual Studio Code on Linux with the excellent code-d plugin, written by Jan “WebFreak” Jurzitza. It gives me all of the features I need (code completion, tooltip hints, etc.). The only thing I find missing from Java IDEs is a refactoring capability. For development, I use DMD as it has the fastest compile time. Release builds are done using LDC, the D compiler with an LLVM backend, since it generates a smaller and faster binary. I rarely need to fire up a debugger, but when I do I just use GDB from the command-line.

Joakim: What problems have you had with D? What features do you dislike?

Gerald: No major problems from my perspective. I’m generally very happy with the language and find it strikes the right balance between ease of use and capabilities. Most of my dislikes would be centered on the standard library, Phobos, rather than the language itself, and those dislikes directly correlate to lack of manpower.

No major issues with Phobos, but rather a bunch of irritants. For example, you cannot easily use immutable with send in std.concurrency, std.experimental.logger is still experimental, the json parser has issues if localization is set to use a comma instead of a period for decimals, etc., etc. None of them in and of themselves are showstoppers, and most of these are really about having the manpower to polish things up. I’m actually somewhat reluctant to complain about them because I could probably fix some of them myself and submit PRs.

I tend to get more annoyed about the negativity in the forums with regards to GC. I do feel that sometimes people get so wrapped up in what D needs for it to be a perfect systems language (i.e. no GC, memory safety, etc.), it gets overlooked that it is a very good language for building native applications as it is now. While D is often compared to Rust, in some ways the comparison to Go is more interesting to me. Both are GC-based languages and both started as systems languages, however Go pivoted and doubled down on the GC and has seen success. One of the Red Hat products I support, OpenShift, leverages Kubernetes (a Google project) for container orchestration and it’s written in Go.

I think D as a language is far superior to Go, and I wish we would toot our horn a little more in this regard instead of the constant negative discussion around systems programming. Now in fairness, Go has a large corporate sponsor whereas D does not, however the contrast in positioning is still interesting to me.

Joakim: What are your future plans for Tilix?

Gerald: The two biggest features I’d like to add is support for tmux control mode and adding the ability for the popout sidebar to be permanently displayed.

For those not familiar with tmux, it is a terminal multiplexer; it essentially does terminal tiling, but within the terminal itself. It also supports a number of other features, but the most interesting is keeping terminal sessions alive outside of the terminal. Since it does tiling within the terminal, there is a bit of a performance hit, plus from a GUI perspective it can’t leverage native widgets like scrollbars. To mitigate this, it supports something called control mode that enables it to integrate with a tiling terminal emulator, so that spawning of new terminals, i.e. tiling, is managed outside of tmux. This greatly improves its performance, while allowing users to leverage other features that tmux supports. At the moment, only iterm2 on OSX supports this AFAIK.

The sidebar in Tilix is one of the more controversial UI elements. I opted not to implement a tabbed interface, because I found them useless in terms of figuring out which tab I wanted; there simply isn’t enough room on the tab to distinguish them and manually renaming them is a pain. The sidebar is my attempt at an alternative, it renders a thumbnail of each session (aka tab) in a sidebar that can be popped out as needed to switch between sessions.

The Tilix sidebar in action.

Some people have a strong preference for something that is permanently available, unfortunately making the sidebar always visible isn’t easy, as generating the thumbnails takes a significant amount of time due to the way GTK is structured. There are potentially ways to make it work, but there is a time investment required to try different options to see what is feasible and then what is the most effective.

In terms of a dream feature, I’d love to switch to using a terminal emulator that is written natively in D, rather than the GTK VTE (Virtual Terminal Emulator) that is written in C that I’m using now. For those not familiar with it, the VTE is the terminal emulation widget used by Gnome Terminal and is available as a reusable widget. Many terminal emulators in Linux use this widget (gnome-terminal, guake, terminator, tilix, etc.), as it provides a production-ready emulator that has been through a huge amount of testing.

The downside with using it is that any custom features you want to implement that involve the actual terminal emulation layer require modifying the VTE and getting those modifications in upstream. I have a few patches that Tilix supports (for triggers and badges), but frankly I’ve done a poor job of getting them into upstream. Part of the reason for this is VTE is written in C and getting up to speed enough with C to make a quality patch is time-consuming.

So having the terminal emulation written in D would make this much easier, however it’s a huge time investment as people underestimate the amount of work involved. There’s a lot of edge cases in terminal emulation, plus adding all of the stuff to make it user-friendly (search, clicking links, etc.), it’s a lot to take on. I simply don’t have the time to make this a reality unless I win the lottery. If someone wanted to take that part on and create a GTK terminal emulation widget that has all of the needed features and stick with it for the long haul, I’d be happy to work with them to get it integrated with Tilix. Adam Ruppe has already created one that works quite well from my testing of it; if someone wanted to work on converting it to a GTK widget, add the necessary improvements and agree to maintain it, feel free to ping me. 🙂

Joakim: Please take us from your experience first discovering and using D to writing Tilix.

Gerald: Throughout my working career, I’ve always had hobby projects going on. Some things I worked on included a popular add-in for Delphi called Gexperts, a Windows file explorer replacement, a Java IDE called Gel and a popular Android app called OnTrack Diabetes that I sold a few years ago. A couple of years ago, I was looking for something new to work on as my hobby program and settled on the idea of building a desktop application for Linux. I knew it needed to use the GTK toolkit, since Gnome is my preferred desktop environment, and particularly with my past Delphi experience in GUIs which I could leverage. I also knew I wasn’t interested in coding in C or C++, so I had a look at what the alternatives were.

I started with Python, as it has excellent support for GTK and I had done a bit of Jython programming in WebLogic, as the Weblogic Scripting Tool (WLST) used it. However, most of my previous work had been small scripts and I quickly realized that at larger scales Python wasn’t for me. I really prefer statically-typed languages in general and Python’s dynamic typing drove me batty, particularly on a hobby program where I constantly needed to rebuild my mental stack since I worked on it infrequently.

I also looked at Rust and Go, but at the time neither of them had feature-complete GTK bindings. Also, while Rust had a lot of positive press, it had a formidable learning curve and I was less than convinced that its focus on memory safety via the borrow checker was a better approach than GC.

I was aware of D, as I had looked at D previously many years ago and liked the language, but the ecosystem was so weak it just wasn’t that useful for practical work. I gave it a second look and found that it had greatly improved and amazingly, full GtkD bindings were available. It also helped that D and Java are similar enough that picking up D was incredibly easy. Once I learned how ranges worked, it was easy to start cranking out code.

I first created a small application called Visual Grep that wraps grep into a GUI. When I was consulting, I often had to grep large code bases looking for specific patterns and a GUI that made browsing the matches was an absolute necessity. This was a great first application, as I learned quite a few things about D. The application performance was initially shit with large result sets, because the GC was constantly kicking in when loading results due to the constant allocation. Disabling the GC during that tight loop improved the performance immeasurably. I also learned about integrating GTK with D’s multi-threading capabilities.

I was inspired by the UI from Gnome Builder, an IDE, and thought that it would work quite well for a terminal emulator. Thus Tilix was born. Well, actually at first it was called Terminix, but once it started getting popular I received a polite cease-and-desist order from Terminix, the American pest control company. Being Canadian, I wasn’t overly aware of them, hence why I didn’t think too much about the name. Lesson learned: spend time choosing a good name in case your app does become more popular than you expect.

I’ve enjoyed my time with D and it’s been a great language for creating desktop applications in GTK. In many ways, I feel like D is a natural successor to Vala, which was a language created specifically for building GTK applications but has been slowly dying, largely due to its single focus. If you are not building GTK apps, you aren’t using Vala, which means the pool of people using it and working on it is by definition very small.

I also feel that D is a natural successor to Delphi, at least with GtkD, as a potent tool for creating desktop applications. With its fast compile time and as an easy-to-learn language, it brings many of Delphi’s best attributes into the modern age. Plus I can type { and } instead of Begin and End and increase my efficiency by 75% or so. 🙂

A DUB Case Study: Compiling DMD as a Library

Posted on

In his day job, Jacob Carlborg is a Ruby backend developer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and DVM, the topic of another post on this blog. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.


DUB is the official build tool and package manager for the D programming language. Originally written and currently maintained by Sönke Ludwig as part of the vibe.d web framework, its acceptance as an official part of the D toolchain means it is now shipping with the most recent DMD and LDC compilers.

A Quick Introduction to DUB

If you have have the latest DMD or LDC installed, you already have DUB installed as well. If not, or if you want to check for a more recent version, you can get the very latest release, beta or release candidate from the DUB download page.

You can create a new DUB project by executing the dub init command. This will start an interactive setup that guides you through project creation.

  1. First decide the format of the package recipe. Two formats are supported: JSON and SDLang. Here we picked SDLang.
  2. Then specify the name of the project. Press enter to use the default name, which is displayed in brackets and is inferred from the directory
  3. Do the same for the description, author, license, copyright, and dependencies to select the default values
$ dub init foo
Package recipe format (sdl/json) [json]: sdl
Name [foo]:
Description [A minimal D application.]:
Author name [Jacob Carlborg]:
License [proprietary]:
Copyright string [Copyright © 2017, Jacob Carlborg]:
Add dependency (leave empty to skip) []:
Successfully created an empty project in '/Users/jacob/tmp/foo'.
Package successfully created in foo

After the setup has completed, the following files and directories will have been created:

$ tree foo
foo
├── dub.sdl
└── source
    └── app.d

1 directory, 2 files
  • dub.sdl is the package recipe file, which provides instructions telling DUB how to build the package
  • source is the default path where DUB looks for D source files
  • app.d contains the main function and is an example Hello World generated by DUB with the following content:
import std.stdio;

void main()
{
	writeln("Edit source/app.d to start your project.");
}

The content of the dub.sdl file is the following:

name "foo"
description "A minimal D application."
authors "Jacob Carlborg"
copyright "Copyright © 2017, Jacob Carlborg"
license "proprietary"

All of which was taken from what we specified during project creation. By default, DUB looks for D source files in either source or src directories and compiles all files it finds there and in any subdirectories.

To build and run the application, navigate to the project’s root directory, foo in this case, and invoke dub:

$ dub
Performing "debug" build using dmd for x86_64.
foo ~master: building configuration "application"...
Linking...
Running ./foo
Edit source/app.d to start your project.

To build without running, invoke dub build:

$ dub build
Performing "debug" build using dmd for x86_64.
foo ~master: building configuration "application"...
Linking...

Case Study: DMD as a Library

Recently there has been some progress in making the D compiler (DMD) available as a library. Razvan Nitu has been working on it as part of his D Foundation scholarship at the University Politechnica of Bucharest. He gave a presentation at DConf 2017 (a video of the talk is available, as well as examples in the DMD repository). So I had the idea that as part of the DConf 2017 hackathon I could create a simple DUB package for DMD to make only the lexer and the parser available as a library, something his work has made possible.

Currently DMD is built using make. There are three Makefiles, one for Posix, one for 32-bit Windows and one for 64-bit Windows  (which is only a wrapper of the 32-bit one). I don’t intend to try to completely replicate the Makefiles as a DUB package (they contain some additional tasks besides building the compiler), but instead will start out fresh and only include what’s necessary to build the lexer and parser.

DMD already has all the source code in the src directory, which is one of the directories DUB searches by default. If we would leave it as is, DUB would include the entirety of DMD, including the backend and other parts we don’t want to include at this point.

The first step is to create the DUB package recipe file. We start simple with only the metadata (here using the SDLang format):

name "dmd"
description "The DMD compiler"
authors "Walter Bright"
copyright "Copyright © 1999-2017, Digital Mars"
license "BSL-1.0"

When we have this we need to figure out which files to include in the package. We can do this by invoking DMD with the -deps flag to generate the imports of a module. A good start is the lexer, which is located in src/ddmd/lexer.d. We run the following command to output the imports that lexer.d is using:

$ dmd -deps=deps.txt -o- -Isrc src/ddmd/lexer.d

This will write a file named deps.txt containing all the imports used by lexer.d. The -o- flag is used to tell the compiler not to generate any code. The -I flag is used to add an import path where the compiler will look for additional modules to import (but not compile). An example of the output looks like this (the long path names have been reduced to save space):

core.attribute (druntime/import/core/attribute.d) : private : object (druntime/import/object.d)
object (druntime/import/object.d) : public : core.attribute (druntime/import/core/attribute.d):selector
ddmd.lexer (ddmd/lexer.d) : private : object (druntime/import/object.d)
core.stdc.ctype (druntime/import/core/stdc/ctype.d) : private : object (druntime/import/object.d)
ddmd.root.array (ddmd/root/array.d) : private : object (druntime/import/object.d)
ddmd.root.array (ddmd/root/array.d) : private : core.stdc.string (druntime/import/core/stdc/string.d)

The most interesting part of this output, in this case, is the first column, which consists of a long list of module names. What we are interested in here is a unique list of modules that are located in the ddmd package. All modules in the core package are part of the D runtime and are already precompiled as a library and automatically linked when compiling a D executable, so these modules don’t need to be compiled. The modules from the ddmd package can be extracted with some search-and-replace in your favorite text editor or using some standard Unix command lines tools:

$ cat deps.txt | cut -d ' ' -f 1 | grep ddmd | sort | uniq
ddmd.console
ddmd.entity
ddmd.errors
ddmd.globals
ddmd.id
ddmd.identifier
ddmd.lexer
ddmd.root.array
ddmd.root.ctfloat
ddmd.root.file
ddmd.root.filename
ddmd.root.hash
ddmd.root.outbuffer
ddmd.root.port
ddmd.root.rmem
ddmd.root.rootobject
ddmd.root.stringtable
ddmd.tokens
ddmd.utf

Here we can see that a set of modules is located in the nested package ddmd.root. This package contains common functionality used throughout the DMD source code. Since it doesn’t have any dependencies on any code outside the package it’s a good fit to place in a DUB subpackage. This can be done using the subPackage directive, as follows:

subPackage {
  name "root"
  targetType "library"
  sourcePaths "src/ddmd/root"
}

We specify the name of the subpackage, root. The targetType directive is used to tell DUB whether it should build an executable or a library (though it’s optional — DUB will build an executable if it finds an app.d in the root of the source directory and a library if it doesn’t). Finally, sourcePaths can be used to specify the paths where DUB should look for the D source files if neither of the default directories is used. Fortunately, we want to include all the files in the src/ddmd/root, so using sourcePaths works perfectly fine.

We can verify that the subpackage works and builds by invoking:

$ dub build :root
Building package dmd:root in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:root ~master: building configuration "library"...

:package-name is shorthand that tells DUB to build the package-name subpackage of the current package, in our case the root subpackage.

After removing all the modules from the root package from the initial list of dependencies, the following modules remain:

ddmd.console
ddmd.entity
ddmd.errors
ddmd.globals
ddmd.id
ddmd.identifier
ddmd.lexer
ddmd.tokens
ddmd.utf

The next step is to create a subpackage for the lexer containing the remaning modules.

subPackage {
  name "lexer"
  targetType "library"
  sourcePaths

Again we start by specifying the name of the subpackage and that the target type is a library. Specifying sourcePaths without any value will set it to an empty list, i.e. no source paths. This is done because there are more files than we want to include in this subpackage in the source directory.

sourceFiles \
    "src/ddmd/console.d" \
    "src/ddmd/entity.d" \
    "src/ddmd/errors.d" \
    "src/ddmd/globals.d" \
    "src/ddmd/id.d" \
    "src/ddmd/identifier.d" \
    "src/ddmd/lexer.d" \
    "src/ddmd/tokens.d" \
    "src/ddmd/utf.d"

The above specifies all source files that should be included in this subpackage. The difference between sourcePaths and sourceFiles is that sourcePaths expects a whole directory of source files that should be included, where sourceFiles lists only the individual files that should be included. A list in SDLang is written by separating the items with a space. The backslash (\) is used for line continuation, making it possible spread the list across multiple lines.

The final step of the lexer subpackage is to add a dependency on the root subpackage. This is done with the dependency directive:

dependency "dmd:root" version="*"
}

The first parameter for the dependency directive is the name of another DUB package. The colon is used to separate the package name from the subpackage name. The version attribute is used to specify which version the package should depend on. The * is used to indicate that any version of the dependency matches, i.e. the latest version should always be used. When implementing subpackages in any given package, this is generally what should be used. External projects that depend on any DUB package should specify a SemVer version number corresponding to a known release version.

If we build the lexer subpackage now it will result in an error:

$ dub build :lexer
Building package dmd:lexer in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:lexer ~master: building configuration "library"...
src/ddmd/globals.d(339,21): Error: need -Jpath switch to import text file VERSION
dmd failed with exit code 1.

Looking at the file and line of the error shows that it contains the following code:

_version = (import("VERSION") ~ '\0').ptr;

This code contains an import expression. Import expressions differ from import statements (e.g. import std.stdio;) in that they take a file from the file system and insert its contents into the current module. It’s just as if you copied and pasted the contents yourself. Using an import expression requires that the path where the file is imported from be passed to the compiler as a security mechanism. This can be done using the -J flag. In this case, we want to use the package root, where we are executing DUB, so we can use a single dot: “.“. Passing arbitrary flags to the compiler can be done with the dflags build setting, as follows:

dflags "-J."

Add that to the lexer subpackage configuration and it will compile correctly:

$ dub build :lexer
Building package dmd:lexer in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:lexer ~master: building configuration "library"...

For the final subpackage, we have the parser. The parser is located in src/ddmd/parse.d. To get its dependencies we can use the same approach we used for the lexer. But we will filter out all files that are part of the other subpackages:

$ dmd -deps=deps.txt -Isrc -J. -o- src/ddmd/parse.d
$ cat deps.txt | cut -d ' ' -f 1 | grep ddmd | grep -E -v '(root|console|entity|errors|globals|id|identifier|lexer|tokens|utf)' | sort | uniq
ddmd.parse

Here, we’re supplying the -v flag to grep to filter the results and the -E flag to enable extended regular expressions. All modules from the root package and all modules from the lexer subpackage are filtered out and the only remaining module is the ddmd.parse module.

The subpackage for the parser will look similar to the other subpackages:

subPackage {
  name "parser"
  targetType "library"
  sourcePaths

  sourceFiles "src/ddmd/parse.d"

  dependency "dmd:lexer" version="*"
}

Again, we can verify that it’s working by building the subpackage:

$ dub build :parser
Building package dmd:parser in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:parser ~master: building configuration "library"...

Currently we have three subpackages in the DUB recipe file, but no way to use the main package as a whole. To fix this we add the parser subpackage as a dependency of the main package. We pick the parser subpackage as a dependency because it will include the other two subpackages through its own dependencies.

license "BSL-1.0"

targetType "none"
dependency ":parser" version="*"

subPackage {
  name "root"

In addition to specifying parser as a dependency, we also specify the target type to be none. This will avoid building an empty library out of the main package, since it doesn’t contain any source files of its own.

As a final step, we’ll verify that the whole library is working by creating a separate project that uses the DMD DUB package as a dependency. We create a new DUB project in the test directory, called dub_package:

$ cd test
$ mkdir dub_package
$ cd dub_package
$ cat > dub.sdl <<EOF
> name "dmd-dub-test"
> description "Test of the DMD Dub package"
> license "BSL 1.0"
>
> dependency "dmd" path="../../"
> EOF
$ mkdir source

We create a new file, source/app.d, with the following content:

void main()
{
}

// lexer
unittest
{
    import ddmd.lexer;
    import ddmd.tokens;

    immutable expected = [
        TOKvoid,
        TOKidentifier,
        TOKlparen,
        TOKrparen,
        TOKlcurly,
        TOKrcurly
    ];

    immutable sourceCode = "void test() {} // foobar";
    scope lexer = new Lexer("test", sourceCode.ptr, 0, sourceCode.length, 0, 0);
    lexer.nextToken;

    TOK[] result;

    do
    {
        result ~= lexer.token.value;
    } while (lexer.nextToken != TOKeof);

    assert(result == expected);
}

// parser
unittest
{
    import ddmd.astbase;
    import ddmd.parse;

    scope parser = new Parser!ASTBase(null, null, false);
    assert(parser !is null);
}

The above file contains two unit tests, one for the lexer and one for the parser. We can run dub test to run the unit tests for this package:

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
../../src/ddmd/globals.d(339,21): Error: file "VERSION" cannot be found or not in a path specified with -J
dmd failed with exit code 1.

Which gives us the error that it cannot find the VERSION file in any string import paths, even though we added the correct directory to the string import paths. If we run the tests with verbose output enabled, using the --verbose flag we get a hint (the output has been reduced to save space):

dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd -J. -lib

Here we see that the compiler is invoked with the -J. flag, which is what we previously specified in the lexer subpackage. The problem is that the current directory is now of the dmd-dub-test DUB package instead of the dmd DUB package. Looking at the documentation of DUB we can see there’s an environment variable, $PACKAGE_DIR, that we can use as the string import path instead of hardcoding it to use a single dot. We update the dflags setting of the lexer subpackage to use the $PACKAGE_DIR environment variable:

dflags "-J$PACKAGE_DIR"
}

Running the tests again shows that the error is fixed, but now we get a new error, a long list of undefined symbols (shortened here):

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd:parser ~issue-17392-dub: building configuration "library"...
dmd-dub-test ~master: building configuration "application"...
Linking...
Undefined symbols for architecture x86_64:
  "_D4ddmd7astbase12__ModuleInfoZ", referenced from:
      _D3app12__ModuleInfoZ in dmd-dub-test.o

The reason for this is that we’re importing the ddmd.astbase module in the test of the parser, but it’s never compiled. We can solve that problem by adding it to the parser subpackage in the dmd DUB package. Running dmd again to show all its dependencies shows that it also depends on the ddmd.astbasevisitor module. We add these two modules as follows:

sourceFiles \
  "src/ddmd/astbase.d" \
  "src/ddmd/astbasevisitor.d" \
  "src/ddmd/parse.d"

Finally, running the tests again shows that everything is working correctly:

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd:parser ~issue-17392-dub: building configuration "library"...
dmd-dub-test ~master: building configuration "application"...
Linking...
Running ./dmd-dub-test

After verifying that both the lexer and parser are working in a separate DUB package, this is the final result of the package recipe for the dmd DUB package:

name "dmd"
description "The DMD compiler"
authors "Walter Bright"
copyright "Copyright © 1999-2017, Digital Mars"
license "BSL-1.0"

targetType "none"
dependency ":parser" version="*"

subPackage {
  name "root"
  targetType "library"
  sourcePaths "src/ddmd/root"
}

subPackage {
  name "lexer"
  targetType "library"
  sourcePaths

  sourceFiles \
    "src/ddmd/console.d" \
    "src/ddmd/entity.d" \
    "src/ddmd/errors.d" \
    "src/ddmd/globals.d" \
    "src/ddmd/id.d" \
    "src/ddmd/identifier.d" \
    "src/ddmd/lexer.d" \
    "src/ddmd/tokens.d" \
    "src/ddmd/utf.d"

  dflags "-J$PACKAGE_DIR"

  dependency "dmd:root" version="*"
}

subPackage {
  name "parser"
  targetType "library"
  sourcePaths

  sourceFiles \
    "src/ddmd/astbase.d" \
    "src/ddmd/astbasevisitor.d" \
    "src/ddmd/parse.d"

  dependency "dmd:lexer" version="*"
}

All this has now been merged into master and the DUB package is available here: http://code.dlang.org/packages/dmd. Happy hacking!