A DUB Case Study: Compiling DMD as a Library

Posted on

In his day job, Jacob Carlborg is a Ruby backend developer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and DVM, the topic of another post on this blog. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.


DUB is the official build tool and package manager for the D programming language. Originally written and currently maintained by Sönke Ludwig as part of the vibe.d web framework, its acceptance as an official part of the D toolchain means it is now shipping with the most recent DMD and LDC compilers.

A Quick Introduction to DUB

If you have have the latest DMD or LDC installed, you already have DUB installed as well. If not, or if you want to check for a more recent version, you can get the very latest release, beta or release candidate from the DUB download page.

You can create a new DUB project by executing the dub init command. This will start an interactive setup that guides you through project creation.

  1. First decide the format of the package recipe. Two formats are supported: JSON and SDLang. Here we picked SDLang.
  2. Then specify the name of the project. Press enter to use the default name, which is displayed in brackets and is inferred from the directory
  3. Do the same for the description, author, license, copyright, and dependencies to select the default values
$ dub init foo
Package recipe format (sdl/json) [json]: sdl
Name [foo]:
Description [A minimal D application.]:
Author name [Jacob Carlborg]:
License [proprietary]:
Copyright string [Copyright © 2017, Jacob Carlborg]:
Add dependency (leave empty to skip) []:
Successfully created an empty project in '/Users/jacob/tmp/foo'.
Package successfully created in foo

After the setup has completed, the following files and directories will have been created:

$ tree foo
foo
├── dub.sdl
└── source
    └── app.d

1 directory, 2 files
  • dub.sdl is the package recipe file, which provides instructions telling DUB how to build the package
  • source is the default path where DUB looks for D source files
  • app.d contains the main function and is an example Hello World generated by DUB with the following content:
import std.stdio;

void main()
{
	writeln("Edit source/app.d to start your project.");
}

The content of the dub.sdl file is the following:

name "foo"
description "A minimal D application."
authors "Jacob Carlborg"
copyright "Copyright © 2017, Jacob Carlborg"
license "proprietary"

All of which was taken from what we specified during project creation. By default, DUB looks for D source files in either source or src directories and compiles all files it finds there and in any subdirectories.

To build and run the application, navigate to the project’s root directory, foo in this case, and invoke dub:

$ dub
Performing "debug" build using dmd for x86_64.
foo ~master: building configuration "application"...
Linking...
Running ./foo
Edit source/app.d to start your project.

To build without running, invoke dub build:

$ dub build
Performing "debug" build using dmd for x86_64.
foo ~master: building configuration "application"...
Linking...

Case Study: DMD as a Library

Recently there has been some progress in making the D compiler (DMD) available as a library. Razvan Nitu has been working on it as part of his D Foundation scholarship at the University Politechnica of Bucharest. He gave a presentation at DConf 2017 (a video of the talk is available, as well as examples in the DMD repository). So I had the idea that as part of the DConf 2017 hackathon I could create a simple DUB package for DMD to make only the lexer and the parser available as a library, something his work has made possible.

Currently DMD is built using make. There are three Makefiles, one for Posix, one for 32-bit Windows and one for 64-bit Windows  (which is only a wrapper of the 32-bit one). I don’t intend to try to completely replicate the Makefiles as a DUB package (they contain some additional tasks besides building the compiler), but instead will start out fresh and only include what’s necessary to build the lexer and parser.

DMD already has all the source code in the src directory, which is one of the directories DUB searches by default. If we would leave it as is, DUB would include the entirety of DMD, including the backend and other parts we don’t want to include at this point.

The first step is to create the DUB package recipe file. We start simple with only the metadata (here using the SDLang format):

name "dmd"
description "The DMD compiler"
authors "Walter Bright"
copyright "Copyright © 1999-2017, Digital Mars"
license "BSL-1.0"

When we have this we need to figure out which files to include in the package. We can do this by invoking DMD with the -deps flag to generate the imports of a module. A good start is the lexer, which is located in src/ddmd/lexer.d. We run the following command to output the imports that lexer.d is using:

$ dmd -deps=deps.txt -o- -Isrc src/ddmd/lexer.d

This will write a file named deps.txt containing all the imports used by lexer.d. The -o- flag is used to tell the compiler not to generate any code. The -I flag is used to add an import path where the compiler will look for additional modules to import (but not compile). An example of the output looks like this (the long path names have been reduced to save space):

core.attribute (druntime/import/core/attribute.d) : private : object (druntime/import/object.d)
object (druntime/import/object.d) : public : core.attribute (druntime/import/core/attribute.d):selector
ddmd.lexer (ddmd/lexer.d) : private : object (druntime/import/object.d)
core.stdc.ctype (druntime/import/core/stdc/ctype.d) : private : object (druntime/import/object.d)
ddmd.root.array (ddmd/root/array.d) : private : object (druntime/import/object.d)
ddmd.root.array (ddmd/root/array.d) : private : core.stdc.string (druntime/import/core/stdc/string.d)

The most interesting part of this output, in this case, is the first column, which consists of a long list of module names. What we are interested in here is a unique list of modules that are located in the ddmd package. All modules in the core package are part of the D runtime and are already precompiled as a library and automatically linked when compiling a D executable, so these modules don’t need to be compiled. The modules from the ddmd package can be extracted with some search-and-replace in your favorite text editor or using some standard Unix command lines tools:

$ cat deps.txt | cut -d ' ' -f 1 | grep ddmd | sort | uniq
ddmd.console
ddmd.entity
ddmd.errors
ddmd.globals
ddmd.id
ddmd.identifier
ddmd.lexer
ddmd.root.array
ddmd.root.ctfloat
ddmd.root.file
ddmd.root.filename
ddmd.root.hash
ddmd.root.outbuffer
ddmd.root.port
ddmd.root.rmem
ddmd.root.rootobject
ddmd.root.stringtable
ddmd.tokens
ddmd.utf

Here we can see that a set of modules is located in the nested package ddmd.root. This package contains common functionality used throughout the DMD source code. Since it doesn’t have any dependencies on any code outside the package it’s a good fit to place in a DUB subpackage. This can be done using the subPackage directive, as follows:

subPackage {
  name "root"
  targetType "library"
  sourcePaths "src/ddmd/root"
}

We specify the name of the subpackage, root. The targetType directive is used to tell DUB whether it should build an executable or a library (though it’s optional — DUB will build an executable if it finds an app.d in the root of the source directory and a library if it doesn’t). Finally, sourcePaths can be used to specify the paths where DUB should look for the D source files if neither of the default directories is used. Fortunately, we want to include all the files in the src/ddmd/root, so using sourcePaths works perfectly fine.

We can verify that the subpackage works and builds by invoking:

$ dub build :root
Building package dmd:root in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:root ~master: building configuration "library"...

:package-name is shorthand that tells DUB to build the package-name subpackage of the current package, in our case the root subpackage.

After removing all the modules from the root package from the initial list of dependencies, the following modules remain:

ddmd.console
ddmd.entity
ddmd.errors
ddmd.globals
ddmd.id
ddmd.identifier
ddmd.lexer
ddmd.tokens
ddmd.utf

The next step is to create a subpackage for the lexer containing the remaning modules.

subPackage {
  name "lexer"
  targetType "library"
  sourcePaths

Again we start by specifying the name of the subpackage and that the target type is a library. Specifying sourcePaths without any value will set it to an empty list, i.e. no source paths. This is done because there are more files than we want to include in this subpackage in the source directory.

sourceFiles \
    "src/ddmd/console.d" \
    "src/ddmd/entity.d" \
    "src/ddmd/errors.d" \
    "src/ddmd/globals.d" \
    "src/ddmd/id.d" \
    "src/ddmd/identifier.d" \
    "src/ddmd/lexer.d" \
    "src/ddmd/tokens.d" \
    "src/ddmd/utf.d"

The above specifies all source files that should be included in this subpackage. The difference between sourcePaths and sourceFiles is that sourcePaths expects a whole directory of source files that should be included, where sourceFiles lists only the individual files that should be included. A list in SDLang is written by separating the items with a space. The backslash (\) is used for line continuation, making it possible spread the list across multiple lines.

The final step of the lexer subpackage is to add a dependency on the root subpackage. This is done with the dependency directive:

dependency "dmd:root" version="*"
}

The first parameter for the dependency directive is the name of another DUB package. The colon is used to separate the package name from the subpackage name. The version attribute is used to specify which version the package should depend on. The * is used to indicate that any version of the dependency matches, i.e. the latest version should always be used. When implementing subpackages in any given package, this is generally what should be used. External projects that depend on any DUB package should specify a SemVer version number corresponding to a known release version.

If we build the lexer subpackage now it will result in an error:

$ dub build :lexer
Building package dmd:lexer in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:lexer ~master: building configuration "library"...
src/ddmd/globals.d(339,21): Error: need -Jpath switch to import text file VERSION
dmd failed with exit code 1.

Looking at the file and line of the error shows that it contains the following code:

_version = (import("VERSION") ~ '\0').ptr;

This code contains an import expression. Import expressions differ from import statements (e.g. import std.stdio;) in that they take a file from the file system and insert its contents into the current module. It’s just as if you copied and pasted the contents yourself. Using an import expression requires that the path where the file is imported from be passed to the compiler as a security mechanism. This can be done using the -J flag. In this case, we want to use the package root, where we are executing DUB, so we can use a single dot: “.“. Passing arbitrary flags to the compiler can be done with the dflags build setting, as follows:

dflags "-J."

Add that to the lexer subpackage configuration and it will compile correctly:

$ dub build :lexer
Building package dmd:lexer in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:lexer ~master: building configuration "library"...

For the final subpackage, we have the parser. The parser is located in src/ddmd/parse.d. To get its dependencies we can use the same approach we used for the lexer. But we will filter out all files that are part of the other subpackages:

$ dmd -deps=deps.txt -Isrc -J. -o- src/ddmd/parse.d
$ cat deps.txt | cut -d ' ' -f 1 | grep ddmd | grep -E -v '(root|console|entity|errors|globals|id|identifier|lexer|tokens|utf)' | sort | uniq
ddmd.parse

Here, we’re supplying the -v flag to grep to filter the results and the -E flag to enable extended regular expressions. All modules from the root package and all modules from the lexer subpackage are filtered out and the only remaining module is the ddmd.parse module.

The subpackage for the parser will look similar to the other subpackages:

subPackage {
  name "parser"
  targetType "library"
  sourcePaths

  sourceFiles "src/ddmd/parse.d"

  dependency "dmd:lexer" version="*"
}

Again, we can verify that it’s working by building the subpackage:

$ dub build :parser
Building package dmd:parser in /Users/jacob/development/d/dlang/dmd/
Performing "debug" build using dmd for x86_64.
dmd:parser ~master: building configuration "library"...

Currently we have three subpackages in the DUB recipe file, but no way to use the main package as a whole. To fix this we add the parser subpackage as a dependency of the main package. We pick the parser subpackage as a dependency because it will include the other two subpackages through its own dependencies.

license "BSL-1.0"

targetType "none"
dependency ":parser" version="*"

subPackage {
  name "root"

In addition to specifying parser as a dependency, we also specify the target type to be none. This will avoid building an empty library out of the main package, since it doesn’t contain any source files of its own.

As a final step, we’ll verify that the whole library is working by creating a separate project that uses the DMD DUB package as a dependency. We create a new DUB project in the test directory, called dub_package:

$ cd test
$ mkdir dub_package
$ cd dub_package
$ cat > dub.sdl <<EOF
> name "dmd-dub-test"
> description "Test of the DMD Dub package"
> license "BSL 1.0"
>
> dependency "dmd" path="../../"
> EOF
$ mkdir source

We create a new file, source/app.d, with the following content:

void main()
{
}

// lexer
unittest
{
    import ddmd.lexer;
    import ddmd.tokens;

    immutable expected = [
        TOKvoid,
        TOKidentifier,
        TOKlparen,
        TOKrparen,
        TOKlcurly,
        TOKrcurly
    ];

    immutable sourceCode = "void test() {} // foobar";
    scope lexer = new Lexer("test", sourceCode.ptr, 0, sourceCode.length, 0, 0);
    lexer.nextToken;

    TOK[] result;

    do
    {
        result ~= lexer.token.value;
    } while (lexer.nextToken != TOKeof);

    assert(result == expected);
}

// parser
unittest
{
    import ddmd.astbase;
    import ddmd.parse;

    scope parser = new Parser!ASTBase(null, null, false);
    assert(parser !is null);
}

The above file contains two unit tests, one for the lexer and one for the parser. We can run dub test to run the unit tests for this package:

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
../../src/ddmd/globals.d(339,21): Error: file "VERSION" cannot be found or not in a path specified with -J
dmd failed with exit code 1.

Which gives us the error that it cannot find the VERSION file in any string import paths, even though we added the correct directory to the string import paths. If we run the tests with verbose output enabled, using the --verbose flag we get a hint (the output has been reduced to save space):

dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd -J. -lib

Here we see that the compiler is invoked with the -J. flag, which is what we previously specified in the lexer subpackage. The problem is that the current directory is now of the dmd-dub-test DUB package instead of the dmd DUB package. Looking at the documentation of DUB we can see there’s an environment variable, $PACKAGE_DIR, that we can use as the string import path instead of hardcoding it to use a single dot. We update the dflags setting of the lexer subpackage to use the $PACKAGE_DIR environment variable:

dflags "-J$PACKAGE_DIR"
}

Running the tests again shows that the error is fixed, but now we get a new error, a long list of undefined symbols (shortened here):

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd:parser ~issue-17392-dub: building configuration "library"...
dmd-dub-test ~master: building configuration "application"...
Linking...
Undefined symbols for architecture x86_64:
  "_D4ddmd7astbase12__ModuleInfoZ", referenced from:
      _D3app12__ModuleInfoZ in dmd-dub-test.o

The reason for this is that we’re importing the ddmd.astbase module in the test of the parser, but it’s never compiled. We can solve that problem by adding it to the parser subpackage in the dmd DUB package. Running dmd again to show all its dependencies shows that it also depends on the ddmd.astbasevisitor module. We add these two modules as follows:

sourceFiles \
  "src/ddmd/astbase.d" \
  "src/ddmd/astbasevisitor.d" \
  "src/ddmd/parse.d"

Finally, running the tests again shows that everything is working correctly:

$ dub test
No source files found in configuration 'library'. Falling back to "dub -b unittest".
Performing "unittest" build using dmd for x86_64.
dmd:root ~issue-17392-dub: building configuration "library"...
dmd:lexer ~issue-17392-dub: building configuration "library"...
dmd:parser ~issue-17392-dub: building configuration "library"...
dmd-dub-test ~master: building configuration "application"...
Linking...
Running ./dmd-dub-test

After verifying that both the lexer and parser are working in a separate DUB package, this is the final result of the package recipe for the dmd DUB package:

name "dmd"
description "The DMD compiler"
authors "Walter Bright"
copyright "Copyright © 1999-2017, Digital Mars"
license "BSL-1.0"

targetType "none"
dependency ":parser" version="*"

subPackage {
  name "root"
  targetType "library"
  sourcePaths "src/ddmd/root"
}

subPackage {
  name "lexer"
  targetType "library"
  sourcePaths

  sourceFiles \
    "src/ddmd/console.d" \
    "src/ddmd/entity.d" \
    "src/ddmd/errors.d" \
    "src/ddmd/globals.d" \
    "src/ddmd/id.d" \
    "src/ddmd/identifier.d" \
    "src/ddmd/lexer.d" \
    "src/ddmd/tokens.d" \
    "src/ddmd/utf.d"

  dflags "-J$PACKAGE_DIR"

  dependency "dmd:root" version="*"
}

subPackage {
  name "parser"
  targetType "library"
  sourcePaths

  sourceFiles \
    "src/ddmd/astbase.d" \
    "src/ddmd/astbasevisitor.d" \
    "src/ddmd/parse.d"

  dependency "dmd:lexer" version="*"
}

All this has now been merged into master and the DUB package is available here: http://code.dlang.org/packages/dmd. Happy hacking!

Inside D Version Manager

Posted on

In his day job, Jacob Carlborg is a Ruby backend developer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and the topic of this post, DVM. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.


D Version Manager (DVM), is a cross-platform tool that allows you to easily download, install and manage multiple D compiler versions. With DVM, you can select a specific version of the compiler to use without having to manually modify the PATH environment variable. A selected compiler is unique in each shell session, and it’s possible to configure a default compiler.

The main advantage of DVM is the easy downloading and installation of different compiler versions. Specify the version of the compiler you would like to install, e.g. dvm install 2.071.1, and it will automatically download and install that version. Then you can tell DVM to use that version by executing dvm use 2.071.1. After that, you can invoke the compiler as usual with dmd. The selected compiler version will persist until the end of the shell session.

DVM makes it possible for the user to select a specific compiler version without having to modify any makefiles or build scripts. It’s enough for any build script to refer to the compiler by name, i.e. dmd, as long as the user selects the compiler version with DVM before invoking the script.

History

DVM was created in the beginning of 2011. That was a different time for D. No proper installers existed, D1 was still a viable option, and each new release of DMD brought with it a number of regressions. Because of all the regressions, it was basically impossible to always use the latest compiler, and often even older compilers, for all of your projects. Taking into consideration projects from other developers, some were written in D1 and some in D2, making it inconvenient to have only one compiler version installed.

It was for these reasons I created DVM. Being able to have different versions of the compiler active in different shell sessions makes it easy to work on different projects requiring different versions of the compiler. For example, it was possible to open one tab for a D1 compiler and another for a D2 compiler.

The concept of DVM comes directly from the Ruby tool RVM. Where DVM installs D compilers, RVM installs Ruby interpreters. RVM can do everything DVM can do and a lot more. One of the major things I did not want to copy from RVM is that it’s completely written in shell script (bash). I wanted DVM to be written in D. Because it’s written in shell script, RVM enables some really useful features that DVM does not support, but some of them are questionable (some might call them hacks). For example, when navigating to an RVM-enabled project, RVM will automatically select the correct Ruby interpreter. However, it accomplishes this by overriding the built-in cd command. When the command is invoked, RVM will look in the target directory for one of the files .rvmrc or .ruby-version. If either is present, it will read that file to determine which Ruby interpreter to select.

Implementation and Usage

One of the goals of DVM was that it should be implemented in D. In the end, it was mostly written in D with a few bits of shell script. Note that the following implementation details are specific to the platforms that fall under D’s Posix umbrella, i.e. version(Posix), but DVM is certainly available for Windows with the same functionality.

Structure of the DVM Installation

Before DVM can be used, it needs to install itself. This is accomplished with the command, dvm install dvm. This will create the ~/.dvm directory. It contains the following subdirectories: archives, bin, compilers, env and scripts.

archives contains a cache of downloaded zip archives of D compilers.

bin contains shell scripts, acting as symbolic links, to all installed D compilers. The name of each contains the version of the compiler, e.g. dmd-2.071.1, making it possible to invoke a specific compiler without first having to invoke the use command. This directory also contains one shell script, dvm-current-dc, pointing to the currently active D compiler. This allows the currently active D compiler to be invoked without knowing which version has been set. This can be useful for executing the compiler from within an editor or IDE, for example. A shell script for the default compiler exists as well. Finally, this directory also contains the binary dvm itself.

The compilers directory contains all installed compilers. All of the downloaded compilers are unpacked here. Due to the varying quality of the D compiler archives throughout the years, the install command will also make a few adjustments if necessary. In the old days, there was only one archive for all platforms. This command will only include binaries and libraries for the current platform. Another adjustment is to make sure all executables have the executable permission set.

The env directory contains helper shell scripts for the use command. There’s one script for each installed compiler and one for the default selected compiler.

The scripts directory currently only contains one file, dvm. It’s a shell script which wraps the dvm binary in the bin directory. The purpose of this wrapper is to aid the use command.

The use Command

The most interesting part of the implementation is the use command, which selects a specific compiler, e.g. dvm use 2.071.1. The selection of a compiler will persist for the duration of the shell session (window, tab, script file).

The command works by prepending the path of the specified compiler to the PATH environment variable. This can be ~/.dvm/compilers/dmd-2.071.1/{platform}/bin for example, where {platform} is the currently running platform. By prepending the path to the environment variable, it guarantees the selected compiler takes precedence over any other possible compilers in the PATH. The reason the {platform} section of the path exists is related to the structure of the downloaded archive. Keeping this structure avoids having to modify the compiler’s configuration file, dmd.conf.

The interesting part here is that it’s not possible to modify the environment variables of the parent process, which in this case is the shell. The magic behind the use command is that the dvm command that you’re actually invoking is not the D binary; it’s the shell script in the ~/.dvm/scripts path. This shell script contains a function called dvm. This can be verified by invoking type dvm | head -n 1, which should print dvm is a function if everything is installed correctly.

The installation of DVM adds a line to the shell initialization file, .bashrc, .bash_profile or similar. This line will load/source the DVM shell script in the ~/.dvm/scripts path which will make the dvm command available. When the dvm function is invoked, it will forward the call to the dvm binary located in ~/.dvm/bin/dvm. The dvm binary contains all of the command logic. When the use command is invoked, the dvm binary will write a new file to ~/.dvm/tmp/result and exit. This file contains a command for loading/sourcing the environment file available in ~/.dvm/env that corresponds to the version that was specified when the use command was invoked. After the dvm binary has exited, the shell script function takes over again and loads/sources the result file if it exists. Since the shell script is loaded/sourced instead of executed, the code will be evaluated in the current shell instead of a sub-shell. This is what makes it possible to modify the PATH environment variable. After the result file is loaded/sourced, it’s removed.

If you find yourself with the need to build your D project(s) with multiple compiler versions, such as the current release of DMD, one or more previous releases, and/or the latest beta, then DVM will allow you to do so in a hassle-free manner. Pull up a shell, execute use on the version you want, and away you go.