Project Highlight: The New CTFE Engine

Posted on

CTFE (Compile-Time Function Execution) is today a core feature of the D Programming Language. D creator Walter Bright first implemented it in DMD as an extension of the constant folding logic that was already there. Don Clugston (of FastDelegate fame) made a pass at improving it and, according to Walter, “took it much further“. Since that time, usage of CTFE has shown up in one D project after another, including in D’s standard library. For example, Dmitry Olshansky employed it in his overhaul of std.regex to great effect.

On the last day of DConf 2016, Stefan Koch gave a lightning talk on his thoughts about CTFE in D. At the end of the talk, in response to a question from Andrei Alexandrescu on how D’s implementation could be improved, he said the following:

CTFE is really a hack. You can see that it’s a hack. It’s implemented as a hack. It is the most useful hack that I’ve ever seen, and it is definitely a hacker’s tool to do stuff that are like magic. But to be fast, it would need to be heavily redesigned, reimplemented, possibly executed in multiple threads, because it is used for stuff that we could never have envisioned when it was invented.

Not long after that, Stefan opened a discussion on the fourms and took up the torch to improve the CTFE engine. As to why he got started on this journey in the first place, Stefan says, “I started work on the CTFE engine because I said so at DConf.” But, of course, there’s more to it than that.

I have pretty heavy-weight CTFE needs (I worked on a compile-time trans-compiler). Also my CTFE SQLite reader is failing if you want to read a database bigger then 2MB at ctfe.

His investigations into the performance of the CTFE interpreter shed light on its problems.

The current interpreter interprets every AST-Node it sees directly. This leaves very little space to collect information about the code that is being interpreted. It doesn’t know when something will be used as a reference, so it needs to copy every variable on every mutation. It has to do a deep-copy for this. That means it copies the whole chain of mutations every time.

To clarify, he offers the following example.

Imagine foreach(i;0 .. 10) { a = i; }. On the first iteration we save a` = 0 and set a`` to 1. On the second iteration we save a``` = 1 and a````= 0 and we set a````` to 2 , then a`````` = 1 and a``````` = 0 and so on. As you can see, the memory requirements just shoot up. It’s basically a factorial function with a very small coefficient. That is why for very small workloads this extreme overhead is not noticeable.

That flaw looked unfixable. Indeed the whole architecture in dinterpret.d is very convoluted and hard to understand. I did a few experiments on improving memory-management of the interpreter but it proved fruitless.

Once he realized there was going to be no quick fix, Stefan sat down and drew up a plan to avoid digging himself into the same hole the current interpreter was in. The result of his planning led him down a road he hadn’t expected to travel.

Direct Interpretation was out of the question since it would give the new engine too little time to analyze data-flow and decided whether a copy was really needed or not. I had to implement an Intermediate Representation. It had to be portable to different evaluation back-ends. I ended up with a solution, inspired by OpenGL, of defining my interface in the form of function calls an evaluation back end had to implement. That meant I would not be able to simply modify the current interpreter. This made the start very steep, but it is a decision I do not regret.

His implementation consists of a front end and a back end.

The front end walks the AST and issues calls to the back end. And the back end transforms those calls into actual bytecode. This bytecode is interperted by the back end as soon as the front end requires it.

In terms of functionality, he likens the current implementation to an immediate mode graphics API, and his revamp to retained mode. In this case, though, it’s the immediate mode that’s the memory hog.

You can read about his progress in the CTFE Status thread, where he has been posting frequent updates. His updates include problems he encounters, features he implements, and performance statistics. Eventually, every compiler that uses the DMD front end will benefit from his improvements.

GSoC Report: DStep

Posted on

Wojciech Szęszoł is a Computer Science major at the University of Wrocław. As part of Google Summer of Code 2016, he chose to make improvements to Jacob Carlborg’s DStep, a tool to generate D bindings from C and Objective-C header files.


GSoC-icon-192It was December of last year and I was writing an image processing project for a course at my university. I would normally use Python, but the project required some custom processing, so I wasn’t able to use numpy. And writing the inner loops of image processing algorithms in plain Python isn’t the best idea. So I decided to use D.

I’ve been conscious about the existence of the D language for as long as I can remember, but I’d never convinced myself before to try it out. The first thing I needed to do was to load an image. At the time, I didn’t know that there is a DUB repository containing bindings to image loading libraries, so I started writing bindings to libjpeg by myself. It didn’t end very well, so I thought there should be a tool that will do the job for me. That’s when I found DStep and htod.

Unluckily, the capabilities of DStep weren’t satisfying (mostly the lack of any kind of support for the preprocessor) and htod didn’t run on Linux. I ended up coding my project in C++, but as GSoC (Google Summer of Code) was lurking on the horizon, I decided that I should give it a try with DStep. I began by contacting Craig Dillabaugh (Ed. Note: Craig volunteers to organize GSoC projects using D) to learn if there was any need for developing such a project. It sparked some discussion on the forum, the idea was accepted, and, more importantly, Russel Winder agreed to be the mentor of the project. After some time I needed to prepare and submit an official proposal. There was an interview and fortunately I was accepted.

The first commit I made for DStep is dated to February, 1. It was a proof of concept that C preprocessor definitions can be translated to D using libclang. Then I improved the testing framework by replacing the old Cucumber-based tests with some written in D. I made a few more improvements before the actual GSoC coding period began.

During GSoC, I added support for translation of preprocessor macros (constants and functions). It required implementing a parser for a small part of the C language as the information from libclang was insufficient. I implemented translation of comments, improved formatting of the output code (e.g. made DStep keep the spacing from C sources), fixed most of the issues from the GitHub issue list and ported DStep to Windows. While I was coding I was getting support from Jacob Carlborg. He did a great job by reviewing all of the commits I made. When I didn’t know how to accomplish something with D, I could always count on help on forum.dlang.org.

DStep was the first project of such a size that I coded in D. I enjoyed its modern features, notably the module system, garbage collector, built-in arrays, and powerful templates. I used unittest blocks a lot. It would be nice to have named unit tests, so that they can be run selectively. From the perspective of a newcomer, the lack of consistency and symmetry in some features is troubling, at least before getting used to it. For example there is a built-in hash map and no hash set, some identifiers that should be keywords starts with @ (Ed. Note: see the Attributes documentation), etc. I was very sad when I read that the in keyword is not yet fully implemented. Despite those little issues, the language served me very well overall. I suppose I will use D for my personal toy projects in the future. And for DStep of course. I have some unfinished business with it :).

I would like to encourage all students to take part in future editions of GSoC. And I must say the D Language Foundation is a very good place to do this. I learned a lot during this past summer and it was a very exciting experience.

Inside D Version Manager

Posted on

In his day job, Jacob Carlborg is a Ruby backend developer for Derivco Sweden, but he’s been using D on his own time since 2006. He is the maintainer of numerous open source projects, including DStep, a utility that generates D bindings from C and Objective-C headers, DWT, a port of the Java GUI library SWT, and the topic of this post, DVM. He implemented native Thread Local Storage support for DMD on OS X and contributed, along with Michel Fortin, to the integration of Objective-C in D.


D Version Manager (DVM), is a cross-platform tool that allows you to easily download, install and manage multiple D compiler versions. With DVM, you can select a specific version of the compiler to use without having to manually modify the PATH environment variable. A selected compiler is unique in each shell session, and it’s possible to configure a default compiler.

The main advantage of DVM is the easy downloading and installation of different compiler versions. Specify the version of the compiler you would like to install, e.g. dvm install 2.071.1, and it will automatically download and install that version. Then you can tell DVM to use that version by executing dvm use 2.071.1. After that, you can invoke the compiler as usual with dmd. The selected compiler version will persist until the end of the shell session.

DVM makes it possible for the user to select a specific compiler version without having to modify any makefiles or build scripts. It’s enough for any build script to refer to the compiler by name, i.e. dmd, as long as the user selects the compiler version with DVM before invoking the script.

History

DVM was created in the beginning of 2011. That was a different time for D. No proper installers existed, D1 was still a viable option, and each new release of DMD brought with it a number of regressions. Because of all the regressions, it was basically impossible to always use the latest compiler, and often even older compilers, for all of your projects. Taking into consideration projects from other developers, some were written in D1 and some in D2, making it inconvenient to have only one compiler version installed.

It was for these reasons I created DVM. Being able to have different versions of the compiler active in different shell sessions makes it easy to work on different projects requiring different versions of the compiler. For example, it was possible to open one tab for a D1 compiler and another for a D2 compiler.

The concept of DVM comes directly from the Ruby tool RVM. Where DVM installs D compilers, RVM installs Ruby interpreters. RVM can do everything DVM can do and a lot more. One of the major things I did not want to copy from RVM is that it’s completely written in shell script (bash). I wanted DVM to be written in D. Because it’s written in shell script, RVM enables some really useful features that DVM does not support, but some of them are questionable (some might call them hacks). For example, when navigating to an RVM-enabled project, RVM will automatically select the correct Ruby interpreter. However, it accomplishes this by overriding the built-in cd command. When the command is invoked, RVM will look in the target directory for one of the files .rvmrc or .ruby-version. If either is present, it will read that file to determine which Ruby interpreter to select.

Implementation and Usage

One of the goals of DVM was that it should be implemented in D. In the end, it was mostly written in D with a few bits of shell script. Note that the following implementation details are specific to the platforms that fall under D’s Posix umbrella, i.e. version(Posix), but DVM is certainly available for Windows with the same functionality.

Structure of the DVM Installation

Before DVM can be used, it needs to install itself. This is accomplished with the command, dvm install dvm. This will create the ~/.dvm directory. It contains the following subdirectories: archives, bin, compilers, env and scripts.

archives contains a cache of downloaded zip archives of D compilers.

bin contains shell scripts, acting as symbolic links, to all installed D compilers. The name of each contains the version of the compiler, e.g. dmd-2.071.1, making it possible to invoke a specific compiler without first having to invoke the use command. This directory also contains one shell script, dvm-current-dc, pointing to the currently active D compiler. This allows the currently active D compiler to be invoked without knowing which version has been set. This can be useful for executing the compiler from within an editor or IDE, for example. A shell script for the default compiler exists as well. Finally, this directory also contains the binary dvm itself.

The compilers directory contains all installed compilers. All of the downloaded compilers are unpacked here. Due to the varying quality of the D compiler archives throughout the years, the install command will also make a few adjustments if necessary. In the old days, there was only one archive for all platforms. This command will only include binaries and libraries for the current platform. Another adjustment is to make sure all executables have the executable permission set.

The env directory contains helper shell scripts for the use command. There’s one script for each installed compiler and one for the default selected compiler.

The scripts directory currently only contains one file, dvm. It’s a shell script which wraps the dvm binary in the bin directory. The purpose of this wrapper is to aid the use command.

The use Command

The most interesting part of the implementation is the use command, which selects a specific compiler, e.g. dvm use 2.071.1. The selection of a compiler will persist for the duration of the shell session (window, tab, script file).

The command works by prepending the path of the specified compiler to the PATH environment variable. This can be ~/.dvm/compilers/dmd-2.071.1/{platform}/bin for example, where {platform} is the currently running platform. By prepending the path to the environment variable, it guarantees the selected compiler takes precedence over any other possible compilers in the PATH. The reason the {platform} section of the path exists is related to the structure of the downloaded archive. Keeping this structure avoids having to modify the compiler’s configuration file, dmd.conf.

The interesting part here is that it’s not possible to modify the environment variables of the parent process, which in this case is the shell. The magic behind the use command is that the dvm command that you’re actually invoking is not the D binary; it’s the shell script in the ~/.dvm/scripts path. This shell script contains a function called dvm. This can be verified by invoking type dvm | head -n 1, which should print dvm is a function if everything is installed correctly.

The installation of DVM adds a line to the shell initialization file, .bashrc, .bash_profile or similar. This line will load/source the DVM shell script in the ~/.dvm/scripts path which will make the dvm command available. When the dvm function is invoked, it will forward the call to the dvm binary located in ~/.dvm/bin/dvm. The dvm binary contains all of the command logic. When the use command is invoked, the dvm binary will write a new file to ~/.dvm/tmp/result and exit. This file contains a command for loading/sourcing the environment file available in ~/.dvm/env that corresponds to the version that was specified when the use command was invoked. After the dvm binary has exited, the shell script function takes over again and loads/sources the result file if it exists. Since the shell script is loaded/sourced instead of executed, the code will be evaluated in the current shell instead of a sub-shell. This is what makes it possible to modify the PATH environment variable. After the result file is loaded/sourced, it’s removed.

If you find yourself with the need to build your D project(s) with multiple compiler versions, such as the current release of DMD, one or more previous releases, and/or the latest beta, then DVM will allow you to do so in a hassle-free manner. Pull up a shell, execute use on the version you want, and away you go.

Core Team Update: Martin Nowak

Posted on

In the early days of DMD, new releases were put out when Walter decided they were ready. There was no formal process, no commonly accepted set of criteria that defined the point at which a new compiler version was due or the steps involved in building the release packages. In the early days, that was just fine. As time passed, not so much.

According to Martin Nowak:

The old release process was completely opaque, inconsistent, irregular, and also very time-consuming for Walter. So at some point it more or less failed and Walter could no longer manage to build the next release.

Martin was eager to do something about it, but he wasn’t the first person to take action on the issue. He decided to start with the work that had come before.

At this point, I took a fairly complex script from Nick Sabalausky, which tried to emulate Walter’s process, and started to improve upon it. It got trimmed down to a much smaller size over time.

But that was just the beginning.

Next, I prepared OS images for Vagrant for all supported platforms. That alone took more than a week. After that, I wired up the release script with Vagrant. From there on we had kind of reproducible builds.

That was a major step forward and eliminated some of the common mistakes that had crept in now and again with the previous, no-so-reproducible, process. With the pieces in place, Martin got some help from another community member.

Andrew Edwards took over the actual release building, but I was still doing the main work of keeping the scripts running and managing bugfixes. At some point, Andrew no longer had time. As the back and forth between us was almost as much work as the release build process itself, I completely took over. Nowadays, we have the script running fully automated to build nightlies.

Thanks to Martin, the process for building DMD release packages is described step-by-step in the D Wiki so that anyone can do it if necessary. Moreover, he coauthored and implemented a D Improvement Proposal to clearly define a release schedule. This process was adopted beginning with DMD 2.068 and continues today.

With the improved release schedule, DMD users can plan ahead to anticipate new releases, or take advantage of nightly builds and point releases to test out bug fixes or, whenever they come around, new features in the language or the standard library. However, the schedule is not etched in stone, as any particular release may be delayed for one reason or another. For example, the 2.071 release introduced major changes to D’s import and symbol lookup to fix some long-standing annoyances, with the subsequent 2.071.1 fixing some regressions they introduced. The release of 2.072 has been delayed until all of the known issues related to the changes have been fixed.

Those who were around before Martin stepped up to take charge of the release process surely notice the difference. He and the others who have contributed along the way (like Nick and Andrew) have done the D community a major service. As a result, Martin has become an essential member of the core D team.

Making Of: LDC 1.0

Posted on

This is a guest post from Kai Nacke. A long-time contributor to the D community, Kai is the author of D Web Development and the maintainer of LDC, the LLVM D Compiler.


LDC has been under development for more than 10 years. From release to release, the software has gotten better and better, but the version number has always implied that LDC was still the new kid on block. Who would use a version 0.xx compiler for production code?

These were my thoughts when I raised the question, “Version number: Are we ready for 1.0?” in the forum about a year ago. At that time, the current LDC compiler was 0.15.1. In several discussions, the idea was born that the first version of LDC based on the frontend written in D should be version 1.0, because this would really be a major milestone. Version 0.18.0 should become 1.0!

Was LDC really as mature as I thought? Looking back, this was an optimistic view. At DConf 2015, Liran Zvibel from Weka.IO mentioned in his talk about large scale primary storage systems that he couldn’t use LDC because of bugs! Additionally, the beta version of 0.15.2 had some serious issues and was finally abandoned in favor of 0.16.0. And did I mention that I was busy writing a book about vibe.d?

Fortunately, over the past two years, more and more people began contributing to LDC. The number of active committers grew. Suddenly, the progress of LDC was very impressive: Johan added DMD-style code coverage and worked on merging the new frontend. Dan worked on an iOS version and Joakim on an Android version. Together, they made ARM a first class target of LDC. Martin and Rainer worked on the Windows version. David went ahead and fixed a lot of the errors which had occurred with the Weka code base. I spent some time working on the ports to PowerPC and AArch64. Uncounted small fixes from other contributors improved the overall quality.

Now it was obvious that a 1.x version was overdue. Shortly after DMD made the transition to the D-based frontend, LDC was able to use it. After the usual alpha and beta versions, I built the final release version on Sunday, June 5, and officially announced it the next day. Version 1.0 is shipping now!

Creating a release is always a major effort. I would like to say “Thank you!” to everybody who made this special release happen. And a big thanks to our users; your feedback is always a motivation to make the next LDC release even better.

Onward to 1.1!