Stefan Koch is the maintainer of sqlite-d, a native D sqlite reader, and has contributed to projects like SDC (the Stupid D Compiler) and vibe.d. He was also responsible for a 10% performance improvement in D’s current CTFE implementation and is currently writing a new CTFE engine, the subject of this post.
For the past nine months, I’ve been working on a project called NewCTFE, a reimplementation of the Compile-Time Function Evaluation (CTFE) feature of the D compiler front-end. CTFE is considered one of the game-changing features of D.
As the name implies, CTFE allows certain functions to be evaluated by the compiler while it is compiling the source code in which the functions are implemented. As long as all arguments to a function are available at compile time and the function is pure (has no side effects), then the function qualifies for CTFE. The compiler will replace the function call with the result.
Since this is an integral part of the language, pure functions can be evaluated anywhere a compile-time constant may go. A simple example can be found in the standard library module, std.uri
, where CTFE is used to compute a lookup table. It looks like this:
private immutable ubyte[128] uri_flags = // indexed by character ({ ubyte[128] uflags; // Compile time initialize uflags['#'] |= URI_Hash; foreach (c; 'A' .. 'Z' + 1) { uflags[c] |= URI_Alpha; uflags[c + 0x20] |= URI_Alpha; // lowercase letters } foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit; foreach (c; ";/?:@&=+$,") uflags[c] |= URI_Reserved; foreach (c; "-_.!~*'()") uflags[c] |= URI_Mark; return uflags; })();
Instead of populating the table with magic values, a simple expressive function literal is used. This is much easier to understand and debug than some opaque static array. The ({
starts a function-literal, the })
closes it. The ()
at the end tells the compiler to immediately evoke that literal such that uri_flags
becomes the result of the literal.
Functions are only evaluated at compile time if they need to be. uri_flags
in the snippet above is declared in module scope. When a module-scope variable is initialized in this manner, the initializer must be available at compile time. In this case, since the initializer is a function literal, an attempt will be made to perform CTFE. This particular literal has no arguments and is pure, so the attempt succeeds.
For a more in-depth discussion of CTFE, see this article by H. S. Teoh at the D Wiki.
Of course, the same technique can be applied to more complicated problems as well; std.regex
, for example, can build a specialized automaton for a regex at compile time using CTFE. However, as soon as std.regex
is used with CTFE for non-trivial patterns, compile times can become extremely high (in D everything that takes longer than a second to compile is bloat-ware :)). Eventually, as patterns get more complex, the compiler will run out of memory and probably take the whole system down with it.
The blame for this can be laid at the feet of the current CTFE interpreter’s architecture. It’s an AST interpreter, which means that it interprets the AST while traversing it. To represent the result of interpreted expressions, it uses DMD’s AST node classes. This means that every expression encountered will allocate one or more AST nodes. Within a tight loop, the interpreter can easily generate over 100_000_000
nodes and eat a few gigabytes of RAM. That can exhaust memory quite quickly.
Issue 12844 complains about std.regex
taking more than 16GB of RAM. For one pattern. Then there’s issue 6498, which executes a simple 0
to 10_000_000
while loop via CTFE and runs out of memory.
Simply freeing nodes doesn’t fix the problem; we don’t know which nodes to free and enabling the garbage collector makes the whole compiler brutally slow. Luckily there is another approach which doesn’t allocate for every expression encountered. It involves compiling the function to a virtual ISA (Instruction Set Architecture). This virtual ISA, also known as bytecode, is then given to a dedicated interpreter for that ISA (in the case in which a virtual ISA happens to be the same as the ISA of the host, we call it a JIT (Just in Time) interpreter).
The NewCTFE project concerns itself with implementing such a bytecode interpreter. Writing the actual interpreter (a CPU emulator for a virtual CPU/ISA) is reasonably simple. However, compiling code to a virtual ISA is exactly as much work as compiling it to a real ISA (though, a virtual ISA has the added benefit that it can be extended for customized needs, but that makes it harder to do JIT later). That’s why it took a month just to get the first simple examples running on the new CTFE engine, and why slightly more complicated ones still aren’t running even after 9 months of development. At the end of the post, you’ll find an approximate timeline of the work done so far.
I’ll be giving a presentation at DConf 2017, where I’ll discuss my experience implementing the engine and explain some of the technical details, particularly regarding the trade-offs and design choices I’ve made. The current estimation is that the 1.0 goals will not be met by then, but I’ll keep coding away until it’s done.
Those interested in keeping up with my progress can follow my status updates in the D forums. At some point in the future, I will write another article on some of the technical details of the implementation. In the meantime, I hope the following listing does shed some light on how much work it is to implement NewCTFE 🙂
- May 9th 2016
Announcement of the plan to improve CTFE. - May 27th 2016
Announcement that work on the new engine has begun. - May 28th 2016
Simple memory management change failed. - June 3rd 2016
Decision to implement a bytecode interpreter. - June 30th 2016
First code (taken from issue 6498) consisting of simple integer arithmetic runs. - July 14th 2016
ASCII string indexing works. - July 15th 2016
Initialstruct
support - Sometime between July and August
First switches work. - August 17th 2016
Support for the special casesif(__ctfe)
andif(!__ctfe)
- Sometime between August and September
Ternary expressions are supported - September 08th 2016
First Phobos unit tests pass. - September 25th 2016
Support for returning strings and ternary expressions. - October 16th 2016
First (almost working) version of the LLVM backend. - October 30th 2016
First failed attempts to support function calls. - November 01st
DRuntime unit tests pass for the first time. - November 10th 2016
Failed attempt to implement string concatenation. - November 14th 2016
Array expansion, e.g. assignment to thelength
property, is supported. - November 14th 2016
Assignment of array indexes is supported. - November 18th 2016
Support for arrays as function parameters. - November 19th 2016
Performance fixes. - November 20th 2016
Fixing the brokenwhile(true) / for (;;)
loops; they can now be broken out of 🙂 - November 25th 2016
Fixes togoto
and switch handling. - November 29th 2016
Fixes tocontinue
andbreak
handling. - November 30th 2016
Initial support forassert
- December 02nd 2016
Bailout onvoid
-initialized values (since they can lead to undefined behavior). - December 03rd 2016
Initial support for returningstruct
literals. - December 05th 2016
Performance fix to the bytecode generator. - December 07th 2016
Fixes tocontinue
andbreak
infor
statements (continue
must not skip the increment step) - December 08th 2016
Array literals with variables inside are now supported:[1, n, 3]
- December 08th 2016
Fixed a bug inswitch
statements. - December 10th 2016
Fixed a nasty evaluation order bug. - December 13th 2016
Some progress on function calls. - December 14th 2016
Initial support for strings in switches. - December 15th 2016
Assignment of static arrays is now supported. - December 17th 2016
Fixinggoto
statements (we were ignoring the lastgoto
to any label :)). - December 17th 2016
De-macrofied string-equals. - December 20th 2016
Implement check to guard against dereferencing null pointers (yes… that one was oh so fun). - December 22ed 2016
Initial support for pointers. - December 25th 2016
static immutable
variables can now be accessed (yes the result is recomputed … who cares). - January 02nd 2017
First Function calls are supported !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! - January 17th 2017
Recursive function calls work now 🙂 - January 23rd 2017
The interpret3.d unit-test passes. - January 24th 2017
We are green on 64bit! - January 25th 2017
Green on all platforms !!!!! (with blacklisting though) - January 26th 2017
Fixed special casecast(void*) size_t.max
(this one cannot go through the normal pointer support, which assumes that you have something valid to dereference). - January 26th 2017
Member function calls are supported! - January 31st 2017
Fixed a bug inswitch
handling. - February 03rd 2017
Initial function pointer support. - Rest of Feburary 2017
Wild goose chase for issue #17220 - March 11th 2017
Initial support for slices. - March 15th 2017
String slicing works. - March 18th 2017
$
in slice expressions is now supported. - March 19th 2017
The concatenation operator (c = a ~ b
) works. - March 22ed 2017
Fixed aswitch
fallthrough bug.