Serialization in D

Posted on

Vladimir Panteleev has spent over a decade using and contributing to D. He is the creator and maintainer of DFeed, the software powering the D forums, has made numerous contributions to Phobos, DRuntime, DMD, and the D website, and has created several tools useful for maintaining D software (like Digger and Dustmite).


A few days ago, I saw this blog post by Justin Turpin on the front page of Hacker News:

The Grass is Always Greener – My Struggles with Rust

This was an interesting coincidence in that it occurred during DConf, where I had mentioned serialization in D a few times during my talk. Naturally, I was curious to see how D stands up to this challenge.

The Task

Justin’s blog starts off with the following Python code:

import configparser
config = ConfigParser()
config.read("config.conf")

This is actually very similar to a pattern I use in many of my D programs. For example, DFeed (the software behind forum.dlang.org), has this code for configuring its built-in web server:

struct ListenConfig
{
    string addr;
    ushort port = 80;
}

struct Config
{
    ListenConfig listen;
    string staticDomain = null;
    bool indexable = false;
}
const Config config;

import ae.utils.sini;
shared static this() { config = loadIni!Config("config/web.ini"); }

This is certainly more code than the Python example, but that’s only the case because I declare the configuration as a D type. The loadIni function then accepts the type as a template parameter and returns an instance of it. The strong typing makes it easier to catch typos and other mistakes in the configuration – an unknown field or a non-numeric value where a number is expected will immediately result in an error.

On the last line, the configuration is saved to a global by a static constructor (shared indicates it runs once during program initialization, instead of once per thread). Even though loadIni‘s return type is mutable, D allows the implicit conversion to const because, as it occurs in a static constructor, it is treated as an initialization.

Traits

The Rust code from Justin’s blog is as follows:

#[macro_use]
extern crate serde_derive;
extern crate toml;

#[derive(Deserialize)]
struct MyConfiguration {
  jenkins_host: String,
  jenkins_username: String,
  jenkins_token: String
}

fn gimme_config(some_filename: &str) -> MyConfiguration {
  let mut file = File::open(some_filename).unwrap();
  let mut s = String::new();
  file.read_to_string(&mut s).unwrap();
  let my_config: MyConfiguration = toml::from_str(s).unwrap();
  my_config
}

The first thing that jumps out to me is that the MyConfiguration struct is annotated with #[derive(Deserialize)]. It doesn’t seem optional, either – quoting Justin:

This was something that actually really discouraged me upon learning, but you cannot implement a trait for an object that you did not also create. That’s a significant limitation, and I thought that one of the main reason Rust decided to go with Traits and Structs instead of standard classes and inheritance was for this very reason. This limitation is also relevant when you’re trying to serialize and deserialize objects for external crates, like a MySQL row.

D allows introspecting the fields and methods of any type at compile-time, so serializing third-party types is not an issue. For example (and I’ll borrow a slide from my DConf talk), deserializing one struct field from JSON looks something like this:

string jsonField = parseJsonString(s);
enforce(s.skipOver(":"), ": expected");

bool found;
foreach (i, ref field; v.tupleof)
{
    enum name = __traits(identifier, v.tupleof[i]);
    if (name == jsonField)
    {
        field = jsonParse!(typeof(field))(s);
        found = true;
        break;
    }
}
enforce(found, "Unknown field " ~ jsonField);

Because the foreach aggregate is a tuple (v.tupleof is a tuple of v‘s fields), the loop will be unrolled at compile time. Then, all that’s left to do is compare each struct field with the field name we got from the JSON stream and, if it matches, read it in. This is a minimal example that can be improved e.g. by replacing the if statements with a switch, which allows the compiler to optimize the string comparisons to hash lookups.

That’s not to say D lacks means for adding functionality to existing types. Although D does not have struct inheritance like C++ or struct traits like Rust, it does have:

  • alias this, which makes wrapping types trivial;
  • opDispatch, allowing flexible customization of forwarding;
  • template mixins, which allow easily injecting functionality into your types;
  • finally, there is of course classic OOP inheritance if you use classes.

Ad-lib and Error Handling

It doesn’t always make sense to deserialize to a concrete type, such as when we only know or care about a small part of the schema. D’s standard JSON module, std.json, currently only allows deserializing to a tree of variant-like types (essentially a DOM). For example:

auto config = readText("config.json").parseJSON;
string jenkinsServer = config["jenkins_server"].str;

The code above is the D equivalent of the code erickt posted on Hacker News:

let config: Value = serde::from_reader(file)
    .expect("config has invalid json");

let jenkins_server = config.get("jenkins_server")
    .expect("jenkins_server key not in config")
    .as_str()
    .expect("jenkins_server key is not a string");

As D generally uses exceptions for error handling, the checks that must be done explicitly in the Rust example are taken care of by the JSON library.

Final thoughts

In the discussion thread for Justin’s post, Reddit user SilverWingedSeraph writes:

You’re comparing a systems language to a scripting language. Things are harder in systems programming because you have more control over, in this case, the memory representation of data. This means there is more friction because you have to specify that information.

This struck me as a false dichotomy. There is no reason why a programming language which has the necessary traits to be classifiable as a system programming language can not also provide the convenience of scripting languages to the extent that it makes sense to do so. For example, D provides type inference and variant types for when you don’t care about strong typing, and garbage collection for when you don’t care about object lifetime, but also provides the tools to get down to the bare metal in the parts of the code where performance matters.

For my personal projects, I’ve greatly enjoyed D’s capability of allowing rapidly prototyping a design, then optimizing the performance-critical parts as needed without having to use a different language to do so.

See also

The New CTFE Engine

Posted on

Stefan Koch is the maintainer of sqlite-d, a native D sqlite reader, and has contributed to projects like SDC (the Stupid D Compiler) and vibe.d. He was also responsible for a 10% performance improvement in D’s current CTFE implementation and is currently writing a new CTFE engine, the subject of this post.


For the past nine months, I’ve been working on a project called NewCTFE, a reimplementation of the Compile-Time Function Evaluation (CTFE) feature of the D compiler front-end. CTFE is considered one of the game-changing features of D.

As the name implies, CTFE allows certain functions to be evaluated by the compiler while it is compiling the source code in which the functions are implemented. As long as all arguments to a function are available at compile time and the function is pure (has no side effects), then the function qualifies for CTFE. The compiler will replace the function call with the result.

Since this is an integral part of the language, pure functions can be evaluated anywhere a compile-time constant may go. A simple example can be found in the standard library module, std.uri, where CTFE is used to compute a lookup table. It looks like this:

private immutable ubyte[128] uri_flags = // indexed by character
({

    ubyte[128] uflags;

    // Compile time initialize
    uflags['#'] |= URI_Hash;

    foreach (c; 'A' .. 'Z' + 1)
    {
        uflags[c] |= URI_Alpha;
        uflags[c + 0x20] |= URI_Alpha; // lowercase letters

    }

    foreach (c; '0' .. '9' + 1) uflags[c] |= URI_Digit;

    foreach (c; ";/?:@&=+$,") uflags[c] |= URI_Reserved;

    foreach (c; "-_.!~*'()") uflags[c] |= URI_Mark;

    return uflags;

})();

Instead of populating the table with magic values, a simple expressive function literal is used. This is much easier to understand and debug than some opaque static array. The ({ starts a function-literal, the }) closes it. The () at the end tells the compiler to immediately evoke that literal such that uri_flags becomes the result of the literal.

Functions are only evaluated at compile time if they need to be. uri_flags in the snippet above is declared in module scope. When a module-scope variable is initialized in this manner, the initializer must be available at compile time. In this case, since the initializer is a function literal, an attempt will be made to perform CTFE. This particular literal has no arguments and is pure, so the attempt succeeds.

For a more in-depth discussion of CTFE, see this article by H. S. Teoh at the D Wiki.

Of course, the same technique can be applied to more complicated problems as well; std.regex, for example, can build a specialized automaton for a regex at compile time using CTFE. However, as soon as std.regex is used with CTFE for non-trivial patterns, compile times can become extremely high (in D everything that takes longer than a second to compile is bloat-ware :)). Eventually, as patterns get more complex, the compiler will run out of memory and probably take the whole system down with it.

The blame for this can be laid at the feet of the current CTFE interpreter’s architecture. It’s an AST interpreter, which means that it interprets the AST while traversing it. To represent the result of interpreted expressions, it uses DMD’s AST node classes. This means that every expression encountered will allocate one or more AST nodes. Within a tight loop, the interpreter can easily generate over 100_000_000 nodes and eat a few gigabytes of RAM. That can exhaust memory quite quickly.

Issue 12844 complains about std.regex taking more than 16GB of RAM. For one pattern. Then there’s issue 6498, which executes a simple 0 to 10_000_000 while loop via CTFE and runs out of memory.

Simply freeing nodes doesn’t fix the problem; we don’t know which nodes to free and enabling the garbage collector makes the whole compiler brutally slow. Luckily there is another approach which doesn’t allocate for every expression encountered. It involves compiling the function to a virtual ISA (Instruction Set Architecture). This virtual ISA, also known as bytecode, is then given to a dedicated interpreter for that ISA (in the case in which a virtual ISA happens to be the same as the ISA of the host, we call it a JIT (Just in Time) interpreter).

The NewCTFE project concerns itself with implementing such a bytecode interpreter. Writing the actual interpreter (a CPU emulator for a virtual CPU/ISA) is reasonably simple. However, compiling code to a virtual ISA is exactly as much work as compiling it to a real ISA (though, a virtual ISA has the added benefit that it can be extended for customized needs, but that makes it harder to do JIT later). That’s why it took a month just to get the first simple examples running on the new CTFE engine, and why slightly more complicated ones still aren’t running even after 9 months of development. At the end of the post, you’ll find an approximate timeline of the work done so far.

I’ll be giving a presentation at DConf 2017, where I’ll discuss my experience implementing the engine and explain some of the technical details, particularly regarding the trade-offs and design choices I’ve made. The current estimation is that the 1.0 goals will not be met by then, but I’ll keep coding away until it’s done.

Those interested in keeping up with my progress can follow my status updates in the D forums. At some point in the future, I will write another article on some of the technical details of the implementation. In the meantime, I hope the following listing does shed some light on how much work it is to implement NewCTFE 🙂

  • May 9th 2016
    Announcement of the plan to improve CTFE.
  • May 27th 2016
    Announcement that work on the new engine has begun.
  • May 28th 2016
    Simple memory management change failed.
  • June 3rd 2016
    Decision to implement a bytecode interpreter.
  • June 30th 2016
    First code (taken from issue 6498) consisting of simple integer arithmetic runs.
  • July 14th 2016
    ASCII string indexing works.
  • July 15th 2016
    Initial struct support
  • Sometime between July and August
    First switches work.
  • August 17th 2016
    Support for the special cases if(__ctfe) and if(!__ctfe)
  • Sometime between August and September
    Ternary expressions are supported
  • September 08th 2016
    First Phobos unit tests pass.
  • September 25th 2016
    Support for returning strings and ternary expressions.
  • October 16th 2016
    First (almost working) version of the LLVM backend.
  • October 30th 2016
    First failed attempts to support function calls.
  • November 01st
    DRuntime unit tests pass for the first time.
  • November 10th 2016
    Failed attempt to implement string concatenation.
  • November 14th 2016
    Array expansion, e.g. assignment to the length property, is supported.
  • November 14th 2016
    Assignment of array indexes is supported.
  • November 18th 2016
    Support for arrays as function parameters.
  • November 19th 2016
    Performance fixes.
  • November 20th 2016
    Fixing the broken while(true) / for (;;) loops; they can now be broken out of 🙂
  • November 25th 2016
    Fixes to goto and switch handling.
  • November 29th 2016
    Fixes to continue and break handling.
  • November 30th 2016
    Initial support for assert
  • December 02nd 2016
    Bailout on void-initialized values (since they can lead to undefined behavior).
  • December 03rd 2016
    Initial support for returning struct literals.
  • December 05th 2016
    Performance fix to the bytecode generator.
  • December 07th 2016
    Fixes to continue and break in for statements (continue must not skip the increment step)
  • December 08th 2016
    Array literals with variables inside are now supported: [1, n, 3]
  • December 08th 2016
    Fixed a bug in switch statements.
  • December 10th 2016
    Fixed a nasty evaluation order bug.
  • December 13th 2016
    Some progress on function calls.
  • December 14th 2016
    Initial support for strings in switches.
  • December 15th 2016
    Assignment of static arrays is now supported.
  • December 17th 2016
    Fixing goto statements (we were ignoring the last goto to any label :)).
  • December 17th 2016
    De-macrofied string-equals.
  • December 20th 2016
    Implement check to guard against dereferencing null pointers (yes… that one was oh so fun).
  • December 22ed 2016
    Initial support for pointers.
  • December 25th 2016
    static immutable variables can now be accessed (yes the result is recomputed … who cares).
  • January 02nd 2017
    First Function calls are supported !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  • January 17th 2017
    Recursive function calls work now 🙂
  • January 23rd 2017
    The interpret3.d unit-test passes.
  • January 24th 2017
    We are green on 64bit!
  • January 25th 2017
    Green on all platforms !!!!! (with blacklisting though)
  • January 26th 2017
    Fixed special case cast(void*) size_t.max (this one cannot go through the normal pointer support, which assumes that you have something valid to dereference).
  • January 26th 2017
    Member function calls are supported!
  • January 31st 2017
    Fixed a bug in switch handling.
  • February 03rd 2017
    Initial function pointer support.
  • Rest of Feburary 2017
    Wild goose chase for issue #17220
  • March 11th 2017
    Initial support for slices.
  • March 15th 2017
    String slicing works.
  • March 18th 2017
    $ in slice expressions is now supported.
  • March 19th 2017
    The concatenation operator (c = a ~ b) works.
  • March 22ed 2017
    Fixed a switch fallthrough bug.

Snowflake Strings

Posted on

Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century.


Consider the following D code in file.d:

int foo(int i) {
    assert(i < 3);
    return i;
}

This is equivalent to the C code:

#include <assert.h>

int foo(int i) {
    assert(i < 3);
    return i;
}

The assert() in the D code is “lowered” (i.e. rewritten by the compiler) to the following:

(i < 3 || _d_assertp("file.d", 2))

We’re interested in how the compiler writes that string literal, "file.d" to the generated object file. The most obvious implementation is to write the characters into the data section and push the address of that to call _d_assertp().

Indeed, that does work, and it’s tempting to stop there. But since we’re professional compiler nerds obsessed with performance, details, etc., there’s a lot more oil in that olive (to borrow one of Andrei’s favorite sayings). Let’s put it in a press and start turning the screw, because assert()s are used a lot.

First off, string literals are immutable (they were originally mutable in C and C++, but are no longer, and D tries to learn from such mistakes). This suggests the string can
be put in read-only memory. What advantages does that deliver?

  • Read-only memory is, well, read only. Attempts to write to it are met with a seg fault exception courtesy of the CPUs memory management logic. Seg faults are a bit graceless, like a cop pulling you over for driving on the wrong side of the road, but at least there wasn’t a head-on collision or corruption of the program’s memory.
  • Read-only pages are never swapped out by the virtual memory system; they never get marked as “dirty” because they are never written to. They may get discarded and reloaded, but that’s much less costly.
  • Read-only memory is safe from corruption by malware (unless the malware infects the MMU, sigh).
  • Read-only memory in a shared library is shared – copies do not have to be made for each user of the shared library.
  • Read-only memory does not need to be scanned for pointers to the heap by the garbage collection system (if the application does GC).

Essentially, shoving as much as possible into read-only memory is good for performance, size and safety. There’s the first drop of oil.

The next issue crops up as soon as there’s more than one assert:

int foo(int i, int j) {
    assert(i < 3);
    assert(j & 1);
    return i + j;
}

The compiler emits two copies of "file.d" into the object file. Since they’re identical, and read-only, it makes sense to only emit one copy:

string TMP = "file.d";
int foo(int i, int j) {
    (i < 3 || _d_assertp(TMP, 2))
    (j & 1 || _d_assertp(TMP, 3))
    return i + j;
}

This is called string pooling and is fairly simple to implement. The compiler maintains a hash table of string literals and their associated symbol names (TMP in this case).

So far, this is working reasonably well. But when asserts migrate into header files, macros, and templates, the same string can appear in lots of object files, since the compiler doesn’t know what is happening in other object files (the separate compilation model). Other string literals can exhibit this behavior, too, when generic coding practices are used. There needs to be some way to present these in the object file so the linker can pool identical strings.

The dmd D compiler currently supports four different object file formats on different platforms:

  • Elf, for Linux and FreeBSD
  • Mach-O, for OSX
  • MS-COFF, for Win64
  • OMF, for Win32

Each does it in a different way, with different tradeoffs. The methods tend to be woefully under documented, and figuring this stuff out is why I get paid the big bucks.

Elf

Elf turns out to have a magic section just for this purpose. It’s named .rodata.strM.N where N is replace by the number of bytes a character has, and M is the alignment. For good old char strings, that would be .rodata.str1.1. The compiler just dumps the strings into that section, and the Elf linker looks through it, removing the redundant strings and adjusting the relocations accordingly. It’ll handle the usual string types – char, wchar, and dchar – with aplomb.

There’s just a couple flaws. The end of a string is determined by a nul character. This means that strings cannot have embedded nuls, or the linker will regard them as multiple strings and shuffle them about in unexpected ways. One cannot have relocations in those sections, either. This means it’s only good for C string literals, not other kinds of data.

This poses a problem for D, where the strings are length-delineated strings, not nul-terminated ones. Does this mean D is doomed to being unable to take advantage of the C-centric file formats and linker design? Not at all. The D compiler simply appends a nul when emitting string literals. If the string does have an embedded nul (allowed in D), it is not put it in these special sections (and the benefit is lost, but such strings are thankfully rare).

Mach-O

Mach-O uses a variant of the Elf approach, a special section named __cstring. It’s more limited in that it only works with single byte chars. No wchar_ts for you! If there ever was confirmation that UTF-16 and UTF-32 are dead end string types, this should be it.

MS-COFF

Microsoft invented MS-COFF by extending the old Unix COFF format. It has many magic sections, but none specifically for strings. Instead, it uses what are called COMDAT sections, one for each string. COMDATs are sections tagged with a unique name, and when the linker is faced with multiple COMDATs with the same name, one is picked and all references to the other COMDATs are rewritten to refer to the Anointed One. COMDATs first appeared in object formats with the advent of C++ templates, since template code generation tends to generate the same code over and over in separate files.

(Isn’t it interesting how object file formats are driven by the needs of C and C++?)

The COMDAT for "hello" would look something like this:

??_C@_05CJBACGMB@hello?$AA@:
db 'h', 'e', 'l', 'l', 'o', 0

The tty noise there is the mangled name of the COMDAT which is generated from the string literal’s contents. The algorithm must match across compilation units, as that is how the linker decides which ones are the same (experimenting with it will show that the substring CJBACGMB is some sort of hash). Microsoft’s algorithm for the mangling and hash is undocumented as far as I can determine, but it doesn’t matter anyway, it only has to have a 1:1 mapping between name and string literal. That screams “MD5 hash” to me, so that’s what dmd does. The name is an MD5 hash of the string literal contents, which also has the nice property that no matter how large the string gets, the identifier doesn’t grow.

COMDATs can have anything stuffed in them, so this is a feature that is usable for a lot more than just strings.

The downside of the COMDAT scheme is the space taken up by all those names, so shipping a program with the debug symbols in it could get much larger.

OMF

The caboose is OMF, an ancient format going back to the early 80’s. It was extended with a kludgy COMDAT system to support C++ just before most everyone abandoned it. DMD still emits it for Win32 programs. We’re stuck with it because it’s the only format the default linker (OPTLINK) understands, and so we find a way to press it into service.

Since it has COMDATs, that’s the mechanism used. The wrinkle is that COMDATs are code sections or data sections only; there are no other options. We want it to be read-only, so the string COMDATs are emitted as code sections (!). Hey, it works.

Conclusion

I don’t think we’ve pressed all the oil out of that olive yet. It may be like memcpy, where every new crop of programmers thinks of a way to speed it up.

I hope you’ve enjoyed our little tour of literals, and may all your string literals be unique snowflakes.

Thanks to Mike Parker for his help with this article.

The D Language Foundation’s Scholarship Program

Posted on

d6The D Language Foundation recently announced a new scholarship program aimed at EE and CS majors attending University “Politehnica” Bucharest (UPB). I contacted Andrei Alexandrescu for a few details on how the initiative came together, hoping for just enough tidbits of backstory to craft a blog post around. He obliged in a big way, turning my one question and “a few details” into an informative conversation.

Mike: I assume quite a lot of work went into this. Could you share a few details about how it came about?

Andrei: Gladly! The story starts back in 2012, when I gave a talk at the How to Web conference in Bucharest, my native city. It was a great event and I got to meet many great people. Except for one whose name kept coming up all over the Romanian IT space, Andrei Pitis.

I heard he was an instructor in the CS department at UPB (the best IT school in Romania, also noted internationally). He’s been directly involved in a number of IT-related foundations and professional organizations, and he created and led the immensely successful Vector Smart Watch startup. So, having heard he’d be around, I went to the conference speakers’ dinner hoping to bump into him.

Not knowing what he looked like, I was just craning my neck in search of someone who seemed popular. Meanwhile, I was passing time by making chit chat with a nice fellow who introduced himself to me. Now, you know how these group parties go. There’s always loud music and conversation, so I didn’t even hear his name and assumed he hadn’t heard mine.

As the evening progressed, I figured Andrei Pitis wasn’t going to show, so I had more time to chat with that fine gentleman. And I noticed two things. First, he was incredibly insightful. Second, he seemed equally excited about meeting me as I was about meeting Andrei Pitis. After a long while, the coin dropped: they were one and the same.

Thus started a great friendship. Andrei gave me great tips about how to start and conduct The D Language Foundation. Recently, he introduced me to two UPB CS systems professors, Razvan Deaconescu and Razvan Rughinis (together, the three had created the Tech Lounge nonprofit organization dedicated to helping graduating CS students start their careers).

Razvan Rughinis came up with the scholarship idea while we were chatting over beers in the quaint old town of Bucharest. In great part the idea was motivated by the strong interest UPB systems graduate students had in participating in a high-impact open source project such as the D language as part of their MSc thesis. In systems research (unlike e.g. CS theory), actual system building is a key part of the research project; therefore, a visible OSS project makes for a much stronger dissertation than the usual throwaway experimental code.

Clearly a strong opportunity had presented itself, and the DLang UPB scholarship is its realization.

Mike: How does the selection process work?

Andrei: The two professors introduce a few candidates, which I pass through the rigors of the typical Facebook interview. We also ask for the usual suspects – proof of enrollment, transcripts, motivation letter, and references.

Of all components, the most important are (in order) the interview, the quality of the BSc projects, and the recommendation letters from their professors. The four current scholarship recipients passed the interview with flying colors and have very strong BSc projects and references. Some of them returned from summer internships at prestigious companies such as Bloomberg, others won CS awards. I have no doubt any company in the Bay Area or elsewhere would be happy to work with them. Once they finish their MSc, of course :o).

And I should mention here that the two professors aren’t only involved in the selection process. They will make themselves available to help manage the students on an ongoing basis. We’re very fortunate to have them.

Mike: Can you provide any info on the current recipients and their projects?

Andrei: The current recipients are Alexandru Razvan Caciulescu, Lucia Cojocaru, Eduard Staniloiu, and Razvan Nitu. I have posted an introduction to each on the D forums and, now that you mention it, I told them to create a wiki page with a blurb for each. They are hosted in a nice shared office kindly donated by Tech-Lounge.ro and… we’re in the process of getting a coffee machine up there :o).

They are all obviously interested in taking large systems projects that benefit their research interests and have an impact on the D language. To get them started, I took a page from Facebook’s practice and defined a “bootcamp” program. Bootcamp is a month-long process (six weeks at Facebook) during which the so-called n00bs get familiar with the technologies used in the organization: the language proper; the core runtime and standard library; the build process; the way code changes are created, reviewed, accepted, and committed; and, last but not least, the community ethos and the kind of problems we are facing that are fit for ingenious solutions.

To kickstart the bootcamp program, I defined a “bootcamp” label in our Bugzilla and applied it to a bunch of existing bugs, with an eye for the kind of bug that simultaneously has low surface (you don’t need to know a lot of internal details to get into it) and offers a good learning experience. Right now each student is busy fixing a couple of such bugs.

Long-term we are looking at high-impact libraries and tools. I do have a few ideas, but I have no doubt the students will come up with their own. Just give them time.

Mike: Speaking of time… is there any room here for an update on the D Foundation’s finances?

Andrei: Of course. To be honest, right now we’re in better shape than ever before (and than I would have hoped). Thanks to Sociomantic, who footed a large part of DConf 2016’s bills, we have quite a bit of change left from conference registration fees. I have also personally carried a number of high-profile appearances at public tech events and private corporate training events, with proceeds flowing to the Foundation.

So we have accumulated a little war chest – not much, but definitely not negligible. With our current funds and operational costs, we are covered for over two years. Of course, the situation is fluid and I am working on expanding both income and (useful) expenditures.

We’re running a very tight operation, and I want to keep it that way. By the Foundation bylaws, its officers (Walter Bright, Ali Çehreli, and myself) cannot get income from the Foundation, which preempts a variety of conflicts of interest. We are a public charity, which reduces and simplifies our taxation. We use modern, low-overhead money transfer methods such as transferwise.com and constantly scan for better ones. Anyone who considers donating should know that about every five dollars donated goes straight to pay for one hour of an exceptional graduate student’s time.

Mike: Are there more applications in the queue? Do you plan to extend scholarships to other universities?

Andrei: UPB seems to be off to a great start, but it’s also a happy case for many reasons: it’s my undergrad alma mater, we know professors there, and we don’t need to pay tuition. If we wanted to extend a scholarship to another university we’d need to avail ourselves of similar strategic advantages. Needless to say, if anyone who reads this has ideas on the matter, please contact me.

Anyhow, for the time being, we got one more strong DLang UPB scholarship application literally today.

Mike: To close out, is there anything you’d like to say to people who’d like to help out?

Andrei: I’m very excited about this scholarship program and possible extensions to it. The reason for my excitement is that this is but a part of a larger strategy. Allow me to explain.

Up until now, we had no idea what to do with money even if we had it. A while ago, I met this potential donor who said, “OK, say I gave the Foundation half a million dollars over two years, no strings attached. What would you do with it?” To my own surprise, I had only vague answers. I asked Walter the same question, and he had even less of a clue than me.

So then I figured it’s essential for the Foundation to have a strong response to that. I’m a big believer in the adage “luck helps the prepared”, of which the converse is “luck is wasted on the unprepared”. By that paradigm, not knowing what we’d do with money was a definite way to ensure we’d never be big. Now that we have the scholarship program, there exists a powerful reason for people to donate to the Foundation: donations help us find and support good students to work on high-impact D-related projects that push the state of CS systems research forward.

Another thing that would be great to have “donations” of is contributor time. Receiving more students starts pushing against our management capacity. Currently, and somewhat to my surprise, I am effectively a manager, seeing that all of these things I just gave you an earful of (bringing money in to the Foundation, managing bootcamp, finances, operations) take enough time to be a full-time job that leaves little time for coding. At some point, I won’t be able to help everyone with their research, so I’ll need to delegate some of that work to other folks. I’m talking any capacity here – from code reviews to managing to co-authoring papers to co-advising.

There are more things I have in mind, but it’s early to share those. In brief, we need to organize ourselves for further growth. What’s clear to me is we’re no longer a seat-of-the-pants operation in a (virtual) basement. The D Language is exiting its adolescence.

Ruminations on D: An Interview with Walter Bright

Posted on

Joakim is the resident interviewer for the D Blog. He has also interviewed members of the D community for This Week in D and is responsible for the Android port of LDC.


d6Walter Bright is the creator and first implementer of the D programming language. He was an early developer of C++ compilers starting from the mid-’80s, including the first C++ compiler to translate source code directly to object code without using C as an intermediate, and has written compilers for ABEL, C, Java, and Javascript. He believes he is the only person to have written a full C++98 compiler by himself. Empire, one of the first computer strategy games, was written by Walter at Caltech. Before getting into computer programming full-time, Walter worked for Boeing from 1979-1982 on the 757’s flight controls, particularly the stabilizer trim gearbox and system and the elevators.

Joakim: D appears to be picking up speed. The fourth straight DConf took place earlier this year, Wired wrote a nice article a couple years ago, Gartner ranked it in the top 20 languages soon afterwards, and downloads of the reference compiler, DMD, average 1200 per day in recent years. Talk about the current popularity and status of the language and what you’re doing to take it to the next level.

Walter: I don’t worry too much about that. I spend my efforts making D the best language possible, and let the metrics take care of themselves. It’s like being a CEO; he shouldn’t be sweating the stock price, he should be working on making money for the company, then the stock price will take care of itself.

There’s always a stack of things to do, more or less with the most important one on the top, and I pop the top off and work on it, mostly the one with the maximum benefit/cost ratio. The ordering in the stack changes all the time. If an item has others strongly interested in working on it, I defer to them.

I get the jobs that nobody else wants to do. 🙂 Regression fixes for complex problems is a big one. It’s much more fun designing new stuff than maintenance on the existing stuff, dealing with technical debt, etc.

Joakim: In the last year, @nogc and interfacing with C++ have been at the top of your stack. Why? You always used to say that interfacing more with C++ would be too much work.

Walter: It became clear that the garbage collector wasn’t needed to be embedded in most things, that memory allocation could be decided separately from the algorithm. Letting the user decide seemed like a great way forward.

As for C++, I figured out a way to support it that avoided the problems I thought were impractical to deal with. I did a talk about it PDF slides here– and a Rust user wrote up a partial transcript.

Since that talk, I’ve come up with a solution for exceptions. C++ can throw/catch exceptions of any type. But few people do anything other than throw/catch a reference to a class, like std::exception. All D needed to do was allow throw/catch of a class marked with the extern (C++) attribute. Throw/catch of other types remains unsupported, and most likely will remain unsupported.

In order to make that work, the custom exception handling code in the code generation and the library had to change to work with the DWARF exception handling mechanism. That turned out to be a fair amount of work, as it is rather under-documented. But it’s pretty simple for the user.

Joakim: What do you plan on working on in D for the rest of the year? Work on @nogc and C++ support seems to be ongoing and you’ve tried to increase the use of ranges in the standard library for some time now. Anything else? What is the status of those efforts?

Walter: Those are still high-priority and ongoing efforts. But also we’re flexible; if a major opportunity comes up that needs us to push in a different direction, we’ll adapt. The pervasive use of ranges is advancing rapidly, I’ve been very pleased with the results.

The current focus is on improving memory safety. It’s become more and more important to have verifiable memory safety in a programming language, as the expenses involved when unsafe code is exploited in order to install malware become greater and greater. D has always supported memory safety, but recently we’ve embarked on a much more comprehensive review of memory safety in the core language and are making changes to close the gaps.

Joakim: How much time do you spend on D and what is your daily routine?

Walter: I work full time on D. I probably spend half my time working on the language, and the other half helping others with it, discussing things, doing interviews (!), writing articles, doing conferences, etc.

Joakim: You were 42 when you started working on D and I guess it is the first language you designed? Talk about why you started working on it so late–you were probably older then than most of the D contributors now–and what insights your experience gave you that your younger self or other young contributors may not have.

Walter: ABEL is the first language I designed, and was a solid success for Data I/O. It’s obsolete now, because the electronic devices it was aimed at are now obsolete, but it had a great run for 10 years or so.

Having been writing compilers my whole career, doing tech support for them, and following the various changes in the languages inevitably gave me much insight on what worked and what didn’t. Probably the biggest thing is that simpler is better. But making something simple is actually quite difficult. Anybody can (and does) design a complex solution, but few can see through the complexity to find the underlying simplicity.

Many successful languages were designed by older engineers.

Joakim: Please give some examples of such simplicity and how you were able to find it.

Walter: We nailed it with arrays (Jan Knepper’s idea), the basic template design, compile-time function execution (CTFE), and static if. I have no idea what the thought process is in any repeatable manner. If anything, it’s simply a dogged sense that there’s got to be a better way. It took me years to suddenly realize that a template function is nothing more than a function with two sets of parameters –compile time and run time–and then everything just fell into place.

I was more or less blinded by the complexity of templates such that I had simply missed what they fundamentally were. There was also a bit of the “gee, templates are hard” that predisposes one to believe they actually are hard, and then confirmation bias sets in.

I once attended a Scott Meyers presentation on type lists. He took an hour to explain it, with slide after slide of complexity. I had the thought that if it was an array of ints, his presentation would be 2 minutes long. I realized that an array of types should be equally straightforward.

With CTFE, we just went straight in the front door from asking the question “why can’t we execute this function at compile time, just like constant folding?” I did the initial one just by extending the constant folding logic. Don Clugston took it much further, but at its heart it’s still a modified dwarf. Stefan Koch is currently working on making a real interpreter out of it.

Joakim: Why do you think there hasn’t been a killer app for D yet? For example, Ruby was kind of an obscure language for a dozen years till Rails propelled it into the spotlight.

Walter: I suspect the age of the killer app is behind us. There is so much software existing and being written, for every imaginable purpose, that it’s hard to believe there will even be another killer app of any sort. Of course, predictions are notoriously unreliable.

Joakim: D has a unique approach in the compiled languages market, being mostly community-developed without a major corporate sponsor. C++ had Bell Labs, Rust has Mozilla, Go has Google; all have full-time paid devs on the language, even if they also take open-source contributions. Do you think this is a problem and is D being left behind?

Walter: We recently started a D foundation which will make it a lot easier for corporations to sponsor D.

Joakim: Do you make money off D? I know you’ve contracted with Facebook to write a C++ linter, Flint, and preprocessor, Warp, and that you work with Sociomantic and other companies using D.

Walter: I do some paid consulting work for D, but am careful not to take on so much that it interferes with working on D itself.

Joakim: Do you write much code in D outside of the standard library? If so, talk about recent stuff you’ve written and how the experience has been, plus a bit about using some of the new features.

Walter: D consumes so much of my efforts, there’s not much time to write other D apps other than smallish utilities. Currently, of course, the D compiler front end itself is now in D. But it’s been translated from C++, so isn’t idiomatic D.

Joakim: OK, so I guess Warp is the largest program you’ve written in D lately, about 10 klocs. Can you talk about the experience of actually coding that in D, as opposed to C/C++? What stood out for you?

Walter: What stood out is the speed with which it went together, and the remarkably small number of bugs that surfaced in it after it was released. I credit the extensive use of unit tests for the latter.

Having written another preprocessor before certainly helped, and it would be hard to tease that out as a separate effect. But I still believe that unit testing made the difference, because the way the preprocessor worked was very different from the one I’d done before.

What stood out with D was the ease of changing the data structures to try different ways, compared with doing this in other languages.

Joakim: When you think back to your vision for D as you were starting out in 1999, does it at all resemble that today? Anything big missing that you wanted back then?

Walter: D is far more advanced than what I thought of 15 years ago. Programming language ideas have certainly advanced since then, and D along with it.

D originally wasn’t going to have templates at all, based on my earlier experience with them. But finding a simple way to do them changed everything – even to the point where well over half of a modern D program is templates!

The idea of ranges slowly evolved over time, we’re still learning how to do it right.

bit as a basic type was unworkable. complex as a basic type turned out to be pointless (it works better as a library type). Auto-decoding of UTF-8 to code points turned out not nearly as useful as expected.

Transitive const was a leap of faith, and is consistently overlooked by other languages looking to adopt D features. I have a lot of faith in transitive const, and it is already paying off in making it possible to have pure functions, a key feature for modern programming.

Core Team Update: Martin Nowak

Posted on

In the early days of DMD, new releases were put out when Walter decided they were ready. There was no formal process, no commonly accepted set of criteria that defined the point at which a new compiler version was due or the steps involved in building the release packages. In the early days, that was just fine. As time passed, not so much.

According to Martin Nowak:

The old release process was completely opaque, inconsistent, irregular, and also very time-consuming for Walter. So at some point it more or less failed and Walter could no longer manage to build the next release.

Martin was eager to do something about it, but he wasn’t the first person to take action on the issue. He decided to start with the work that had come before.

At this point, I took a fairly complex script from Nick Sabalausky, which tried to emulate Walter’s process, and started to improve upon it. It got trimmed down to a much smaller size over time.

But that was just the beginning.

Next, I prepared OS images for Vagrant for all supported platforms. That alone took more than a week. After that, I wired up the release script with Vagrant. From there on we had kind of reproducible builds.

That was a major step forward and eliminated some of the common mistakes that had crept in now and again with the previous, no-so-reproducible, process. With the pieces in place, Martin got some help from another community member.

Andrew Edwards took over the actual release building, but I was still doing the main work of keeping the scripts running and managing bugfixes. At some point, Andrew no longer had time. As the back and forth between us was almost as much work as the release build process itself, I completely took over. Nowadays, we have the script running fully automated to build nightlies.

Thanks to Martin, the process for building DMD release packages is described step-by-step in the D Wiki so that anyone can do it if necessary. Moreover, he coauthored and implemented a D Improvement Proposal to clearly define a release schedule. This process was adopted beginning with DMD 2.068 and continues today.

With the improved release schedule, DMD users can plan ahead to anticipate new releases, or take advantage of nightly builds and point releases to test out bug fixes or, whenever they come around, new features in the language or the standard library. However, the schedule is not etched in stone, as any particular release may be delayed for one reason or another. For example, the 2.071 release introduced major changes to D’s import and symbol lookup to fix some long-standing annoyances, with the subsequent 2.071.1 fixing some regressions they introduced. The release of 2.072 has been delayed until all of the known issues related to the changes have been fixed.

Those who were around before Martin stepped up to take charge of the release process surely notice the difference. He and the others who have contributed along the way (like Nick and Andrew) have done the D community a major service. As a result, Martin has become an essential member of the core D team.

Core Team Update: Vladimir Panteleev

Posted on

When Walter Bright unleashed the first public release of DMD on the world, the whole release and maintenance process surrounding the language and the compiler was largely a one man show. Today, there are a number of regular contributors at every step of the process, but at the heart, in addition to Walter, are three core team members who keep the engine running. One of them is Vladimir Panteleev.

Vladimir maintains a number of D projects that have proven useful to D users over the years. Today’s entry focuses on one that had as much of an impact as any other D project you could name at the time upon its release, if not more.

If the name DFeed isn’t familiar to you, it’s the software that powers the D Forums. Posts pop up now and again from new users (and even some old ones!) asking for features like post editing or deletion, or other capabilities that are common in forum software found around the web. What they don’t realize is that DFeed is actually not the traditional sort of forum package they may be familiar with.

Go back a few years and the primary vehicle powering D discussions was an NNTP server that most people accessed via newsreaders. Over time, there were additional interfaces added, as Vladimir points out:

D only had an NNTP news server, a mailing list connected to it, and two shoddy PHP web interfaces for the NNTP server, both somehow worse than the other. A lot of people were asking for a regular web forum, such as phpBB. However, Walter would always give the same answer – they were inferior in many aspects to NNTP, thus he wouldn’t use them. What’s the point of an official forum if the community leaders don’t visit them?

Vladimir decided to take the initiative and do something about it. He took a project he had already been working on and enhanced it.

The program was originally simply an IRC bot (and called “DIrcFeed”), which announced newsgroup posts, GitHub activity, Wikipedia edits and such to the #d channel on FreeNode. Adding a local message cache and a web interface thus turned it into a forum.

And so DFeed was born. Walter announced that it had gone live on Valentine’s Day 2012. It even managed to get a positive reddit thread behind it, which was a big deal for D in those days. The NNTP server and the mailing list interface to it are still alive and well, but it’s probably a safe assumption that a number of users eventually ditched their newsreaders for DFeed’s web interface and never looked back. I know I did.

So if you’ve ever wondered why you can’t delete your posts in the D Forums, you now have the reason. DFeed does not have the authoritative database. That belongs to the NNTP server. To illustrate two major problems this brings about, consider the idea of adding Markdown support to DFeed.

First, people using NNTP/email won’t see the rendered versions. Which isn’t a big deal by itself since it’s just text, but does create feature imparity. It *is* possible to write Markdown that looks fine when rendered but is unreadable in source form, especially with some common extensions such as GitHub Flavored Markdown.

Second, unless we’re careful with this, people using NNTP/mailing lists might trigger Markdown formatting that could make their post unreadable. This could be avoided, though, by only rendering messages with Markdown if they originate from the web interface, which allows previewing posts.

Even so, he’s hoping to add support for Markdown at some point in the future. Another enhancement he’s eyeing is optional delayed NNTP/email posting.

A lot of younger people communicate mainly through web forums, and are very used to being able to edit their post after sending it. This is not supported on forum.dlang.org for the simple reason that you can’t unsend an email. One solution to this problem is to save all messages sent from the web interface locally, but delay their propagation to NNTP/email for a few minutes. This creates a window during which the message can still be edited, or even deleted.

Other features coming in a future update are OAuth support, a thread overview widget, and performance improvements. He says DFeed isn’t as fast as it used to be. He wants to change that. Don’t look for the new version of DFeed too soon, though. Right now, Vladimir’s attention is turned in directions that he believes will have a bigger impact on D. As soon as I know what that means, you’ll read about it here.

In the meantime, if you have any ideas on how to improve the forums, keeping in mind that any new features have to play nice with both the NNTP server and the mailing list, don’t hesitate to bring it up.

Thanks to Vladimir for taking the time to provide me with an update and for maintaining DFeed over the years. Here’s to many more!