Category Archives: Core Team

DMD Compiler as a Library: A Call to Arms

Digital Mars D logo

Having a flexible and powerful compiler library has been one of the stated goals of the D Language Foundation for some time now. This makes sense, as a proper compiler library will channel the efforts of contributors into building developer tools, which in turn, will increase the adoption rate of the language. However, progress on this topic has been slow, mainly due to two aspects: (1) the lack of a clear direction, and (2) the intimidating complexity of the DMD frontend, which requires significant work on the compiler codebase.

The good news is that we now have a plan, which I will outline in this blog post. The bad news is that implementing this plan requires significant effort, and we need more contributors. However, the silver lining is that the work, while extensive, mostly involves refactoring the code. This provides an excellent opportunity for contributors to familiarize themselves with the compiler codebase while delivering real value. Before delving into the specifics, let me give you some background.

Current Status And How We Got Here

To fully understand the work done so far on the compiler-as-a-library project, I highly recommend watching my talk on this subject.

In summary:

Several years ago, we began packaging the compiler as a library.
Our goal was to clearly separate compilation phases: lexing, parsing, semantic analysis, optimizations, and code generation.
The parsing and semantic analysis modules were interdependent, necessitating a method for separation.
We opted to template the parser with an ASTFamily template parameter, defining the AST nodes required for parsing.
We created ASTBase (containing AST nodes essential for parsing) and ASTCodegen (containing AST nodes needed for code generation).
ASTBase, as it stands, is code duplicated from ASTCodegen.
We started extracting semantic routines and fields from AST nodes to eliminate ASTBase’s code duplication by importing a subset of modules used by ASTCodegen.
Additionally, we began replacing third-party libraries (like libdparse) with the DMD-as-a-library package.

For more detailed information on each of these points, I recommend watching the talk I referenced.

Recently, I proposed to Walter a modification to the codebase that would significantly enhance the flexibility of the compiler library, allowing any AST node to be overwritten. Walter was hesitant to accept my proposal, concerned about the potential “ugliness” it would introduce to the codebase. He cited the addition of ASTBase and the resulting code duplication as a precedent. He then suggested that if we eliminate ASTBase, he would reconsider my proposal.

What You Can Do To Help

We are now focused on eliminating the duplication in ASTBase. To achieve this, we need to extract all information related to semantic analysis from the existing AST nodes. The challenge is the sheer number of AST nodes and the multitude of functions associated with each. I have been working on this sporadically over the past few months, and progress is slow due to the nature of the work: it mostly involves moving code, creating visitors, breaking dependencies, etc. While not overly complex, it isn’t particularly creative work either. However, for someone interested in understanding a real-life compiler codebase, it’s an ideal starting point.

If you’re willing to support this initiative, I’ve put together a guide on where to start and what you can do. Feel free to contact me on Slack (razvan.nitu), Discord, or email (razvan.nitu1305@gmail.com) for more details or to request a review of your PR.

I see this as an excellent opportunity to onboard new people into compiler development in a way that benefits both the language and the contributor. So, if you have some spare time, please join us in getting this work done!

DustMite: The General-Purpose Data Reduction Tool

If you’ve been around for a while, or are a particularly adventurous developer who enjoys mixing language features in interesting ways, you may have run into one compiler bug or two:

Implementation bugs are inevitably a part of using cutting-edge programming languages. Should you run into one, the steps to proceed are generally as follows:

Reduce the failing program to a minimal, self-contained example.
Add a description of what happens and what you expect to happen.
Post it on the bug tracker.

Nine years ago, an observation was made that when filing and fixing compiler bugs, a disproportionate amount of time was spent on the first step. When your program stops compiling “out of the blue”, or when the bug stops reproducing after the code is taken out of its context, manually paring down a large codebase by repeatedly cutting out code and checking if the bug still occurs becomes a tedious and repetitive task.

Fortunately, tedious and repetitive tasks are what computers are good for; they just have to be tricked into doing them, usually by writing a program. Enter DustMite.

The first version.

The basic operation is simple. The tool takes as inputs:

a data set to reduce (such as, a directory containing D source code which exhibits a particular compiler bug)
an oracle (or, more mundanely, a test script), which itself:
takes as input a variation of the data set, and
produces a yes-or-no answer on whether the input still satisfies the sought property (such as reproducing the particular compiler bug).

DustMite’s output is some local minimum variation of the data set, which it reaches by consecutively trying to remove parts of the data set and saving the results which the oracle approves. In the compiler bug example, this means removing bits of code which are not required to reproduce the problem at hand.

DustMite wouldn’t be very efficient if it attempted to remove things line-by-line or character-by-character. In order to maximize the chance of finding good reductions, the input is parsed into a tree according to the syntax of the input files.

Each tree node consists of a “head” (string), children (list of node pointers), and “tail” (string). Technically, it is redundant to have both “head” and “tail”, but they make representing some constructs and performing reductions much simpler, such as paren/bracket pairs.

Nodes are arranged into a binary tree as an optimization.

Additionally, nodes may have a list of dependencies. The dependency relationship essentially means “if this node is removed, these nodes should be removed too”. These constraints are not representable using just the tree structure described above, and are used to allow reducing things such as lists where trailing item delimiters are not allowed, or removing a function parameter and corresponding arguments from the entire code base at once.

In the case of D source code, declarations, statements, and subexpressions get their own tree nodes, so that they can be removed in one go if unneeded. The parser DustMite uses for D source code is intentionally very simple because it needs to handle potentially invalid D code, and you don’t want your bug reduction tool to also crash on top of the compiler.

How DustMite sees a simple D program.

An algorithm decides the order in which nodes are queued for potential deletion; DustMite implements several (it calls them “strategies”). Fundamentally, a strategy’s interface is (state_i, result_i) ⇒ (state_i+1, reduction_i+1), i.e., which reduction is chosen next depends on the previous reduction and its result. The default “inbreadth” strategy visits nodes in ascending depth order (row by row) and starts over from the top as long as it finds new reductions.

DustMite today supports quite a few more options:

The current version.

Probably, the most interesting of these is the -j switch—one reason being that DustMite’s task is inherently not parallelizable. Which reduction is chosen next, and the tree version to which that reduction is applied, depends on the previous reduction’s result.

DustMite works around this by putting unused CPU cores to work on lookahead: using a highly sophisticated predictor, it guesses what the result of the current reduction will be, and based on that assumption, calculates the next reduction. If the guess was right, great! We get to use that result. Otherwise, the work is wasted. Implementing this meant that strategies now needed to have copyable state, and so had to be refactored from an imperative style to a state machine.

Unfortunately, although the highly expected feature was implemented four years ago, the initial implementation was rather underwhelming. DustMite still did too much work in the main thread and wasted too much CPU time on rescanning the data set tree on every reduction. The problem was so bad that, at high core counts, lookahead mode was even slower than single-threaded mode.

I have recently set out to resolve these inadequacies. The following obstacles stood in the way:

Problem 1: Hashing was too slow. Because the oracle’s services (i.e., running the test script) are usually expensive, DustMite keeps track of a cache of previously attempted reductions and their outcome. This helps because not all tree transformations result in a change of output, and some strategies will retry reductions in successive iterations. A hash of the tree is used as the cache key; however, calculating it requires walking the entire tree every time, which is slow for large inputs.

Would it be possible to make the hash calculation incremental? One approach would be Merkle trees (each node’s hash is the hash of its children’s hashes), however that is suboptimal in the case of e.g., empty leaf nodes. CS erudite Ivan Kazmenko blesses us with an answer: polynomial hashes! By representing strings as polynomials, it is possible to use modulo arithmetic to calculate an incremental fixed-size hash and cache subtree hashes per node.

Each node holds its cumulative hash and length.

The number theory staggered me at first, so I recruited the assistance of feep from #d. After we went through a few draft implementations, I could begin working on the final version. The first improvement was replacing the naive exponentiation algorithm with exponentiation by squaring (D CTFE allowed precomputing a table at compile-time and a faster calculation than the classical method). Next, there was the matter of the modulo.

Initially, we used integer overflow for modulo arithmetic (i.e. q=2⁶⁴), however Ivan cautioned against using powers of two as the modulo, as this makes the algorithm susceptible to Thue-Morse strings. Not long ago I was experimenting with using long multiplication/division CPU instructions (where multiplying one machine word by another yields the result in two machine words with a high and low part, and vice-versa for division). D allows generating assembler code specific to the types that the function template is instantiated with, though in DustMite we only use the unsigned 64-bit variant (on x86 we fall back to using integer overflow).

With the hashing algorithm implemented, all that remained was to mark dirty nodes (they or their children had their content edited) and incrementally recalculate their hashes as needed. Dependencies posed a small obstacle: at the time, they were implemented as simply an array of pointers to the dependency node within the tree. As such, we didn’t know how to get to their parents (to mark them dirty as well), however this was easily overcome by adding a “parent” pointer to each node.

Well, or so I thought, until I got to work on the next problem.

Problem 2: Copying the tree. At the time, the current version of the tree representing the data set was part of the global state. Because of this, applying a reduction was implemented twice:

Temporarily, when saving the data set with a proposed reduction for testing:
void save(Reduction reduction, string savedir)
The tree here is not mutated, instead the reduction was applied “dynamically” to the output while walking the tree.
Permanently, when accepting a successful reduction, and irreversibly mutating the in-memory tree to apply it:
void applyReduction(ref Reduction r)

This was clumsy, but faster and less complicated than making a copy of the entire tree just to change one part of it to test a reduction. However, doing so was a requirement for proper lookahead, otherwise we would be unable to test reductions based on results where past tests predicted a positive outcome, or do nearly anything in a separate thread.

One issue was the tree “internal pointers”—making a copy would require updating all pointers within the tree to point to the new copies in the new tree. This was easy for children/parent pointers (since we can reliably visit every such pointer exactly once), but not quite for dependencies: because they were also implemented as simple pointers to nodes, we would have to keep track of a map of which node was copied where in order to update the dependency pointers.

One way to solve this would be to change the representation of node references from pointers to indices into a node array; this way, copying the tree would be as simple as a .dup. However, large inputs meant many nodes, and I wanted to see if it was possible to avoid iterating over every node in the tree (i.e. O(n)) for every reduction.

Was it possible? It would mean that we would copy only the modified nodes and their parents, leaving the rest of the tree in-place, and only reusing it as the copies’ children. This goal conflicted with the existence of “parent” pointers, because a parent would have to point towards either the old or new root, so to resolve this ambiguity every node would have to be copied. As a result, the way we handled dependencies needed to be rethought.

Editing trees with “copy on write” involves copying just the edited nodes (🔴), and their parents.

With internal pointers out, the next best thing to array indices for referencing a node was a series of instructions for how to reach the node from the tree root: an address. The representation of these addresses that I chose was a bit string represented as a linked list, where each list node holds the child index at that depth, starting from the deep end. Such a representation can be arranged in a tree where the root-side ends are shared, mimicking the structure of the tree containing the nodes for the reduced data, and thus allowing us to reuse memory and minimize allocations.

Nodes cannot hold their own address (as that would make them unmovable),
which is why they need to be stored outside of the main tree.

For addresses to work, the object they point at needs to remain the same, which means that we can no longer simply remove children from tree nodes—an address going through the second child would become invalid if the first child was removed. Rewriting all affected addresses for every tree edit is, of course, impractical, which leads us to the introduction of tombstones—dead nodes that only serve to preserve the index of the children that follow it. Because one of the possible reduction types involves moving subtrees around the tree, we now also have “redirects” (which are just tombstones with a “see here” address attached).

With the above changes in place, we can finally move forward with fixing and optimizing lookahead, as well as implementing incremental rehashing in a way that’s compatible with the above! The mutable global “current” tree variable is gone, save now simply takes a tree root as an argument, and applyReduction is now:

/// Apply a reduction to this tree, and return the resulting tree.
/// The original tree remains unchanged.
/// Copies only modified parts of the tree, and whatever references them.
Entity applyReduction(Entity origRoot, ref Reduction r)

With the biggest hurdle behind us, and a few more rounds of applying Walter Bright’s secret weapon, the performance metrics started to look more like what they should:

Going deeper would likely involve using OS-specific I/O APIs or rewriting D’s GC.

A mere 3.5x speed-up from a 32-fold increase in computational power may seem underwhelming. Here are some reasons for this:

With a 50/50 predictor, predictions form a complete binary tree, so doubling the number of parallel jobs gives you +1x more speed. That’s roughly log₂(jobs)-1, or 4 for 32 jobs – not far off!
The results will depend on the reduction being performed, so YMMV. For a certain artificial test case, one optimization (not pictured above) yielded a 500x speed-up!
DustMite does not try to keep all CPU cores busy all the time. If a prediction turns out false, all lookahead jobs based on it become wasted work, so DustMite only starts new lookahead tasks when a reduction’s outcome is resolved. Perhaps ideally DustMite would keep starting new jobs but kill them as soon as it discovers they’re based on a misprediction. As there is no cross-platform process group management in Phobos, the D standard library, this is something I left for future versions.
Some work is still done in the main thread, because moving it to a worker thread actually makes things slower due to the global GC lock.

There still remains one last place where DustMite iterates over every tree node per reduction: saving the tree to disk (so that it could be read by the test script). This seems unavoidable at first, but could actually be avoided by caching each node’s full text contents within the node itself.

I opted to leave this one out. With the other related improvements, such as using lockingBinaryWriter and aggregating writes of contiguous strings as one I/O operation, the increase in memory usage was much more dramatic than the decrease in execution time, even when optimized to just one allocation per reduction (polynomial hashing gives us every node’s total length for free). But, for a brief instant, DustMite processed reductions in sub-O(n) time.

One more addition is worth mentioning: Andrej Mitrovic suggested a switch which would replace removed text with whitespace, which would allow searching for exact line numbers in the test script. At the time, its addition posed significant challenges, as there needed to be some way to keep removed nodes in the tree but exclude them from future removal attempts. With the new tree representation, this became much easier, and also allowed creating the following animation:

In conclusion, I’d like to bring up that DustMite is good at more than just reducing compiler test cases. The wiki lists some ideas:

Finding the source of ambiguous or misleading compiler error messages (e.g., errors with the file/line information pointing only inside the standard library).
Alternative (much slower, but also much more thorough) method of verifying unit test code coverage. Just because a line of code is executed, that doesn’t mean it’s necessary; DustMite can be made to remove all code that does not affect the execution of your unit tests.
Similarly, if you have complete test coverage, it can be used for reducing the source tree to a minimal tree which includes support for only enabled unittests. This can be used to create a version of a program or library with a test-defined subset of features.
The --obfuscate mode can obfuscate your code’s identifiers. It can be used for preparing submission of proprietary code to bug trackers.
The --fuzz mode (a new feature) can help find bugs in compilers and tools by creating random programs (using fragments of other programs as input).

But DustMite is not limited to D programs (or any kind of programs) as input. With the --split option, we can tell DustMite how to parse and reduce other kinds of files. DustMite successfully handled the following scenarios:

reducing C++ programs (the D parser supports some C++-only syntax too);
reducing Python programs (using the indent split mode);
reducing a large commit to a minimal diff (using the diff split mode);
reducing a commit list, when git bisect is insufficient due to the problem being introduced across more than any single commit;
reducing a large data set to a minimal one, resulting in the same code coverage, with the purpose of creating a test suite;
and many more which I do not remember.

Today, some version of DustMite is readily available in major distributions (usually as part of some D-related package), so I’m happy having a favorite tool one apt-get / pacman -S away when I’m not at my PC.

Discovering a problem which can be elegantly reduced away by DustMite is always exciting for me, and I’m hoping you will find it useful too.

My Vision of D’s Future

When Andrei Alexandrescu stepped down as deputy leader of the D programming language, I was asked to take over the role going forward. It’s needless to say, but I’ll say it anyway, that those are some pretty big shoes to fill.

I’m still settling into my new role in the community and figuring out how I want to do things and what those things even are. None of this happens in a vacuum either, since Walter needs to be on board as well.

I was asked on the D forums to write a blog post on my “dreams and way forward for D”, so here it is. What I’d like to happen with D in the near future:

Memory safety

“But D has a GC!”, I hear you exclaim. Yes, but it’s also a systems programming language with value types and pointers, meaning that today, D isn’t memory safe. DIP1000 was a step in the right direction, but we have to be memory safe unless programmers opt-out via a “I know what I’m doing” @trusted block or function. This includes transitioning to @safe by default.

Safe and easy concurrency

We’re mostly there—using the actor model eliminates a lot of problems that would otherwise naturally occur. We need to finalize shared and make everything @safe as well.

Make D the default implementation language

D’s static reflection and code generation capabilities make it an ideal candidate to implement a codebase that needs to be called from several different languages and environments (e.g. Python, Excel, R, …). Traditionally this is done by specifying data structures and RPC calls in an Interface Definition Language (IDL) then translating that to the supported languages, with a wire protocol to go along with it.

With D, none of that is necessary. One can write the production code in D and have libraries automagically make that code callable from other languages. Add to all of this that it’s possible and easy to write D code that runs as fast or faster than the alternatives, and it’s a win on all fronts.

Second to none reflection.

Instead of disparate ways of getting things done with fragmented APIs (__traits, std.traits, custom code), I’d like for there to be a library that centralizes all reflection needs with a great API. I’m currently working on it.

Easier interop with C++.

As I mentioned in my DConf 2019 talk, C++ has had the success it’s enjoyed so far by making the transition from C virtually seamless. I want current C++ programmers with legacy codebases to just as easily be able to start writing D code. That’s why I wrote dpp, but it’s not quite there yet and we might have to make language changes to accommodate this going forward.

Fast development times.

I think we need a ridiculously fast interpreter so that we can skip machine code generation and linking. To me, this should be the default way of running unittest blocks for faster feedback, with programmers only compiling their code for runtime performance and/or to ship binaries to final users. This would also enable a REPL.

String interpolation

I was initially against this, but the more I think about it the more it seems to make sense for D. Why? String mixins. Code generation is one of D’s greatest strengths, and token strings enable visually pleasing blocks of code that are actually “just strings”. String interpolation would make them vastly easier to use. As it happens, there’s a draft DIP for it in the pipeline.

That’s what I came up with after a long walk by Lake Geneva. I’d love to know what the community thinks of this, what their pet peeves and pet features would be, and how they think this would help or hinder D going forward.

Ownership and Borrowing in D

Digital Mars logo Nearly all non-trivial programs allocate and manage memory. Getting it right is becoming increasingly important, as programs get ever more complex and mistakes get ever more costly. The usual problems are:

memory leaks (failure to free memory when no longer in use)
double frees (freeing memory more than once)
use-after-free (continuing to refer to memory already freed)

The challenge is in keeping track of which pointers are responsible for freeing the memory (i.e. owning the memory), which pointers are merely referring to the memory, where they are, and which are active (in scope).

The common solutions are:

Garbage Collection – The GC owns the memory and periodically scans memory looking for any pointers to that memory. If none are found, the memory is released. This scheme is reliable and in common use in languages like Go and Java. It tends to use much more memory than strictly necessary, have pauses, and slow down code because of inserted write gates.
Reference Counting – The RC object owns the memory and keeps a count of how many pointers point to it. When that count goes to zero, the memory is released. This is also reliable and is commonly used in languages like C++ and ObjectiveC. RC is memory efficient, needing only a slot for the count. The downside of RC is the expense of maintaining the count, building an exception handler to ensure the decrement is done, and the locking for all this needed for objects shared between threads. To regain efficiency, sometimes the programmer will cheat and temporarily refer to the RC object without dealing with the count, engendering a risk that this is not done correctly.
Manual – Manual memory management is exemplified by C’s malloc and free. It is fast and memory efficient, but there’s no language help at all in using them correctly. It’s entirely up to the programmer’s skill and diligence in using it. I’ve been using malloc and free for 35 years, and through bitter and endless experience rarely make a mistake with them anymore. But that’s not the sort of thing a programming shop can rely on, and note I said “rarely” and not “never”.

Solutions 2 and 3 more or less rely on faith in the programmer to do it right. Faith-based systems do not scale well, and memory management issues have proven to be very difficult to audit (so difficult that some coding standards prohibit use of memory allocation).

But there is a fourth way – Ownership and Borrowing. It’s memory efficient, as performant as manual management, and mechanically auditable. It has been recently popularized by the Rust programming language. It has its downsides, too, in the form of a reputation for having to rethink how one composes algorithms and data structures.

The downsides are manageable, and the rest of this article is an outline of how the ownership/borrowing (OB) system works, and how we propose to fit it into D. I had originally thought this would be impossible, but after spending a lot of time thinking about it I’ve found a way to fit it in, much like we’ve fit functional programming into D (with transitive immutability and function purity).

Ownership

The solution to who owns the memory object is ridiculously simple—there is only one pointer to it, so that pointer must be the owner. It is responsible for releasing the memory, after which it will cease to be valid. It follows that any pointers in the memory object are the owners of what they point to, there are no other pointers into the data structure, and the data structure therefore forms a tree.

It also follows that pointers are not copied, they are moved:

T* f();
void g(T*);
T* p = f();
T* q = p; // value of p is moved to q, not copied
g(p);     // error, p has invalid value

Moving a pointer out of a data structure is not allowed:

struct S { T* p; }
S* f();
S* s = f();
T* q = s.p; // error, can't have two pointers to s.p

Why not just mark s.p as being invalid? The trouble there is one would need to do so with a runtime mark, and this is supposed to be a compile-time solution, so attempting it is simply flagged as an error.

Having an owning pointer fall out of scope is also an error:

void h() {
  T* p = f();
} // error, forgot to release p?

It’s necessary to move the pointer somewhere else:

void g(T*);
void h() {
  T* p = f();
  g(p);  // move to g(), it's now g()'s problem
}

This neatly solves memory leaks and use-after-free problems. (Hint: to make it clearer, replace f() with malloc(), and g() with free().)

This can all be tracked at compile time through a function by using Data Flow Analysis (DFA) techniques, like those used to compute Common Subexpressions. DFA can unravel whatever rat’s nest of gotos happen to be there.

Borrowing

The ownership system described above is sound, but it is a little too restrictive. Consider:

struct S { void car(); void bar(); }
struct S* f();
S* s = f();
s.car();  // s is moved to car()
s.bar();  // error, s is now invalid

To make it work, s.car() would have to have some way of moving the pointer value back into s when s.car() returns.

In a way, this is how borrowing works. s.car() borrows a copy of s for the duration of the execution of s.car(). s is invalid during that execution and becomes valid again when s.car() returns.

In D, struct member functions take the this by reference, so we can accommodate borrowing through an enhancement: taking an argument by ref borrows it.

D also supports scope pointers, which are also a natural fit for borrowing:

void g(scope T*);
T* f();
T* p = f();
g(p);      // g() borrows p
g(p);      // we can use p again after g() returns

(When functions take arguments by ref, or pointers by scope, they are not allowed to escape the ref or the pointer. This fits right in with borrow semantics.)

Borrowing in this way fulfills the promise that only one pointer to the memory object exists at any one time, so it works.

Borrowing can be enhanced further with a little insight that the ownership system is still safe if there are multiple const pointers to it, as long as there are no mutable pointers. (Const pointers can neither release their memory nor mutate it.) That means multiple const pointers can be borrowed from the owning mutable pointer, as long as the owning mutable pointer cannot be used while the const pointers are active.

For example:

T* f();
void g(T*);
T* p = f();  // p becomes owner
{
  scope const T* q = p; // borrow const pointer
  scope const T* r = p; // borrow another one
  g(p); // error, p is invalid while q and r are in scope
}
g(p); // ok

Principles

The above can be distilled into the notion that a memory object behaves as if it is in one of two states:

there exists exactly one mutable pointer to it
there exist one or more const pointers to it

The careful reader will notice something peculiar in what I wrote: “as if”. What do I mean by that weasel wording? Is there some skullduggery going on? Why yes, there is. Computer languages are full of “as if” dirty deeds under the hood, like the money you deposit in your bank account isn’t actually there (I apologize if this is a rude shock to anyone), and this isn’t any different. Read on!

But first, a bit more necessary exposition.

Folding Ownership/Borrowing into D

Isn’t this scheme incompatible with the way people normally write D code, and won’t it break pretty much every D program in existence? And not break them in an easily fixed way, but break them so badly they’ll have to redesign their algorithms from the ground up?

Yup, it sure is. Except that D has a (not so) secret weapon: function attributes. It turns out that the semantics for the Ownership/Borrowing (aka OB) system can be run on a per-function basis after the usual semantic pass has been run. The careful reader may have noticed that no new syntax is added, just restrictions on existing code. D has a history of using function attributes to alter the semantics of a function—for example, adding the pure attribute causes a function to behave as if it were pure. To enable OB semantics for a function, an attribute @live is added.

This means that OB can be added to D code incrementally, as needed, and as time and resources permit. It becomes possible to add OB while, and this is critical, keeping your project in a fully functioning, tested, and releasable state. It’s mechanically auditable how much of the project is memory safe in this manner. It adds to the list of D’s many other memory-safe guarantees (such as no pointers to the stack escaping).

As If

Some necessary things cannot be done with strict OB, such as reference counted memory objects. After all, the whole point of an RC object is to have multiple pointers to it. Since RC objects are memory safe (if built correctly), they can work with OB without negatively impinging on memory safety. They just cannot be built with OB. The solution is that D has other attributes for functions, like @system. @system is where much of the safety checking is turned off. Of course, OB will also be turned off in @system code. It’s there that the RC object’s implementation hides from the OB checker.

But in OB code, the RC object looks to the OB checker like it is obeying the rules, so no problemo!

A number of such library types will be needed to successfully use OB.

Conclusion

This article is a basic overview of OB. I am working on a much more comprehensive specification. It’s always possible I’ve missed something and that there’s a hole below the waterline, but so far it’s looking good. It’s a very exciting development for D and I’m looking forward to getting it implemented.

For further discussion and comments from Walter, see the discussion threads on the /r/programming subreddit and at Hacker News.

The New Fundraising Campaign

On January 23, 2011, Walter announced that development of the core D projects had moved to GitHub. It’s somewhat entertaining to go through the thread and realize that it’s almost entirely about people coming to terms with using git, a reminder that there was a time when git was still relatively young and had yet to dominate the source control market.

The move to GitHub has since been cited by both Walter and Andrei as a big win, thanks to a subsequent increase in contributions. It was smooth sailing for quite some time, but eventually some grumbling began to be heard below the surface. Pull Requests (PRs) were being ignored. Reviews weren’t happening fast enough. The grumbling grew louder. There were long communication delays. PRs were sometimes closed by frustrated contributors, demotivated and unwilling to contribute further.

The oldest open PR in the DMD queue as I write is dated June 10, 2013. If we dig into it, we see that it was ultimately done in by a break in communication. The contributor was asking for more feedback, then there was silence for nine months. Six months later, when asked to rebase, the contributor no longer had time for it. A year and a half.

There are other reasons why PRs can stall, but in the end many of them have one thing in common: there’s no one pushing all parties involved toward a resolution. There’s no one following up every few days to make sure communication hasn’t broken down, or that any action that must be taken is followed through. Everyone involved in maintaining the PR queue has other roles and responsibilities both inside and outside of D development, but none of them have the bandwidth to devote to regular PR management.

We have had people step up and try to revive old PRs. We have seen some of them closed or merged. But there are some really challenging or time-consuming PRs in the list. The one linked above, for example. The last comment is from Sebastian Wilzbach in March of this year, who points out that it will be difficult to revive because it’s based on the old C++ code base. So who has the motivation to get it done?

I promised in the forums that I would soon be launching targeted funding campaigns. This seems like an excellent place to start.

Fostering an environment that encourages more contributions benefits everyone who uses D. The pool of people in the D community who have the skill and knowledge necessary to manage those contributions is small. If they had the time, we wouldn’t have this problem. A community effort to compensate one of them to make more time for it seems a just cause.

The D Language Foundation is asking the community as a whole to contribute to a fund to pay a part-time PR manager $1,000 per month for a period of three months, from November 15 to February 14. The manager will be paid the full amount after February 14, in one lump sum. At that time, we’ll evaluate our progress and decide whether to proceed with another three-month campaign.

We’ve already roped someone in who’s willing to do the job. Nicholas Wilson has been rummaging around the PR queue lately, trying to get merged those he has an immediate interest in. He’s also shown an interest in improving the D development process as a whole. That, and the fact that he said yes, makes him an ideal candidate for the role.

He’ll have two primary goals: preventing active PRs from becoming stale, and reviving PRs that have gone dormant. He’ll also be looking into open Bugzilla issues when he’s got some time to fill. He’ll have the weight of the Foundation behind his finger when he pokes it in the shoulder of anyone who is being unresponsive (and that includes everyone on the core team). Where he is unable to merge a PR himself, he’ll contact the people he needs to make it happen.

Click on the campaign card below and you’ll be taken to the campaign page. Our target is $3,000 by February, 14. If we exceed that goal, the overage will go toward the next three-month cycle should we continue. If we decide not to continue, overage will go to the General Fund.

Note that there are options for recurring donations that go beyond the campaign period. All donations through any campaign we launch through Flipcause go to the D Language Foundation’s account. In other words, they aren’t tied specifically to a single campaign. If you choose to initiate a recurring donation through any of our campaigns, you’ll be helping us meet our goal for that campaign and, once the campaign ends, your donations will go toward our general fund. If you do set up a monthly or quarterly donation, leave a note if you want to continue putting it toward paying the PR manager and we’ll credit it toward future PR manager campaigns.

When you’re viewing the blog on a desktop system, you’ll be able to see all of our active campaigns by clicking on the Donate Now button that I’ve added to the sidebar. Of course, the other donation options that have always been supported are still available from the donation page, accessible through the menu bar throughout dlang.org.

Thanks to Nicholas for agreeing to take on this job, and thanks in advance to everyone who supports us in getting it done!

DasBetterC: Converting make.c to D

Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century. This post is the third in a series about D’s BetterC mode

D as BetterC (a.k.a. DasBetterC) is a way to upgrade existing C projects to D in an incremental manner. This article shows a step-by-step process of converting a non-trivial C project to D and deals with common issues that crop up.

While the dmd D compiler front end has already been converted to D, it’s such a large project that it can be hard to see just what was involved. I needed to find a smaller, more modest project that can be easily understood in its entirety, yet is not a contrived example.

The old make program I wrote for the Datalight C compiler in the early 1980’s came to mind. It’s a real implementation of the classic make program that’s been in constant use since the early 80’s. It’s written in pre-Standard C, has been ported from system to system, and is a remarkably compact 1961 lines of code, including comments. It is still in regular use today.

Here’s the make manual, and the source code. The executable size for make.exe is 49,692 bytes and the last modification date was Aug 19, 2012.

The Evil Plan is:

Minimize diffs between the C and D versions. This is so that if the programs behave differently, it is far easier to figure out the source of the difference.
No attempt will be made to fix or improve the C code during translation. This is also in the service of (1).
No attempt will be made to refactor the code. Again, see (1).
Duplicate the behavior of the C program as exactly and as much as possible,
bugs and all.
Do whatever is necessary as needed in the service of (4).

Once that is completed, only then is it time to fix, refactor, clean up, etc.

Spoiler Alert!

The completed conversion. The resulting executable is 52,252 bytes (quite comparable to the original 49,692). I haven’t analyzed the increment in size, but it is likely due to instantiations of the NEWOBJ template (a macro in the C version), and changes in the DMC runtime library since 2012.

Step By Step

Here are the differences between the C and D versions. It’s 664 out of 1961 lines, about a third, which looks like a lot, but I hope to convince you that nearly all of it is trivial.

The #include files are replaced by corresponding D imports, such as replacing #include <stdio.h> with import core.stdc.stdio;. Unfortunately, some of the #include files are specific to Digital Mars C, and D versions do not exist (I need to fix that). To not let that stop the project, I simply included the relevant declarations in lines 29 to 64. (See the documentation for the import declaration.)

#if _WIN32 is replaced with version (Windows). (See the documentation for the version condition and predefined versions.)

extern (C): marks the remainder of the declarations in the file as compatible with C. (See the documentation for the linkage attribute.)

A global search/replace changes uses of the debug1, debug2 and debug3 macros to debug printf. In general, #ifdef DEBUG preprocessor directives are replaced with debug conditional compilation. (See the documentation for the debug statement.)

/* Delete these old C macro definitions...
#ifdef DEBUG
-#define debug1(a)       printf(a)
-#define debug2(a,b)     printf(a,b)
-#define debug3(a,b,c)   printf(a,b,c)
-#else
-#define debug1(a)
-#define debug2(a,b)
-#define debug3(a,b,c)
-#endif
*/

// And replace their usage with the debug statement
// debug2("Returning x%lx\n",datetime);
debug printf("Returning x%lx\n",datetime);

The TRUE, FALSE and NULL macros are search/replaced with true, false, and null.

The ESC macro is replaced by a manifest constant. (See the documentation for manifest constants.)

// #define ESC     '!'
enum ESC =      '!';

The NEWOBJ macro is replaced with a template function.

// #define NEWOBJ(type)    ((type *) mem_calloc(sizeof(type)))
type* NEWOBJ(type)() { return cast(type*) mem_calloc(type.sizeof); }

The filenamecmp macro is replaced with a function.

Support for obsolete platforms is removed.

Global variables in D are placed by default into thread-local storage (TLS). But since make is a single-threaded program, they can be inserted into global storage with the __gshared storage class. (See the documentation for the __gshared attribute.)

// int CMDLINELEN;
__gshared int CMDLINELEN

D doesn’t have a separate struct tag name space, so the typedefs are not necessary. An
alias can be used instead. (See the documentation for alias declarations.) Also, struct is omitted from variable declarations.

/*
typedef struct FILENODE
        {       char            *name,genext[EXTMAX+1];
                char            dblcln;
                char            expanding;
                time_t          time;
                filelist        *dep;
                struct RULE     *frule;
                struct FILENODE *next;
        } filenode;
*/
struct FILENODE
{
        char            *name;
        char[EXTMAX1]  genext;
        char            dblcln;
        char            expanding;
        time_t          time;
        filelist        *dep;
        RULE            *frule;
        FILENODE        *next;
}

alias filenode = FILENODE;

macro is a keyword in D, so we’ll just use MACRO instead.

Grouping together multiple pointer declarations is not allowed in D, use this instead:

// char *name,*text;
// In D, the * is part of the type and 
// applies to each symbol in the declaration.
char* name, text;

C array declarations are transformed to D array declarations. (See the documentation for D’s declaration syntax.)

// char            *name,genext[EXTMAX+1];
char            *name;
char[EXTMAX+1]  genext;

static has no meaning at module scope in D. static globals in C are equivalent to private module-scope variables in D, but that doesn’t really matter when the module is never imported anywhere. They still need to be __gshared and that can be applied to an entire block of declarations. (See the documentation for the static attribute)

/*
static ignore_errors = FALSE;
static execute = TRUE;
static gag = FALSE;
static touchem = FALSE;
static debug = FALSE;
static list_lines = FALSE;
static usebuiltin = TRUE;
static print = FALSE;
...
*/

__gshared
{
    bool ignore_errors = false;
    bool execute = true;
    bool gag = false;
    bool touchem = false;
    bool xdebug = false;
    bool list_lines = false;
    bool usebuiltin = true;
    bool print = false;
    ...
}

Forward reference declarations for functions are not necessary in D. Functions defined in a module can be called at any point in the same module, before or after their definition.

Wildcard expansion doesn’t have much meaning to a make program.

Function parameters declared with array syntax are pointers in reality, and are declared as pointers in D.

// int cdecl main(int argc,char *argv[])
int main(int argc,char** argv)

mem_init() expands to nothing and we previously removed the macro.

C code can play fast and loose with arguments to functions, D demands that function prototypes be respected.

void cmderr(const char* format, const char* arg) {...}

// cmderr("can't expand response file\n");
cmderr("can't expand response file\n", null);

Global search/replace C’s arrow operator (->) with the dot operator (.), as member access in D is uniform.

Replace conditional compilation directives with D’s version.

/*
 #if TERMCODE
    ...
 #endif
*/
    version (TERMCODE)
    {
        ...
    }

The lack of function prototypes shows the age of this code. D requires proper prototypes.

// doswitch(p)
// char *p;
void doswitch(char* p)

debug is a D keyword. Rename it to xdebug.

The \n\ line endings for C multiline string literals are not necessary in D.

Comment out unused code using D’s /+ +/ nesting block comments. (See the documentation for line, block and nesting block comments.)

static if can replace many uses of #if. (See the documentation for the static if condition.)

Decay of arrays to pointers is not automatic in D, use .ptr.

// utime(name,timep);
utime(name,timep.ptr);

Use const for C-style strings derived from string literals in D, because D won’t allow taking mutable pointers to string literals. (See the documentation for const and immutable.)

// linelist **readmakefile(char *makefile,linelist **rl)
linelist **readmakefile(const char *makefile,linelist **rl)

void* cannot be implicitly cast to char*. Make it explicit.

// buf = mem_realloc(buf,bufmax);
buf = cast(char*)mem_realloc(buf,bufmax);

Replace unsigned with uint.

inout can be used to transfer the “const-ness” of a function from its argument to its return value. If the parameter is const, so will be the return value. If the parameter is not const, neither will be the return value. (See the documentation for inout functions.)

// char *skipspace(p) {...}
inout(char) *skipspace(inout(char)* p) {...}

arraysize can be replaced with the .length property of arrays. (See the documentation for array properties.)

// useCOMMAND  |= inarray(p,builtin,arraysize(builtin));
useCOMMAND  |= inarray(p,builtin.ptr,builtin.length)

String literals are immutable, so it is necessary to replace mutable ones with a stack allocated array. (See the documentation for string literals.)

// static char envname[] = "@_CMDLINE";
char[10] envname = "@_CMDLINE";

.sizeof replaces C’s sizeof(). (See the documentation for the .sizeof property).

// q = (char *) mem_calloc(sizeof(envname) + len);
q = cast(char *) mem_calloc(envname.sizeof + len);

Don’t care about old versions of Windows.

Replace ancient C usage of char * with void*.

And that wraps up the changes! See, not so bad. I didn’t set a timer, but I doubt this took more than an hour, including debugging a couple errors I made in the process.

This leaves the file man.c, which is used to open the browser on the make manual page when the -man switch is given. Fortunately, this was already ported to D, so we can just copy that code.

Building make is so easy it doesn’t even need a makefile:

\dmd2.079\windows\bin\dmd make.d dman.d -O -release -betterC -I. -I\dmd2.079\src\druntime\import\ shell32.lib

Summary

We’ve stuck to the Evil Plan of translating a non-trivial old school C program to D, and thereby were able to do it quickly and get it working correctly. An equivalent executable was generated.

The issues encountered are typical and easily dealt with:

Replacement of #include with import
Lack of D versions of #include files
Global search/replace of things like ->
Replacement of preprocessor macros with:
- manifest constants
- simple templates
- functions
- version declarations
- debug declarations
Handling identifiers that are D keywords
Replacement of C style declarations of pointers and arrays
Unnecessary forward references
More stringent typing enforcement
Array handling
Replacing C basic types with D types

None of the following was necessary:

Reorganizing the code
Changing data or control structures
Changing the flow of the program
Changing how the program works
Changing memory management

Future

Now that it is in DasBetterC, there are lots of modern programming features available to improve the code:

modules!
memory safety (including buffer overflow checking)
metaprogramming
RAII
Unicode
nested functions
member functions
operator overloading
documentation generation
functional programming support
Compile Time Function Execution
etc.

Action

Let us know over at the D Forum how your DasBetterC project is coming along!

The #dbugfix Campaign: Round 1 Report

The D Foundation released version 2.080.0 of DMD on May 2nd. That normally would have been accompanied by a blog post highlighting some of the items from the changelog. This time, it should also have included the first update on the #dbugfix campaign. I was on vacation with my wife in the days before and after DConf and never got around to writing the post. It’s a bit late to announce the DMD release now, so this post is instead all about the first round of #dbugfix.

I admit, my initial expectations of participation were low. I suppose I didn’t want to be disappointed. Happily, in the end I had no reason to be pessimistic. There were several issues “nominated” on Twitter and in the forums. I was also pleased that some of the forum posts led to substantial discussion. In fact, at least one of the forum threads led to one of the nominated bugs being fixed well before the period ended, and the fix was included in the 2.080 release.

The results

The following issues were nominated, but were fixed before the nomination period ended. #5227 is the only one I’m certain was fixed as a result of discussion from the #dbugfix campaign. The others might have been, or they may have been fixed in the normal course of activity.

5227 X ^^ FP at compile-time
16189 Optimizer bug, with simple test case
18562 expression is not evaluated when accessing manifest constant – this one was marked as a duplicate of 12486, but that was closed as RESOLVED FIXED on April 8.
18574 Unclear error message when trying to inherit from multiple classes

The following issues were open at the time the bug fixers selected which two issues to focus on, along with the “vote” count (+1’s, retweets, or anything which indicated support for fixing the issue) as best as I could determine:

15984 [REG2.071] Interface contracts retrieve garbage instead of parameters – 5 votes
15692 Allow struct member initializer everywhere – 4 votes
5710 cannot use delegates as parameters to non-global template – 3 votes
18068 No file names and line numbers in stack trace – 3 votes
1983 Delegates violate const – 1 vote
2043 Closure outer variables in nested blocks are not allocated/instantiated correctly: should have multiple instances but only have one.
16486 Compiler see template alias like a separate type in template function definition – 1 vote (duplicate of 10884 and 1807)
17592 destroy isn’t marked @nogc when the class destructor is marked as @nogc – 1 vote (duplicate of 15246)
17957 D shared library throws asserts when called from C detached pthread but not terminated with dlclose – 1 vote
18147 Debug information limited in size – 1 vote
18353 Unexpected OPTLINK Termination at EIP = 0040F60A – 1 vote (duplicate of 17508)
18493 [betterC] Can’t use aggregated type with postblit – 1 vote
18572 AliasSeq default arguments are broken – 1 vote

The mandate for the bug fixers is to select at least two of the top five. Of course there has to be some leeway here, given that some issues are extremely difficult to resolve, so they can look beyond the top five if they need to. In this case, given that all of the issues share only four distinct vote counts, the entirety of the list is in play anyway.

I got the bug fixers together before I headed off to my (spectacular) German vacation and asked them to come to a consensus by May 1st on two bugs to fix (I must note that they met their deadline, even though I did not). The two bugs selected were:

There are no time constraints on when these will be fixed. It is hoped that they will be fixed and merged in time for 2.081.0, but they could come in a later release. The point is, each of these is now a subject of attention.

Some details regarding a few of the issues not selected:

re: 15692 – a while back, Sebastian Wilzbach revived an old DIP for in-place struct initialization and is part way through an implementation.
re: 1983 – there’s an old PR waiting for a champion to revive it.
re: 18493 – Mike Franklin has a PR that’s waiting on clarification about whether or not nothrow should be implicit with -betterC.

Keep it going

Now that it’s done, I’m happy to declare the inaugural round of the #dbugfix campaign a success. I had hoped that asking for nominations in the forums would as an alternative to Twitter would spark discussions and am pleased to see that happened. So let’s build on this and keep it going.

The slate was wiped clean and the votes reset from the day I declared the first round finished on April 24th. I haven’t noticed any nominations since then (though I haven’t searched for any yet), but please keep them coming. If there’s an issue on the above list that’s not yet fixed but that you feel strongly about, nominate it again. Or nominate something not on the list that’s stuck in your craw. If you see a nomination on Twitter that you support, retweet it to add your vote. In the forums, give a +1 reply. Keep it going!

And while you’re nominating your issues, keep in mind that fixing bugs is only part of the story. If you really care about seeing issues fixed, please try your hand at reviewing pull requests. Contributors whose fixes languish in the PR queue are demotivated from contributing more, but without more eyes reviewing the PRs, the bandwidth to speed up the reviews isn’t there. More PR reviews beget more merges beget more closed issues beget more motivated contributors and satisfied users.

Barring a change in release schedule, DMD 2.081.0 should be released July 1. This round of nominations will close at the beginning of the last week of June. So tweet out or open a new forum thread for your #dbugfix nominee today. Please include both the #dbugfix hashtag and the bugzilla issue number in the tweet or post title, along with a link to the issue report. That will make my job considerably easier.

And review some PRs!

Thanks to Sebasitan Wilzbach and Mike Franklin for their help with this, particularly in keeping me updated with info about PR and DIPs!

The New New DIP Process

When I took on the role of DIP Manager last year, my number one goal was to clear out the queue. I made a few revisions to the process and got busy. Over the next few months, things went along fairly well, not so much from anything I did as from the quality of the submissions. But at some point, things broke down and the process stalled.

Near the end of the year, Andrei asked me to make two specific changes to the process. One of them was to come up with a different approach for handling the final review. To that end, he suggested I look at how other languages handle the review of their enhancement proposals.

Before I started, I identified some other areas of the process that were problematic so that I could keep an eye out for ideas on how to shore them up. Might as well overhaul the whole process rather than one small part of it. So I put the entire DIP process on hold until the new process was ready to go.

The main thing I learned from looking at other language processes was that about the only thing they have in common is that they use GitHub repositories to store their proposals and that new submissions are made through PRs. Beyond that, there’s a quite a bit of variety in how they handle review and evaluation.

Ultimately, I decided the basic framework we already had was well-suited for our circumstances. It just needed some serious tweaking to iron out the problem spots. So I set out to write up a new procedure document to address the things that needed addressing. When it was done, it took a while to get the seal of approval, but it finally came through and the process is no longer stalled.

So first, before describing the major changes, it will help to understand their motivation.

What went wrong

One of the earliest stumbles I had as DIP Manager was a mix up over DIP 1006. I had gotten it in my head that the author intended to rewrite it before moving forward. The reality was that I had informed him before DConf that I would get it going at the end of May. The result was that it sat there for several months before I realized my error.

Another problem came with DIP 1009. There was an issue with the way it was written – the style didn’t meet the standard laid out in the Guidelines document. This led to multiple email exchanges, with massive delays and more misunderstandings, that resulted in the DIP being stuck in limbo for quite a while.

The communication problem over DIP 1009 was what prompted the process revision. For the Final Review of each DIP, I was acting as the middle-man between Walter and Andrei on one side and the DIP Author on the other. It worked well enough as long as little further effort on the DIP was needed, but as soon as there were questions that needed answering or more work to do, it became inefficient, cumbersome, and prone to misunderstanding.

Perhaps the biggest issue of all was time. DIP 1006 sat in Draft Review with nothing from me for several months. DIPs 1006, 1009, and 1011 have been in the Final Review stage for ages. There’s no reason any DIP author should have to wait for months on end with no feedback, or vague promises, no matter which stage of the process a DIP is in. It’s discouraging and demotivating. The process should require some motivation and effort from DIP authors, but it should also require a commitment from the other side to keep the authors informed and to get each DIP through from beginning to end as efficiently as possible.

Some of these problems could have been avoided if I had taken a different view of my role. I saw the role of DIP Manager more as that of a Shepherd than a Gatekeeper. The ultimate fate of a DIP rested on the Author’s shoulders, not mine. The Guidelines were “more what you call guidelines than actual rules”. After I made my revisions to the Guidelines document early on, they fell right out of my head and I never looked at the document again.

Righting the wrongs

The new Procedure document outlines the new process. Following is a summary of the big ones.

A minor issue is that there was some confusion about the existing review-stage names. There are now four review stages rather than three: Draft Review, Community Review, Final Review, and Formal Assessment. The Draft Review is the same as before. The Community Review is the new name for the old Preliminary Review. The old Final Review, which had two parts, has been split out into the Final Review and the Formal Assessment – the former is the last chance for the community to leave feedback, and the latter is Walter and Andrei’s decision round.

For all but the Draft Review, each stage specifies a maximum amount of time that a DIP can go without progress. For example, a DIP may remain in the Post-Community Round N state for 180 days, and a DIP in Formal Assessment should receive a final disposition within 30 days. The document defines the steps that must be taken when these deadlines are not met.

Related, though not in the document, is what I will do to keep to the deadlines. I’ll be making use of the calendar in the D Foundation’s Google account to post the start and finish date of each stage for each DIP. When a DIP is between stages, I’ll set milestone dates so that the DIP Author and I can have a clear target to aim for. If we’re on the same page, there will be less opportunity for uncertainty and misunderstanding.

The document provides for a new process for handling the Formal Assessment. No longer will I be a middleman between two email chains. Now, Walter and Andrei will provide their feedback on a private gist, with direct participation by the DIP Author. This should help things move more quickly and will eliminate (or greatly reduce) the chance of anyone (me in particular) causing more delay by getting things mixed up.

Another change is the requirement for a Point of Contact (POC). From here on out, every DIP must have a POC. For a single-author DIP, the DIP Author is the POC. If there are multiple authors, they must select one from among themselves. The need for this came to light after a misunderstanding that arose from the communication problem. The POC must commit to seeing the DIP through to the end of the process. The document outlines what happens to a DIP when the POC becomes unavailable.

Another change that’s not outlined in the document is in how I view my role as DIP Manager. From here on out, I will consider the guidelines as actual rules. I’ll do my best to make sure a DIP meets the standards expected in terms of language and style before it leaves the Draft Review stage. We can tweak it as we go, of course, but never again should a DIP be sent back for revision because it’s too informal.

Open to refinement

The new Procedure document and the undocumented tweaks to my process are the result of lessons learned over several months. That doesn’t mean they’re perfect. We’ll always be open to suggestions on how to patch up any holes that are identified. Not every change was mentioned above, so please read the document for the details.

Hopefully, the three DIPs currently awaiting a final disposition will be resolved before too much longer. After that, DIP 1012 will be moved forward for the Final Review and Formal Assessment to become the first DIP to go through the new gist-based review. DIP 1013 (which will likely be the one introducing binary assignment operators for properties), will be the first test-case for the new process in its entirety. Let’s all keep an eye open for what works and what needs work.

And to everyone, thanks for your patience while I went through my growing pains and we got the mess sorted out. Now that the train is back on the tracks, I’ll do my best to keep it moving.

DMD 2.079.0 Released

The D Language Foundation is happy to announce version 2.079.0 of DMD, the reference compiler for the D programming language. This latest version is available for download in multiple packages. The changelog details the changes and bugfixes that were the product of 78 contributors for this release.

It’s not always easy to choose which enhancements or changes from a release to highlight on the blog. What’s important to some will elicit a shrug from others. This time, there’s so much to choose from that my head is spinning. But two in particular stand out as having the potential to result in a significant impact on the D programming experience, especially for those who are new to the language.

No Visual Studio required

Although it has only a small entry in the changelog, this is a very big deal for programming in D on Windows: the Microsoft toolchain is no longer required to link 64-bit executables. The previous release made things easier by eliminating the need to configure the compiler; it now searches for a Visual Studio or Microsoft Build Tools installation when either -m32mscoff or -m64 are passed on the command line. This release goes much further.

DMD on Windows now ships with a set of platform libraries built from the MinGW definitions and a wrapper library for the VC 2010 C runtime (the changelog only mentions the installer, but this is all bundled in the zip package as well). When given the -m32mscoff or -m64 flags, if the compiler fails to find a Windows SDK installation (which comes installed with newer versions of Visual Studio – with older versions it must be installed separately), it will fallback on these libraries. Moreover, the compiler now ships with lld, the LLVM linker. If it fails to find the MS linker, this will be used instead (note, however, that the use of this linker is currently considered experimental).

So the 64-bit and 32-bit COFF output is now an out-of-the-box experience on Windows, as it has always been with the OMF output (-m32, which is the default). This should make things a whole lot easier for those coming to D without a C or C++ background on Windows, for some of whom the need to install and configure Visual Studio has been a source of pain.

Automatically compiled imports

Another trigger for some new D users, particularly those coming from a mostly Java background, has been the way imports are handled. Consider the venerable ‘Hello World’ example:

import std.stdio;

void main() {
    writeln("Hello, World!");
}

Someone coming to D for the first time from a language that automatically compiles imported modules could be forgiven for assuming that’s what’s happening here. Of course, that’s not the case. The std.stdio module is part of Phobos, the D standard library, which ships with the compiler as a precompiled library. When compiling an executable or shared library, the compiler passes it on to the linker along any generated object files.

The surprise comes when that same newcomer attempts to compile multiple files, such as:

// hellolib.d
module hellolib;
import std.stdio;

void sayHello() {
    writeln("Hello!");
}

// hello.d
import hellolib;

void main() {
    sayHello();
}

The common mistake is to do this, which results in a linker error about the missing sayHello symbol:

dmd hello.d

D compilers have never considered imported modules for compilation. Only source files passed on the command line are actually compiled. So the proper way to compile the above is like so:

dmd hello.d hellolib.d

The import statement informs the compiler which symbols are visible and accessible in the current compilation unit, not which source files should be compiled. In other words, during compilation, the compiler doesn’t care whether imported modules have already been compiled or are intended to be compiled. The user must explicitly pass either all source modules intended for compilation on the command line, or their precompiled object or library files for linking.

It’s not that adding support for compiling imported modules is impossible. It’s that doing so comes with some configuration issues that are unavoidable thanks to the link step. For example, you don’t want to compile imported modules from libFoo when you’re already linking with the libFoo static library. This is getting into the realm of build tools, and so the philosophy has been to leave it up to build tools to handle.

DMD 2.079.0 changes the game. Now, the above example can be compiled and linked like so:

dmd -i hello.d

The -i switch tells the compiler to treat imported modules as if they were passed on the command line. It can be limited to specific modules or packages by passing a module or package name, and the same can be excluded by preceding the name with a dash, e.g.:

dmd -i=foo -i=-foo.bar main.d

Here, any imported module whose fully-qualified name starts foo will be compiled, unless the name starts with foo.bar. By default, -i means to compile all imported modules except for those from Phobos and DRuntime, i.e.:

-i=-core -i=-std -i=-etc -i=-object

While this is no substitute for a full on build tool, it makes quick tests and programs with no complex configuration requirements much easier to compile.

The #dbugfix Campaign

On a related note, last month I announced the #dbugfix Campaign. The short of it is, if there’s a D Bugzilla issue you’d really like to see fixed, tweet the issue number along with #dbugfix, or, if you don’t have a Twitter account or you’d like to have a discussion about the issue, make a post in the General forum with the issue number and #dbugfix in the title. The core team will commit to fixing at least two of those issues for a subsequent compiler release.

Normally, I’ll collect the data for the two months between major compiler releases. For the initial batch, we’re going three months to give people time to get used to it. I anticipated it would be slow to catch on, and it seems I was right. There were a few issues tweeted and posted in the days after the announcement, but then it went quiet. So far, this is what we have:

DMD 2.080.0 is scheduled for release just as DConf 2018 kicks off. The cutoff date for consideration during this run will be the day the 2.080.0 beta is announced. That will give our bugfixers time to consider which bugs to work on. I’ll include the tally and the issues they select in the DMD release announcement, then they will work to get the fixes implemented and the PRs merged in a subsequent release (hopefully 2.081.0). When 2.080.0 is released, I’ll start collecting #dbugfix issues for the next cycle.

So if there’s an issue you want fixed that isn’t on that list above, put it out there with #dbugfix! Also, don’t be shy about retweeting #dbugfix issues or +1’ing them in the forums. This will add weight to the consideration of which ones to fix. And remember, include an issue number, otherwise it isn’t going to count!

The State of D 2018 Survey

NOTE: The survey is closed. Thanks to everyone who participated!

Strange things are afoot at the D Language Foundation. Odd noises and varicolored lights have been reported emanating from the cellar into the wee hours of the morning. Foundation members have been sighted, stumbling dazed and bleary-eyed in and out of the front door, arms full of mysterious black boxes. Neighbors whisper, and rumor has it that the spawn of so much secretive activity is only one arcane ritual away from seeing the light of day.

How right they are! For the past few weeks, the initiate Sebastian Wilzbach has devoted his energies to studying the Book of Modern Arcana in preparation for the ritual known as the State of D 2018 Survey. With feedback from those already steeped in the Dark arts, he has been refining the incantations of the ritual so that they prove most effective. Now, at long last, his preparations are complete and the ritual has been unleashed upon the world!

Upon the D community anyway.

And now it’s on you. This is your chance to turn your praise, complaints, and nitpicks into action. By participating in the State of D survey, you’ll be providing guidance to the D Language Foundation to help identify both short and long-term goals for the future development of D and its ecosystem.

A handful of initiatives are already coming together in the cellar. You’ll be able to read about them in more detail here as they are announced in the coming days, weeks, and months. The 2018 H1 document, presenting an overview of the current focus, will be announced soon. This survey will help build on existing plans, fill in the details of the general goals, and identify any course corrections that are necessary for 2018 H2 and beyond.

The Foundation is eager to address the issues that matter most to the community. Members frequently make their thoughts known in the forums, but the conversations can be long, sprawling, wandering, and hard to follow. The State of D Survey will allow the core team to see at a glance what’s going right and what’s not, and to focus their attention where it’s needed most. It could take 15 minutes or more of your time to complete, depending on the amount of thought you put into your answers. If you care about the D programming language, it’s well worth every minute.

We’ll leave the survey open for at least two weeks. A short while after it closes, we’ll publish the results here. From then on, the blog will cover what’s happening in response, providing updates on progress at reasonable intervals. Then next year we’ll do it all again.

Remember, D is a community-driven language, but not everyone is able to contribute as much as they would like. Whether you’re a frequent contributor or just getting started writing D programs, this is your chance to help make D an even better language than it is today.

Make your opinion count and take the survey!