The State of D 2018 Survey

NOTE: The survey is closed. Thanks to everyone who participated!


Strange things are afoot at the D Language Foundation. Odd noises and varicolored lights have been reported emanating from the cellar into the wee hours of the morning. Foundation members have been sighted, stumbling dazed and bleary-eyed in and out of the front door, arms full of mysterious black boxes. Neighbors whisper, and rumor has it that the spawn of so much secretive activity is only one arcane ritual away from seeing the light of day.

How right they are! For the past few weeks, the initiate Sebastian Wilzbach has devoted his energies to studying the Book of Modern Arcana in preparation for the ritual known as the State of D 2018 Survey. With feedback from those already steeped in the Dark arts, he has been refining the incantations of the ritual so that they prove most effective. Now, at long last, his preparations are complete and the ritual has been unleashed upon the world!

Upon the D community anyway.

And now it’s on you. This is your chance to turn your praise, complaints, and nitpicks into action. By participating in the State of D survey, you’ll be providing guidance to the D Language Foundation to help identify both short and long-term goals for the future development of D and its ecosystem.

A handful of initiatives are already coming together in the cellar. You’ll be able to read about them in more detail here as they are announced in the coming days, weeks, and months. The 2018 H1 document, presenting an overview of the current focus, will be announced soon. This survey will help build on existing plans, fill in the details of the general goals, and identify any course corrections that are necessary for 2018 H2 and beyond.

The Foundation is eager to address the issues that matter most to the community. Members frequently make their thoughts known in the forums, but the conversations can be long, sprawling, wandering, and hard to follow. The State of D Survey will allow the core team to see at a glance what’s going right and what’s not, and to focus their attention where it’s needed most. It could take 15 minutes or more of your time to complete, depending on the amount of thought you put into your answers. If you care about the D programming language, it’s well worth every minute.

We’ll leave the survey open for at least two weeks. A short while after it closes, we’ll publish the results here. From then on, the blog will cover what’s happening in response, providing updates on progress at reasonable intervals. Then next year we’ll do it all again.

Remember, D is a community-driven language, but not everyone is able to contribute as much as they would like. Whether you’re a frequent contributor or just getting started writing D programs, this is your chance to help make D an even better language than it is today.

Make your opinion count and take the survey!

DConf 2018 Munich: The Venue

The deadline for DConf 2018 submissions is this Sunday. If you’re on the fence about sending in a proposal, don’t still be poised there when midnight AOE strikes on the 25th! Come down before then on the submission side. If you’re selected to speak, you may be eligible for reimbursement for your hotel and travel expenses (reasonable expenses will be covered). This is our first time in Munich, and if you can pad out your visit by two or three days, there’s a lot to see while you’re there.

The venue is the NH Munich Messe hotel, located in the Zamdorf area of the city.

There’s a bus stop right outside that will get you to the Marienplatz and the New Town Hall, the world-famous center of the Bavarian capital, in short order. Not far from there, you’ll find the original Hofbräuhaus, where servers in traditional costumes pamper thousands of daily visitors from Munich and around the world, who come for the regional cuisine, music, folk dances, and historic atmosphere.

After you see the Old Town, be sure to make time for the modern world. The Deutsches Museum, which according to Wikipedia is the world’s largest science and technology museum, is a good place to start. With over 28,000 exhibits, it may be difficult to pull yourself away.

There are plenty of daytrip destinations outside of the city. One must-see spot is Neuschwanstein castle, one of the most recognizable structures in the world. World War II history buffs may be interested in a trip to Nuremburg. There are plenty of options for guided tours that can get you to these and other locations and back in a day, but it’s not difficult to get there on your own. Sites like TripAdvisor can help with the planning.

As for the hotel:

All the 253 rooms have just been refurbished, so you can expect stylish, comfortable bases. Nice touches include free Wi-Fi and pillow menus. Other highlights include a restaurant serving Bavarian dishes, a stylish lobby bar, and a compact fitness center. The Hotel also has Sky TV, allowing you to catch up on the day’s sporting events.

There’s a bar with a terrace which has the look and feel of a typical Bavarian beer garden. It’s surrounded by a little garden and is a great spot to enjoy a glass of wine or a light meal in the sunshine.

For the health-conscious, they also have a gym that’s open from 2:00 pm to 11:00 pm, and it can be opened at other times upon request. It was refurbished in 2015 and includes a sauna.

Most importantly for us, they are offering DConf attendees a discount on single rooms. Drop a line to reservierungen@nh-hotels.com to take advantage of this offer.

When the submission deadline passes this weekend, the next date to focus on is March 17th. That’s when the early-bird registration discount ends. Head over to the registration page before then!

Project Highlight: The D Community Hub

As has been stressed on this blog before, D is a community-driven language. Because of that, the ecosystem depends on the work of volunteers who are willing to contribute their time and open their projects to the community at large. From IDE and editor plugins to libraries in the DUB registry, it’s all about the efforts of people who are (usually) getting no monetary reward for their efforts.

There are some inherent downsides to that reality. Sometimes projects are abandoned. Sometimes they aren’t updated as frequently as users would like. This can become an issue for those who depend upon these projects, but it’s alleviated by the fact that most D projects are open source and their repositories are publicly available. To keep a project alive and up-to-date only requires more volunteers willing to pitch in.

That’s the motivation behind the D Community Hub (dlang-community) at GitHub. According to Sebastian Wilzbach, it started with Brian Schott’s popular tools used by several IDE and editor plugins:

There were maintenance issues with Brian’s (aka Hackerpilot) awesome projects. He has a full-time job and often could only respond to simple issues every few weeks. This meant that simple bug fix PRs sat in the queue for quite a while. For example, there was one case where the same PR to fix the Windows build script was submitted by three different people (there was no Windows CI at the time).

Brian’s projects weren’t the only ones that motivated the idea. Sebastian and Petar Kirov maintain the DLang Tour, and some of the projects they depend upon were either inactive or slow to update. However, Brian’s tools are widely used, so they started with him. Eventually, they convinced him to move some of his projects to the new organization and others followed.

Sebastian lays out the following benefits that have come from moving multiple projects from disparate developers under an umbrella group:

  • Common best policies (e.g. all repositories have GitHub branch protection)
  • No need to fork an inactive repository – work can be shared easily.
  • No dependence on a single person who might be busy or on vacation (this is especially important for swiftly pulling and releasing bug fixes )
  • One common location whenever updates are required (e.g. package bumps or deprecation fixes)
  • Many of the projects are enabled on the Project Tester (their test suite is run on every PR for the DMD, DRuntime, Phobos, Dub, and tools repositories to prevent regressions) – this is possible because many people have merge rights in case an improvement in the compiler finds critical bugs or deprecations are moved forward
  • Shared knowledge (e.g. all projects support “auto-merge” like the dlang repositories)
  • Automation with bots – Mark Rz (@skl131313) created a bot that automatically triggers update PRs whenever dependencies are updated (some of the projects in dlang-community still support builds with only git submodules and make)
  • Less overhead for automation with CIs (everyone can connect a repo to a third-party provider or restart a failing CI job)

It has also resulted in increased participation. For example, other D users have joined the group, and Sociomantic Labs (the D shop in Berlin that hosted the 2016 and 2017 editions of DConf) has taken over the release process for dfmt, Brian’s tool for formatting D source code.

There are currently 22 repositories in the dlang-community organization, including the following:

  • DCD (the D Completion Daemon) – an autocomplete program that is used by several D IDE and editor plugins
  • dfmt – a formatter for D source code, also used by many IDE and editor plugins
  • D-Scanner – a tool for analyzing D source code
  • dfix – a tool for automatically upgrading D source code
  • libparse – a library for lexing and parsing D source code
  • drepl – a DMD-based REPL for D
  • stdx-allocator – a frozen version of std.experimental.allocator (which is due for an overhaul)
  • containers – a set of containers backed by stdx.allocator to easily switch between different allocation strategies, with or without the GC
  • D-YAML – A YAML parser and emitter
  • harbored-mod – a documentation generator that supports both D’s built-in Ddoc syntax and Markdown

In addition to other D projects, there’s a repository set up specifically to discuss the dlang-community organization via GitHub issues, and repositories that contain artwork. If you decide to use any of these projects, the discussion repository is the place to ask for help when you need it.

Other projects may be added in the future. According to Sebastian, there are a few questions that form a set of loose criteria for inclusion under the dlang-community umbrella.

  • Is there enough interest from the general public so that it is “worth maintaining”?
  • Is there a similar library with active development out there?
  • Is at least one DLang community member competent for the domain covered by the project? If no, is there anyone who’s willing to fill the role?

Sebastian and the others are looking to add a few features over time. These include:

  • More automatic documentation builds
  • Automatic build of binaries on new tags (especially for Windows)
  • d-apt: Sociomantic is working on moving d-apt to GitHub and enabling full automatic CI builds for it.
  • dfmt: Leandro Lucarella / Sociomantic is introducing neptune and a proper release process

For anyone interested in joining the dlang-community organization, there are two options. If you are already a well-known participant in the D community, simply ping one of the existing members for merge rights. For anyone else, the best approach is to start contributing to one or more of the dlang-community projects to build up trust. At some point, frequent trustworthy contributors will be welcomed into the fold.

As for the current contributors, Sebasitian says:

There are many people working behind the scenes on the dlang-community libraries. A special thanks goes to the active reviewers who make it possible that your potential PR gets merged in a timely manner.

  • Basile Burg
  • Brian Schott
  • Jan Jurzitza
  • Leandro Lucarella
  • Martin Nowak
  • Petar Kirov
  • Richard Andrew Cattermole
  • skl131313
  • Stefan Koch

If you have or know of a D project that is suffering from a lack of attention, bringing it to the dlang-community might be the way to breathe new life into it. Don’t be shy in asking for help.

Vanquish Forever These Bugs That Blasted Your Kingdom

Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century. This post is the second in a series about D’s BetterC mode.


Do you ever get tired of bugs that are easy to make, hard to check for, often don’t show up in testing, and blast your kingdom once they are widely deployed? They cost you time and money again and again. If you were only a better programmer, these things wouldn’t happen, right?

Maybe it’s not you at all. I’ll show how these bugs are not your fault – they’re the tools’ fault, and by improving the tools you’ll never have your kingdom blasted by them again.

And you won’t have to compromise, either.

Array Overflow

Consider this conventional program to calculate the sum of an array:

#include <stdio.h>

#define MAX 10

int sumArray(int* p) {
    int sum = 0;
    int i;
    for (i = 0; i <= MAX; ++i)
        sum += p[i];
    return sum;
}

int main() {
    static int values[MAX] = { 7,10,58,62,93,100,8,17,77,17 };
    printf("sum = %d\n", sumArray(values));
    return 0;
}

The program should print:

sum = 449

And indeed it does, on my Ubuntu Linux system, with both gcc and clang and -Wall. I’m sure you already know what the bug is:

for (i = 0; i <= MAX; ++i)
              ^^

This is the classic “fencepost problem”. It goes through the loop 11 times instead of 10. It should properly be:

for (i = 0; i < MAX; ++i)

Note that even with the bug, the program still produced the correct result! On my system, anyway. So I wouldn’t have detected it. On the customer’s system, well, then it mysteriously fails, and I have a remote heisenbug. I’m already tensing up anticipating the time and money this is going to cost me.

It’s such a rotten bug that over the years I have reprogrammed my brain to:

  1. Never, ever use “inclusive” upper bounds.
  2. Never, ever use <= in a for loop condition.

By making myself a better programmer, I have solved the problem! Or have I? Not really. Let’s look again at the code from the perspective of the poor schlub who has to review it. He wants to ensure that sumArray() is correct. He must:

  1. Look at all callers of sumArray() to see what kind of pointer is being passed.
  2. Verify that the pointer actually is pointing to an array.
  3. Verify that the size of the array is indeed MAX.

While this is trivial for the trivial program as presented here, it doesn’t really scale as the program complexity goes up. The more callers there are of sumArray, and the more indirect the data structures being passed to sumArray, the harder it is to do what amounts to data flow analysis in your head to ensure it is correct.

Even if you get it right, are you sure? What about when someone else checks in a change, is it still right? Do you want to do that analysis again? I’m sure you have better things to do. This is a tooling problem.

The fundamental issue with this particular problem is that a C array decays to a pointer when it’s an argument to a function, even if the function parameter is declared to be an array. There’s just no escaping it. There’s no detecting it, either. (At least gcc and clang don’t detect it, maybe someone has developed an analyzer that does).

And so the tool to fix it is D as a BetterC compiler. D has the notion of a dynamic array, which is simply a fat pointer, that is laid out like:

struct DynamicArray {
    T* ptr;
    size_t length;
}

It’s declared like:

int[] a;

and with that the example becomes:

import core.stdc.stdio;

extern (C):   // use C ABI for declarations

enum MAX = 10;

int sumArray(int[] a) {
    int sum = 0;
    for (int i = 0; i <= MAX; ++i)
        sum += a[i];
    return sum;
}

int main() {
    __gshared int[MAX] values = [ 7,10,58,62,93,100,8,17,77,17 ];
    printf("sum = %d\n", sumArray(values));
    return 0;
}

Compiling:

dmd -betterC sum.d

Running:

./sum
Assertion failure: 'array overflow' on line 11 in file 'sum.d'

That’s more like it. Replacing the <= with < we get:

./sum
sum = 449

What’s happening is the dynamic array a is carrying its dimension along with it and the compiler inserts an array bounds overflow check.

But wait, there’s more.

There’s that pesky MAX thing. Since the a is carrying its dimension, that can be used instead:

for (int i = 0; i < a.length; ++i)

This is such a common idiom, D has special syntax for it:

foreach (value; a)
    sum += value;

The whole function sumArray() now looks like:

int sumArray(int[] a) {
    int sum = 0;
    foreach (value; a)
        sum += value;
    return sum;
}

and now sumArray() can be reviewed in isolation from the rest of the program. You can get more done in less time with more reliability, and so can justify getting a pay raise. Or at least you won’t have to come in on weekends on an emergency call to fix the bug.

“Objection!” you say. “Passing a to sumArray() requires two pushes to the stack, and passing p is only one. You said no compromise, but I’m losing speed here.”

Indeed you are, in cases where MAX is a manifest constant, and not itself passed to the function, as in:

int sumArray(int *p, size_t length);

But let’s get back to “no compromise.” D allows parameters to be passed by reference,
and that includes arrays of fixed length. So:

int sumArray(ref int[MAX] a) {
    int sum = 0;
    foreach (value; a)
        sum += value;
    return sum;
}

What happens here is that a, being a ref parameter, is at runtime a mere pointer. It is typed, though, to be a pointer to an array of MAX elements, and so the accesses can be array bounds checked. You don’t need to go checking the callers, as the compiler’s type system will verify that, indeed, correctly sized arrays are being passed.

“Objection!” you say. “D supports pointers. Can’t I just write it the original way? What’s to stop that from happening? I thought you said this was a mechanical guarantee!”

Yes, you can write the code as:

import core.stdc.stdio;

extern (C):   // use C ABI for declarations

enum MAX = 10;

int sumArray(int* p) {
    int sum = 0;
    for (int i = 0; i <= MAX; ++i)
        sum += p[i];
    return sum;
}

int main() {
    __gshared int[MAX] values = [ 7,10,58,62,93,100,8,17,77,17 ];
    printf("sum = %d\n", sumArray(&values[0]));
    return 0;
}

It will compile without complaint, and the awful bug will still be there. Though this time I get:

sum = 39479

which looks suspicious, but it could have just as easily printed 449 and I’d be none the wiser.

How can this be guaranteed not to happen? By adding the attribute @safe to the code:

import core.stdc.stdio;

extern (C):   // use C ABI for declarations

enum MAX = 10;

@safe int sumArray(int* p) {
    int sum = 0;
    for (int i = 0; i <= MAX; ++i)
        sum += p[i];
    return sum;
}

int main() {
    __gshared int[MAX] values = [ 7,10,58,62,93,100,8,17,77,17 ];
    printf("sum = %d\n", sumArray(&values[0]));
    return 0;
}

Compiling it gives:

sum.d(10): Error: safe function 'sum.sumArray' cannot index pointer 'p'

Granted, a code review will need to include a grep to ensure @safe is being used, but that’s about it.

In summary, this bug is vanquished by preventing an array from decaying to a pointer when passed as an argument, and is vanquished forever by disallowing indirections after arithmetic is performed on a pointer. I’m sure a rare few of you have never been blasted by buffer overflow errors. Stay tuned for the next installment in this series. Maybe your moat got breached by the next bug! (Or maybe your tool doesn’t even have a moat.)

The #dbugfix Campaign

Why so many bugs?

Every major release of DMD comes with a list of closed issues from Bugzilla. For example, looking at the changelog for DMD 2.078.0 shows the following counts for closed regressions, bugs, and enhancements: 51 for the compiler, 37 for the standard library, 6 for the runtime, 17 for the website, and 1 for the linker. That’s 112 total issues, the majority related to the compiler. The total number of closed issues fluctuates between releases, but the compiler and standard library normally get the lion’s share.

This isn’t news to anyone who regularly follows DMD releases. But spend enough time on the forums and you’ll eventually see someone under the impression that bugs aren’t getting fixed. They cite the number of open issues in the database, or the age of some of the open issues, or the fact that they can’t find any formal process for fixing bugs. In reaction, it’s easy to point to the changelogs, or cite the number of closed issues in the database, or bring up the number of open issues in other language compilers. And, of course, to explain once again that this is a volunteer community where people work on the things that matter to them, and organizing groups to complete specific tasks is like herding cats.

That’s all quite reasonable, but really isn’t going to matter to someone who found the motivation to check D out, but is still looking for the motivation to stay. For me personally, I really don’t care how many issues are in the database, or the age of the oldest. All I care about is that it works for me. But I’m already invested in D. I don’t need to be motivated to stick around. And while I wouldn’t use a bug database as criteria to judge a new language, I can see that others do. It’s akin to looking at a stable repository on GitHub and dismissing it as abandoned because of its lack of recent activity. If you don’t see the whole picture, you can’t make an informed judgement.

If perception were the only issue, then it would simply be a matter of web design and PR. However, there have been, and are, people invested in D who have become frustrated because issues they reported, or that directly affect them, have languished in Bugzilla for months or even years. This can’t simply be dismissed as not seeing the whole picture. This is a matter of manpower and process. A number of issues are still open because there isn’t a simple fix, or perhaps because no one has taken an interest. The set of people who can solve complex issues is small. The set of people willing to work on issues that aren’t interesting to them is smaller. But again, how do you get a disparate group of volunteers of varying skill levels to devote their free time to fixing other peoples’ problems?

This is something the D community has struggled with as it has grown. There are no easy, comprehensive solutions without a full-time team of dedicated personnel, something we simply don’t have. However, it’s possible that there are opportunities to take baby steps and chip away at some of these issues without the complications inherent in herding cats.

The #dbugfix campaign

To recap, there are two primary complaints about the D bug-fixing process (such as it is):

  • Too many (old) bugs in the database
  • Bugs you care about aren’t getting fixed

In an effort to alleviate these problems, one baby step to chip away at it, I’m announcing the #dbugfix campaign.

It works like this. If there is an issue in D’s Bugzilla that you want to see fixed, whatever the reason (maybe it’s blocking you, or it’s annoying you, or it’s an enhancement you want, or you think it’s too old – it doesn’t matter), then either tweet out the issue number with #dbugfix in the tweet, or create a topic in the general forum with the issue number and #dbugfix in the subject line. I’ll monitor both Twitter and the forums and keep a running tally of issue numbers.

A week before a major version of DMD is released (starting with 2.080.0, which is slated for May 1), I’ll look at the tally and find the top five issues. I’ve already gotten people to commit to fixing at least two of the top five. That doesn’t mean only two. It could well be more. It depends on the complexity of the issues and how many other volunteers we can scrounge up. Hopefully, the two (or more) fixed bugs will be ready to be merged in the subsequent major release.

In the blog post announcing each major release, I’ll report on which bugs in the current release were fixed as a result of the campaign and announce the two selected for the subsequent release. If any of the top five from the previous release were not fixed, I’ll call for volunteers to help so that they can be squashed as soon as possible.

Yes, I know. We enabled voting on Bugzilla issues and that didn’t change anything. That’s because there was no real commitment to fixing any of the highest-voted issues. The votes simply served a guideline for the people browsing the database, looking for issues to fix. For this campaign, there really are people committed to fixing at least two of the issues that float to the top for every major release.

But two is not a lot! No, it isn’t. But it also isn’t zero. As I mentioned at the top of this post, dozens of issues are already fixed with each major DMD release. The problem (for those who see it as such) is that there’s currently next to zero community involvement in deciding which issues get fixed. This campaign gives the community more input into the selection process and it provides public updates on the status of that process. It is my hope that, in addition to changing perception and chipping away at the bug count, it encourages more people to help fix bugs.

If you would like to volunteer your time and knowledge to helping out with this campaign and increase the number of #dbugfix bugs fixed in each release, please email me at aldacron@gmail.com. For everyone else, I’ve got a search for #dbugfix set up in my Twitter client, so start tweeting!

DConf 2018: Register Now!

It was the middle of November when DConf 2018 was announced here on this blog in a Q & A session with Andrei Alexandrescu. Since then, the DConf train has slowly been building up steam as things have been happening behind the scenes. Now it’s full steam ahead!

 

The venue

DConf 2018 is being hosted at the NH München Messe hotel. They’re offering a discount (single room, breakfast included) to all conference attendees. If you’d like to cut out the commute time between your hotel and DConf, everything you need to take advantage of the discount and get to the hotel is over at the DConf 2018 venue page.

The registration fees

The cost of general admission to DConf 2018 is US $400. A 15% early-bird discount is available from now until March 17. This year, there’s a special deal for past attendees. If you signed up for DConf 2017, the 15% discount doesn’t go away in March. For you, it applies right up to the regular registration deadline. Whenever you’re ready to sign up, head on over to the DConf 2018 registration page where you can pay via PayPal or Eventbrite.

The invited keynote speaker

The D Language Foundation is excited to announce that Martin Odersky, the inventor of the Scala language, a professor at EPFL in Lausanne, Switzerland, and a founder of Lightbend, is this year’s invited keynote speaker. He’ll be presenting a talk titled, “How to Abstract Over Context”, in which he’ll “argue that implicit parameters as they are found in Scala are a canonical way to express context and that implicit function types are the right way to abstract over it.” We’re looking forward to it!

The partners

The time around, the conference is being hosted by QA Systems, a provider of tools for automating unit testing, code coverage, integration testing, and static analysis, in conjunction with HLMC, a company specialized in the organization of IT events. They’re working hard to ensure DConf 2018 is a success. We know it will be.

The call for submissions

Don’t forget, we’re taking submissions for D-language related papers, talks, demos, panels, and research reports until February 25. We’re eager to hear about what’s happening out there in the world of the D programming language. Put your proposal together and send it to foundation@dlang.org for consideration. If you’ve never submitted to DConf before, please give the guidelines a look over before you do so.

The uninitiated

We’re eager to see new faces this year, especially those who know little or nothing about the D programming language. If you or someone you know hasn’t yet figured out what all the fuss is about, we want a chance to show you. The D language community are a friendly bunch, happy to partake in engaging and intelligent conversation well into the night. And we love to meet new people! Every DConf is a chance to reinforce old bonds and forge new ones. You may arrive as a stranger, but you’ll leave as a friend.

The months ahead

As May 2 draws closer, keep an eye on this blog for more DConf 2018 updates, including posts from our partners, scheduled speakers, and the D Language Foundation.

Project Highlight: BSDScheme

Last year, Phil Eaton started working on BSDScheme, a Scheme interpreter that he ultimately intends to support Scheme R7RS. In college, he had completed two compiler projects in C++ for two different courses. One was a Scheme to Forth compiler, the other an implementation of the Tiger language from Andrew Appel’s ‘Modern Compiler Implementation’ books.

I hadn’t really written a complete interpreter or compiler since then, but I’d been trying to get back into language implementation. I planned to write a Scheme interpreter that was at least bootstrapped from a compiled language so I could go through the traditional steps of lexing, parsing, optimizing, compiling, etc., and not just build a language of the meta-circular interpreter variety. I was spurred to action when my co-worker at Linode, Brian Steffens, wrote bshift, a compiler for a C-like language.

For his new project, he wanted to use something other than C++. Though he knows the language and likes some of the features, he overall finds it a “complicated mess”. So he started on BSDScheme using C, building generic ADTs via macros.

As he worked on the project, he referred to Brian’s bshift for inspiration. As it happens, bshift is implemented in D. Over time, he discovered that he “really liked the power and simplicity of D”. That eventually led him to drop C for D.

It was clear it would save me a ton of time implementing all the same data structures and flows one implements in almost every new C project. The combination of compile-time type checking, GC, generic ADT support, and great C interoperability was appealing.

He’s been developing the project on Mac and FreeBSD, using LDC, the LLVM-based D compiler. In that time, he has found a number of D features beneficial, but two stand out for him above the rest. The first is nested functions.

They’re a good step up from C, where nested functions are not part of the standard and only unofficially supported (in different ways) by different compilers. C++ has lambdas, but that’s not really the same thing. It is a helpful syntactic sugar used in BSDScheme for defining new functions with external context (the names of the parameters to bind).

As for the second, hold on to your seats: it’s the GC.

The existence of a standard GC is a big benefit of D over C and C++. Sure, you could use the Boehm GC, but how that works with threads is up to you to discover. It is not fun to do prototyping in a GC-less language because the amount of boilerplate distracts from the goals. People often say this when they’re referring to Python, Ruby, or Node, but D is not at all comparable for (among) a few reasons: 1) compile-time type-checking, 2) dead-simple C interop, 3) multi-processing support.

Spend some time in the D forums and you’ll often find newcomers from C and C++ who, unlike Phil, have a strong aversion to garbage collection and are actively seeking to avoid it. You’ll also find replies from long-time D coders who started out the same way, but eventually came to embrace the GC completely and learned how to make it work for them. The D GC can certainly be problematic for certain types of software, but it is a net win for others. This point is reiterated frequently in the GC series on this blog, which shows how to get the most out of the GC, profile it, and mitigate its impact if it does become a performance problem.

As Phil learned the language, he identified areas for improvement in the D documentation.

Certainly it is advantageous compared to the C preprocessor that there is not an entirely separate language for doing compile-time macros, but the behavior difference and transition guides are missing or poorly written. A comparison between D templates and C++ templates (in all their complexity) could also be a great source of explanation.

We’re always looking to improve the documentation and make it more friendly to newcomers of all backgrounds. The docs are often written by people who are already well-versed in the language and its ecosystem, making blind spots somewhat inevitable. Anyone in the process of learning D is welcome and encouraged to help improve both the Language and Library docs. In the top right corner of each page are two links: “Report a bug” and “Improve this page”. The first takes you to D’s bug tracker to report a documentation bug, the second allows anyone with a logged-in GitHub account to quickly fork dlang.org, edit the page online, and submit a pull request.

In addition to the ulitmate goal of supporting Scheme R7RS, Phil plans to add FFI support in order to allow BSDScheme to call D functions directly, as well as support for D threads and an LLVM-based backend.

Overall, he seems satisfied with his decision to move to D for the implementation.

I think D has been a good time investment. It is a very practical language with a lot of the necessary high-level aspects and libraries for modern development. In the future, I plan to dig more into the libraries and ecosystem for backend web systems. Furthermore, unlike with C or C++, so far I’d feel comfortable choosing D for projects where I am not the sole developer. This includes issues ranging from prospective ease of onboarding to long-term performance and maintainability.

A big thanks to Phil for taking the time to contribute to this post (be sure to check out his blog, too). We’re always happy to hear about new projects in D. BSDScheme is released under the three-clause BSD license, so it’s a great place to start for anyone looking for an interesting open-source D project to contribute to or learn from. Have fun!

The D Blog in 2017

The first full year of the D blog is now in the rear view. Last year around this time, I posted some statistics from the seven months of 2016 that the blog was in business. This time, looking back on 2017, we’ve got a full twelve to draw on.

The fun stuff

From my perspective, 2017 was a fun year for managing the blog. The only negatives for me are that I didn’t get as many Project Highlights as I would have liked and I never started the series on DUB that I had envisioned. But there were some new features that I quite enjoyed working on:

  • the GC series came about in response some posts in the D forums. The next post in the series should have come at the end of December, but I’ve had to put it off for just a bit. I’m not done with the series yet. I’m also still looking for more contributors willing to share their strategies and libraries for working with, around, and without the GC.
  • I started a new series on interfacing D with C. I’ll be continuing on with that in the coming months, eventually writing about the other direction (interfacing C with D), and pushing out a few words about -betterC mode.
  • at DConf 2017, the Funkwerk crew agreed to cooperate with me in putting out a series of posts from their perspective of using D in production. This is the first of what I hope will become a regular series highlighting companies working with D in production, the projects they’re building, and the tools they use.
  • as a reaction to the pain that comes when I cut content out of posts that feel too long, I decided to create a new domain for “extended posts”: dblog-ext.info, and a corresponding page of links here at the blog. The separate domain and the simple layout are both to make it clear that it’s not part of the official blog. I don’t know if it will be permanent, but I intend to keep it alive for a while yet until to see if more people actually make use of it. I now encourage anyone writing guest posts not to feel constrained by word count. If the post is too long, we can split it into multiple posts or, where there’s not enough content for that, make an extended post for it when it makes sense to do so.

The stats

We saw 44 new posts added to the blog in 2017. Across the entire blog, including the front page, there were a total of 132,985 page views from 96,101 visitors who left 117 comments.

The top five referrers:

Referrer Page Views
Reddit 21,920
Hacker News 20,693
Google Search 10,761
D Forums 7,008
Twitter 5,265

The top five countries:

Country Page Views
United States 43,495
Germany 9,606
United Kingdom 8,226
Russia 5,022
Canada 4,890

Several posts included links to D projects at GitHub. Counting projects, profiles, organizations, and specific file links, the top five most-clicked were:

  1. voxelman
  2. dlangui
  3. DerelictOrg
  4. DIP 1005
  5. yomm11(Open multi-methods for C++)

The top five posts of 2017:

Post Title Page Views
D as a Better C 17,502
Faster Command Line Tools in D 10,143
Don’t Fear the Reaper 8,277
D’s Newfangled Name Mangling 6,011
Compile–Time Sort in D 4,874

All time (as of a few minutes before the timestamp on this post), there are 70 posts (aside from this one) that have had 187,946 views from 136,850 visitors. The top five most-viewed posts of all time :

Post Title Page Views
D as a Better C 17,553
Faster Command Line Tools in D 10,151
Don’t Fear the Reaper 8,282
Find Was Too Damn Slow, So We Fixed It 6,228
D’s Newfangled Name Mangling 6,102

In 2018…

This year, look for the GC series and the D & C series to continue. I’m hoping to recruit a couple of semi-regular guest posters to help me up the post count a little bit. At the moment, I’m pretty much at peak output and could use the help. I’m on constant lookout for projects to highlight, and plan to bring at least one more company highlight this year (Funkwerk’s series will be wrapping up this month). I hope to bring some DConf 2018-themed posts in the runup to this year’s conference.

Finally, as always, if you have something D to write about, whether it’s your project, a language feature, a tutorial, an algorithm… anything about programming in D, please let me know!

DMD 2.078.0 Has Been Released

Another major release of DMD, this time 2.078.0, has been packaged and delivered in time for the new year. See the full changelog at dlang.org and download the compiler for your platform either from the main download page or the 2.078.0 release directory.

This release brings a number of quality-of-life improvements, fixing some minor annoyances and inconsistencies, three of which are targeted at smoothing out the experience of programming in D without DRuntime.

C runtime construction and destruction

D has included static constructors and destructors, both as aggregate type members and at module level, for most of its existence. The former are called in lexical order as DRuntime goes through its initialization routine, and the latter are called in reverse lexical order as the runtime shuts down. But when programming in an environment without DRuntime, such as when using the -betterC compiler switch, or using a stubbed-out runtime, static construction and destruction are not available.

DMD 2.078.0 brings static module construction and destruction to those environments in the form of two new pragmas, pragma(crt_constructor) and pragma(crt_destructor) respectively. The former causes any function to which it’s applied to be executed before the C main, and the latter after the C main, as in this example:

crun1.d

// Compile with:    dmd crun1.d
// Alternatively:   dmd -betterC crun1.d

import core.stdc.stdio;

// Each of the following functions should have
// C linkage (cdecl).
extern(C):

pragma(crt_constructor)
void init()
{
    puts("init");
}

pragma(crt_destructor)
void fini()
{
    puts("fini");
}

void main()
{
    puts("C main");
}

The compiler requires that any function annotated with the new pragmas be declared with the extern(C) linkage attribute. In this example, though it isn’t required, main is also declared as extern(C). The colon syntax on line 8 applies the attribute to every function that follows, up to the end of the module or until a new linkage attribute appears.

In a normal D program, the C main is the entry point for DRuntime and is generated by the compiler. When the C runtime calls the C main, the D runtime does its initialization, which includes starting up the GC, executing static constructors, gathering command-line arguments into a string array, and calling the application’s main function, a.k.a. D main.

When a D module’s main is annotated with extern(C), it essentially replaces DRuntime’s implementation, as the compiler will never generate a C main function for the runtime in that case. If -betterC is not supplied on the command line, or an alternative implementation is not provided, DRuntime itself is still available and can be manually initialized and terminated.

The example above is intended to clearly show that the crt_constructor pragma will cause init to execute before the C main and the crt_destructor causes fini to run after. This introduces new options for scenarios where DRuntime is unavailable. However, take away the extern(C) from main and the same execution order will print to the command line:

crun2.d

// Compile with:    dmd crun2.d

import core.stdc.stdio;

pragma(crt_constructor)
extern(C) void init()
{
    puts("init");
}

pragma(crt_destructor)
extern(C) void fini()
{
    puts("fini");
}

void main()
{
    import std.stdio : writeln;
    writeln("D main");
}

The difference is that the C main now belongs to DRuntime and our main is the D main. The execution order is: init, C main, D main, fini. This means init is effectively called before DRuntime is initialized and fini after it terminates. Because this example uses the DRuntime function writeln, it can’t be compiled with -betterC.

You may discover that writeln works if you import it at the top of the module and substitute it for puts in the example. However, always remember that even though DRuntime may be available, it’s not in a valid state when a crt_constructor and a crt_destructor are executed.

RAII for -betterC

One of the limitations in -betterC mode has been the absence of RAII. In normal D code, struct destructors are executed when an instance goes out of scope. This has always depended on DRuntime, and since the runtime isn’t available in -betterC mode, neither are struct destructors. With DMD 2.078.0, the are in the preceding sentence becomes were.

destruct.d

// Compile with:    dmd -betterC destruct.d

import core.stdc.stdio : puts;

struct DestroyMe
{
    ~this()
    {
        puts("Destruction complete.");
    }
}

extern(C) void main()
{
    DestroyMe d;
}

Interestingly, this is implemented in terms of try..finally, so a side-effect is that -betterC mode now supports try and finally blocks:

cleanup1.d

// Compile with:    dmd -betterC cleanup1.d

import core.stdc.stdlib,
       core.stdc.stdio;

extern(C) void main()
{
    int* ints;
    try
    {
        // acquire resources here
        ints = cast(int*)malloc(int.sizeof * 10);
        puts("Allocated!");
    }
    finally
    {
        // release resources here
        free(ints);
        puts("Freed!");
    }
}

Since D’s scope(exit) feature is also implemented in terms of try..finally, this is now possible in -betterC mode also:

cleanup2.d

// Compile with: dmd -betterC cleanup2.d

import core.stdc.stdlib,
       core.stdc.stdio;

extern(C) void main()
{
    auto ints1 = cast(int*)malloc(int.sizeof * 10);
    scope(exit)
    {
        puts("Freeing ints1!");
        free(ints1);
    }

    auto ints2 = cast(int*)malloc(int.sizeof * 10);
    scope(exit)
    {
        puts("Freeing ints2!");
        free(ints2);
    }
}

Note that exceptions are not implemented for -betterC mode, so there’s no catch, scope(success), or scope(failure).

Optional ModuleInfo

One of the seemingly obscure features dependent upon DRuntime is the ModuleInfo type. It’s a type that works quietly behind the scenes as one of the enabling mechanisms of reflection and most D programmers will likely never hear of it. That is, unless they start trying to stub out their own minimal runtime. That’s when linker errors start cropping up complaining about the missing ModuleInfo type, since the compiler will have generated an instance of it for each module in the program.

DMD 2.078.0 changes things up. The compiler is aware of the presence of the runtime implementation at compile time, so it can see whether or not the current implementation provides a ModuleInfo declaration. If it does, instances will be generated as appropriate. If it doesn’t, then the instances won’t be generated. This makes it just that much easier to stub out your own runtime, which is something you’d want to do if you were, say, writing a kernel in D.

Other notable changes

New users of DMD on Windows will now have an easier time getting a 64-bit environment set up. It’s still necessary to install the Microsoft build tools, but now DMD will detect the installation of either the Microsoft Build Tools package or Visual Studio at runtime when either -m64 or -m32mscoff is specified on the command line. Previously, configuration was handled automatically only by the installer; manual installs had to be configured manually.

DRuntime has been enhanced to allow more fine-grained control over unit tests. Of particular note is the --DRT-testmode flag which can be passed to any D executable. With the argument "run-main", the current default, any unit tests present will be run and then main will execute if they all pass; with "test-or-main", the planned default beginning with DMD 2.080.0, any unit tests present will run and the program will exit with a summary of the results, otherwise main will execute; with "test-only", main will not be executed, but test results will still be summarized if present.

Onward into 2018

This is the first DMD release of 2018. We can’t wait to see what the next 12 months bring for the D programming language community. From everyone at the D Language Foundation, we hope you have a very Happy New Year!

D’s Newfangled Name Mangling

Rainer Schuetze is the creator and maintainer of Visual D, the D plugin for Visual Studio. Recently, he implemented a new name mangling algorithm for the D frontend, which was released in DMD 2.077.0. In this post, he explains why it was needed and what it does.


What is symbol name mangling?

D embraces the separate compilation model that compiles D source code to object files and uses a linker to bind the object files to an executable binary file. This allows the reuse of precompiled object files and libraries, speeding up the build process. As the linker is usually one that’s also used for other languages with the same compilation model, e.g. C/C++ or Fortran, mixing object files from different languages is straightforward.

In an object file, a symbol name is assigned to each function or global variable, both when it is defined and when it is used via a call or access. The linker uses these names to connect references to definitions of the same name with only very bare knowledge about the symbol. For example, the symbol for this C function declaration,

extern(C) const(char)* find(int ch, const(char)* str);

does not tell the linker anything about function arguments or return type, as the C language uses the plain function name find as the symbol name (some platforms prepend a _ to the symbol). If you later change the order of the arguments to

extern(C) const(char)* find(const(char)* str, int ch);

but fail to update and recompile all source files that use the new declarartion, the linker will happily bind the resulting object files. In that case, the program is likely to crash, since a character passed to the function will be interpreted as a string pointer and vice versa.

D and C++ avoid this problem by adding more information to the symbol name, i.e. they encode into a symbol name the scope in which the symbol is defined, the function argument types, and the return type. Even if the linker does not interpret this information, linking fails with an undefined symbol error if the definitions used to build the object files don’t match. For example, the D function declaration

module test;
extern(D) const(char)* find(int ch, const(char)* str);

has a symbol name _D4test4findFiPxaZPxa, where _D is a prefix to identify the symbol as being generated from a D source symbol, 4test4find encodes the “fully qualified name” find in module test, and FiPxaZPxa describes the function type with an integer argument (designated by i) and the C-style string pointer type Pxa by just concatenating the encodings for argument types. Z terminates the function argument list and is followed by the encoding for the return type, again Pxa for a C-style string pointer. In contrast,

extern(D) const(char)* find(const(char)* str, int ch);

is encoded as _D4test4findFPxaiZPxa, making it a different symbol with the argument types reversed. The encoding ensures a normalized representation of types and scopes while also providing shorter symbols than minimal source code. This encoding is called “name mangling”.

Ed: Note that extern(C) and extern(D) are linkage attributes. When a function is declared in D without an explicit linkage attribute, extern(D) is the default.

In D, some function attributes are also mangled into the symbol name, e.g. @safe, nothrow, pure and @nogc. In theory, mangling could also cover parameter names, user defined attributes, or even contracts, but that is currently considered excessive.

Please note that even though name mangling can detect some mismatches in the binary interface of functions (i.e. how arguments are passed in registers or on the stack), it won’t catch every error; for example, structs, classes and other user defined types are mangled by name only, so that a change to their definition will still pass unnoticed by the linker.

The mangled name of a symbol is also available during compilation using the .mangleof property. This used to be exploited to provide type reflection of the symbol at compile time. This should no longer be necessary due to the introduction of new __traits that make this information accessible faster and more convenient, for example,

__traits(getLinkage,symbol);

or

__traits(getFunctionAttributes, symbol);

Thus, usage of .mangleof is not recommended except for debugging purposes.

When reversing the mangling process in the “demangler”, all the encoded information is kept to make it available to the user, but that does not always yield correct D syntax. The first definition above demangles as

const(char)* test.find(int, const(char)*)

i.e. the module name test is added to the function name.

Template symbols

The two definitions of find shown above can coexist in D and C++, so name mangling is not only a way to detect errors at link time but also a necessity to represent overloads. It should at least contain enough information to distinguish different overloads of the same scoped identifier.

This becomes even more obvious when considering templates that usually instantiate different functions or variable definitions for each argument type. In D, the template instantiation information is added to the qualified name of a symbol.

Consider expression templates, a common example of meta programming used for delayed evaluation of expressions:

module expr;
struct Mul(X,Y)
{
    X x;
    Y y;
}
struct Add(X,Y)
{
    X x;
    Y y;
}

auto mul(X,Y)(X x, Y y) { return Mul!(X,Y)(x, y); }
auto add(X,Y)(X x, Y y) { return Add!(X,Y)(x, y); }

A function template is lowered by the compiler to an eponymous template:

template mul(X, Y)
{
    auto mul(X x, Y y) { return Mul!(X,Y)(x, y); }
}

The template name is part of the qualified function name, expr.mul!(X,Y).mul, and the auto return type is inferred to be Mul!(X,Y). This causes the symbol to reference the types X and Y three times. The demangled mangled name of an instantiation with types double and float of this template is

expr.Mul!(double,float) expr.mul!(double,float).mul(double,float)

The mangling process of DMD before version 2.077 walks the abstract syntax tree of the declaration and emits the mangled representation of the types whenever it is hit. Now consider stacking operations, e.g.

auto square(X)(X x) { return mul(x, x); }

auto len = square("var");
pragma(msg, len.square.mangleof);
// S4expr66__T3MulTS4expr16__T3MulTAyaTAyaZ3MulTS4expr16__T3MulTAyaTAyaZ3MulZ3Mul

pragma(msg, typeof(len).mangleof.length);
pragma(msg, len.square.mangleof.length);
pragma(msg, len.square.square.mangleof.length);
pragma(msg, len.square.square.square.mangleof.length);
pragma(msg, len.square.square.square.square.mangleof.length);
pragma(msg, len.square.square.square.square.square.mangleof.length);
pragma(msg, len.square.square.square.square.square.square.mangleof.length);

With DMD 2.076 or earlier, this displays 28u, 78u, 179u, 381u, 785u, 1594u, 3212u, showing exponential growth of the mangled symbol name length even though the expression in the source code just grows linearly. This happens because types like Mul!(Mul!(string, string), Mul!(string, string)) are combined and the mangling repeats their full representation every time they are referenced.

Create a chain of 12 calls to square above and the symbol length increases to 207,114. Even worse, the resulting object file for COFF/64-bit is larger than 15 MB and the time to compile increases from 0.1 seconds to about 1 minute. Most of that time is spent generating code for functions only used at compile time.

Voldemort types returned from template functions can be similar, as they carry the function signature including the template arguments as part of the type name. These can also show a dramatic increase in build times without generating as much code as in the example.

Symbol compression to the rescue

In early 2016, a couple of attempts were made to shorten these long symbols:

  • cut off symbol names if they exceed a given threshold, but append a checksum of the full symbol instead. This was already done with an MD5 hash when emitting symbols for the DigitalMars C compiler tool chain as the OMF object file format does not allow symbols longer than 255 characters. The downside to this is that these symbols can no longer be demangled, so that symbols in linker messages cannot be translated back into human digestible names.
  • apply binary compression to the symbol name. Standard techniques use part of the full symbol name as a dictionary to encode repetitions within the name. This is usually done by encoding a position-length pair using characters outside the normal identifier set. Again, this is already in use when DMD tries to fit symbols into the OMF limit of 255 characters (before applying the MD5 hash trick), but it also has shown some disadvantages: when using characters above the ASCII range, this interferes with UTF8 encoded characters that are also allowed as symbol characters in the D language. It can also break linker output as the console might misinterpret it as a locale-specific character encoding. Avoiding this by applying a binary to ASCII conversion like base64 to the symbols would obfuscate the actual symbol name even more.
  • extend the mangling grammar by allowing references to entities already encoded. This is similar to binary compression, but does not need to encode match length as the entities have terminators embedded into the grammar. The most prominent entities are types. This is the road C++ has taken, as it is also affected by the issues described here. In C++, name mangling is not standardised by the language, but by the compiler or the platform. GNU g++ uses Itanium C++ ABI mangling, which does a pretty good job with C++ code similar to the example above or in the Voldemort issue. Even though Microsoft’s Visual C++ can encode recurring types as well, it still generates very long names because it limits the encoding to the first 10 types of an argument list.

The first attempts at applying the latter scheme to the mangling of D symbols showed disappointing results. As it turned out, these implementations missed a subtle detail of the mangler in the DMD front end at that time; it reused cached representations of mangled type names to combine them to more complex types. This fails to find repetitions of the types from which a cached type name was built.

This is where I stepped in to create a proof-of-concept version of the mangling without these omissions. Early results were promising, so I looked for more opportunities to reduce symbol length:

  • with fully qualified names always containing the package and module names of a symbol, identifiers tend to appear often in a mangled name.
  • qualified names are likely to come from the same module or package, so it would be nice to encode them as a single entity.

The unit tests of the Phobos runtime library are benchmark candidates, as they contain a lot of symbols for template-heavy code. At the given time there were 127,172 symbols found in the map file of the Windows build. These were the results of the different manglings:

back references max length average length
none 416133 369
types 2095 157
types+identifiers 1263 128
types+identifiers+qualified names 1114 117

(This has been measured with the implementation at the time, which is not exactly the same as the final mangling; it used special characters for the different back reference types, but this turned out not to be a good idea. The D mangling is supposed to be the same on all platforms, but these characters will have a special meaning to the linker on one of them.)

It’s rather simple in DMD to determine the identity of identifiers and types, as the latter are merged according to their mangling anyway. Qualified names and their associated symbols turned out to introduce a number of complications, though. Namely, mangleFunc in core.demangle allows building a mangled name of a function from a fully qualified name given as a string function argument and a type specified as a template argument. Implementing this for run-time usage requires copying the full mangling machinery and introspection capabilities of the compiler, which is unrealistic. Considering the limited benefit shown by the above Phobos statistics, the idea of encoding qualified names was dropped.

Here are some details about the new mangling:

  • Back references are now encoded by the character Q, followed by the relative position of the original appearance of the same identifier or type. These positions are encoded with respect to base 26, with the last digit encoded by a lowercase letter and the other digits encoded by an uppercase letter. That way, most back references are 2 or 3 characters long, 4 in extreme cases. Using a different encoding for the last digit allows determining the end of a number without looking at the next character. This helps to avoid ambiguities. (The Itanium C++ ABI mangling uses base 36 encoding by combining numbers and letters, but need a termination character _.)
  • Counting encodable entities as in the C++ mangling would result in slightly shorter mangled names, but needs the mangler to keep a dynamic list of respective positions. The current demangler is designed not to allocate as long as the supplied output buffer is large enough.
  • Relative positions are chosen instead of absolute positions to allow prepending the _D prefix without having to re-encode the symbol. Some platforms also prepend an additional underscore, for which the relative positions are agnostic.
  • The mangling grammar sometimes allows types and identifiers at the same position, so a demangler needs to distinguish between the two even if given by a back reference. That’s why a lookup to the referenced position is necessary to continue demangling; an identifier always starts with a number, while a type always starts with a letter.
  • Using Q for back references grabs the last free letter used to encode types, but there is at least one type defined in the mangling grammar that is not supposed to appear in a mangling anyway (namely TypeIdent), so it can be resurrected if the necessity appears.

For example, the expression template type shown above now mangles as

pragma(msg, len.square.mangleof);
// S4expr__T3MulTSQo__TQlTAyaTQeZQvTQtZQBb
//                ^^   ^^     ^^ ^^ ^^ ^^^ decode to:
//                |    |      |  |  |  |
//                |    |      |  |  |  +- 3Mul
//                |    |      |  |  +---- S4expr__T3MulTAyaTAyaZ3Mul
//                |    |      |  +------- 3Mul
//                |    |      +---------- Aya
//                |    +----------------- 3Mul
//                +---------------------- 4expr

with a length of 39 instead of 78 without back references. The resulting sizes are 23, 39, 57, 76, 95, 114, 133 showing linear growth. The chain of 12 calls to square shrinks from 207,114 characters to 247, i.e. by a factor of more than 800.

Implementing mangleFunc mentioned above for the mangling with back referencing identifiers still is not obvious; while the fully qualified name is not supposed to contain any types (e.g. as a struct template argument) identifiers in the mangled name can appear again in the function type. This was solved by extending the demangler to use “Design by Introspection” (DbI) (as coined by Andrei Alexandrescu):

  • make the Demangle struct a template that parameterizes on a struct that supplies a couple of hooks
    struct NoHooks {}  // supports: static bool parseLName(ref Demangle); ...
    private struct Demangle(Hooks = NoHooks)
    {
    Hooks hooks;
        // ...
        void parseLName()
        {
            static if(__traits(hasMember, Hooks, "parseLName"))
                if (hooks.parseLName(this))
                    return;
                // normal decode...
        }
    }
  • create a hook that replaces a reoccurring identifier with the appropriate back reference
    struct RemangleHooks
    {
        char[] result;
        size_t[const(char)[]] idpos;
        // ...
        bool parseLName(ref Demangler!RemangleHooks d)
        {
            // flush input so far to result[]
            if (d.front == 'Q')
            {
                // re-encode back reference...
            }
            else if (auto ppos = currentIdentifier in idpos)
            {
                // encode back reference to identifier at *ppos
            }
            else
            {
                idpos[currentIdentifier] = currentPos;
            }
            return true;
        }
    }
  • combine the qualified name and the type as before (core.demangle is still capable of decoding it) and run it through the hooked demangler
    char[] mangleFunc(FuncType)(const(char)[] qualifiedName)
    {
        const(char)mangledQualifiedName = encodeLNames(qualifiedName);
        const(char)mangled = mangledQualifiedName ~ FuncType.mangleof;
        auto d = Demangle!RemangleHooks(mangled, null);
        d.mute = true; // no demangled output
        d.parseMangledName();
        return d.hooks.result;
    }

Is the new mangling sound?

The back references encoded into the mangling extend the existing mangling. Unfortunately, the latter had ambiguities reported to the D issue tracking system, with more of these likely yet to be uncovered. The demangler in core.demangle rejected about 3% of the unmodified symbols from the Phobos unit tests, while 15% were demangled only partially.

It’s tough to verify the soundness of an addition to an already complex and fragile definition, as a change to the mangling would need an update to the tooling (demangler, debuggers). Anyway, it was a good opportunity to get rid of these, too.

So scrutiny of the existing definition was required. To do this mechanically, the mangling specification from the web site was converted into a grammar digestible by the bison parser generator. Bison can create LALR(1) parser tables, which basically means that, while scanning a mangled symbol, looking at a character and its successor is enough to determine whether the character adds to a previous entity or starts a new one. When conflicts are reported when processing a grammar, they might be resolvable with a larger context, but they can also hint at actual problems or undesirable complexity. Adding pseudo-tokens representing handcrafted parser control flow can avoid these conflicts.

This gist shows a grammar for the D mangling scheme without the back references. It still has a couple of conflicts when run through Bison, one of which was determined to be an actual ambiguity in the definition. Adding back references to
the grammar doesn’t add any conflicts.

In addition, core.demangle was fixed to work for all symbols but those exposing the known ambiguities.

Aftermath

Some of the implementations in std.traits used the mangling of a symbol to introspect compile-time properties, for example, to determine the linkage. This was done using a simplified demangler. With the introduction of back references, these
didn’t work any more except for simple symbol names. Using a solution as for core.mangleFunc is feasible, but can slow down compilation considerably as the demangling needs to be executed via CTFE. Fortunately, new __traits have been added which cover all information that can be found in the mangling.

While most users will not notice any changes to their programs other than smaller object and executable file sizes, the new mangling can be a breaking change to external tools like the linker or a debugger. These will continue to work, but until they are updated, be prepared to eventually see the new mangled names for a little while instead of the demangled ones.

Thanks to Mike Parker, Walter Bright and Steven Schveighoffer for review.