# How I Taught the D Programming Language at a Russian University

This article was originally published in Russian by Grigorii Smorkalov. It was translated to English for the D Blog by Georgy Markov and lightly revised from the original by Michael Parker.

This is the fourth year I’m teaching my D Programming Language course at a very real university in Russia. It’s a full-term course with lectures, practical lessons, and exams, although it’s all remote now. This is the story about how I got there, the challenges I encountered, and how students sometimes surpass their teachers.

## What’s in D for university students

The job market for D is very small. As I always say during the first lesson, it’s unlikely that students are going to write D code for a salary, but that doesn’t mean that learning D is useless. Firstly, it’s much easier to learn how to program with D than with C or C++. This is important because many students don’t know how to program even after a full C/C++ course. Secondly, a broader outlook makes for better code. My familiarity with D improved my C++ skills and made it much easier to learn Python, especially its iterators. Most importantly, D is the future—of C++ and beyond. Many C++11/17/20 novelties were first battle-tested in D, and even today D is a much more modern and feature-rich language than C++.

## Who am I and what’s in D for me

For simplicity, I would say that I am a C++ programmer. This is my main line of work. Before this course, I’d never been a teacher of any sort. Even on my job, I’m not involved much in mentoring. After earning my master’s degree in Computational Mathematics and Cybernetics at UNN (Lobachevsky State University of Nizhny Novgorod) in 2014, I had no relation to academics at all.

In my first months of university, I entertained the thought of being a school teacher. Now I realize that this was just a call for justice of sorts. There is a stark difference between a high school and a university, and for me, the latter was a much better experience. It felt like in just one month at university I learned more than in one term in high school, and that was so cool. Why couldn’t they explain it this way back in high school, was my common thought. If only educators applied the same educational methods in a regular school, it would make the experience much better. My desire to become a teacher vanished the second I received my first paycheck as a programmer (in Russia, ‘programmer’ is a very well-paid job and ‘school teacher’ is the total opposite), but some memory of that desire remained.

This was also the time when I became interested in D. Compared to C++ it looked like a perfect programming language. You can write code that would be as fast, but without all those C atavisms. I used D for my master’s thesis, and I loved it. My program was twice as small and simple as the older C++ version while performing better. Implementing complex and more efficient algorithms in D was much easier; doing the same in C++ would be too much work, and, like any student, I always struggled with my deadlines.

Since then I’ve followed D and used it for my little pet projects. I’ve always wanted to help the D community in some way, but I couldn’t write any useful library, nor could I find the motivation for contributing to open source.

## Beginnings

In 2018, an unusual offer surfaced on the D mailing list: does anyone by any chance want to teach students in Moscow, Russia?

The initiator was Dmitry Olshansky, a well-known member of the D community, the creator of std.regex and more. I contacted him and said I was interested. I didn’t see myself as a full-fledged lecturer and expected to be just an assistant. At the beginning that was the plan, with someone else acting as a lecturer. He was going to give lectures remotely via Skype, and I would assist him on site.

To my surprise, the university in question was RSUH, Russian State University for the Humanities. As it turned out, they do have technical faculty there, and the students do actually code. I checked their program: D was introduced for third-term students, and during the first two terms they learned C, C++, Prolog, and even some Lisp, I think (a bit too much, but why not). Their math course was solid, too (yes, I am among those who think that math is important for programmers).

## Preparations

I was introduced to the department staff. They explained everyone’s responsibilities and even offered the opportunity to join in scientific work as a programmer. We started working on the course program, although I barely included myself in that “we”. That was a mistake. With one month left until the classes started, the lecturer was suddenly leaving us. The news took me by surprise, but… there was still plenty of time, right?

This was happening when it was time for me to complete all the formalities and start work. Though I knew they were hiring only me, for some reason I was still under the impression that I wouldn’t be alone. I can’t say why I thought so. Everything was saying that there would be no help and that I had to do the whole course by myself, but my impression was hard to shake. The grave realization only came one week before Day One. Only then did I start to prepare for real.

## Bureaucracy

This is supposed to be the part about the trials and tribulations of the endless bureaucracy awaiting a poor programmer’s soul. The amount of paperwork required to sign the contract was indeed an entertaining story to tell to my peers. And everyone was, as they say, “rolling on the floor laughing” when I described applying for a Mir payroll card (Mir is the Russian national payment system mandated in the state-funded sector). But that’s it, actually.

The next bureaucratic task was composing the formal program, an official paper including the course program and the materials. There were indeed a lot of formalities there, but I got some help: they showed me the paper for a very similar course. At the end of the day, it was easier than I expected. For this, I should thank the university staff. It was a one-year contract. Renewing it every year only takes one piece of paper and a couple of pen strokes.

## The first class

The schedule was set up such that my first class happened to be a seminar (a.k.a. a practical lesson) instead of a lecture. This is only on paper, though with only one group consisting of just 14 people it didn’t make any difference. The schedule was adjusted later. I got two classes in a row and could decide which was a lecture and which was a seminar. I came up with the following arrangement: in the first class, I would lecture and answer general questions, and in the second, the students would program and ask practical questions.

For the first lesson, I prepared a brief description of the language, a syntax overview, a compiler, and several problems to solve: calculating the area of a triangle, solving a quadratic equation, and similar problems that yield a simple output for simple input. The idea was to immerse students in the language and immediately give them something to do with it. The plan was a success. By the end of the class, most of the students had gotten the hang of using a command-line compiler, written some code in a text editor, and solved some problems.

## Notepad and the command line

I bet many people would take issue with the command line and text editor part. Seriously? No IDE? IDEs for D exist, but I left it up to the students if they wanted to use them. The reason was my own experience in learning C++ and programming in general. Knowing how a compiler works and how to link several files into a project is integral to understanding the language as a whole.

This is especially true for C and C++. Things like the difference between declaration and definition lose their meaning without understanding the build process. In D, such nuances are fewer. For example, header files are not required. Still, I meant to teach how to program in D, not how to use an IDE. Things like the principles of import are easier to grok when using a command-line compiler rather than some “intuitive interface”. And you learn the syntax faster if you write the code by hand without autocomplete doing it for you.

## Further development

Lifted by the success of the first seminar, I was slammed to the ground by the first lecture. It contained the full theoretical explanation of type systems and their various types, compared D with other languages, brought out the problems of C and C++, and demonstrated what makes D different. It was a total failure.

I expected the lecture to take the whole 80-minute class, including questions, but it finished in less than an hour without a single question during or after the speech. I even asked if the students couldn’t hear or understand me, or if there was a problem with my diction. But the problem was with the lecture itself. First, it was too much for one class. Second, the lecture was based on the talks I did for my job that were intended for seasoned programmers. I realized that everything that I’d prepared for my future lectures must be tossed aside and rewritten from scratch.

## The new program

The first candidate for simplification, and by that I mean expunging, was metaprogramming. Getting rid of templates was impossible since even the most basic language features and algorithms are tied to them, but code generation of all sorts was the first to be removed. Following that was anything that required external libraries. What was left were things that D code can’t be written without.

Since I had better success with live communication during actual programming, I decided to focus on that. Rewriting the course on the fly was tricky, but I came up with a solid plan: throughout the semester we would write a complex calculating program, during lectures we would study language features required for a given task, and then during seminars the ideas would be transformed into code.

## The semester assignment

If you’re a nerd like me, you’ve probably heard of the 10,958 problem. The gist of it is that you need to put signs and parentheses between numbers composed of the digits 1 to 9 so that the result would be exactly 10,958. The solution exists for any number up to 11,000, except 10,958. There’s no proof that it doesn’t exist either, hence the 10,958 problem.

I gave the students an assignment to write a program that would find the solution, brute-forcing any possible combination of signs and parenthesis. All calculations were done with double instead of some sort of bignum, so it’s not a real solution, but it’s simpler this way.

I had several reasons to think that this was a well-fitting problem. First, I could write a solution I expected to see from the students in five hours or two evenings. Compensating for the students’ level, it looked like a good project for a semester-long assignment. Second, the solution isn’t too straightforward. Simple brute force would take too much time so you need to cut off the equal variants in the beginning. And third, I was fascinated by this problem myself, so I thought the students would feel the same. I couldn’t be more wrong.

Not only were a majority of students not into such mathematics, most of them couldn’t even grasp what the problem was about and what this 10,958 was for. I failed to get them interested.

When I realized the problem, it was too late to change anything. So the first semester wasn’t very engaging. Only a few students could finish the assignment to its fullest. For the rest, it was impossible.

## The second semester

Since it was a full-term course, I had time to make up for it. For the second semester, I tried to come up with something practical and interactive. I gave the students an assignment to write a game. They were learning networking, so it was to be a multiplayer game. I recalled playing “tic-tac-toe on an infinite field” with my peers during boring lectures. The smarter name for this game is Gomoku. Two players can play online, and the game logic is simple. I was hoping to spark students’ interest. This assignment turned out much better than the previous one, but I still can’t call it a full success.

I really wanted to show them how coroutines (fibers, as they are called in D) make async programming much more manageable, how nice the vibe.d framework is, and how easy it is to use external libraries with the dub package manager. Lesson One: don’t try to sell coroutines to those who don’t know what callback hell is. Lesson Two: always keep hardware limitations in mind.

## Problems out of nowhere

I would never have thought that single-threaded compilation could be thwarted by the memory limit. It happened because the computers at the university were equipped with only 2 GB of memory, and some students’ netbooks worked on a mere 1 GB. Simple programs build just fine even on weak machines—D is much better than C++ in terms of compilation time—but a large framework like vibe.d requires a lot of memory for its compilation. Now imagine, I was just telling them how everything is so easy-peasy and then half the students couldn’t even compile a networked version of Hello World.

In my defense, I checked everything in advance. I set up a virtual machine with 2 GB of memory and made sure that it worked with the special compiler flags. Theoretically, I was prepared. But dealing with a new library and a new build system and new compiler flags all at once was just too much for the students. Their brains were getting DoS’d and shut down. So even though I demonstrated how to compile a program on a low-memory machine, I still had to explain to them individually why the usual method of compilation wasn’t working. For those who had only 1 GB of memory, I didn’t even have a ready solution.

But still, the results of the second semester were much better: absolutely everyone could write the client side of the game. Some had problems with coroutines on the server side, but in general, they got this part just fine. For me, that wasn’t enough. So I gave them a supertask: write a simple AI for the game. This would help them understand the advantages of the client-server architecture. They had to realize that an AI is just another client, so the server code should be left as it is, and on the client side, the only required modification was move polling. This was a good problem on architecture design, suitable for students who already had gotten a grip on programming.

We also had a little contest. I invited the students to play some code golf, solving a simple problem with the smallest program possible. For encouragement, I promised to free the best-achieving students from writing a semester report. And if anyone could beat my solution, they would pass the semester examination automatically (in Russian universities, teachers are allowed to set arbitrary conditions, like participating in side projects, for passing the semester without the usual examination).

As I expected, nobody could beat me, though a couple of students came up with some interesting tricks, like writing 10 instead of "\0" to save two symbols. Some would say this is an abuse of the type system, but clever hacks like this speak to a student’s knowledge.

## The results of the first year

I was asking too much for a single semester; only one student could write an AI that kind of worked (she said that the first time it made a sensible move, it made her really happy). Another student who complained the whole year that programming was not her thing and that she couldn’t understand anything managed to write her own client and server. At the end of the day, all was not for nothing.

## The second try

Everything described until this point happened during the 2018/2019 academic year. The higher-ups had no issues with the course, and I was able to continue on the following year. New students, new possibilities, a new program. This time I was much better prepared. I had materials for most of the lectures, no more need to redo everything on the fly, and during the summer I had time to fix the problems with the course.

The 10,958 problem was expelled on the charge of being boring. Instead, the Gomoku game was expanded, now lasting almost the whole term. “Almost” because the first assignment was a problem regarding OOP: students had to implement various geometrical shapes as classes and draw them on screen with some customization. I made this the first step after Hello World so everyone could learn how to use the tools, and so I could judge their level.

It’s not surprising that students are very different. Some are already working as programmers and attend the course out of curiosity, while some still struggle with variables and loops. From the beginning, I decided that my course would be not just for everyone, but for those who are interested. However, disregarding the others would be wrong, so we needed assignments for different levels.

To jump-start the network study, I set up a server with the reference implementation, and everyone could connect to it and play. Even telnet would do. Allotting more time for an assignment and providing better explanations served students well. Many could finish the supertask. We even had a little contest between AIs. A human could still beat them with ease, but it was still a big improvement over the previous year.

## A pleasant surprise

I had the same code golf contest again. And again, I promised to give a free pass to whoever could beat me. I’m sure you can see where this is going, but first I must explain the problem in detail. The task was to implement a function, challenge, defined as follows:

import std;

/**
* Params:
* s = Multiline string, each line containing positive integers separated by whitespace.
* Returns: An array containing the sums of each line and the grand total as the last element.
*/
uint[] challenge(string s);

unittest {
assert(challenge("0") == [0, 0]);
assert(challenge("1\n1") == [1, 1, 2]);
assert(challenge("2\n2\n3") == [2, 2, 3, 7]);
assert(challenge("2\n2 0 3\n3 1 1 4") == [2, 5, 9, 16]);
}

Only the symbols inside the function body count. Using any Phobos functionality is allowed, but no renamed imports.

My solution from the previous year was pretty straightforward:

auto r=s.split('\n').map!(a=>a.split.map!(i=>i.to!uint).sum); return r.array~r.sum; // 84 characters

I didn’t show them my solution, I just told them its character count. I was absolutely sure that, just like the previous year, nobody could beat it. Imagine my shock when I was outdone in a mere week:

auto r=split(src,"\n").map!"sum(map!(to!uint)(a.split))".array;return r~[sum(r)]; // 82 characters

Even with the unneeded brackets, this solution was better, thanks to the shortened map syntax: passing a lambda as a string in the first case and passing to directly in the second. The latter was just my oversight, but the former was a typical teacher’s mistake. You see, back when I first wrote my solution, it wasn’t possible to pass a string to map like that. I don’t remember well what the problem was, something about scopes. I missed that it was fixed at some point. That’s how in just one year you become an old geezer teacher who can’t keep up with the times.

I stayed true to my word and granted a free pass to the resourceful student at the end of the semester. I was concerned that securing the pass in the middle of the semester would demotivate him to attend classes, but fortunately, I was wrong.

## COVID-19 strikes

Of course, I can’t omit how the pandemic impacted the educational system. In the spring of 2020, all classes were moved online. There are some pros and cons to this.

Giving lectures remotely turned out to be very convenient. You can mutually agree on a suitable time, and you don’t need to find a free lecture room. Another very good thing is that students can either ask questions vocally—as when they raise their hands offline—or write them in the chat so that the teacher can answer them when it’s most appropriate. I really think that this system works very well.

Practical lessons are problematic though, especially with those students who aren’t involved enough. When they sit in a classroom they at least do something. Why show up at all if you’re not going to do anything? When it’s online, they just skip class. The chemistry of a group coding session with a teacher ready to help won’t kick in. I encouraged them to ask questions not only during class but at any time, hoping that this would allow me to assist them whenever they are coding. But this only worked for a couple of people.

## Lessons of the second year

The second year was better than the first but still less than ideal. I saw no need to modify the course, but its presentation had to be addressed.

We need to tear down this wall in communication. Mutual trust between the teacher and the student makes for a better educational process. It’s difficult to work with those who are wary of the teacher even when doing alright. I don’t yet know a robust solution; each case is tackled individually.

Even those who are doing well need some control. I used to think that attendance scrutiny is for those students who don’t really want to learn, to intimidate them to show up. When I was a student, the best teachers never bothered about absentees. So I too was liberal with these things, even when half the group was missing. But everything has its limits. Bad students will skip classes anyway, but lazy B-graders could benefit from a little scolding.

## Fast forward to the present

I don’t have much to say about the last year-and-a-half. Teaching D is now part of my life. Due to the pandemic, we’ve had to move online completely. I couldn’t even meet my students face-to-face. Aside from what I said earlier about how this affects the educational process, this also disrupted one of the ideas I had.

I need some way to track students’ progress to be sure that they actually follow the course and don’t just show up for classes. The way I handled this during the previous two years worked for some students: if they had a problem, they asked questions during a lecture or a practical lesson. But some students just keep quiet when they have problems, so I need some means to identify them. I couldn’t do written tests for programming; that would be nonsense. So during the summer break, I came up with the idea of doing some small quizzes if COVID restrictions were to be lifted. These quizzes would affect the final grade, but not much, as the intention behind them was to help students who are having problems, not to be the final straw that breaks the camel’s back. Unfortunately, remote lessons made this nearly impossible.

My students keep surprising me, coming up with newer and better solutions to the code golf puzzle. Unfortunately, I don’t have the exact code of the student who won in the third year (it appears that she deleted her Github account), but it was something like:

auto r=s.split('\n').map!(a=>a.split.to!(uint[]).sum); return r.array~r.sum; // 74 characters

Again, it was a feature I didn’t know about: to converts between arrays, too.

I thought that this problem was done and that there was no room for further improvement. Oh, how wrong I was. This is what they’ve brought me this year:

return(s.split('\n')~s).map!(a=>a.split.to!(uint[]).sum).array; // 63 characters

This completely destroys my solution with all the improvements! Note that the trick is not in the syntax, but in the logic. This works by concatenating the split lines with the original string, making it unnecessary to declare an intermediate array, which allows implementing this function as a single statement. I would never have thought to do that! Algorithms win over micro-optimizations, even in code golf.

The biggest change this year is a new, formalized evaluation system. In the semester, students earn points for doing assignments and writing reports on them. The first and simplest assignment is worth 5 points, and the last and hardest is worth 30. The maximum number of points a student can score without participating in code golf is 100. Code golf is scored by the formula 110 − length. The code golf winner this year got 47 points for his solution which earned him an exemption from writing any reports. We have a table listing every student’s points so everybody knows how many points they need to score. Everything is very transparent, so I don’t need to worry about not being objective when evaluating students.

# A Gas Dynamics Toolkit in D

The Eilmer flow simulation code is the main simulation program in our collection of gas dynamics simulation tools. An example of its application is shown here with the simulation of the hypersonic flow over the BoLT-II research vehicle that is to be flown in 2022.

### Some history

This simulation program, originally called cns4u, started as a relatively small C program that ran on the Cray-Y/MP supercomputer at NASA Langley Research Center in 1991. A PDF of an early report with the title ‘Single-Block Navier-Stokes Integrator’ can be found here. It describes the simple finite-volume formulation of the code that allows simulation of a nonreacting gas on a single, structured grid. Thirty years on, many capabilities have been added through the efforts of a number of academic staff and students. These capabilities include high-temperature thermochemical effects with reacting gases and distributed-memory parallel simulations on cluster computers. The language in which the program was written changed from C to C++, with connections to Tcl, Python and Lua.

The motivation for using C++ in combination with the scripting languages was to allow many code variations and user programmability so that we could tackle any number of initially unimagined gas-dynamic processes as new PhD students arrived to do their studies. By 2010, the Eilmer3 code (as it was called by then) was sitting at about 100k lines of code and was growing. We were, and still are, mechanical engineers and students of gas-dynamics first and programmers second. C++ was a lot of trouble for us. Over the next 4 years, C++ became even more trouble for us as the Eilmer3 code grew to about 250k lines of code and many PhD students used it to do all manner of simulations for their thesis studies.

Also in 2010, a couple of us (PAJ and RJG) living in different parts of the world (Queensland, Australia and Virginia, USA) came across the D programming language and took note of Andrei Alexandrescu’s promise of stability into the future. Here was the promise of a C++ replacement that we could use to rebuild our code and remain somewhat sane. We each bought a copy of Andrei’s book and experimented with the D language to see if it really was the C++-done-right that we wished for. One of us still has the copy of the initial printing of Andrei’s book without his name on the front cover.

### Rebuilding in D

In 2014 we got serious about using D for the next iteration of Eilmer and started porting the core gas dynamics code from C++ to D. Over the next four years, in between university teaching activities, we reimplemented much of the Eilmer3 C++ code in D and extended it. We think that this was done to good effect. This conference paper, from late 2015, documents our effort at the initial port of the structured grid solver. (A preprint is hosted on our site.) The Eilmer4 program is as fast as the earlier C++ program but is far more versatile while being implemented in fewer lines of code. It now works with unstructured as well as structured grids and has a new flexible boundary condition model, a high-temperature thermochemistry module, and in the past two years we have added the Newton-Krylov-accelerated steady-state solver that was used to do the simulation shown above. And importantly for us, with the code now being in D, we now have have many fewer WTF moments.

If you want more details on our development of the Eilmer4 code in D, we have the slides from a number of presentations given to the Centre for Hypersonics over the past six years.

### Features of D that have been of benefit to us include:

• Template programming that other Mechanical Engineers can understand (thanks Walter!). Many of our numerical routines are defined to work with numbers that we define as an alias to either double or Complex!double values. This has been important to us because we can use the same basic update code and get the sensitivity coefficients via finite differences in the complex direction. We think this saved us a large number of lines of code.

• String mixins have replaced our use of the M4 preprocessor to generate C++ code in Eilmer3. We still have to do a bit of head-scratching while building the code with mixins, but we have retained most of our hair—something that we did not expect to do if we continued to work with C++.

• Good error messages from the compiler. We often used to be overwhelmed by the C++ template error messages that could run to hundreds of lines. The D compilers have been much nicer to us and we have found the “did you mean” suggestions to be quite useful.

• A comprehensive standard library in combination with language features such as delegates and closures that allow us to write less code and instead concentrate on our gas dynamics calculations. I think that having to use C++ Functors was about the tipping point in our 25-year adventure with C++.

• Ranges and the foreach loops make our D code so much tidier than our equivalent C++ code.

• Low-barrier shared-memory parallelism. We do many of the flow update calculations in parallel over blocks of cells and we like to take advantage of the many cores that are available on a typical workstation.

• Simple and direct linkage to C libraries. We make extensive use of Lua for our configuration and do large simulations by using many processors in parallel via the OpenMPI library.

• The garbage collector is wonderful, even if other people complain about it. It makes life simpler for us. For the input and output, we take the comfortable path of letting the compiler manage the memory and then tell the garbage collector to tidy up after us. Of course, we don’t want to overuse it. @nogc is used in the core of the code to force us not to generate garbage. We allocate much of our data storage at the start of a simulation and then pass references to parts of it into the core functions.

• Fast compilation and good optimizing compilers. Nearly an hour’s build time was fairly common for our old C++ code, and now we would expect a DMD or LDC debug build in about a quarter of a minute. This builds a basic version of the main simulation code on a Lenovo ThinkPad with Core i7 processor. An optimized build can take a little over a minute but the benefit of the faster simulation is paid back by orders of magnitude when a simulation job is run for several hours over hundreds of processors.

• version(xxxx) { ... } has been a good way to have variants of the code. Some use complex numbers and others are just double numbers. Also, some variants of the code can have multiple chemical species and/or just work with a single-species nonreacting gas. This reduction in physical modelling allows us to reduce the memory required by the simulation. For big simulations of 3D flows, the required memory can be on the order of hundreds of gigabytes.

• debug { ... } gets used to hide IO code in @nogc functions. If a simulation fails, our first action is often to run the debug-flavor of the code to get more information and then, if needed, run the debug flavor of the code under the control of gdb to dig into the details.

We have a very specialized application and so don’t make use of much of the software ecosystem that has built up around the D language. For a build tool, we use make and for an IDE, we use emacs. The D major mode is very convenient.

There are lots of other features that just work together to make our programming lives a bit better. We are six years in on our adventure with the D programming language and we are still liking it.

# Teaching D from Scratch: Is it a viable first language?

About two-and-a-half years ago, one of my son’s homeschooling friends asked if I would teach a coding class. Both of my kids have shown interest in coding via Scratch, and I also need to pull my weight when it comes to homeschooling them (my wife doing most of the instruction). So I thought, surely, this can’t be too hard. The challenges I would encounter over the course of the last few years surprised me more often than I would have expected!

First, a bit about my background. I’ve been writing software professionally for over 20 years, about half using system languages and half doing web development. I’ve dabbled in a lot of programming projects, including embedded microcontrollers with 256 bytes of RAM, collaborative video systems using high-end Unix workstations, and automating a factory testing rack-mounted appliances. What I haven’t ever done to any official degree is teach programming. For sure, this is a challenge that is novel to me, and also one that I have found very satisfying.

A bit about the kids. I had ten students, aged 9 – 15, all of whom had very little experience writing real software. Since this class was going to be a side project for me on top of my normal job, we went rather slowly, meeting once every two weeks.

One of the goals of my class was to try not to dumb down coding into an abstraction. I wanted the kids to experience real programming without too much of a pre-built hand-holding framework, and I believed they could do it. I’ve seen many (and purchased some) online resources for teaching to code that hide most of the real work and just focus on using custom libraries or simplified languages to learn the “basics”, while not actually teaching to be productive in a standard environment. I also witnessed firsthand my own kids getting bored when the material was presented too slowly or too abstractly.

Note: you can see some of my homework assignments and review pages on a dedicated website that I created for this class

## First Try: Javascript and Lua

The first language we worked with, because many of the kids didn’t have actual full-blown computers to work on (this was pre-Covid so it was at my house), was Javascript. Javascript is possible to develop in a browser, using a tablet or a Chromebook, so all the kids I was teaching could participate. My focus in these early days was basic concepts like loops, branching, and basic statements. I think Javascript can be a great first language, though the things you can do with it as a beginner are a bit bland, and to do anything exciting you have to start learning the DOM (including how objects and methods work).

After half a year we moved on to our second language: Lua, specifically in the video game system Roblox. I almost wish we had skipped this one because it was a bit too advanced for these students, and I was not familiar with the language and the system to begin with. The kids did have fun with the model-building UI. However, I can see an experienced teacher finding good ways to instruct using this environment. Roblox, with its pre-built rendering and physics engines, has the nice benefit of allowing one to generate a really cool game with relatively little code—something new coders quite enjoy.

## Just use D

And that leads us to the second year, where I decided to just try teaching what I know best: the D programming language. Since all the kids have a keen interest in building games, I searched for a simple library I could use to build games. I have to give a shout-out to the library I discovered, Raylib, and the YouTuber Ki Rill, whose very straightforward and well-made videos gave me the confidence that this also would be something I could teach.

For a quick demo of what Raylib looks like, here is a program I wrote in a few minutes that bounces a Raylib-drawn D-man around the window.

Everyone in the class is using Windows, so the toolchain that I am having them use is the stock DMD compiler and the Visual Studio Code development environment with the code-d extension. Throughout the last year-and-a-half, I have had the kids build such classics as hangman and tic-tac-toe using text-based “graphics”. After that came the introduction to 2D graphics and Raylib, where we built a brick-breaking game similar to Arkanoid and others.

Finally, I let the students pick what they wanted for the last project, and their pick was a rock-paper-scissors online tournament game. I’m not sure of the lasting power of that one…

Keep in mind that my goal was to teach them programming, not necessarily D. Working through all these projects allowed me to introduce a lot of the typical programming concepts (branching, looping, functions, data types, arrays, etc.). This year we are focusing on aggregates (specifically arrays and structs), and how they can be useful to encapsulate your coding solution into pieces that are easier to deal with. The game we are writing this year is a snake clone. I’m hoping to wrap things up on that by the end of the year, and then next year add in networking to have the game playable between the students (all the kids these days are into online tournament-style games).

## My Take on Teaching With D

So how is D as a beginner’s programming language? I can say with confidence that the kids understand the flavor of D that I have taught just as well as the other languages. What flavor is that you may ask? Why vanilla of course! As D was the first typed language they were exposed to, I had to first explain types and why they are important. I also had to pick and choose which concepts in D I did not want to confuse them with. To that end, I’ve left out a lot of the advanced features (so far), such as:

• Templates
• Type inference
• __traits
• Classes/Interfaces/OOP
• Pointers/references (mostly)
• Memory management in general (thank you GC!)

This means that a lot of language features are missing from “Vanilla D”, and a lot of those are pieces I like. I had to force myself to avoid those things while teaching so as not to confuse them. I also try to avoid shortcuts if I can, as I want them to master the basic concepts before learning how to do it quicker or less verbosely. I tried hard to always use curly braces for scope blocks, for instance, instead of just writing single statements without them.

The features that I found the kids handled well were basic types like ints and strings (though what a funny name “string” is, that took a while to sink in), if statements and foreach/while loops, structs, and functions. They had a harder time understanding arrays and associative arrays, which might be due to my lack of experience teaching. I’ll also note that teaching virtually has some significant challenges—there’s just no substitute to being able to walk over to a student’s computer and observe and help their progress or draw out on a whiteboard how data is laid out. One surprising thing (to me anyway) that they handled completely in stride, was nested functions. That’s a nice D feature that is not available in C, barely in C++, and something that I personally took a while to get used to.

## What Could Go Wrong?

The challenges were numerous. First I had to try and remember what it’s like to know nothing about programming. I did not take any classes on teaching programming and did not look up the “proper” way to teach it, I just went with what I thought would be the next best thing to do based on their skills and experience. Having class only once every two weeks is a big drawback since the students’ brains tend to garbage-collect some of the stuff we worked on last time. For sure, a daily or maybe twice-weekly class schedule would improve the situation, but that would mean I’d have to generate material to teach that many classes, which I unfortunately don’t have the time for.

Secondly, I previously had no experience with games programming. Raylib made this pretty easy, and as long as you can understand event loops, it’s pretty straightforward (and fun—I wish I had learned it sooner).

Aside from my personal and environmental shortcomings, I have some suggestions for the D language and community that would make D a much better “first language”. In no particular order:

### Error Messages

Error messages in D are sometimes esoteric, and I’m not just talking about templates. The most common error the students make is bad punctuation. This leads to some of the worst messages. Most of the time, I need to explain what is happening, but even when explaining it myself, I am sometimes thwarted by the compiler. For instance, if you forget a semicolon:

void main()
{
import std.stdio;
int x = 5
writeln(x); // line 5
}

The resulting error is:

foo.d(5): Error: semicolon expected, not writeln

But wait, there is a semicolon on line 5! So why is it having a problem? Technically, putting an extra semicolon just before the writeln call on line 5 would solve the error (after all, D doesn’t have significant whitespace). But really, the compiler shouldn’t be suggesting that.

I’d rather see something like:

foo.d(5): Error: previous statement not terminated, perhaps you need
a semicolon on line 4?

What if you are missing a brace or have an extra one? That spits out a lot of error vomit that is sometimes really difficult to decipher. I know this isn’t as easy a problem to solve, but since the compiler is not even to the tricky semantic parts, what if it used some other hints of the source to try and tease out a suggested solution? Like maybe taking cues from the indentation.

A great feature of the D compiler is the spelling correction suggestions. But sometimes it’s not easy to see what has changed in the suggested spelling correction (especially when it’s just different capitalization). Why not highlight the changed letters? Since D has gained colorized output for errors, things have become much clearer in my opinion.

I do have to relay a comment from one of my students, who said he appreciated the ability of VS Code to jump to the line that the compiler error is referencing. This helps put into perspective where their mindset is.

### Debugger on VSCode for Windows

Another challenge I’ve faced is the lack of a good debugging experience for VS Code on Windows. The C debugger for VS Code displays, for example, a string as a length and a C string (which might have trailing garbage). I’d rather not confuse them with this. I have read that Visual D has a better debugging experience, but I need dub support for these projects, and Visual D focuses on Visual Studio integration, something I don’t necessarily want to deal with in teaching these kids. I’m hoping that a recent focus on debugging (e.g., the LLDB integration project) will help.

I think WebFreak has done a great job on the VS Code plugin, and for the most part, it works amazingly well. Thank you! I know debugging is a piece that is hard to get right.

I spend an inordinate amount of time helping the kids set up their build environments, and having this built-in to the IDE or dub would be immensely helpful for using D for learning.

## Suggestions for the Future

I am confident that D can be a viable first language. I have students that, prior to my class, had never before written real code now creating 2D games using D and Raylib (with a lot of hand-holding, darn it!). What I think D needs for this success to be scalable is a set of resources that assumes no programming experience. Most of D’s learning resources are aimed at teaching experienced programmers how D works in terms of their existing knowledge. They answer the question, “How do I do <insert feature> from <insert language> in D?” What we need is to answer the question, “How do I write code?”, that just happens to be using D.

Editor’s Note:Some readers may be thinking at this point of Ali Çehreli’s Programming in D. Though written for beginner-level programmers, it’s focused more on teaching the language than teaching to program generally.

This shouldn’t be too hard to create, as there are a ton of resources on learning to program in C already. D is so similar you could almost just replace C with D and call it a day. In fact, I’m considering creating a YouTube series on the topic with the experience I’ve gained. One important tool that needs creating is a map of D features and where they fall on the difficulty scale. This allows one to write D tutorials that follow a gradual introduction of features. Broaching the subject of complex features is probably needed early, as they will experience some of them (you are using a template whenever you use writeln, or to), but explanation of the entire feature may be deferred to a later level.

And finally, such materials should be engaging! I can’t stress enough how important it is to be doing something fun when you are learning to program. The kids definitely are more engaged since we are creating games that they might actually want to play vs. simple hello world examples or even Javascript web pages. A nice way to proceed might be to write a game in Vanilla D and then introduce features that allow you to refine and improve the code.

I hope this becomes a reality because I think this area of development has been relegated to either frameworks that attempt to water down programming to get kids engaged or sterile hello world-style programming that is intended for rote instruction or reference. I think we can show the full potential of D and, at the same time, engage eager learners with interesting and fun projects to trick them into learning code with our favorite language!

# I Wrote a High-Frequency Trading Platform In D

I’ve used the D programming language to implement a high-frequency trading (HFT) platform. I’ve been quite satisfied with the experience and thought I’d share how I got here. It wasn’t a direct path.

In 2008, I stumbled across a book on Amazon called Learn to Tango with D. That grabbed my curiosity, so I decided to research D further. That led me to Digital Mars and Walter Bright. I had first heard of Walter when I learned about Zortech C++, the first native C++ compiler. His work had been a huge influence on my C++ learning experience. So I was immediately interested in the language just because it was his, and excited to learn that he was working with Andrei Alexandrescu on version 2. Still, I decided to wait until they were further along with the new version before I dove in.

In 2010, I bought Andrei’s The D Programming Language as soon as it was published and started reading. At the time, I was working at BNP Paribas using C++ to optimize their HFT platform, so high performance was prevalent in my thoughts. When I saw that D’s classes were reference types, with functions that are virtual by default, I was disappointed. I didn’t see how this could be useful for low-latency programming. I became too busy with work to explore further at the time, so I put the book and the language aside.

In 2014, I began preparing for a new adventure. As part of that, I started working on a feed handler framework from scratch in C++, using my own long-maintained C++ library of low-level components useful in low-latency, high-performance applications. Andrei’s book came to my attention again, so I decided to give it another look.

This time, I read the book through to the end and learned that my initial impression had been misplaced. I found that I liked D’s metaprogramming features and its support for programming in a functional style. By the end of the book, I was ready to give D a try.

I started by porting my C++ library and feed handler to D. It wasn’t difficult. I use very little inheritance in my C++ code, preferring composition and concrete classes. I found myself quite productive with D’s structs, templates, and mixins. All the while, I kept a close eye on performance benchmarks. When D turned out to give me the same performance as my C++ code, I was sold. I found D to be much more elegant, cleaner, more readable, and easier to maintain. I made the switch and never looked back.

My goal was to develop a complete HFT system using D. The system would consist of different subsystems:

• Feed-Handler Framework: receives market data from exchanges; builds the books for all securities; publishes the updates to the other subsystems.
• Strategies Framework: receives market data updates from feed handlers; facilitates communications with the Order Management System; allows for plugging into it strategies that make decisions on stock trades.
• Order Management System: communicates with the exchange and the strategies framework; maintains a database of orders.
• Signal Generator: receives market data updates from feed handlers; generates different signals as indicator values, predictions of stock prices, etc.; sends the different signals to strategies.

Ultimately, I found a new data structure and better design for my feed-handler framework. I developed the new version completely in D. This implementation heavily uses templates. I like D’s template syntax and generally find the error messages clearer than the complex error messages I was used to from C++. I needed to drop down to assembly for some specific x86 instructions and it was straightforward to do in D.

Later, I needed to work with configuration files. I prefer to write my config files in Lua, a lightweight scripting language that is easy to integrate into a program as an extension via its C API. For this, I found a D Lua binding called DerelictLua. Using, again, D’s metaprogramming facilities, I developed a very easy and practical way to interface with Lua on top of DerelictLua. Editor’s Note: DerelictLua has since been discontinued; new projects should use its successor, bindbc-lua, instead.

The feed handler on the Bats market comes on 31 simultaneous channels, so it is more efficient to use multithreading. For this, I chose not to use the multithreading facilities provided by Phobos. I felt I needed more control in such a low-latency environment, particularly the ability to map each thread to a specific core. I opted to use the pthreads library and its affinity feature. D’s C ABI compatibility made it a straightforward thing to do.

I’m running on FreeBSD. For my intercommunication needs, I’m using kernel queues and sockets. The same functionality is available on macOS, my preferred development platform. D did not get in my way in using these APIs on either macOS or FreeBSD. It was as seamless as using kernel queues from C.

A few notes about problems and limitations:

• I encountered one compiler bug. I found a workaround, so it wasn’t a blocker. I was able to reproduce it with a few lines of code and contacted the D community. They solved the problem and had a fix in a later version of the compiler.
• I did not use D’s garbage collector. This is not a strike against D or its GC, though. In a low-latency system like this, even the use of malloc and free can be costly, so I’m not going to take a chance on a nondeterministic system with unpredictable latency. Instead, I used my library to handle allocation/deallocation via free lists, with memory preallocated upfront. As a consequence, I also decided not to use D’s standard library for anything.
• I had to work with fixed-size ASCII strings that are not NUL-terminated and are, instead, padded with spaces at the end. Without the standard library, I found it easier to manipulate them C-style via pointers.

I was the sole developer on this project but completed it successfully in a relatively short period. Big credit to D and its productivity, readability, and ease of modifications.

# D Summer School v3

The third edition of the D Summer School, held at University POLITEHNICA of Bucharest, took place from the 5th to the 25th of July. It was three weeks of boot camping bachelor students into the basics of D across eight sessions of hands-on workshops and a hackathon. We will describe our experience in organizing the program, teaching the students, and trying to integrate them into the D community.

## Recap from past editions

For the first two editions we had the following process:

• we started advertising the summer school two months before the event;
• we selected students from among the applicants;
• the students had to complete a project during the summer school;
• we helped students improve their projects during the hackathon;
• we collected feedback at the end.

For more information, we recommend our previous article on the first edition.

## DSSv3

### Pre-event actions

In contrast to previous years, this time we started marketing the event very early in February, five months before the start date. We used the most influential vectors we could to promote it: the most popular professors. We nagged them ceaselessly to promote DSSv3 during their courses. The results were spectacular: we received 86 student applications (as opposed to 25 in the previous years). Since this was an online version of the summer school, we decided to drop the selection process and open the door to everyone. This had the added benefit that we didn’t have to conduct interviews with everyone and a wider range of students had a chance to be introduced to D.

To cope with the larger number of participants, we had to grow our team. Former Symmetry Autumn of Code participants Robert Aron and Adela Vais, and Symmetry employee Alexandru Militaru, agreed to join us. As such, we were able to diversify our teaching materials and raise the overall quality of the presentations.

The teaching materials were mostly the same as in the previous years; we simply reshuffled the order to have an incremental level of complexity and added a lab, “WebApp Tutorial using Vibe.d”. Besides this, we also changed the project competition. During the previous editions of DSS we found that students were not very engaged with the project assignment, so this time we came up with something different. We created a Dlang Bug Fix Competition: the top three students who had the largest number of merged PRs that solved a Bugzilla issue would win Raspberry Pi 400 kits (you may have noticed the “DSSv3” labeled PRs on our three main repositories). You may think that contributing to a programming language is a scary task, however, the truth is there are dozens of approachable, easy-to-fix issues that give instant gratification to the contributor.

With all the planning into place, we were now ready to start DSSv3.

### The teaching sessions

A teaching session is comprised of an hour-long lecture and two hours of hands-on exercises. This year, we abandoned the slides in favor of tutorial-like examples. Since everything was online, we simply shared our screen and highlighted the different aspects of D in a practical way by directly compiling and running examples. This made the lessons more interactive as students enthusiastically asked “what happens if…?” questions, and we could easily demonstrate the results.

For the exercises, we followed a team-play system where students were grouped in teams of four, and they worked together to solve their tasks. This made it easier for us to organize everything on the Teams platform (we would enter rooms of four students instead of talking students individually) and it encouraged them to help each other.

From among the 86 applicants, we had an average of 35 students attending the sessions, with a record high of 56 (“Introduction to D”) and a low of 25 (“C\C++ Interoperability and Tooling”). It seems that from the first lecture to the last, almost half of the students abandon the course. This may seem like a grim figure, but note that we had students from all types of backgrounds, some of them in their first year at university. Since we are teaching subjects like memory management and multithreading, it’s only natural that some of them will be lost along the way. Regardless, the lowest number of attendees was higher than the highest number from previous years.

The hackathon was attended by eight people. Again, a low figure, but that was not a surprise. Keep in mind that the majority of the participants had never made an open-source contribution. We expected that only the best students would manage to contribute. One other factor that may have influenced the low number was our uninspired decision to organize the hackathon on a Sunday; several people noted in our feedback form that they would have participated had they not had other plans. The result of the hackathon:

• 9 PRs submitted to Phobos: 5 were merged, 2 were closed but led to closing the associated Bugzilla entries, and 2 were rejected
• 1 PR submitted to DRuntime that was merged
• 4 PRs submitted to DMD: 2 were merged, 2 were rejected

We had hoped that students would submit PRs before the hackathon, but we were wrong. It seems that students should be assisted when making their first PR.

The winners of the hackathon were:

1. danandrei279 with 3 PRs merged
2. vladchicos with 2 PRs merged
3. lucica28 with 1 PR merged

### Feedback

We asked the students to fill out a feedback form, and we received 15 responses. It is highly possible that the results are biased since the feedback form was available at the end of the hackathon; by that time some students had already dropped out. Although it would have been great to understand their perspective, we still had valuable feedback on what went well and what could be improved.

From aggregating the results, we have the following conclusions.

• The difficulty of DSS is perceived as being high. Those who are well prepared love it, but those who don’t have too much programming experience are lost along the way.
• The introductory courses are much more popular than advanced ones such as “C\C++ interop” and “Multithreading in D”.
• The hackathon was appreciated by hardcore programmers (a small percentage of the total number of attendees), but the rest were intimidated by it.
• Students appreciated the relaxed interactive atmosphere of the sessions, with some commenting: “The general feel of the summer school was a chill evening hanging out with your friends.”

Overall, DSSv3 generated enthusiasm among the programming geeks, but we still have some work to do to make it attractive to a less savvy audience.

### Future plans

Now that our team has grown, we plan to do a bigger restructuring of the course. Given the high drop-out rate, we would like to make the course welcoming for any type of CS student regardless of background or experience. To that end, we are considering creating two tracks for the course: one at a beginner level, and one for more advanced students. That way we can accommodate any type of audience.

Another aspect to think about is the hackathon. We still haven’t found the most appealing project that would motivate students to commit to it. Experience has shown that creating a project from scratch in a language you’ve just learned doesn’t really work (or we haven’t found the adequate project) and contributing to the D language may be intimidating. We are still searching for better solutions, so if you have any ideas, please contact us.

Also, right now, UPB is going through a major redesign of its curriculum. Proposals for new courses are being accepted, and we will forward this course as our choice. There’s a long wait for acceptance, but we’re keeping our fingers crossed.

## Conclusions

Overall, we are happy with this year’s edition. We managed to expand our team, grow our reach, and motivate some students to contribute to the language. Even though we still have some work to do to engage less passionate students, we think we are on the right track.

See you next time at DSSv4!

# Driving with D

Here is what comes to mind when I think of D: fast, expressive, easy, and… driving? That’s right, I drive with D.

Enter my venerable Holden VZ Ute daily driver. From the factory, it came with a rubbish four-speed automatic gearbox. During 18 months of ownership, I destroyed four gearboxes. I could not afford a new vehicle at the time (I’m a 20-year-old Australian computer science student at Monash University), so I had to get creative. I purchased a rock-solid, bulletproof, six-speed automatic gearbox from another car. But that’s where the solutions ended. To make it work, I had to build my own circuit board, computer system, and firmware to control the solenoids, hydraulics, and clutches inside the gearbox, handle user input, perform shifting decisions, and interface to my car by pretending to be the four-speed automatic.

I’m quite proud of my solution. It can perform a shift in 250 milliseconds, which is great for racing. It has a steep first gear, giving it a swift takeoff. It has given some more powerful cars a run for their money. It’s got flappy paddles, diagnostic data on the screen, and the ability to go ahead and change the way it works whenever I want.

Here’s a very old video of the system working. It’s not representative of the current system—that ghastly blue screen is gone, the speedo works, and shifting has improved.

The computer is split into two parts: the user interface board, which drives an OLED display and uses an STM32F042, and the mainboard, which handles everything else, utilizing an STM32F407. The two cooperate over a CAN bus (Controller Area Network). All the firmware to handle this is written in D.

I picked D (as -betterC) because of its ingenious Uniform Function Call Syntax (UFCS), design by contract, metaprogramming, ease of interfacing with C, unit testing, portability, shared, @safe, and codebase maintenance features. Another bonus is the helpful, welcoming community. It has genuinely been a joy discussing D on the forums, and with the founders and community leaders.

Uniform Function Call Syntax (UFCS)
This has made my code significantly clearer. My code can accurately follow the flow of data without polluting my stack with single-use variables, nesting many function calls, or other sorts of clutter.
Here is an example of how you could potentially use it in an ECU (Engine Control Unit):

immutable injectorTime = airStoich(100.kpa, 25.degCelsius)
.airMass
.fuelMass((14.7f).afr)
.fuelMol
.calculateInjectorWidth;

This is equivalent to:

immutable injectorTime =
calculateInjectorWidth(
fuelMol(
fuelMass(
airMass(
airStoich(kpa(100), degCelsius(25))
),
afr(14.7)
)
)
); // brackets have been expanded for reading clarity

Please note: the values in this example are hardcoded to simplify the code and demonstrate how UFCS can give a unit of measurement to a value.

Both representations are valid D code; you can use either.

With UFCS, there’s no need to read the code backward or count your brackets, no need to use a gazillion single-use variables. Function calls mirror the flow of data. It’s concise.

Design by Contract
D’s contract programming is quite similar to Ada’s. Functions can be marked with preconditions, designated by in, and postconditions, designated by out. Should a contract fail, an assertion is thrown.

// This demonstrates D’s contract programming for a function.
// It uses the short-hand expression based syntax.
int iHaveAContract(void* ptr)
in(ptr !is null) // this is a precondition, if ptr is null, an assertion is raised
in(ptr !is null, "ptr is null :(") // this is a precondition, if ptr is null, an assertion with the error message "ptr is null :(" is raised
out(result; result > 0) // this is a post condition, it captures the return value as result and tests it
out(result; result > 0, "result was too low") // this is a post condition, it captures the return value as result, tests it, and if it fails, raise an assertion with the message "result was too low"
{
// normal function code here
}

If you want to do something a bit more complex in your contracts, an alternative syntax is available:

int iHaveAContract(void* ptr)
in {
assert(ptr !is null); // this is equivalent to in(ptr !is null) from above.
MyStruct* ms = cast(MyStruct*)ptr; // we can introduce variables local to this contract
assert(ms.blah == 2, "MyStruct.blah must be 2!"); // test ms.blah, if it fails, raise an assert with an error message
}
out(result) { // captures the return value as result
int squareIt = result * result;
assert(squareIt == 4);
}
do { // this designates the function body
// normal function code here
}

void main(string[] args)
{
auto i = iHaveAContract(null); // this will violate the contract
}

Structs and classes can include contracts, called invariants, that sanity check the state of an instance for its whole lifetime. Invariants are checked after the constructor is run, before the destructor is run, and before and after public function calls.

struct MyStruct
{
int blah;

invariant {
assert(blah == 2, "blah must always be 2 for some reason!");
}
// shorthand:
invariant(blah == 2, "blah must be 2 for some reason!");

// The invariant is checked at function entry and exit. If value is anything other than 2, the invariant will fail when the function exits
void setBlah(int value)
{
blah = value;
}
}

void main(string[] args)
{
MyStruct s; // invariant is run after construction
s.setBlah(3); // invariant contract will be violated.
}

Metaprogramming
The Don’t Repeat Yourself (DRY) principle is often touted by programmers. D’s metaprogramming is an incredible tool to achieve that goal. I use it in my CAN bus implementation. For example:

struct CANPacket(ushort ID) {
enum id = ID;
ubyte[8] data;
}
alias HeartbeatPacket = CANPacket!10;
alias BeepHornPacket = CANPacket!140;

I’ve got specific aliased types as HeartbeatPacket and BeepHornPacket, but I haven’t needed to repeat any code. They all follow the same underlying structure, so if I modify CANPacket, every alias is also updated.

Maybe you want a more descriptive CAN packet? Mixin templates can help with that!

struct GenericCanPacket
{
ushort id;
ubyte[8] data; // 8 bytes to store the CAN packet payload
}

struct HeartbeatPacket
{
ubyte deviceID; // first byte is the device ID
ubyte statusID; // second byte is the status
}

To translate a GenericCanPacket to the descriptive HeartbeatPacket, we can use mixin templates.

mixin template CanPacketHelperFunctions(ushort ID)
{
enum id = ID;

// typeof(this) means that the return type of readFromGeneric will be the struct this template is instantiated in.
static typeof(this) readFromGeneric(const ref GenericCanPacket p)
in(p.id == ID, "Generic packet cannot be converted!")
{
// do some cast
}
}

struct HeartbeatPacket
{
// the stuff declared in CanPacketHelperFunctions is pretty much copy-pasted (not literally) into here
mixin CanPacketHelperFunctions!10;

ubyte deviceID; // first byte is the device ID
ubyte statusID; // second byte is the status
}

void main()
{
GenericCanPacket generic;
generic.id = 10;
generic.data = [ 2 /* deviceID */, 3 /* statusID */ ];

writeln(HeartbeatPacket.id); // the packet ID is 10
writeln(heartbeat.deviceID); // 2
writeln(heratbeat.statusID); // 3
}

The mixin template CanPacketHelperFunctions can be used over and over for all sorts of packet representations, and since it is only declared once, the implementation remains consistent across all types that use it.

Interfacing to C
I frequently must communicate with my microcontroller’s HAL and RTOS; D’s C interface made that a breeze. Just add an extern(C) and it’s good to go.

extern(C) c_setPwm(int solenoid, void* userData); // declaration
c_setPwm(4, null); // usage, pretty easy!

Unit Testing
D’s built-in unit testing has saved me from blowing my foot off a few times. I can run all my unit tests on Windows to guarantee logical correctness, and then build a final target for my microcontroller. Here is an example:

struct MyStruct
{
int x;
int squareIt() { return x * x; }
}

unittest
{
MyStruct s;
s.x = 9;
assert(s.squareIt == 9 * 9); // If for some reason the implementation breaks, then the unit test fails
}

Deprecation
Codebases can be ever-changing, and sometimes certain functionality may no longer be considered good practice but must be retained for legacy reasons. D provides a way to explicitly mark this in code. Any use of such deprecated code will trigger a deprecation warning from the compiler. Example:

deprecated struct Example
{
// ...
}

// This deprecation includes a reason
deprecated("This is the reason for deprecation..") struct ExampleWithMessage
{
// ...
}

void main()
{
// This will generate the following compiler warning:
// "Deprecation: struct Example is deprecated"
Example e;

// This will generate the following compiler warning:
// "Deprecation: struct ExampleWithMessage is deprecated - This is the reason for deprecation.."
ExampleWithMessage ewm;
}

Of course, deprecated can be applied to all sorts of things, not just structs.

Portability
Following on from above, D supports a surprisingly large number of targets via GDC and LDC. If it weren’t for D’s portability, I would have had to write my project in C++ (ugh). I use LDC, and cross-compiling can be performed by simply adjusting my command line arguments.

Shared
shared is D’s way of guarding against multi-threaded access of code. It’s not perfect, but I use it as-is, and I think it works well. I do have multiple threads in my codebase, and they need to synchronize data. I mark certain variables as shared, which means I must take special care accessing that data. It works with system locks and mutexes. While locked, I can cast shared away and use it like a normal variable. This is handy with structs and classes.

shared int sensorValue;
sensorValue = 4; // using it like a single-thread variable, error
atomicStore(sensorValue, 4); // works with atomics

SafeD
@safe exists to prohibit sketchy memory activities and enforce best behavior. I haven’t had to fight @safe much yet because I don’t do anything wicked with my memory, but it is comfortable knowing that if I am going to make a mistake, the compiler can assist me in stopping it.

Mental Friction
Adam D. Ruppe puts it succinctly: D has low mental friction. The flexibility and expressiveness of the language make it easy to translate one’s thoughts into written code and maintain productivity. I don’t have to fight D much. This is my personal opinion, but I feel like D is the language in which I’m most productive.

### Final Thoughts

D is the perfect fit for this sort of project— I think it’s going to have a bright future ahead in the embedded world. I’m going to continue using D for my projects. I’ve got another D-powered automotive project in the works which I hope to show off in the future. Even if D isn’t yet suitable for your project, keep an eye on it. D has been making enormous strides in the past few years, especially in regards to memory safety.

The examples shown in this article are purely meant to demonstrate how D’s features can be used in the real world. Do not take them as gospel as to how you should program.

# Function Generation in D: The Good, the Bad, the Ugly, and the Bolt

## Introduction

A while ago, Andrei Alexandrescu started a thread in the D Programming Language forums, titled “Perfect forwarding”, about a challenge which came up during the July 2020 beerconf:

Write an idiomatic template forward that takes an alias fun and defines (generates) one overload for each overload of fun.

Several people proposed solutions. In the course of the discussion, it arose that there is sometimes a need to alter a function’s properties, like adding, removing, or hanging user-defined attributes or parameter storage classes. This was exactly the problem I struggled with while trying to support all the bells and whistles of D functions in my openmethods library.

Eventually, I created a module that helps with this problem and contributed it to the bolts meta-programming library. But before we get to that, let’s first take a closer look at function generation in D.

## The Good

I speculate that many a programmer who is a moderately advanced beginner in D would quickly come up with a mostly correct solution to the “Perfect forwarding” challenge, especially those with a C++ background who have an interest in performing all sorts of magic tricks by means of template meta-programming. The solution will probably look like this:

template forward(alias fun)
{
import std.traits: Parameters;
static foreach (
ovl; __traits(getOverloads, __traits(parent, fun), __traits(identifier, fun))) {
auto forward(Parameters!ovl args)
{
return ovl(args);
}
}
}

...

int plus(int a, int b) { return a + b; }
string plus(string a, string b) { return a ~ b; }

assert(forward!plus(1, 2) == 3);        // pass
assert(forward!plus("a", "b") == "ab"); // pass

This solution is not perfect, as we shall see, but it is not far off either. It covers many cases, including some that a beginner may not even be aware of. For example, forward handles the following function without dropping function attributes or parameter storage classes:

class Matrix { ... }

Matrix times(scope const Matrix a, scope const Matrix b) pure @safe
{
return ...;
}

pragma(msg, typeof(times));
// pure @safe Matrix(scope const(Matrix) a, scope const(Matrix) b)

pragma(msg, typeof(forward!times));
// pure @safe Matrix(scope const(Matrix) _param_0, scope const(Matrix) _param_1)

It even handles user-defined attributes (UDAs) on parameters:

struct testParameter;

void testPlus(@testParameter int a, @testParameter int b);

pragma(msg, typeof(testPlus));
// void(@(testParameter) int a, @(testParameter) int b)

pragma(msg, typeof(forward!testPlus));
// void(@(testParameter) int a, @(testParameter) int b)

Speaking of UDAs, that’s one of the issues with the solution above: it doesn’t carry function UDAs. It also doesn’t work with functions that return a reference. Both issues are easy to fix:

template forward(alias fun)
{
import std.traits: Parameters;
static foreach (ovl; __traits(getOverloads, __traits(parent, fun), __traits(identifier, fun)))
{
@(__traits(getAttributes, fun)) // copy function UDAs
auto ref forward(Parameters!ovl args)
{
return ovl(args);
}
}
}

This solution is still not 100% correct though. If the forwardee is @trusted, the forwarder will be @safe:

@trusted void useSysCall() { ... }

pragma(msg, typeof(&useSysCall));         // void function() @trusted
pragma(msg, typeof(&forward!useSysCall)); // void function() @safe

This happens because the body of the forwarder consists of a single statement calling the useSysCall function. Since calling a trusted function is safe, the forwarder is automatically deemed safe by the compiler.

However, Andrei’s challenge was not exactly what we discussed in the previous section. It came with a bit of pseudocode that suggested the template should not be eponymous. In other words, I believe that the exact task was to write a template that would be used like this: forward!fun.fun(...). Here is the pseudocode:

// the instantiation of forward!myfun would be (stylized):

template forward!myfun
{
void myfun(int a, ref double b, out string c)
{
return myfun(a, b, c);
}
int myfun(in string a, inout double b)
{
return myfun(a, b);
}
}

Though this looks like a small difference, if we want to implement exactly this, a complication arises. In the eponymous forward, we did not need to create a new identifier; we simply used the template name as the function name. Thus, the function name was fixed. Now we need to create a function with a name that depends on the forwardee’s name. And the only way to do this is with a string mixin.

The first time I had to do this, I tried the following:

template forward(alias fun)
{
import std.format : format;
import std.traits: Parameters;
enum name = __traits(identifier, fun);
static foreach (ovl; __traits(getOverloads, __traits(parent, fun), name)) {
@(__traits(getAttributes, fun))
auto ref mixin(name)(Parameters!ovl args)
{
return ovl(args);
}
}
}

This doesn’t work because a string mixin can only be used to create expressions or statements. Therefore, the solution is to simply expand the mixin to encompass the entire function definition. The token-quote operator q{} is very handy for this:

template forward(alias fun)
{
import std.format : format;
import std.traits: Parameters;
enum name = __traits(identifier, fun);
static foreach (ovl; __traits(getOverloads, __traits(parent, fun), name)) {
mixin(q{
@(__traits(getAttributes, fun))
auto ref %s(Parameters!ovl args)
{
return ovl(args);
}
}.format(name));
}
}

Though string mixins are powerful, they are essentially C macros. For many D programmers, resorting to a string mixin can feel like a defeat.

Let us now move on to a similar, yet significantly more difficult, challenge:

Write a class template that mocks an interface.

For example:

interface JsonSerializable
{
string asJson() const;
}

void main()
{
auto mock = new Mock!JsonSerializable();
}

Extrapolating the techniques acquired during the previous challenge, a beginner would probably try this first:

class Mock(alias Interface) : Interface
{
import std.format : format;
import std.traits: Parameters;
static foreach (member; __traits(allMembers, Interface)) {
static foreach (fun; __traits(getOverloads, Interface, member)) {
mixin(q{
@(__traits(getAttributes, fun))
auto ref %s(Parameters!fun args)
{
// record call
static if (!is(ReturnType!fun == void)) {
return ReturnType!fun.init;
}
}
}.format(member));
}
}
}

Alas, this fails to compile, throwing errors like:

Error: function challenge.Mock!(JsonSerializable).Mock.asJson return type
inference is not supported if may override base class function

In other words, auto cannot be used here. We have to fall back to explicitly specifying the return type:

class Mock(alias Interface) : Interface
{
import std.format : format;
import std.traits: Parameters, ReturnType;
static foreach (member; __traits(allMembers, Interface)) {
static foreach (fun; __traits(getOverloads, Interface, member)) {
mixin(q{
@(__traits(getAttributes, fun))
ReturnType!fun %s(Parameters!fun args)
{
// record call
static if (!is(ReturnType!fun == void)) {
return ReturnType!fun.init;
}
}
}.format(member));
}
}
}

This will not handle ref functions though. What about adding a ref in front of the return type, like we did in the first challenge?

// as before
ref ReturnType!fun %s(Parameters!fun args) ...

This will fail with all the functions in the interface that do not return a reference.

The reason why everything worked almost magically in the first challenge is that we called the wrapped function inside the template. It enabled the compiler to deduce almost all of the characteristics of the original function and copy them to the forwarder function. But we have no model to copy from here. The compiler will copy some of the aspects of the function (pure, @safe, etc.) to match those of the overriden function, but not some others (ref, const, and the other modifiers).

Then, there is the issue of the function modifiers: const, immutable, shared, and static. These are yet another category of function “aspects”.

At this point, there is no other option than to analyze some of the function attributes by means of traits, and convert them to a string to be injected in the string mixin:

      mixin(q{
@(__traits(getAttributes, fun))
%sReturnType!fun %s(Parameters!fun args)
{
// record call
static if (!is(ReturnType!fun == void)) {
return ReturnType!fun.init;
}
}
}.format(
(functionAttributes!fun & FunctionAttribute.const_ ? "const " : "")
~ (functionAttributes!fun & FunctionAttribute.ref_ ? "ref " : "")
~ ...,
member));
}

If you look at the implementation of std.typecons.wrap, you will see that part of the code deals with synthesizing bits of a string mixin for the storage classes and modifiers.

## The Ugly

So far, we have looked at the function storage classes, modifiers, and UDAs, but we have merely passed the parameter list as a single, monolithic block. However, sometimes we need to perform adjustments to the parameter list of the generated function. This may seem far-fetched, but it does happen. I encountered this problem in my openmethods library. During the “Perfect forwarding” discussion, it appeared that I was not the only one who wanted to do this.

I won’t delve into the details of openmethods here (see an older blog post for an overview of the module); for the purpose of this article, it suffices to say that, given a function declaration like this one:

Matrix times(virtual!Matrix a, double b);

openmethods generates this function:

Matrix dispatcher(Matrix a, double b)
{
return resolve(a)(a, b);
}

The virtual template is a marker; it indicates which parameters should be taken into account (i.e., passed to resolve) when picking the appropriate specialization of times. Note that only a is passed to the resolve function—that is because the first parameter uses the virtual! marker and the second does not.

Bear in mind, though, that dispatcher is not allowed to use the type of the parameters directly. Inside the openmethods module, there is no Matrix type. Thus, when openmethods is handed a function declaration, it needs to synthesize a dispatcher function that refers to the declaration’s parameter types exclusively via the declaration. In other words, it needs to use the ReturnType and Parameters templates from std.traits to extract the types involved in the declaration – just like we did in the examples above.

Let’s put aside function attributes and UDAs – we already discussed those in the previous section. The obvious solution then seems to be:

ReturnType!times dispatcher(
RemoveVirtual!(Parameters!times[0]) a, Parameters!times[1] b)
{
return resolve(a)(a, b);
}

pragma(msg, typeof(&dispatcher)); // Matrix function(Matrix, double)

where RemoveVirtual is a simple template that peels off the virtual! marker from the type.

Does this preserve parameter storage classes and UDAs? Unfortunately, it does not:

@nogc void scale(ref virtual!Matrix m, lazy double by);

@nogc ReturnType!scale dispatcher(RemoveVirtual!(Parameters!scale[0]) a, Parameters!scale[1] b)
{
return resolve(a)(a, b);
}

pragma(msg, typeof(&dispatcher)); // void function(Matrix a, double b)

We lost the ref on the first parameter and the lazy on the second. What happened to them?

The culprit is Parameters. This template is a wrapper around an obscure feature of the is operator used in conjunction with the __parameters type specialization. And it is quite a strange beast. We used it above to copy the parameter list of a function, as a whole, to another function, and it worked perfectly. The problem is what happens when you try to process the parameters separately. Let’s look at a few examples:

pragma(msg, Parameters!scale.stringof); // (ref virtual!(Matrix), lazy double)
pragma(msg, Parameters!scale[0].stringof); // virtual!(Matrix)
pragma(msg, Parameters!scale[1].stringof); // double

We see that accessing a parameter individually returns the type… and discards everything else!

There is actually a way to extract everything about a single parameter: use a slice instead of an element of the paramneter pack (yes, this is getting strange):

pragma(msg, Parameters!scale[0..1].stringof); // (ref virtual!(Matrix))
pragma(msg, Parameters!scale[1..2].stringof); // (lazy double)

So this gives us a solution for handling the second parameter of scale:

ReturnType!scale dispatcher(???, Parameters!scale[1..2]) { ... }

But what can we put in place of ???. RemoveVirtual!(Parameters!scale[0..1]) would not work. RemoveVirtual expects a type, and Parameters!scale[1..2] is not a type—it is a sort of conglomerate that contains a type, and perhaps storage classes, type constructors, and UDAs.

At this point, we have no other choice but to construct a string mixin once again. Something like this:

mixin(q{
%s ReturnType!(scale) dispatcher(
%s RemoveVirtual!(Parameters!(scale)[1]) a,
Parameters!(scale)[1..2] b)
{
resolve(a)(a, b);
}
}.format(
functionAttributes!scale & FunctionAttribute.nogc ? "@nogc " : ""
/* also handle other function attributes */,
__traits(getParameterStorageClasses, scale, 0)));

pragma(msg, typeof(dispatcher)); // @nogc void(ref double a, lazy double)

This is not quite sufficient though, because it still doesn’t take care of parameter UDAs.

### To Boltly Refract…

openmethods once contained kilometers of mixin code like the above. Such heavy use of string mixins was too ugly and messy, so much so that the project began to feel less like fun and more like work. So I decided to sweep all the ugliness under a neat interface, once and for all. The result was a “refraction” module, which I later carved out of openmethods and donated to Ali Akhtarzada’s excellent bolts library. bolts attempts to fill in the gaps and bring some regularity to D’s motley set of meta-programming features.

refraction’s entry point is the refract function template. It takes a function and an “anchor” string, and returns an immutable Function object that captures all the aspects of a function. Function objects can be used at compile-time. It is, actually, their raison d’être.

Function has a mixture property that returns a declaration for the original function. For example:

Matrix times(virtual!Matrix a, double b);
pragma(msg, refract!(times, "times").mixture);
// @system ReturnType!(times) times(Parameters!(times) _0);

Why does refract need the anchor string? Can’t the string "times" be inferred from the function by means of __traits(identifier...)? Yes, it can, but in real applications we don’t want to use this. The whole point of the library is to be used in templates, where the function is typically passed to refract via an alias. In general, the function’s name has no meaning in the template’s scope—or if, by chance, the name exists, it does not name the function. All the meta-expressions used to dissect the function must work in terms of the local symbol that identifies the alias.

Consider:

module matrix;

Matrix times(virtual!Matrix a, double b);

Method!times timesMethod; // openmethods creates a Method object for each
// declared method

module openmethods;

struct Method(alias fun)
{
enum returnTypeMixture = refract!(fun, "fun").returnType;
pragma(msg, returnTypeMixture);              // ReturnType!(fun)
mixin("alias R = ", returnTypeMixture, ";"); // ok
pragma(msg, R.stringof);                     // Matrix
}

There is no times and no Matrix in module openmethods. Even if they existed, they could not be the times function and the Matrix class from module matrix, as this would require a circular dependency between the two modules, something that D forbids by default. However, there is a fun symbol, and it aliases to the function; thus, the return type can be expressed as ReturnType!(fun).

All aspects of the function are available piecemeal. For example:

@nogc void scale(ref virtual!Matrix m, lazy double by);
pragma(msg, refract!(scale, "scale").parameters[0].storageClasses); // ["ref"]

Function also has methods that return a new Function object, with an alteration to one of the aspects. They can be used to create a variation of a function. For example:

pragma(msg,
refract!(scale, "scale")
.withName("dispatcher")
.withBody(q{{ resolve(_0[0])(_0); }})
.mixture
);

@nogc @system ReturnType!(scale) dispatcher(ref Parameters!(scale)[0] _0, lazy Parameters!(scale)[1] _1)
{
resolve(_0[0])(_0);
}

This is the reason behind the name “refraction”: the module creates a blueprint of a function, performs some alterations on it, and returns a string—called a mixture—which, when passed to mixin, will create a new function.

openmethods needs to change the type of the first parameter while preserving storage classes. With bolts.experimental.refraction, this becomes easy:

original = refract!(scale, "scale");

pragma(msg,
original
.withName("dispatcher")
.withParameters(
[original.parameters[0].withType(
"RemoveVirtual!(%s)".format(original.parameters[0].type)),
original.parameters[1],
])
.withBody(q{{
return resolve(_0)(%s);
}}.format(original.argumentMixture))
);

This time, the generated code splits the parameter pack into individual components:

@nogc @system ReturnType!(scale) dispatcher(
ref RemoveVirtual!(Parameters!(scale)[0]) _0, Parameters!(scale)[1..2] _1)
{
return resolve(_0)(_0);
}

Note how the first and second parameters are handled differently. The first parameter is cracked open because we need to replace the type. That forces us to access the first Parameters value via indexing, and that loses the storage classes, UDAs, etc. So they need to be re-applied explicitly.

On the other hand, the second parameter does not have this problem. It is not edited; thus, the Parameters slice trick can be used. The lazy is indeed there, but it is inside the parameter conglomerate.

## Conclusion

Initially, D looked almost as good as Lisp for generating functions. As we tried to gain finer control of the generated function, our code started to look a lot more like C macros; in fact, in some respects, it was even worse: we had to put an entire function definition in a string mixin just to set its name.

This is due to the fact that D is not as “regular” a language as Lisp. Some of the people helming the evolution of D are working on changing this, and it is my hope that an improved D will emerge in the not-too-distant future.

In the meantime, the experimental refraction module from the bolts meta-programming library offers a saner, easier way of generating functions without compromising on the idiosyncrasies that come with them. It allows you to pretend that functions can be disassembled and reassembled at will, while hiding all the gory details of the string mixins that are necessarily involved in that task.

Jean-Louis Leroy is not French, but Belgian. He got his first taste of programming from a HP-25 calculator. His first real programming language was Forth, where CTFE is pervasive. Later he programmed (a little) in Lisp and Smalltalk, and (a lot) in C, C++, and Perl. He now works for Bloomberg LP in New York. His interests include object-relational mapping, open multi-methods, DSLs, and language extensions in general.

# A Pattern for Head-mutable Structures

When Andrei Alexandrescu introduced ranges to the D programming language, the gap between built-in and user-defined types (UDTs) narrowed, enabling new abstractions and greater composability. Even today, though, UDTs are still second-class citizens in D. One example of this is support for head mutability—the ability to manipulate a reference without changing the referenced value(s). This document details a pattern that will further narrow the UDT gap by introducing functions for defining and working with head-mutable user-defined types.

### Introduction

D is neither Kernel nor Scheme—it has first-class and second-class citizens. Among its first-class citizens are arrays and pointers. One of the benefits these types enjoy is implicit conversion to head-mutable. For instance, const(T[]) is implicitly convertible to const(T)[]. Partly to address this difference, D has many ways to define how one type may convert to or behave like another – alias this, constructors, opDispatch, opCast, and, of course, subclassing. The way pointers and dynamic arrays decay into their head-mutable variants is different from the semantics of any of these features, so we would need to define a new type of conversion if we were to mimic this behavior.

Changing the compiler and the language to permit yet another way of converting one type into another is not desirable: it makes the job harder for compiler writers, makes an already complex language even harder to learn, and any implicit conversion can make code harder to read and maintain. If we can define conversions to head-mutable data structures without introducing compiler or language changes, this will also make the feature available to users sooner, since such a mechanism would not necessarily require changes in the standard library, and users could gradually implement it in their own code and benefit from the code in the standard library catching up at a later point.

### Unqual

The tool used today to get a head-mutable version of a type is std.traits.Unqual. In some cases, this is the right tool—it strips away one layer of const, immutable, inout, and shared. For some types though, it either does not give a head-mutable result, or it gives a head-mutable result with mutable indirections:

struct S(T) {
T[] arr;
}

With Unqual, this code fails to compile:

void foo(T)(T a) {
Unqual!T b = a; // cannot implicitly convert immutable(S!int) to S!int
}

unittest {
immutable s = S!int([1,2,3]);
foo(s);
}

A programmer who sees that message hopefully finds a different way to achieve the same goal. However, the error message says that the conversion failed, indicating that a conversion is possible, perhaps even without issue. An inexperienced programmer, or one who knows that doing so is safe right now, could use a cast to shut the compiler up:

void bar(T)(T a) {
Unqual!T b = cast(Unqual!T)a;
b.arr[0] = 4;
}

unittest {
immutable s = S!int([1,2,3]);
bar(s);
assert(s.arr[0] == 1); // Fails, since bar() changed it.
}

If, instead of S!int, the programmer had used int[], the first example would have compiled, and the cast in the second example would have never seen the light of day. However, since S!int is a user-defined type, we are forced to write a templated function that either fails to compile for some types it really should support or gives undesirable behavior at run time.

Clearly, we should be able to do better than Unqual, and in fact we can. D has template this parameters which pick up on the dynamic type of the this reference, and with that, its const or immutable status:

struct S {
void foo(this T)() {
import std.stdio : writeln;
writeln(T.stringof);
}
}
unittest {
S s1;
const S s2;
s1.foo(); // Prints "S".
s2.foo(); // Prints "const(S)".
}

This way, the type has the necessary knowledge of which type qualifiers a head-mutable version needs. We can now define a method that uses this information to create the correct head-mutable type:

struct S(T) {
T[] arr;
import std.traits : CopyTypeQualifiers;
return S!(CopyTypeQualifiers!(This, T))(arr);
}
}
unittest {
const a = S!int([1,2,3]);
assert(is(typeof(b) == S!(const int))); // The correct part of the type is now const.
assert(a.arr is b.arr); // It's the same array, no copying has taken place.
b.arr[0] = 3; // Correctly fails to compile: cannot modify const expression.
}

Thanks to the magic of Uniform Function Call Syntax, we can also define headMutable() for built-in types:

auto headMutable(T)(T value) {
import std.traits;
import std.typecons : rebindable;
static if (isPointer!T) {
// T is a pointer and decays naturally.
return value;
} else static if (isDynamicArray!T) {
// T is a dynamic array and decays naturally.
return value;
} else static if (!hasAliasing!(Unqual!T)) {
// T is a POD datatype - either a built-in type, or a struct with only POD members.
return cast(Unqual!T)value;
} else static if (is(T == class)) {
// Classes are reference types, so only the reference may be made head-mutable.
return rebindable(value);
} else static if (isAssociativeArray!T) {
// AAs are reference types, so only the reference may be made head-mutable.
return rebindable(value);
} else {
}
}
unittest {
const(int*[3]) a = [null, null, null];
assert(is(typeof(b) == const(int)*[3]));
}

Now, whenever we need a head-mutable variable to point to tail-const data, we can simply call headMutable() on the value we need to store. Unlike the ham-fisted approach of casting to Unqual!T, which may throw away important type information and also silences any error messages that may inform you of the foolishness of your actions, attempting to call headMutable() on a type that doesn’t support it will give an error message explaining what you tried to do and why it didn’t work (“Type T cannot be made head-mutable.”). The only thing missing now is a way to get the head-mutable type. Since headMutable() returns a value of that type, and is defined for all types we can convert to head-mutable, that’s a template one-liner:

import std.traits : ReturnType;
alias HeadMutable(T) = ReturnType!((T t) => t.headMutable());

Where Unqual returns a type with potentially the wrong semantics and only gives an error once you try assigning to it, HeadMutable disallows creating the type in the first place. The programmer will have to deal with that before casting or otherwise coercing a value into the variable. Since HeadMutable uses headMutable() to figure out the type, it also gives the same informative error message when it fails.

Lastly, since one common use case requires us to preserve the tail-const or tail-immutable properties of a type, it is beneficial to define a template that converts to head-mutable while propagating const or immutable using std.traits.CopyTypeQualifiers:

import std.traits : CopyTypeQualifiers;
alias HeadMutable(T, ConstSource) = HeadMutable!(CopyTypeQualifiers!(ConstSource, T));

This way, immutable(MyStruct!int) can become MyStruct!(immutable int), while the const version would propagate constness instead of immutability.

### Example Code

Since the pattern for range functions in Phobos is to have a constructor function (e.g. map) that forwards its arguments to a range type (e.g. MapResult), the code changes required to use headMutable() are rather limited. Likewise, user code should generally not need to change at all in order to use headMutable(). To give an impression of the code changes needed, I have implemented map and equal:

import std.range;

// Note that we check not if R is a range, but if HeadMutable!R is
auto map(alias Fn, R)(R range) if (isInputRange!(HeadMutable!R)) {
// This is basically the extent to which code that uses head-mutable data types will need to change.
}

struct MapResult(alias Fn, R) {
R range;

this(R _range) {
range = _range;
}

void popFront() {
range.popFront();
}

@property
auto ref front() {
return Fn(range.front);
}

@property
bool empty() {
return range.empty;
}

static if (isBidirectionalRange!R) {
@property
auto ref back() {
return Fn(range.back);
}

void popBack() {
range.popBack();
}
}

static if (hasLength!R) {
@property
auto length() {
return range.length;
}
alias opDollar = length;
}

static if (isRandomAccessRange!R) {
auto ref opIndex(size_t idx) {
return Fn(range[idx]);
}
}

static if (isForwardRange!R) {
@property
auto save() {
return MapResult(range.save);
}
}

static if (hasSlicing!R) {
auto opSlice(size_t from, size_t to) {
return MapResult(range[from..to]);
}
}

// All the above is as you would normally write it.
// We also need to implement headMutable().
// Generally, headMutable() will look very much like this - instantiate the same
// type template that defines typeof(this), use HeadMutable!(T, ConstSource) to make
// the right parts const or immutable, and call headMutable() on fields as we pass
// them to the head-mutable type.
}
}

// Need to get head-mutable version of the parameters to iterate over them.
while (!_r1.empty && !_r2.empty) {
if (_r1.front != _r2.front) return false;
_r1.popFront();
_r2.popFront();
}
return _r1.empty && _r2.empty;
}

unittest {
// User code does not use headMutable at all:
const arr = [1,2,3];
const squares = arr.map!(a => a*a);
const squaresPlusTwo = squares.map!(a => a+2);
assert(equal(squaresPlusTwo, [3, 6, 11]));
}

(Note that these implementations are simplified slightly from Phobos code to better showcase the use of headMutable)

The unittest block shows a use case where the current Phobos map would fail—it is perfectly possible to create a const MapResult, but there is no way of iterating over it. Note that only two functions are impacted by the addition of headMutable(): map tests if HeadMutable!R is an input range and converts its arguments to head-mutable when passing them to MapResult, and MapResult needs to implement headMutable(). The rest of the code is exactly as you would otherwise write it.

The implementation of equal() shows a situation where implicit conversions would be beneficial. For const(int[]) the call to headMutable() is superfluous—it is implicitly converted to const(int)[] when passed to the function. For user-defined types however, this is not the case, so the call is necessary in the general case.

While I have chosen to implement a range here, ranges are merely the most common example of a place where headmutable would be useful; the idea has merits beyond ranges. Another type in the standard library that would benefit from headmutable is RefCounted!T: const(RefCounted!(T)) should convert to RefCounted!(const(T)).

### Why not Tail-Const?

In previous discussions of this problem, the solution has been described as tail-const, and a function tailConst() has been proposed. While this idea might at first seem the most intuitive solution, it has some problems, which together make headMutable() far superior.

The main problem with tailConst() is that it does not play well with D’s existing const system. It needs to be called on a mutable value, and there is no way to convert a const(Foo!T) to Foo!(const(T)). It thus requires that the programmer explicitly call tailConst() on any value that is to be passed to a function expecting a non-mutable value and, abstain from using const or immutable to convey the same information. This creates a separate world of tail-constness and plays havoc with generic code, which consequently has no way to guarantee that it won’t mutate its arguments.

Secondly, the onus is placed on library users to call tailConst() whenever they pass an argument anywhere, causing an inversion of responsibility: the user has to tell the library that it is not allowed to edit the data instead of the other way around. In the best case, this merely causes unnecessary verbiage. In other cases, the omission of const will lead to mutation of data expected to be immutable.

A minor quibble in comparison is that the tail-const solution also requires the existence of tailImmutable to cover the cases where the values are immutable.

## Issues

The ideas outlined in this document concern only conversion to head-mutable. A related issue is conversion to tail const, e.g. from RefCounted!T or RefCounted!(immutable T) to RefCounted!(const T), a conversion that, again, is implicit for arrays and pointers in D today.

One issue that may be serious is the fact that headMutable often cannot be @safe and may, in fact, need to rely on undefined behavior in some places. For instance, RefCounted!T contains a pointer to the actual ref count. For immutable(RefCounted!T), headMutable() would need to cast away immutable, which is undefined behavior per the spec.

### The Compiler Solution

It is logical to think that, as with built-in types, headMutable() could be elided in its entirety, and the compiler could handle the conversions for us. In many cases, this would be possible, and in fact the compiler already does so for POD types like struct S { int n; }—a const or immutable S may be assigned to a mutable variable of type S. This breaks down, however, when the type includes some level of mutable indirection. For templated types it would be possible to wiggle the template parameters to see if the resulting type compiles and has fields with the same offsets and similar types, but even such an intelligent solution breaks down in the presence of D’s Turing-complete template system, and some cases will always need to be handled by the implementer of a type.

It is also a virtue that the logic behind such an implementation be understandable to the average D programmer. The best case result of that not being true is that the forums would be inundated with a flood of posts about why types don’t convert the way users expect them to.

For these reasons, headMutable() will be necessary even with compiler support. But what would that support look like? Implicit casting to head-mutable happens in the language today in two situations:

• Assignment to head-mutable variables: const(int)[] a = create!(const(int[]))(); (all POD types, pointers and arrays)
• Function calls: fun(create!(const(int[]))(); (only pointers and arrays)

The first is covered by existing language features (alias headMutable this; fits the bill perfectly). The second is not but is equivalent to calling .headMutable whenever a const or immutable value is passed to a function that does not explicitly expect a const or immutable argument. This would change the behavior of existing code, in that templated functions would prefer a.headMutable over a, but would greatly improve the experience of working with const types that do define headMutable(). If headMutable is correctly implemented, the different choice of template instantiations should not cause any actual breakage.

### Future Work

While this document proposes to implement the described feature without any changes to the compiler or language, it would be possible for the compiler in the future to recognize headMutable() and call it whenever a type that defines that method is passed to a function that doesn’t explicitly take exactly that type, or upon assignment to a variable that matches headMutable()’s return value. This behavior would mirror the current behavior of pointers and arrays.

### Conclusion

It is possible to create a framework for defining head-mutable types in D today without compiler or language changes. It requires a little more code in the methods that use head-mutable types but offers a solution to a problem that has bothered the D community for a long time.

While this document deals mostly with ranges, other types will also benefit from this pattern: smart pointers and mutable graphs with immutable nodes are but two possible examples.

### Definitions

A type is head-mutable if some or all of its members without indirections are mutable. Note that a head-mutable datatype may also have const or immutable members without indirections; the requirement is merely that some subset of its members may be mutated. A head-mutable datatype may be tail-const, tail-immutable or tail-mutable—head-mutable only refers to its non-indirected members. Examples of head-mutable types include const(int)[], int*, string, and Rebindable!MyClass. Types without indirections (like int, float and struct S { int n; }) are trivially head-mutable.

##### Tail-const

A type is tail-const if some of its members with indirections have the const type qualifier. A tail-const type may be head-mutable or head-const. Examples of tail-const types are const(int)*, const(int[]), const(immutable(int)[])* and string.

### Source

The source code for HeadMutable and headMutable is available here.

# A Look at Chapel, D, and Julia Using Kernel Matrix Calculations

## Introduction

It seems each time you turn around there is a new programming language aimed at solving some specific problem set. Increased proliferation of programming languages and data are deeply connected in a fundamental way, and increasing demand for “data science” computing is a related phenomenon. In the field of scientific computing, Chapel, D, and Julia are highly relevant programming languages. They arise from different needs and are aimed at different problem sets: Chapel focuses on data parallelism on single multi-core machines and large clusters; D was initially developed as a more productive and safer alternative to C++; Julia was developed for technical and scientific computing and aimed at getting the best of both worlds—the high performance and safety of static programming languages and the flexibility of dynamic programming languages. However, they all emphasize performance as a feature. In this article, we look at how their performance varies over kernel matrix calculations and present approaches to performance optimization and other usability features of the languages.

Kernel matrix calculations form the basis of kernel methods in machine learning applications. They scale rather poorly—O(m n^2), where n is the number of items and m is the number of elements in each item. In our exercsie, m will be constant and we will be looking at execution time in each implementation as n increases. Here m = 784 and n = 1k, 5k, 10k, 20k, 30k, each calculation is run three times and an average is taken. We disallow any use of BLAS and only allow use of packages or modules from the standard library of each language, though in the case of D the benchmark is compared with calculations using Mir, a multidimensional array package, to make sure that my matrix implementation reflects the true performance of D. The details for the calculation of the kernel matrix and kernel functions are given here.

While preparing the code for this article, the Chapel, D, and Julia communities were very helpful and patient with all inquiries, so they are acknowledged here.

In terms of bias, going in I was much more familiar with D and Julia than I was with Chapel. However, getting the best performance from each language required a lot of interaction with each programming community, and I have done my best to be aware of my biases and correct for them where necessary.

## Language Benchmarks for Kernel Matrix Calculation

The above chart (generated using R’s ggplot2 using a script) shows the performance benchmark time taken against the number of items n for Chapel, D, and Julia, for nine kernels. D performs best in five of the nine kernels, Julia performs best in two of the nine kernels, and in two of the kernels (Dot and Gaussian) the picture is mixed. Chapel was the slowest for all of the kernel functions examined.

It is worth noting that the mathematics functions used in D were pulled from C’s math API made available in D through its core.stdc.math module because the mathematical functions in D’s standard library std.math can be quite slow. The math functions used are given here. By way of comparison, consider the mathdemo.d script comparing the imported C log function D’s log function from std.math:

$ldc2 -O --boundscheck=off --ffast-math --mcpu=native --boundscheck=off mathdemo.d && ./mathdemo Time taken for c log: 0.324789 seconds. Time taken for d log: 2.30737 seconds. The Matrix object used in the D benchmark was implemented specifically because the use of modules outside standard language libraries was disallowed. To make sure that this implementation is competitive, i.e., it does not unfairly represent D’s performance, it is compared to Mir’s ndslice library written in D. The chart below shows matrix implementation times minus ndslice times; negative means that ndslice is slower, indicating that the implementation used here does not negatively represent D’s performance. ## Environment The code was run on a computer with an Ubuntu 20.04 OS, 32 GB memory, and an Intel® Core™ i9–8950HK CPU @ 2.90GHz with 6 cores and 12 threads. $ julia --version
julia version 1.4.1
$dmd --version DMD64 D Compiler v2.090.1 ldc2 --version LDC - the LLVM D compiler (1.18.0): based on DMD v2.088.1 and LLVM 9.0.0 $ chpl --version
chpl version 1.22.0

### Compilation

Chapel:

chpl script.chpl kernelmatrix.chpl --fast && ./script

D:

ldc2 script.d kernelmatrix.d arrays.d -O5 --boundscheck=off --ffast-math -mcpu=native && ./script

Julia (no compilation required but can be run from the command line):

julia script.jl

## Implementations

Efforts were made to avoid non-standard libraries while implementing these kernel functions. The reasons for this are:

• To make it easy for the reader after installing the language to copy and run the code. Having to install external libraries can be a bit of a “faff”.
• Packages outside standard libraries can go extinct, so avoiding external libraries keeps the article and code relevant.
• It’s completely transparent and shows how each language works.

### Chapel

Chapel uses a forall loop to parallelize over threads. Also, C pointers to each item are used rather than the default array notation, and guided iteration over indices is used:

proc calculateKernelMatrix(K, data: [?D] ?T)
{
var n = D.dim(0).last;
var p = D.dim(1).last;
var E: domain(2) = {D.dim(0), D.dim(0)};
var mat: [E] T;
var rowPointers: [1..n] c_ptr(T) =
forall i in 1..n do c_ptrTo(data[i, 1]);

forall j in guided(1..n by -1) {
for i in j..n {
mat[i, j] = K.kernel(rowPointers[i], rowPointers[j], p);
mat[j, i] = mat[i, j];
}
}
return mat;
}

Chapel code was the most difficult to optimize for performance and required the highest number of code changes.

### D

D uses a taskPool of threads from its std.parallel package to parallelize code. The D code underwent the fewest number of changes for performance optimization—a lot of the performance benefits came from the specific compiler used and the flags selected (discussed later). My implementation of a Matrix allows columns to be selected by reference via refColumnSelect.

auto calculateKernelMatrix(alias K, T)(K!(T) kernel, Matrix!(T) data)
{
long n = data.ncol;
auto mat = Matrix!(T)(n, n);

{
auto arrj = data.refColumnSelect(j).array;
foreach(long i; j..n)
{
mat[i, j] = kernel(data.refColumnSelect(i).array, arrj);
mat[j, i] = mat[i, j];
}
}
return mat;
}

### Julia

The Julia code uses the @threads macro for parallelising the code and @views macro for referencing arrays. One confusing thing about Julia’s arrays is their reference status. Sometimes, as in this case, arrays will behave like value objects and they have to be referenced by using the @views macro, otherwise they generate copies. At other times they behave like reference objects, for example, when passing them into a function. It can be a little tricky dealing with this because you don’t always know what set of operations will generate a copy, but where this occurs @views provides a good solution.

The Symmetric type saves the small bit of extra work needed for allocating to both sides of the matrix.

function calculateKernelMatrix(Kernel::K, data::Array{T}) where {K <: AbstractKernel,T <: AbstractFloat}
n = size(data)[2]
mat = zeros(T, n, n)
@views for i in j:n
mat[i,j] = kernel(Kernel, data[:, i], data[:, j])
end
end
return Symmetric(mat, :L)
end

The @bounds and @simd macros in the kernel functions were used to turn bounds checking off and apply SIMD optimization to the calculations:

struct DotProduct <: AbstractKernel end
@inline function kernel(K::DotProduct, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T,N}
ret = zero(T)
m = length(x)
@inbounds @simd for k in 1:m
ret += x[k] * y[k]
end
return ret
end

These optimizations are quite visible but very easy to apply.

## Memory Usage

The total time for each benchmark and the total memory used was captured using the /usr/bin/time -v command. The output for each of the languages is given below.

Chapel took the longest total time but consumed the least amount of memory (nearly 6GB RAM peak memory):

Command being timed: "./script"
User time (seconds): 113190.32
System time (seconds): 6.57
Percent of CPU this job got: 1196%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:37:39
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 5761116
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1439306
Voluntary context switches: 653
Involuntary context switches: 1374820
Swaps: 0
File system inputs: 0
File system outputs: 8
Socket messages sent: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

D consumed the highest amount of memory (around 20GB RAM peak memory) but took less total time than Chapel to execute:

Command being timed: "./script"
User time (seconds): 106065.71
System time (seconds): 58.56
Percent of CPU this job got: 1191%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:28:29
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 20578840
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 18249033
Voluntary context switches: 3833
Involuntary context switches: 1782832
Swaps: 0
File system inputs: 0
File system outputs: 8
Socket messages sent: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

Julia consumed a moderate amount of memory (around 7.5 GB peak memory) but ran the quickest—probably because its random number generator is the fastest:

Command being timed: "julia script.jl"
User time (seconds): 49794.85
System time (seconds): 30.58
Percent of CPU this job got: 726%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:54:18
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 7496184
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 794
Minor (reclaiming a frame) page faults: 38019472
Voluntary context switches: 2629
Involuntary context switches: 523063
Swaps: 0
File system inputs: 368360
File system outputs: 8
Socket messages sent: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

## Performance optimization

The process of performance optimization in all three languages was very different, and all three communities were very helpful in the process. But there were some common themes.

• Static dispatching of kernel functions instead of using polymorphism. This means that when passing the kernel function, use parametric (static compile time) polymorphism rather than runtime (dynamic) polymorphism where dispatch with virtual functions carries a performance penalty.
• Using views/references rather than copying data over multiple threads makes a big difference.
• Parallelising the calculations makes a huge difference.
• Knowing if your array is row/column major and using that in your calculation makes a huge difference.
• Bounds checks and compiler optimizations make a tremendous difference, especially in Chapel and D.
• Enabling SIMD in D and Julia made a contribution to the performance. In D this was done using the -mcpu=native flag, and in Julia this was done using the @simd macro.

In terms of language-specific issues, getting to performant code in Chapel was the most challenging, and the Chapel code changed the most from easy-to-read array operations to using pointers and guided iterations. But on the compiler side it was relatively easy to add --fast and get a large performance boost.

The D code changed very little, and most of the performance was gained by the choice of compiler and its optimization flags. D’s LDC compiler is rich in terms of options for performance optimization. It has 8 -O optimization levels, but some are repetitions of others. For instance, -O, -O3, and -O5 are identical, and there are myriad other flags that affect performance in various ways. In this case the flags used were -O5 --boundscheck=off –ffast-math, representing aggressive compiler optimizations, bounds checking, and LLVM’s fast-math, and -mcpu=native to enable CPU vectorization instructions.

In Julia the macro changes discussed previously markedly improved the performance, but they were not too intrusive. I tried changing the optimization -O level, but this did not improve performance.

## Quality of life

This section examines the relative pros and cons around the convenience and ease of use of each language. People underestimate the effort it takes to use a language day-to-day; the support and infrastructure required is significant, so it is worth comparing various facets of each language. Readers seeking to avoid the TLDR should scroll to the end of this section for the table comparing the language features discussed here. Every effort has been made to be as objective as possible, but comparing programming languages is difficult, bias prone, and contentious, so read this section with that in mind. Some elements looked at, such as arrays, are from the “data science”/technical/scientific computing point of view, and others are more general.

### Interactivity

Programmers want a fast code/compile/result loop during development to quickly observe results and outputs in order to make progress or necessary changes. Julia’s interpreter is hands down the best for this and offers a smooth and feature-rich development experience, and D comes a close second. This code/compile/result loop in compilers can be slow even when compiling small code volumes. D has three compilers, the standard DMD compiler, the LLVM-based LDC compiler, and the GCC-based GDC. In this development process, the DMD and LDC compilers were used. DMD has very fast compilation times which is great for development. The LDC compiler is great at creating fast code. Chapel’s compiler is very slow in comparison. To give an example, running Linux’s time command on DMD vs Chapel’s compiler for the kernel matrix code with no optimizations gives us for D:

real	0m0.545s
user	0m0.447s
sys	0m0.101s

Compared with Chapel:

real	0m5.980s
user	0m5.787s
sys	0m0.206s

That’s a large actual and psychological difference, it can make programmers reluctant to check their work and delay the development loop if they have to wait for outputs, especially when source code increases in volume and compilation times become significant.

It is worth mentioning, however, that when developing packages in Julia, compilation times can be very long, and users have noticed that when they load some packages ,compilation times can stretch. So the experience of the development loop in Julia could vary, but in this specific case the process was seamless.

### Documentation and examples

One way of comparing documentation in the different languages is to compare them all with Python’s official documentation, which is the gold standard for programming languages. It combines examples with formal definitions and tutorials in a seamless and user-friendly way. Since many programmers are familiar with the Python documentation, this approach gives an idea of how they compare.

Julia’s documentation is the closest to Python’s documentation quality and gives the user a very smooth, detailed, and relatively painless transition into the language. It also has a rich ecosystem of blogs, and topics on many aspects of the language are easy to come by. D’s official documentation is not as good and can be challenging and frustrating, however there is a very good free book “Programming in D” which is a great introduction to the language, but no single book can cover a programming language and there are not many sources for advanced topics. Chapel’s documentation is quite good for getting things done, though examples vary in presence and quality. Often, the programmer needs a lot of knowledge to look in the right place. A good topic for comparison is file I/O libraries in Chapel, D, and Julia. Chapel’s I/O library has too few examples but is relatively clear and straightforward; D’s I/O is kind of spread across a few modules, and documentation is more difficult to follow; Julia’s I/O documentation has lots of examples and is clear and easy to follow.

Perhaps one factor affecting Chapel’s adoption is lack of example—since its arrays have a non-standard interface, the user has to work hard to become familiar with them. Whereas even though D’s documentation may not be as good in places, the language has many similarities to C and C++, so it gets away with more sparse documentation.

### Multi-dimensional Array support

“Arrays” here does not refer to native C and C++ style arrays available in D, but mathematical arrays. Julia and Chapel ship with array support and D does not, but it has the Mir library which has multidimensional arrays (ndslice). In the implementation of kernel matrix, I wrote my own matrix object in D, which is not difficult if you understand the principle, but it’s not something a user wants to do. However, D has a linear algebra library called Lubeck which has impressive performance characteristics and interfaces with all the usual BLAS implementations. Julia’s arrays are by far the easiest and most familiar. Chapel’s arrays are more difficult to get started with than Julia’s but are designed to be run on single-core, multi-core, and computer clusters using the same or very similar code, which is a good unique selling point.

### Language power

Since Julia is a dynamic programming language, some might say, “well Julia is a dynamic language which is far more permissive than static programming languages, therefore the debate is over”, but it’s more complicated than that. There is power in static type systems. Julia has a type system similar in nature to type systems from static languages, so you can write code as if you were using a static language, but you can do things reserved only for dynamic languages. It has a highly developed generic and meta-programming syntax and powerful macros. It also has a highly flexible object system and multiple dispatch. This mix of features is what makes Julia the most powerful language of the three.

D was intended to be a replacement for C++ and takes very much after C++ (and also borrows from Java), but makes template programming and compile-time evaluation much more user-friendly than in C++. It is a single dispatch language (though multi-methods are available in a package). Instead of macros, D has string and template “mixins” which serve a similar purpose.

Chapel has generic programming support and nascent support for single dispatch OOP, no macro support, and is not yet as mature as D or Julia in these terms.

### Concurrency & Parallel Programming

Nowadays, new languages tout support for concurrency and its popular subset, parallelism, but the details vary a lot between languages. Parallelism is more relevant in this example and all three languages deliver. Writing parallel for loops is straightforward in all three languages.

Chapel’s concurrency model has much more emphasis on data parallelism but has tools for task parallelism and ships with support for cluster-based concurrency.

Julia has good support for both concurrency and parallelism.

D has industry strength support for parallelism and concurrency, though its support for threading is much less well documented with examples.

### Standard Library

How good is the standard library of all three languages in general? What range of tasks do they allow users to easily tend to? It’s a tough question because library quality and documentation factor in. All three languages have very good standard libraries. D has the most comprehensive standard library, but Julia is a great second, then Chapel, but things are never that simple. For example, a user seeking to write binary I/O may find Julia the easiest to start with; it has the most straightforward, clear interface and documentation, followed by Chapel, and then D. Though in my implementation of an IDX file reader, D’s I/O was the fastest, but then Julia code was easy to write for cases unavailable in the other two languages.

### Package Managers & Package Ecosystems

In terms of documentation, usage, and features, D’s Dub package manager is the most comprehensive. D also has a rich package ecosystem in the Dub website, Julia’s package manager runs tightly integrated with GitHub and is a good package system with good documentation. Chapel has a package manager but does not have a highly developed package ecosystem.

### C Integration

C interop is easy in all three languages; Chapel has good documentation but is not as well popularised as the others. D’s documentation is better and Julia’s documentation is the most comprehensive. Oddly enough though, none of the languages’ documentation show the commands required to compile your own C code and integrate it with the language, which is an oversight especially when it comes to novices. It is, however, easy to search for and find examples for the compilation process in D and Julia.

### Community

All three languages have convenient places where users can ask questions. For Chapel, the easiest place is Gitter, for Julia it’s Discourse (though there is a Julia Gitter), and for D it’s the official website forum. The Julia community is the most active, followed by D, and then Chapel. I’ve found that you’ll get good responses from all three communities, but you’ll probably get quicker answers from the D and Julia communities.

Chapel D Julia
Compilation/Interactivty Slow Fast Best
Documentation & Examples Detailed Patchy Best
Multi-dimensional Arrays Yes Native Only
(library support)
Yes
Language Power Good Great Best
Concurrency & Parallelism Great Great Good
Standard Library Good Great Great
Package Manager & Ecosystem Nascent Best Great
C Integration Great Great Great
Community Small Vibrant Largest

Table for quality of life features in Chapel, D & Julia

## Summary

If you are a novice programmer writing numerical algorithms and doing calculations based in scientific computing and want a fast language that’s easy to use, Julia is your best bet. If you are an experienced programmer working in the same space, Julia is still a great option. If you specifically want a more conventional, “industrial strength”, statically compiled, high-performance language with all the “bells and whistles”, but want something more productive, safer, and less painful than C++, then D is your best bet. You can write “anything” in D and get great performance from its compilers. If you need to get array calculations happening on clusters, then Chapel is probably the easiest place to go.

In terms of raw performance on this task, D was the winner, clearly performing better in 5 out of the 9 kernels benchmarked. This exercise reveals that Julia’s label as a high-performance language is more than just hype—it has held it’s own against highly competitive languages. It was harder than expected to get competitive performance from Chapel—it took a lot of investigation from the Chapel team to come up with the current solution. However, as the Chapel language matures we could see further improvement.

# DustMite: The General-Purpose Data Reduction Tool

If you’ve been around for a while, or are a particularly adventurous developer who enjoys mixing language features in interesting ways, you may have run into one compiler bug or two:

Implementation bugs are inevitably a part of using cutting-edge programming languages. Should you run into one, the steps to proceed are generally as follows:

1. Reduce the failing program to a minimal, self-contained example.
2. Add a description of what happens and what you expect to happen.
3. Post it on the bug tracker.

Nine years ago, an observation was made that when filing and fixing compiler bugs, a disproportionate amount of time was spent on the first step. When your program stops compiling “out of the blue”, or when the bug stops reproducing after the code is taken out of its context, manually paring down a large codebase by repeatedly cutting out code and checking if the bug still occurs becomes a tedious and repetitive task.

Fortunately, tedious and repetitive tasks are what computers are good for; they just have to be tricked into doing them, usually by writing a program. Enter DustMite.

The first version.

The basic operation is simple. The tool takes as inputs:

• a data set to reduce (such as, a directory containing D source code which exhibits a particular compiler bug)
• an oracle (or, more mundanely, a test script), which itself:
• takes as input a variation of the data set, and
• produces a yes-or-no answer on whether the input still satisfies the sought property (such as reproducing the particular compiler bug).

DustMite’s output is some local minimum variation of the data set, which it reaches by consecutively trying to remove parts of the data set and saving the results which the oracle approves. In the compiler bug example, this means removing bits of code which are not required to reproduce the problem at hand.

DustMite wouldn’t be very efficient if it attempted to remove things line-by-line or character-by-character. In order to maximize the chance of finding good reductions, the input is parsed into a tree according to the syntax of the input files.

Each tree node consists of a “head” (string), children (list of node pointers), and “tail” (string). Technically, it is redundant to have both “head” and “tail”, but they make representing some constructs and performing reductions much simpler, such as paren/bracket pairs.

Nodes are arranged into a binary tree as an optimization.

Additionally, nodes may have a list of dependencies. The dependency relationship essentially means “if this node is removed, these nodes should be removed too”. These constraints are not representable using just the tree structure described above, and are used to allow reducing things such as lists where trailing item delimiters are not allowed, or removing a function parameter and corresponding arguments from the entire code base at once.

In the case of D source code, declarations, statements, and subexpressions get their own tree nodes, so that they can be removed in one go if unneeded. The parser DustMite uses for D source code is intentionally very simple because it needs to handle potentially invalid D code, and you don’t want your bug reduction tool to also crash on top of the compiler.

How DustMite sees a simple D program.

An algorithm decides the order in which nodes are queued for potential deletion; DustMite implements several (it calls them “strategies”). Fundamentally, a strategy’s interface is (statei, resulti) ⇒ (statei+1, reductioni+1), i.e., which reduction is chosen next depends on the previous reduction and its result. The default “inbreadth” strategy visits nodes in ascending depth order (row by row) and starts over from the top as long as it finds new reductions.

DustMite today supports quite a few more options:

The current version.

Probably, the most interesting of these is the -j switch—one reason being that DustMite’s task is inherently not parallelizable. Which reduction is chosen next, and the tree version to which that reduction is applied, depends on the previous reduction’s result.

DustMite works around this by putting unused CPU cores to work on lookahead: using a highly sophisticated predictor, it guesses what the result of the current reduction will be, and based on that assumption, calculates the next reduction. If the guess was right, great! We get to use that result. Otherwise, the work is wasted. Implementing this meant that strategies now needed to have copyable state, and so had to be refactored from an imperative style to a state machine.

Unfortunately, although the highly expected feature was implemented four years ago, the initial implementation was rather underwhelming. DustMite still did too much work in the main thread and wasted too much CPU time on rescanning the data set tree on every reduction. The problem was so bad that, at high core counts, lookahead mode was even slower than single-threaded mode.

I have recently set out to resolve these inadequacies. The following obstacles stood in the way:

Problem 1: Hashing was too slow. Because the oracle’s services (i.e., running the test script) are usually expensive, DustMite keeps track of a cache of previously attempted reductions and their outcome. This helps because not all tree transformations result in a change of output, and some strategies will retry reductions in successive iterations. A hash of the tree is used as the cache key; however, calculating it requires walking the entire tree every time, which is slow for large inputs.

Would it be possible to make the hash calculation incremental? One approach would be Merkle trees (each node’s hash is the hash of its children’s hashes), however that is suboptimal in the case of e.g., empty leaf nodes. CS erudite Ivan Kazmenko blesses us with an answer: polynomial hashes! By representing strings as polynomials, it is possible to use modulo arithmetic to calculate an incremental fixed-size hash and cache subtree hashes per node.

Each node holds its cumulative hash and length.

The number theory staggered me at first, so I recruited the assistance of feep from #d. After we went through a few draft implementations, I could begin working on the final version. The first improvement was replacing the naive exponentiation algorithm with exponentiation by squaring (D CTFE allowed precomputing a table at compile-time and a faster calculation than the classical method). Next, there was the matter of the modulo.

Initially, we used integer overflow for modulo arithmetic (i.e. q=264), however Ivan cautioned against using powers of two as the modulo, as this makes the algorithm susceptible to Thue-Morse strings. Not long ago I was experimenting with using long multiplication/division CPU instructions (where multiplying one machine word by another yields the result in two machine words with a high and low part, and vice-versa for division). D allows generating assembler code specific to the types that the function template is instantiated with, though in DustMite we only use the unsigned 64-bit variant (on x86 we fall back to using integer overflow).

With the hashing algorithm implemented, all that remained was to mark dirty nodes (they or their children had their content edited) and incrementally recalculate their hashes as needed. Dependencies posed a small obstacle: at the time, they were implemented as simply an array of pointers to the dependency node within the tree. As such, we didn’t know how to get to their parents (to mark them dirty as well), however this was easily overcome by adding a “parent” pointer to each node.

Well, or so I thought, until I got to work on the next problem.

Problem 2: Copying the tree. At the time, the current version of the tree representing the data set was part of the global state. Because of this, applying a reduction was implemented twice:

This was clumsy, but faster and less complicated than making a copy of the entire tree just to change one part of it to test a reduction. However, doing so was a requirement for proper lookahead, otherwise we would be unable to test reductions based on results where past tests predicted a positive outcome, or do nearly anything in a separate thread.

One issue was the tree “internal pointers”—making a copy would require updating all pointers within the tree to point to the new copies in the new tree. This was easy for children/parent pointers (since we can reliably visit every such pointer exactly once), but not quite for dependencies: because they were also implemented as simple pointers to nodes, we would have to keep track of a map of which node was copied where in order to update the dependency pointers.

One way to solve this would be to change the representation of node references from pointers to indices into a node array; this way, copying the tree would be as simple as a .dup. However, large inputs meant many nodes, and I wanted to see if it was possible to avoid iterating over every node in the tree (i.e. O(n)) for every reduction.

Was it possible? It would mean that we would copy only the modified nodes and their parents, leaving the rest of the tree in-place, and only reusing it as the copies’ children. This goal conflicted with the existence of “parent” pointers, because a parent would have to point towards either the old or new root, so to resolve this ambiguity every node would have to be copied. As a result, the way we handled dependencies needed to be rethought.

Editing trees with “copy on write” involves copying just the edited nodes (🔴), and their parents.

With internal pointers out, the next best thing to array indices for referencing a node was a series of instructions for how to reach the node from the tree root: an address. The representation of these addresses that I chose was a bit string represented as a linked list, where each list node holds the child index at that depth, starting from the deep end. Such a representation can be arranged in a tree where the root-side ends are shared, mimicking the structure of the tree containing the nodes for the reduced data, and thus allowing us to reuse memory and minimize allocations.

Nodes cannot hold their own address (as that would make them unmovable),
which is why they need to be stored outside of the main tree.

For addresses to work, the object they point at needs to remain the same, which means that we can no longer simply remove children from tree nodes—an address going through the second child would become invalid if the first child was removed. Rewriting all affected addresses for every tree edit is, of course, impractical, which leads us to the introduction of tombstones—dead nodes that only serve to preserve the index of the children that follow it. Because one of the possible reduction types involves moving subtrees around the tree, we now also have “redirects” (which are just tombstones with a “see here” address attached).

With the above changes in place, we can finally move forward with fixing and optimizing lookahead, as well as implementing incremental rehashing in a way that’s compatible with the above! The mutable global “current” tree variable is gone, save now simply takes a tree root as an argument, and applyReduction is now:

/// Apply a reduction to this tree, and return the resulting tree.
/// The original tree remains unchanged.
/// Copies only modified parts of the tree, and whatever references them.
Entity applyReduction(Entity origRoot, ref Reduction r)

With the biggest hurdle behind us, and a few more rounds of applying Walter Bright’s secret weapon, the performance metrics started to look more like what they should:

Going deeper would likely involve using OS-specific I/O APIs or rewriting D’s GC.

A mere 3.5x speed-up from a 32-fold increase in computational power may seem underwhelming. Here are some reasons for this:

• With a 50/50 predictor, predictions form a complete binary tree, so doubling the number of parallel jobs gives you +1x more speed. That’s roughly log₂(jobs)-1, or 4 for 32 jobs – not far off!

• The results will depend on the reduction being performed, so YMMV. For a certain artificial test case, one optimization (not pictured above) yielded a 500x speed-up!

• DustMite does not try to keep all CPU cores busy all the time. If a prediction turns out false, all lookahead jobs based on it become wasted work, so DustMite only starts new lookahead tasks when a reduction’s outcome is resolved. Perhaps ideally DustMite would keep starting new jobs but kill them as soon as it discovers they’re based on a misprediction. As there is no cross-platform process group management in Phobos, the D standard library, this is something I left for future versions.

• Some work is still done in the main thread, because moving it to a worker thread actually makes things slower due to the global GC lock.

There still remains one last place where DustMite iterates over every tree node per reduction: saving the tree to disk (so that it could be read by the test script). This seems unavoidable at first, but could actually be avoided by caching each node’s full text contents within the node itself.

I opted to leave this one out. With the other related improvements, such as using lockingBinaryWriter and aggregating writes of contiguous strings as one I/O operation, the increase in memory usage was much more dramatic than the decrease in execution time, even when optimized to just one allocation per reduction (polynomial hashing gives us every node’s total length for free). But, for a brief instant, DustMite processed reductions in sub-O(n) time.

One more addition is worth mentioning: Andrej Mitrovic suggested a switch which would replace removed text with whitespace, which would allow searching for exact line numbers in the test script. At the time, its addition posed significant challenges, as there needed to be some way to keep removed nodes in the tree but exclude them from future removal attempts. With the new tree representation, this became much easier, and also allowed creating the following animation:

In conclusion, I’d like to bring up that DustMite is good at more than just reducing compiler test cases. The wiki lists some ideas:

• Finding the source of ambiguous or misleading compiler error messages (e.g., errors with the file/line information pointing only inside the standard library).

• Alternative (much slower, but also much more thorough) method of verifying unit test code coverage. Just because a line of code is executed, that doesn’t mean it’s necessary; DustMite can be made to remove all code that does not affect the execution of your unit tests.

• Similarly, if you have complete test coverage, it can be used for reducing the source tree to a minimal tree which includes support for only enabled unittests. This can be used to create a version of a program or library with a test-defined subset of features.

• The --obfuscate mode can obfuscate your code’s identifiers. It can be used for preparing submission of proprietary code to bug trackers.

• The --fuzz mode (a new feature) can help find bugs in compilers and tools by creating random programs (using fragments of other programs as input).

But DustMite is not limited to D programs (or any kind of programs) as input. With the --split option, we can tell DustMite how to parse and reduce other kinds of files. DustMite successfully handled the following scenarios:

• reducing C++ programs (the D parser supports some C++-only syntax too);

• reducing Python programs (using the indent split mode);

• reducing a large commit to a minimal diff (using the diff split mode);

• reducing a commit list, when git bisect is insufficient due to the problem being introduced across more than any single commit;

• reducing a large data set to a minimal one, resulting in the same code coverage, with the purpose of creating a test suite;

• and many more which I do not remember.

Today, some version of DustMite is readily available in major distributions (usually as part of some D-related package), so I’m happy having a favorite tool one apt-get / pacman -S away when I’m not at my PC.

Discovering a problem which can be elegantly reduced away by DustMite is always exciting for me, and I’m hoping you will find it useful too.