Author Archives: Steven Schveighoffer

Teaching D from Scratch: Is it a viable first language?

About two-and-a-half years ago, one of my son’s homeschooling friends asked if I would teach a coding class. Both of my kids have shown interest in coding via Scratch, and I also need to pull my weight when it comes to homeschooling them (my wife doing most of the instruction). So I thought, surely, this can’t be too hard. The challenges I would encounter over the course of the last few years surprised me more often than I would have expected!

First, a bit about my background. I’ve been writing software professionally for over 20 years, about half using system languages and half doing web development. I’ve dabbled in a lot of programming projects, including embedded microcontrollers with 256 bytes of RAM, collaborative video systems using high-end Unix workstations, and automating a factory testing rack-mounted appliances. What I haven’t ever done to any official degree is teach programming. For sure, this is a challenge that is novel to me, and also one that I have found very satisfying.

A bit about the kids. I had ten students, aged 9 – 15, all of whom had very little experience writing real software. Since this class was going to be a side project for me on top of my normal job, we went rather slowly, meeting once every two weeks.

One of the goals of my class was to try not to dumb down coding into an abstraction. I wanted the kids to experience real programming without too much of a pre-built hand-holding framework, and I believed they could do it. I’ve seen many (and purchased some) online resources for teaching to code that hide most of the real work and just focus on using custom libraries or simplified languages to learn the “basics”, while not actually teaching to be productive in a standard environment. I also witnessed firsthand my own kids getting bored when the material was presented too slowly or too abstractly.

Note: you can see some of my homework assignments and review pages on a dedicated website that I created for this class

First Try: Javascript and Lua

The first language we worked with, because many of the kids didn’t have actual full-blown computers to work on (this was pre-Covid so it was at my house), was Javascript. Javascript is possible to develop in a browser, using a tablet or a Chromebook, so all the kids I was teaching could participate. My focus in these early days was basic concepts like loops, branching, and basic statements. I think Javascript can be a great first language, though the things you can do with it as a beginner are a bit bland, and to do anything exciting you have to start learning the DOM (including how objects and methods work).

After half a year we moved on to our second language: Lua, specifically in the video game system Roblox. I almost wish we had skipped this one because it was a bit too advanced for these students, and I was not familiar with the language and the system to begin with. The kids did have fun with the model-building UI. However, I can see an experienced teacher finding good ways to instruct using this environment. Roblox, with its pre-built rendering and physics engines, has the nice benefit of allowing one to generate a really cool game with relatively little code—something new coders quite enjoy.

Just use D

And that leads us to the second year, where I decided to just try teaching what I know best: the D programming language. Since all the kids have a keen interest in building games, I searched for a simple library I could use to build games. I have to give a shout-out to the library I discovered, Raylib, and the YouTuber Ki Rill, whose very straightforward and well-made videos gave me the confidence that this also would be something I could teach.

For a quick demo of what Raylib looks like, here is a program I wrote in a few minutes that bounces a Raylib-drawn D-man around the window.

Everyone in the class is using Windows, so the toolchain that I am having them use is the stock DMD compiler and the Visual Studio Code development environment with the code-d extension. Throughout the last year-and-a-half, I have had the kids build such classics as hangman and tic-tac-toe using text-based “graphics”. After that came the introduction to 2D graphics and Raylib, where we built a brick-breaking game similar to Arkanoid and others.

Bricker

Finally, I let the students pick what they wanted for the last project, and their pick was a rock-paper-scissors online tournament game. I’m not sure of the lasting power of that one…

Keep in mind that my goal was to teach them programming, not necessarily D. Working through all these projects allowed me to introduce a lot of the typical programming concepts (branching, looping, functions, data types, arrays, etc.). This year we are focusing on aggregates (specifically arrays and structs), and how they can be useful to encapsulate your coding solution into pieces that are easier to deal with. The game we are writing this year is a snake clone. I’m hoping to wrap things up on that by the end of the year, and then next year add in networking to have the game playable between the students (all the kids these days are into online tournament-style games).

Snake

My Take on Teaching With D

So how is D as a beginner’s programming language? I can say with confidence that the kids understand the flavor of D that I have taught just as well as the other languages. What flavor is that you may ask? Why vanilla of course! As D was the first typed language they were exposed to, I had to first explain types and why they are important. I also had to pick and choose which concepts in D I did not want to confuse them with. To that end, I’ve left out a lot of the advanced features (so far), such as:

  • Templates
  • Type inference
  • Operator overloading
  • Overloading in general!
  • __traits
  • Classes/Interfaces/OOP
  • Pointers/references (mostly)
  • Memory management in general (thank you GC!)

This means that a lot of language features are missing from “Vanilla D”, and a lot of those are pieces I like. I had to force myself to avoid those things while teaching so as not to confuse them. I also try to avoid shortcuts if I can, as I want them to master the basic concepts before learning how to do it quicker or less verbosely. I tried hard to always use curly braces for scope blocks, for instance, instead of just writing single statements without them.

The features that I found the kids handled well were basic types like ints and strings (though what a funny name “string” is, that took a while to sink in), if statements and foreach/while loops, structs, and functions. They had a harder time understanding arrays and associative arrays, which might be due to my lack of experience teaching. I’ll also note that teaching virtually has some significant challenges—there’s just no substitute to being able to walk over to a student’s computer and observe and help their progress or draw out on a whiteboard how data is laid out. One surprising thing (to me anyway) that they handled completely in stride, was nested functions. That’s a nice D feature that is not available in C, barely in C++, and something that I personally took a while to get used to.

What Could Go Wrong?

The challenges were numerous. First I had to try and remember what it’s like to know nothing about programming. I did not take any classes on teaching programming and did not look up the “proper” way to teach it, I just went with what I thought would be the next best thing to do based on their skills and experience. Having class only once every two weeks is a big drawback since the students’ brains tend to garbage-collect some of the stuff we worked on last time. For sure, a daily or maybe twice-weekly class schedule would improve the situation, but that would mean I’d have to generate material to teach that many classes, which I unfortunately don’t have the time for.

Secondly, I previously had no experience with games programming. Raylib made this pretty easy, and as long as you can understand event loops, it’s pretty straightforward (and fun—I wish I had learned it sooner).

Aside from my personal and environmental shortcomings, I have some suggestions for the D language and community that would make D a much better “first language”. In no particular order:

Error Messages

Error messages in D are sometimes esoteric, and I’m not just talking about templates. The most common error the students make is bad punctuation. This leads to some of the worst messages. Most of the time, I need to explain what is happening, but even when explaining it myself, I am sometimes thwarted by the compiler. For instance, if you forget a semicolon:

void main()
{
    import std.stdio;
    int x = 5
    writeln(x); // line 5
}

The resulting error is:

foo.d(5): Error: semicolon expected, not `writeln`

But wait, there is a semicolon on line 5! So why is it having a problem? Technically, putting an extra semicolon just before the writeln call on line 5 would solve the error (after all, D doesn’t have significant whitespace). But really, the compiler shouldn’t be suggesting that.

I’d rather see something like:

foo.d(5): Error: previous statement not terminated, perhaps you need 
a semicolon on line 4?

What if you are missing a brace or have an extra one? That spits out a lot of error vomit that is sometimes really difficult to decipher. I know this isn’t as easy a problem to solve, but since the compiler is not even to the tricky semantic parts, what if it used some other hints of the source to try and tease out a suggested solution? Like maybe taking cues from the indentation.

A great feature of the D compiler is the spelling correction suggestions. But sometimes it’s not easy to see what has changed in the suggested spelling correction (especially when it’s just different capitalization). Why not highlight the changed letters? Since D has gained colorized output for errors, things have become much clearer in my opinion.

I do have to relay a comment from one of my students, who said he appreciated the ability of VS Code to jump to the line that the compiler error is referencing. This helps put into perspective where their mindset is.

Debugger on VSCode for Windows

Another challenge I’ve faced is the lack of a good debugging experience for VS Code on Windows. The C debugger for VS Code displays, for example, a string as a length and a C string (which might have trailing garbage). I’d rather not confuse them with this. I have read that Visual D has a better debugging experience, but I need dub support for these projects, and Visual D focuses on Visual Studio integration, something I don’t necessarily want to deal with in teaching these kids. I’m hoping that a recent focus on debugging (e.g., the LLDB integration project) will help.

I think WebFreak has done a great job on the VS Code plugin, and for the most part, it works amazingly well. Thank you! I know debugging is a piece that is hard to get right.

Linking External Libraries

My next challenge is using dub for linking. Dub is great if you are only doing things with D and OS-supplied libraries. As soon as you need to depend on an external C library that is not part of your OS (such as Raylib), now you are not just adding dependencies but downloading pre-built libraries, or trying to build them yourself. I’m not sure if I have great suggestions on this, but I would like some way for dub to be told where to download these libs or have a central place to put them where the linker can find them outside of specifying the location in your dub.json file. Many libraries are released on GitHub, which makes it a prime candidate for downloading and installing external libraries (maybe even when ImportC becomes viable, it can automatically generate the bindings as well).

I spend an inordinate amount of time helping the kids set up their build environments, and having this built-in to the IDE or dub would be immensely helpful for using D for learning.

Suggestions for the Future

I am confident that D can be a viable first language. I have students that, prior to my class, had never before written real code now creating 2D games using D and Raylib (with a lot of hand-holding, darn it!). What I think D needs for this success to be scalable is a set of resources that assumes no programming experience. Most of D’s learning resources are aimed at teaching experienced programmers how D works in terms of their existing knowledge. They answer the question, “How do I do <insert feature> from <insert language> in D?” What we need is to answer the question, “How do I write code?”, that just happens to be using D.

Editor’s Note:Some readers may be thinking at this point of Ali Çehreli’s Programming in D. Though written for beginner-level programmers, it’s focused more on teaching the language than teaching to program generally.

This shouldn’t be too hard to create, as there are a ton of resources on learning to program in C already. D is so similar you could almost just replace C with D and call it a day. In fact, I’m considering creating a YouTube series on the topic with the experience I’ve gained. One important tool that needs creating is a map of D features and where they fall on the difficulty scale. This allows one to write D tutorials that follow a gradual introduction of features. Broaching the subject of complex features is probably needed early, as they will experience some of them (you are using a template whenever you use writeln, or to), but explanation of the entire feature may be deferred to a later level.

And finally, such materials should be engaging! I can’t stress enough how important it is to be doing something fun when you are learning to program. The kids definitely are more engaged since we are creating games that they might actually want to play vs. simple hello world examples or even Javascript web pages. A nice way to proceed might be to write a game in Vanilla D and then introduce features that allow you to refine and improve the code.

I hope this becomes a reality because I think this area of development has been relegated to either frameworks that attempt to water down programming to get kids engaged or sterile hello world-style programming that is intended for rote instruction or reference. I think we can show the full potential of D and, at the same time, engage eager learners with interesting and fun projects to trick them into learning code with our favorite language!

How to Write @trusted Code in D

Steven Schveighoffer is the creator and maintainer of the dcollections and iopipe libraries. He was the primary instigator of D’s inout feature and the architect of a major rewrite of the language’s built-in arrays. He also authored the oft-recommended introductory article on the latter.


d6In computer programming, there is a concept of memory-safe code, which is guaranteed at some level not to cause memory corruption issues. The ultimate holy grail of memory safety is to be able to mechanically verify you will not corrupt memory no matter what. This would provide immunity from attacks via buffer overflows and the like. The D language provides a definition of memory safety that allows quite a bit of useful code, but conservatively forbids things that are sketchy. In practice, the compiler is not omnipotent, and it lacks the context that we humans are so good at seeing (most of the time), so there is often the need to allow otherwise risky behavior. Because the compiler is very rigid on memory safety, we need the equivalent of a cast to say “yes, I know this is normally forbidden, but I’m guaranteeing that it is fine”. That tool is called @trusted.

Because it’s very difficult to explain why @trusted code might be incorrect without first discussing memory safety and D’s @safe mechanism, I’ll go over that first.

What is Memory Safe Code?

The easiest way to explain what is safe, is to examine what results in unsafe code. There are generally 3 main ways to create a safety violation in a statically-typed language:

  1. Write or read from a buffer outside the valid segment of memory that you have access to.
  2. Cast some value to a type that allows you to treat a piece of memory that is not a pointer as a pointer.
  3. Use a pointer that is dangling, or no longer valid.

The first item is quite simple to achieve in D:

auto buf = new int[1]; 
buf[2] = 1;

With default bounds checks on, this results in an exception at runtime, even in code that is not checked for safety. But D allows circumventing this by accessing the pointer of the array:

buf.ptr[2] = 1;

For an example of the second, all that is needed is a cast:

*cast(int*)(0xdeadbeef) = 5;

And the third is relatively simple as well:

auto buf = new int[1];
auto buf2 = buf;
delete buf;  // sets buf to null
buf2[0] = 5; // but not buf2.

Dangling pointers also frequently manifest by pointing at stack data that is no longer in use (or is being used for a different reason). It’s very simple to achieve:

int[] foo()
{
    int[4] buf;
    int[] result = buf[];
    return result;
}

So simply put, safe code avoids doing things that could potentially result in memory corruption. To that end, we must follow some rules that prohibit such behavior.

Note: dereferencing a null pointer in user-space is not considered a memory safety issue in D. Why not? Because this triggers a hardware exception, and generally does not leave the program in an undefined state with corrupted memory. It simply aborts the program. This may seem undesirable to the user or the programmer, but it’s perfectly fine in terms of preventing exploits. There are potential memory issues possible with null pointers, if one has a null pointer to a very large memory space. But for safe D, this requires an unusually large struct to even begin to worry about it. In the eyes of the D language, instrumenting all pointer dereferences to check for null is not worth the performance degradation for these rare situations.

D’s @safe rules

D provides the @safe attribute that tags a function to be mechanically checked by the compiler to follow rules that should prevent all possible memory safety problems. Of course, there are cases where developers need to make exceptions in order to get some meaningful work done.

The following rules are geared to prevent issues like the ones discussed above (listed in the spec here).

  1. Changing a raw pointer value is not allowed. If @safe D code has a pointer, it has access only to the value pointed at, no others. This includes indexing a pointer.
  2. Casting pointers to any type other than void* is not allowed. Casting from any non-pointer type to a pointer type is not allowed. All other casts are OK (e.g. casting from float to int) as long as they are valid. Casting a dynamic array to a void[] is also allowed.
  3. Unions that have pointer types that overlap other types cannot be accessed. This is similar to rules 1 and 2 above.
  4. Accessing an element in or taking a slice from a dynamic array must be either proven safe by the compiler, or incur a bounds check during runtime. This even happens in release mode, when bounds checks are normally omitted (note: dmd’s option -boundscheck=off will override this, so use with extreme caution).
  5. In normal D, you can create a dynamic array from a pointer by slicing the pointer. In @safe D, this is not allowed, since the compiler has no idea how much space you actually have available via that pointer.
  6. Taking a pointer to a local variable or function parameter (variables that are stored on the stack) or taking a pointer to a reference parameter are forbidden. An exception is slicing a local static array, including the function foo above. This is a known issue.
  7. Explicit casting between immutable and mutable types that are or contain references is not allowed. Casting value-types between immutable and mutable can be done implicitly and is perfectly fine.
  8. Explicit casting between thread-local and shared types that are or contain references is not allowed. Again, casting value-types is fine (and can be done implicitly).
  9. The inline assembler feature of D is not allowed in @safe code.
  10. Catching thrown objects that are not derived from class Exception is not allowed.
  11. In D, all variables are default initialized. However, this can be changed to uninitialized by using a void initializer:
    int *s = void;

    Such usage is not allowed in @safe D. The above pointer would point to random memory and create an obvious dangling pointer.

  12. __gshared variables are static variables that are not properly typed as shared, but are still in global space. Often these are used when interfacing with C code. Accessing such variables is not allowed in @safe D.
  13. Using the ptr property of a dynamic array is forbidden (a new rule that will be released in version 2.072 of the compiler).
  14. Writing to void[] data by means of slice-assigning from another void[] is not allowed (this rule is also new, and will be released in 2.072).
  15. Only @safe functions or those inferred to be @safe can be called.

The need for @trusted

The above rules work well to prevent memory corruption, but they prevent a lot of valid, and actually safe, code. For example, consider a function that wants to use the system call read, which is prototyped like this:

ssize_t read(int fd, void* ptr, size_t nBytes);

For those unfamiliar with this function, it reads data from the given file descriptor, and puts it into the buffer pointed at by ptr and expected to be nBytes bytes long. It returns the number of bytes actually read, or a negative value if an error occurs.

Using this function to read data into a stack-allocated buffer might look like this:

ubyte[128] buf;
auto nread = read(fd, buf.ptr, buf.length);

How is this done inside a @safe function? The main issue with using read in @safe code is that pointers can only pass a single value, in this case a single ubyte. read expects to store more bytes of the buffer. In D, we would normally pass the data to be read as a dynamic array. However, read is not D code, and uses a common C idiom of passing the buffer and length separately, so it cannot be marked @safe. Consider the following call from @safe code:

auto nread = read(fd, buf.ptr, 10_000);

This call is definitely not safe. What is safe in the above read example is only the one call, where the understanding of the read function and calling context assures memory outside the buffer will not be written.

To solve this situation, D provides the @trusted  attribute, which tells the compiler that the code inside the function is assumed to be @safe, but will not be mechanically checked. It’s on you, the developer, to make sure the code is actually @safe.

A function that solves the problem might look like this in D:

auto safeRead(int fd, ubyte[] buf) @trusted
{
    return read(fd, buf.ptr, buf.length);
}

Whenever marking an entire function @trusted, consider if code could call this function from any context that would compromise memory safety. If so, this function should not be marked @trusted under any circumstances. Even if the intention is to only call it in safe ways, the compiler will not prevent unsafe usage by others. safeRead should be fine to call from any @safe context, so it’s a great candidate to mark @trusted.

A more liberal API for the safeRead function might take a void[] array as the buffer. However, recall that in @safe code, one can cast any dynamic array to a void[] array — including an array of pointers. Reading file data into an array of pointers could result in an array of dangling pointers. This is why ubyte[] is used instead.

@trusted escapes

A @trusted escape is a single expression that allows @system (the unsafe default in D) calls such as read without exposing the potentially unsafe call to any other part of the program. Instead of writing the safeRead function, the same feat can be accomplished inline within a @safe function:

auto nread = ( () @trusted => read(fd, buf.ptr, buf.length) )();

Let’s take a closer look at this escape to see what is actually happening. D allows declaring a lambda function that evaluates and returns a single expression, with the () => expr syntax. In order to call the lambda function, parentheses are appended to the lambda. However, operator precedence will apply those parentheses to the expression and not the lambda, so the entire lambda must be wrapped in parentheses to clarify the call. And finally, the lambda can be tagged @trusted as shown, so the call is now usable from the @safe context that contains it.

In addition to simple lambdas, whole nested functions or multi-statement lambdas can be used. However, remember that adding a trusted nested function or saving a lambda to a variable exposes the rest of the function to potential safety concerns! Take care not to expose the escape too much because this risks having to manually verify code that should just be mechanically checked.

Rules of Thumb for @trusted

The previous examples show that tagging something as @trusted has huge implications. If you are disabling memory safety checks, but allowing any @safe code to call it, then you must be sure that it cannot result in memory corruption. These rules should give guidance on where to put @trusted marks and avoid getting into trouble:

Keep @trusted code small

@trusted code is never mechanically checked for safety, so every line must be reviewed for correctness. For this reason, it’s always advisable to keep the code that is @trusted as small as possible.

Apply @trusted to entire functions when the unsafe calls are leaky

Code that modifies or uses data that @safe code also uses creates the potential for unsafe calls to leak into the mechanically checked portion of a @safe function. This means that portion of the code must be manually reviewed for safety issues. It’s better to mark the whole thing @trusted, as that’s more in line with the truth. This is not a hard and fast rule; for example, the read call from the earlier example is perfectly safe, even though it will affect data that is used later by the function in @safe mode.

A pointer allocated with C’s malloc in the beginning of the function, and free‘d later, could have been copied somewhere in between. In this case, the dangling pointer may violate @safe, even in the mechanically checked part. Instead, try wrapping the entire portion that uses the pointer as @trusted, or even the entire function. Alternatively, use scope guards to guarantee the lifetime of the data until the end of the function.

Never use @trusted on template functions that accept arbitrary types

D is smart enough to infer @safe for template functions that follow the rules. This includes member functions of templated types. Just let the compiler do its job here. To ensure the function is actually @safe in the right contexts, create an @safe unittest  to call it. Marking the function @trusted allows any operator overloads or members that might violate memory safety to be ignored by the safety checker! Some tricky ones to remember are postblit and opCast.

It’s still OK to use @trusted escapes here, but be very careful. Consider especially possible types that contain pointers when thinking about how such a function could be abused. A common mistake is to mark a range function or range usage @trusted. Remember that most ranges are templates, and can be easily inferred as @system when the type being iterated has a @system postblit or constructor/destructor, or is generated from a user-provided lambda.

Use @safe to find the parts you need to mark as @trusted

Sometimes, a template intended to be @safe may not be inferred @safe, and it’s not clear why. In this case, try temporarily marking the template function @safe to see where the compiler complains. That’s where @trusted escapes should be inserted if appropriate.

In some cases, a template is used pervasively, and tagging it as @safe may make too many parts break. Make a copy of the template under a different name that you mark @safe, and change the calls that are to be checked so that they call the alternative template instead.

Consider how the function may be edited in the future

When writing a trusted function, always think about how it could be called with the given API, and ensure that it should be @safe. A good example from above is making sure safeRead cannot accept an array of pointers. However, another possibility for unsafe code to creep in is when someone edits a part of the function later, invalidating the previous verification, and the whole function needs to be rechecked. Insert comments to explain the danger of changing something that would then violate safety. Remember, pull request diffs don’t always show the entire context, including that a long function being edited is @trusted!

Use types to encapsulate @trusted operations with defined lifetimes

Sometimes, a resource is only dangerous to create and/or destroy, but not to use during its lifetime. The dangerous operations can be encapsulated into a type’s constructor and destructor, marked @trusted, which allows @safe code to use the resource in between those calls. This takes a lot of planning and care. At no time can you allow @safe code to ferret out the actual resource so that it can keep a copy past the lifetime of the managing struct! It is essential to make sure the resource is alive as long as @safe code has a reference to it.

For example, a reference-counted type can be perfectly safe, as long as a raw pointer to the payload data is never available. D’s std.typecons.RefCounted cannot be marked @safe, since it uses alias this to devolve to the protected allocated struct in order to function, and any calls into this struct are unaware of the reference counting. One copy of that payload pointer, and then when the struct is free‘d, a dangling pointer is present.

This can’t be @safe!

Sometimes, the compiler allows a function to be @safe, or is inferred @safe, and it’s obvious that shouldn’t be allowed. This is caused by one of two things: either a function that is called by the @safe function (or some deeper function) is marked @trusted but allows unsafe calls, or there is a bug or hole in the @safe system. Most of the time, it is the former. @trusted is a very tricky attribute to get correct, as is shown by most of this post. Frequently, developers will mark a function @trusted only thinking of some uses of their function, not realizing the dangers it allows. Even core D developers make this mistake! There can be template functions that are inferred safe because of this, and sometimes it’s difficult to even find the culprit. Even after the root cause is discovered, it’s often difficult to remove the @trusted tag as it will break many users of the function. However, it’s better to break code that is expecting a promise of memory safety than subject it to possible memory exploits. The sooner you can deprecate and remove the tag, the better. Then insert trusted escapes for cases that can be proven safe.

If it does happen to be a hole in the system, please report the issue, or ask questions on the D forums. The D community is generally happy to help, and memory safety is a particular focus for Walter Bright, the creator of the language.