automem: Hands-Free RAII for D

Posted on

Átila Neves has used both C++ and D professionally. He’s responsible for several D libraries and tools, like unit-threaded, cerealed, and reggae.


Garbage collected languages tend to suffer from a framing problem, and D is no exception. Its inclusion of a mark-and-sweep garbage collector makes safe memory management easy and convenient, but, thanks to a widespread perception that GC in general is a performance killer, alienates a lot of potential users due to its mere existence.

Coming to D as I did from C++, the one thing I didn’t like about the language initially was the GC. I’ve since come to realize that my fears were mostly unfounded, but the fact remains that, for many people, the GC is reason enough to avoid the language. Whether or not that is reasonable given their use cases is debatable (and something over which reasonable people may disagree), but the existence of the perception is not.

A lot of work has been done over the years to make sure that D code can be written which doesn’t depend on the GC. The @nogc annotation is especially important here, and I don’t think it’s been publicized enough. A @nogc main function is a compile-time guarantee that the program will not ever allocate any GC memory. For the types of applications that need those sorts of guarantees, this is invaluable.

But if not allocating from the GC heap, where does one get the memory? Still in the experimental package of the standard library, std.experimental.allocator provides building blocks for composing allocators that should satisfy any and all memory allocation needs where the GC is deemed inappropriate. Better still, via the IAllocator interface, one can even switch between GC and custom allocation strategies as needed at runtime.

I’ve recently used std.experimental.allocator in order to achieve @nogc guarantees and, while it works, there’s one area in which the experience wasn’t as smooth as when using C++ or Rust: disposing of memory. Like C++ and Rust, D has RAII. As is usual in all three, it’s considered bad form to explicitly release resources. And yet, in the current state of affairs, while using the D standard library one has to manually dispose of memory if using std.experimental.allocator. D makes it easier than most languages that support exceptions, due to scope(exit), but in a language with RAII that’s just boilerplate. And as the good lazy programmer that I am, I abhor writing code that doesn’t need to be, and shouldn’t be, written. An itch developed.

The inspiration for the solution I came up with was C++; ever since C++11 I’ve been delighted with using std::unique_ptr and std::shared_ptr and basically never again worrying about manually managing memory. D’s standard library has Unique and RefCounted in std.typecons but they predate std.experimental.allocator and so “bake in” the allocation strategy. Can we have our allocation cake and eat it too?

Enter automem, a library I wrote providing C++-style smart pointers that integrate with std.experimental.allocator. It was clear to me that the design had to be different from the smart pointers it took inspiration from. In C++, it’s assumed that memory is allocated with new and freed with delete (although it’s possible to override both). With custom allocators and no real obvious default choice, I made it so that the smart pointers would allocate memory themselves. This makes it so one can’t allocate with one allocator and deallocate with a different one, which is another benefit.

Another goal was to preserve the possibility of Unique, like std::unique_ptr, to be a zero-cost abstraction. In that sense the allocator type must be specified (it defaults to IAllocator); if it’s a value type with no state, then it takes up no space. In fact, if it’s a singleton (determined at compile-time by probing where Allocator.instance exists), then it doesn’t even need to be passed in to the constructor! As in much modern D code, Design by Instropection pays its dues here. Example code:

struct Point {
    int x;
    int y;
}

{
    // must pass arguments to initialise the contained object
    // but not an allocator instance since Mallocator is
    // a singleton (Mallocator.instance) returns the only
    // instantiation
    
    auto u1 = Unique!(Point, Mallocator)(2, 3);
    assert(*u1 == Point(2, 3));
    assert(u1.y == 3); // forwards to the contained object

    // auto u2 = u1; // won't compile, can only move
    typeof(u1) u2;
    move(u1, u2);
    assert(cast(bool)u1 == false); // u1 is now empty
}
// memory freed for the Point structure created in the block

RefCounted is automem’s equivalent of C++’s std::shared_ptr. Unlike std::shared_ptr however, it doesn’t always do an atomic reference count increment/decrement. The reason is that it leverage’s D’s type system to determine when it has to; if the payload is shared, then the reference count is changed atomically. If not, it can’t be sent to other threads anyway and the performance penalty doesn’t have to be paid. C++ always does an atomic increment/decrement. Rust gets around this with two types, Arc and Rc. In D the type system disambiguates. Another win for Design by Introspection, something that really is only possible in D. Example code:

{
    auto s1 = RefCounted!(Point, Mallocator)(4, 5);
    assert(*s1 == Point(4, 5));
    assert(s1.x == 4);
    {
        auto s2 = s1; // can be copied, non-atomic reference count
    } // ref count goes to 1 here

} // ref count goes to 0 here, memory released

Given that the allocator type is usually specified, it means that when using a @nogc allocator (most of them), the code using automem can itself be made @nogc, with RAII taking care of any memory management duties. That means compile-time guarantees of no GC allocation for the applications that need them.

I hope automem and std.experimental.allocator manage to solve D’s GC framing problem. Now it should be possible to write @nogc code with no manual memory disposal in D, just as it is in C++ and Rust.

Don’t Fear the Reaper

Posted on

D, like many other programming languages in active use today, comes with a garbage collector out of the box. There are many types of software that can be written without worrying at all about the GC, taking full advantage of its benefits. But the GC does have drawbacks, and there are certainly scenarios in which garbage collection is undesirable. For those situations, the language provides the means to temporarily disable it, or even avoid it completely.

In order to maximize the positive impacts of garbage collection and minimize any negative, it’s necessary to have a good grounding in how the GC works in D. A good place to start is the Garbage Collection page at dlang.org, which outlines the rationale for D’s GC and provides some tips on working with it. This post is the first in a series intended to expand on the information provided on that page.

This time, we’ll look at the very basics, focusing on the language features that can trigger GC allocations. Future posts will introduce ways to disable the GC when necessary, as well as idioms useful in dealing with its nondeterministic nature (e.g. managing resources in the destructors of GC-managed objects).

The first thing to understand about D’s garbage collector is that it only runs during allocation, and only if there is no memory available to allocate. It isn’t sitting in the background, periodically scanning the heap and collecting garbage. This knowledge is fundamental to writing code that makes efficient use of GC-managed memory. Take the following example:

void main() {
    int[] ints;
    foreach(i; 0..100) {
        ints ~= i;
    }
}

This declares a dynamic array of int, then uses D’s append operator to append the numbers 0 to 99 in a foreach range loop. What isn’t obvious to the untrained eye is that the append operator makes use of the GC heap to allocate space for the values it adds to the array.

DRuntime’s array implementation isn’t a dumb one. In the example, there aren’t going to be one hundred allocations, one for each value. When more memory is needed, the implementation will allocate more space than is requested. In this particular case, it’s possible to determine how many allocations are actually made by making use of the capacity property of D’s dynamic arrays and slices. This returns the total number of elements the array can hold before an allocation is necessary.

void main() {
    import std.stdio : writefln;
    int[] ints;
    size_t before, after;
    foreach(i; 0..100) {
        before = ints.capacity;
        ints ~= i;
        after = ints.capacity;
        if(before != after) {
            writefln("Before: %s After: %s",
                before, after);
        }
    }
}

Executing this when compiled with DMD 2.073.2 shows the message is printed six times, meaning there were six total GC allocations in the loop. That means there were six opportunities for the GC to collect garbage. In this small example, it almost certainly didn’t. If this loop were part of a larger program, with GC allocations throughout, it very well might.

On a side note, it’s informative to examine the values of of before and after. Doing so shows a sequence of 0, 3, 7, 15, 31, 63, and 127. So by the end, ints contains 100 values and has space for appending 27 more before the next allocation, which, extrapolating from the values in the sequence, should be 255. That’s an implementation detail of DRuntime, though, and could be tweaked or changed in any release. For more details on how arrays and slices are managed by the GC, take a look at Steven Schveighoffer’s excellent article on the topic.

So, six allocations, six opportunities for the GC to initiate one of its pauses of unpredictable length even in that simple little loop. In the general case, that could become an issue depending on if the loop is in a hot part of code and how much total memory is allocated from the GC heap. But even then, it’s not necessarily a reason to disable the GC in that part of the code.

Even with languages that don’t come with a stock GC out of the box, like C and C++, most programmers learn at some point that it’s better for overall performance to allocate as much as possible up front and minimize allocations in the inner loops. It’s one of the many types of premature optimization that are not actually the root of all evil, something we tend to call best practice. Given that D’s GC only runs when memory is allocated, the same strategy can be applied as a simple way to mitigate any potential impact it could have on performance. Here’s one way to rewrite the example:

void main() {
    int[] ints = new int[](100);
    foreach(i; 0..100) {
        ints[i] = i;
    }
}

Now we’ve gone from six allocations down to one. The only opportunity the GC has to run is before the inner loop. This actually allocates space for at least 100 values and initializes them all to 0 before entering the loop. The array will have a length of 100 after new, but will almost certainly have additional capacity.

There’s an alternative to new for arrays: the reserve function:

void main() {
    int[] ints;
    ints.reserve(100);
    foreach(i; 0..100) {
        ints ~= i;
    }
}

This allocates memory for at least 100 values, but the array is still empty (its length property will return 0) when it returns, so nothing is default initialized. Given that the loop only appends 100 values, it’s guaranteed not to allocate.

In addition to new and reserve, it’s possible to call GC.malloc directly for explicit allocation.

import core.memory;
void* intsPtr = GC.malloc(int.sizeof * 100);
auto ints = (cast(int*)intsPtr)[0 .. 100];

Array literals will usually allocate.

auto ints = [0, 1, 2];

This is also true when an array literal enum is used.

enum intsLiteral = [0, 1, 2];
auto ints1 = intsLiteral;
auto ints2 = intsLiteral;

An enum exists only at compile time and has no memory address. Its name is a synonym for its value. Everywhere you use one, it’s like copying and pasting its value in place of its name. Both ints1 and ints2 trigger allocations exactly as if they were declared like so:

auto ints1 = [0, 1, 2];
auto ints2 = [0, 1, 2];

Array literals do not allocate when the target is a static array. Also, string literals (strings in D are arrays under the hood) are an exception to the rule.

int[3] noAlloc1 = [0, 1, 2];
auto noAlloc2 = "No Allocation!";

The concatenate operator will always allocate:

auto a1 = [0, 1, 2];
auto a2 = [3, 4, 5];
auto a3 = a1 ~ a2;

D’s associative arrays have their own allocation strategy, but you can expect them to allocate when items are added and potentially when removed. They also expose two properties, keys and values, which both allocate arrays and fill them with copies of the respective items. When its desirable to modify the underlying associative array during iteration, or when the items need to be sorted or otherwise manipulated independently of the associative array, these properties are just what the doctor ordered. Otherwise, they’re needless allocations that put undue pressure on the GC.

When the GC does run, the total amount of memory it needs to scan is going to determine how long it takes. The smaller, the better. Avoiding unnecessary allocations isn’t going to hurt anyone and is another good mitigation strategy. D’s associative arrays provide three properties that help do just that: byKey, byValue, and byKeyValue. These each return forward ranges that can be iterated lazily. They do not allocate because they actually refer to the items in the associative array, so it should not be modified while iterating them. For more on ranges, see the chapters titled Ranges and More Ranges in Ali Çehreli’s Programming in D.

Closures, which are delegates or function literals that need to carry around a pointer to the local stack frame, may also allocate. The last allocating language feature listed on the Garbage Collection page is the assertion. An assertion will allocate when it fails because it needs to throw an AssertError, which is part of D’s class-based exception hierarchy (we’ll look at how classes interact with the GC in a future post).

Then there’s Phobos, D’s standard library. Once upon a time, much of Phobos was implemented with little concern for GC allocations, making it difficult to use in situations where they were undesirable. However, a massive effort was initiated
to make it more conservative in its GC usage. Some functions were made to work with lazy ranges, others were rewritten to take buffers supplied by the caller, and some were reimplemented to avoid unnecessary allocations internally. The result is a standard library more amenable to GC-free code (though there are still probably some corners of the library that have not yet been renovated — PRs welcome).

Now that we’ve seen the basics of using the GC, the next post in this series will look at the tools the language and the compiler provide for turning the GC off and making sure specific sections of code are GC-free.

Thanks to Guillaume Piolat and Steven Schveighoffer for their help with this article.