Category Archives: GC

Symphony of Destruction: Structs, Classes and the GC (Part One)

Digital Mars D logo

This post is part of a broader series on garbage collection in D. The motivation is to explore how destructors and the GC interact. To do that, we first need a bit of background. We do not go into a broader discussion on the ins and outs of object destruction, only what is most relevant to the interaction of destructors and the GC.

I’ve split the discussion into two blog posts. Here in Part One, we look at how deterministic and non-deterministic destruction differ, consider the consequences of having a single destructor for both scenarios, and finally establish two simple guidelines that will help us avoid those consequences. In Part Two, we’ll go further and explore how we can still write solid destructors when circumstances dictate that the guidelines don’t apply.

Deterministic destruction

Destruction is deterministic when it is predictable, meaning the programmer can, simply by following the flow of the code, point out where and when an object’s destructor is invoked. This is possible with struct instances allocated on the stack, as the compiler will insert calls to their destructors at well-defined points for automatic and deterministic destruction.

There are two basic rules for automatic destruction:

  1. The destructors of all stack-allocated structs in a given scope are invoked when the scope exits.
  2. Destructors are invoked in reverse lexical order (i.e., the opposite of the order in which the declarations appear in the source).

With these two rules in mind, we can examine the following example and accurately predict its output.

import std.stdio;
struct Predictable
{
    int number;
    this(int n)
    {
        writeln("Constructor #", n);
        number = n;
    }
    ~this()
    {
        writeln("Destructor #", number);
    }
}

void main()
{
    Predictable s0 = Predictable(0);
    {
        Predictable s1 = Predictable(1);
    }
    Predictable s2 = Predictable(2);
}

We see that both s0 and s2 are directly within the scope of the main function, so their destructors will run when main exits. Given that the declaration of s2 comes after that of s0, the destructor of s2 will run before that of s0.

We also see that s1 is declared in an anonymous inner scope between the declarations of s0 and s2. This scope exits before s2 is constructed, so the destructor of s1 will execute before the constructor of s2.

With that, we can expect the following output:

Constructor #0      // declaration of s0
Constructor #1      // declaration of s1
Destructor #1       // anonymous scope exits, s1 destroyed
Constructor #2      // declaration of s2
Destructor #2       // main exits, s2 then s0 destroyed
Destructor #0

Compiling and executing the example proves us accurate seers.

The programmer can implement deterministic destruction manually, as is necessary when destroying instances allocated on the non-GC heap, e.g., with malloc or std.experimental.allocator. In an earlier post, Go Your Own Way (Part Two: The Heap), I covered how to use std.conv.emplace to allocate instances on the non-GC heap and briefly mentioned that destructors can be invoked manually via destroy. That’s a function template declared in the automatically imported object module so that it’s always available. We won’t retread the allocation discussion, but an example of manual destruction isn’t out of bounds for this post.

In the following example, we’ll reuse the definition of the Predictable struct and a destroyPredictable function to manually invoke the destructors. For completeness, I’ve included functions for allocating and deallocating Predictable instances from the non-GC heap: allocatePredictable and deallocatePredictable. If it isn’t clear to you what these two functions are doing, please read the blog post I mentioned above.

void main()
{
    Predictable* s0 = allocatePredictable(0);
    scope(exit) { destroyPredictable(s0); }
    {
        Predictable* s1 = allocatePredictable(1);
        scope(exit) { destroyPredictable(s1); }
    }
    Predictable* s2 = allocatePredictable(2);
    scope(exit) { destroyPredictable(s2); }
}

void destroyPredictable(Predictable* p)
{
    if(p) {
        destroy(*p);
        deallocatePredictable(p);
    }
}

Predictable* allocatePredictable(int n)
{
    import core.stdc.stdlib : malloc;
    import std.conv : emplace;
    auto p = cast(Predictable*)malloc(Predictable.sizeof);
    return emplace!Predictable(p, n);
}

void deallocatePredictable(Predictable* p)
{
    import core.stdc.stdlib : free;
    free(p);
}

Running this program will result in precisely the same output as the previous example. In the destroyPredictable function, we dereference the struct pointer when calling destroy because there is no overload that takes a pointer. There are specializations for classes, interfaces, and structs passed by reference and a general catch-all that takes all other types by reference. Destructors are invoked on types that have them. Before exiting, the function sets the argument to its default .init value through the reference.

Note that if we were to give destroy a pointer without first dereferencing it, the code would still compile. The pointer would be accepted by reference and simply set to null, the default .init value for pointers, but the struct’s destructor would not be invoked (i.e., the pointer is “destroyed”, not the struct instance).

Inserting writeln(*p) immediately after destroy(*p) should print

Predictable(0)

for each destroyed instance. (The default .init state for a struct in D is the aggregate of the .init property of each of its members; in this case, the sole member, being of type int, has an .init property of 0, so the struct’s default .init state is Predictable(0). This can be changed in the struct definition, e.g., struct Predictable { int id = 1; }.)

destroy is not restricted to instances allocated on the non-GC heap. Any aggregate type instance (struct, class, or interface) is a valid argument no matter where it was allocated.

Non-deterministic destruction

In languages with support for objects and a garbage collector, the responsibility for destroying object instances allocated on the GC heap falls to the GC. This is known as finalization. Before reclaiming an object’s memory, the GC finalizes the object by invoking its finalizer.

Finalization, though convenient, comes with a price. In Java’s particular circumstances, the price was determined to be so high that its maintainers deprecated the Object.finalize method and left a scary warning about its use in the documentation. It’s worth quoting here:

The finalization mechanism is inherently problematic. Finalization can lead to performance issues, deadlocks, and hangs. Errors in finalizers can lead to resource leaks; there is no way to cancel finalization if it is no longer necessary; and no ordering is specified among calls to finalize methods of different objects. Furthermore, there are no guarantees regarding the timing of finalization. The finalize method might be called on a finalizable object only after an indefinite delay, if at all.

Finalization in D isn’t quite the bugbear it is in Java, but we do see a less dramatic warning about it in the D documentation:

The garbage collector is not guaranteed to run the destructor for all unreferenced objects. Furthermore, the order in which the garbage collector calls destructors for unreferenced objects is not specified.

Although there’s no mention of “finalization” or “finalizers” here, that’s precisely what the text is referring to. The core message is the same in both warnings: finalization is non-deterministic and cannot be relied on.

Unlike structs, classes in D are reference types by default. Some consequences: the programmer never has direct access to the underlying class instance; instances declared uninitialized are null by default; the normal use case is to allocate instances via new. When a class is instantiated in D, it is usually going to be managed by the GC and its destructor will serve as a finalizer.

As an experiment, let’s change the definition of struct Predictable in our first example to class Unpredictable and use new to allocate the instances like so:

import std.stdio;
class Unpredictable
{
    int number;
    this(int n)
    {
        writeln("Constructor #", n);
        number = n;
    }
    ~this()
    {
        writeln("Destructor #", number);
    }
}

void main()
{
    Unpredictable s0 = new Unpredictable(0);
    {
        Unpredictable s1 = new Unpredictable(1);
    }
    Unpredictable s2 = new Unpredictable(2);
}

We’ll see that the output is drastically different:

Constructor #0
Constructor #1
Constructor #2
Destructor #0
Destructor #1
Destructor #2

Anyone familiar with the characteristics of the default DRuntime constructor can predict for this very simple program that all the destructors will be run when the GC’s cleanup function is executed as the D runtime shuts down, and that they will be executed in the order in which they were declared (an implementation detail; and note that destruction at shut down can be disabled via a command line argument). But in a more complex program, this ability to predict breaks down. Destructors can be invoked by the GC at almost any time and in any order.

To be clear, the GC will only perform its cleanup duties if and when it finds more memory is needed to fulfill a specific allocation request. In other words, it isn’t constantly running in the background, marking objects unreachable and calling destructors willy nilly. To that extent, we can predict when the GC has the possibility to perform its duties. Beyond that, all bets are off. We cannot predict with accuracy if any destructors will be invoked during any given allocation request or the order in which they will be invoked. This uncertainty has ramifications for how one implements destructors for any GC-managed type.

For starters, destructors of GC-managed objects should never perform any operation that can potentially result in a GC allocation request. Attempting to do so can result in an InvalidMemoryOperationError at run time. I use the word “potentially” because some operations can indirectly cause the error in certain circumstances, but not in others. Some examples: attempting to index an associative array can trigger an attempt to allocate a RangeError if the key is not present; a failed assert will result in allocation of an AssertError; calling any function not annotated with @nogc means GC operations are always possible in the call stack. These and any such operations should be avoided in the destructors of GC-managed objects. (The first seven items on the list of operations disallowed in @nogc functions are collectively a good guide.)

A larger issue is that one cannot rely on any resource still being valid when a destructor is called by the GC. Consider a class that attempts to close a socket handle in its destructor; it’s quite possible that the destructor won’t be called until after the program has already shutdown the network interface. There is no scenario in which the runtime can catch this. In the best case, such circumstances will result in a silent failure, but they could also result in crashes during program shutdown or even sooner.

What it comes down to is that GC-allocated objects should never be used to manage any resource that depends upon deterministic destruction for cleanup.

Designing for destruction

For the D neophyte, it can appear as if destructors in D are useless. Given that both struct and class instances can be allocated from memory that may or may not be managed by the GC, that destructors of GC-managed objects are not guaranteed to run, and that destructors are forbidden to perform GC operations during finalization, how can we ever rely on them?

In practice, it’s not as bad as it may seem. Issues do arise for the unwary, but armed with a basic awareness of the nature of D destructors, it turns out that it’s pretty easy to avoid having problems. This is especially true if the programmer adopts two fairly simple rules.

1. Pretend class destructors don’t exist

Class instances will nearly always be allocated with new. That means their destructors will nearly always be non-deterministic. Much of what one would want to do in a destructor is somehow dependent on program state: either the destructor itself expects a certain state (like writing to a log file that is expected to be open), or the program expects the destructor to have modified the program state in a specific manner (like releasing a resource handle).

Non-deterministic destruction means that all expectations about program state are thrown out the window. That log file might have already been closed, so the message will never be written (I hope it wasn’t important). That system resource handle may never be released until the program ends (I hope that particular resource isn’t scarce). Even if it seems through testing that a class destructor is working the way it’s intended, it’s quite likely down to the fact that the testing has not uncovered the case where it breaks. In a long-running program, that case will inevitably pop up at some point. Have fun debugging when your production game server starts randomly crashing.

So when using classes in D, pretend they have no destructors. Pretend that they are Java classes with a deprecated finalize method.

2. Don’t allocate structs on the GC heap when they have destructors

Since we’re pretending classes have no destructors, then we’re going to turn to structs for all of our destructive needs. Allocating structs as value objects on the stack will cover many use cases, but sometimes we may need to allocate them on the heap. When that situation arises, do not allocate any destructor-bearing struct with new. If we allocate a struct that has a destructor on the GC heap, it completely defeats our purpose of avoiding class destructors in the first place. That destructor we intended to be deterministic is now non-deterministic, so we may as well have just used a class.

As we have seen, struct instances can be allocated on the non-GC heap (e.g., with malloc) and their destructors manually invoked with the destroy function. If we need deterministic destruction and we absolutely must have a heap-allocated struct, then we cannot allocate that struct on the GC heap.

Guidelines schmuidelines

I’m sure someone is reading the above guidelines and thinking, “If I have to pretend that classes have no destructors, then why do classes have destructors?” Well, you don’t have to.

There is no One True Path to follow when deciding if an object should be implemented as a class or struct. Personally, I will always prefer structs over classes, and I will only reach for a class if I need something structs can’t give me easily (like a hierarchy) or efficiently. Other people will consider if the object they need to represent has an identity, e.g., an Actor in a simulation versus the Vertex that defines its 3D coordinates. POD (Plain Old Data) types should always be structs, but beyond that it’s largely a matter of preference.

My two guidelines are based on my experience and that of others with whom I’ve spoken. They are intended to help you keep the full implications of D’s distinction between classes and structs at the forefront of your thoughts when architecting your program. They are not commandments that every D programmer must follow.

Realistically, most D programmers will encounter circumstances at one time or another in which the guidelines do not apply. For example, when mixing GC-managed memory and manually-managed memory in the same program, it’s quite possible for a struct intended for stack use to wind up on the GC heap if the programmer is unaware of the circumstances. And some D programmers will always prefer classes over structs because that’s just the way they want it, and so will simply choose to ignore the guidelines. That’s no problem as long as they fully understand the consequences.

So what does that mean? How do you get over the non-deterministic nature of class destructors if your Actor class absolutely must have a destructor, or if you prefer to always follow The Way of the Class? How do you prevent structs intended for the stack from being GC-allocated? These are things we’ll be looking at in Part Two. See you there.

Thanks to Ali Çehreli and Max Haughton for their feedback. And to Adam D. Ruppe for his conversation on the topic in Discord and the title suggestion (it fits in more nicely with the series than the ‘Appetite For Destruction’ I had intended to go with)

Go Your Own Way (Part Two: The Heap)

This post is part of an ongoing series on garbage collection in the D Programming Language, and the second of two regarding the allocation of memory outside of the GC. Part One discusses stack allocation. Here, we’ll look at allocating memory from the non-GC heap.

Although this is only my fourth post in the series, it’s the third in which I talk about ways to avoid the GC. Lest anyone jump to the wrong conclusion, that fact does not signify an intent to warn programmers away from the D garbage collector. Quite the contrary. Knowing how and when to avoid the GC is integral to understanding how to efficiently embrace it.

To hammer home a repeated point, efficient garbage collection requires reducing stress on the GC. As highlighted in the first and subsequent posts in this series, that doesn’t necessarily mean avoiding it completely. It means being judicious in how often and how much GC memory is allocated. Fewer GC allocations means fewer opportunities for a collection to trigger. Less total memory allocated from the GC heap means less total memory to scan.

It’s impossible to make any accurate, generalized statement about what sort of applications may or may not feel an impact from the GC; such is highly application specific. What can be said is that it may not be necessary for many applications to temporarily avoid or disable the GC, but when it is, it’s important to know how. Allocating from the stack is an obvious approach, but D also allows allocating from the non-GC heap.

The ubiquitous C

For better or worse, C is everywhere. Any software written today, no matter the source language, is probably interacting with a C API at some level. Despite the C specification defining no standard ABI, its platform-specific quirks and differences are understood well enough that most languages know how to interface with it. D is no exception. In fact, all D programs have access to the C standard library by default.

The core.stdc package, part of DRuntime, is a collection of D modules translated from C standard library headers. When a D executable is linked, the C standard library is linked along with it. All that need be done to gain access is to import the appropriate modules.

import core.stdc.stdio : puts;
void main() 
{
    puts("Hello C standard library.");
}

Some who are new to D may be laboring under a misunderstanding that functions which call into C require an extern(C) annotation, or, after Walter’s Bright’s recent ‘D as a Better C’ article, must be compiled with -betterC on the command line. Neither is true. Normal D functions can call into C without any special effort beyond the presence of an extern(C) declaration of the function being called. In the snippet above, the declaration of puts is in the core.stdc.stdio module, and that’s all we need to call it.

malloc and friends

Given that we have access to C’s standard library in D, we therefore have access to the functions malloc, calloc, realloc and free. All of these can be made available by importing core.stdc.stdlib. And thanks to D’s slicing magic, using these functions as the foundation of a non-GC memory management strategy is a breeze.

import core.stdc.stdlib;
void main() 
{
    enum totalInts = 10;
    
    // Allocate memory for 10 ints
    int* intPtr = cast(int*)malloc(int.sizeof * totalInts);

    // assert(0) (and assert(false)) will always remain in the binary,
    // even when asserts are disabled, which makes it nice for handling
    // malloc failures    
    if(!intPtr) assert(0, "Out of memory!");

    // Free when the function exits. Not necessary for this example, but
    // a potentially useful strategy for temporary allocations in functions 
    // other than main.
    scope(exit) free(intPtr);

    // Slice the D pointer to get a more manageable length/pointer pair.
    int[] intArray = intPtr[0 .. totalInts];
}

Not only does this bypass the GC, it also bypasses D’s default initialization. A GC-allocated array of type T would have all of its elements initialized to T.init, which is 0 for int. If mimicking D’s default initialization is the desired behavior, more work needs to be done. In this example, we could replace malloc with calloc for the same effect, but that would only be correct for integrals. float.init, for example, is float.nan rather than 0.0f. We’ll come back to this later in the article.

Of course, it would be more idiomatic to wrap both malloc and free and work with slices of memory. A minimal example:

import core.stdc.stdlib;

// Allocate a block of untyped bytes that can be managed
// as a slice.
void[] allocate(size_t size)
{
    // malloc(0) is implementation defined (might return null 
    // or an address), but is almost certainly not what we want.
    assert(size != 0);

    void* ptr = malloc(size);
    if(!ptr) assert(0, "Out of memory!");
    
    // Return a slice of the pointer so that the address is coupled
    // with the size of the memory block.
    return ptr[0 .. size];
}

T[] allocArray(T)(size_t count) 
{ 
    // Make sure to account for the size of the
    // array element type!
    return cast(T[])allocate(T.sizeof * count); 
}

// Two versions of deallocate for convenience
void deallocate(void* ptr)
{   
    // free handles null pointers fine.
    free(ptr);
}

void deallocate(void[] mem) 
{ 
    deallocate(mem.ptr); 
}

void main() {
    import std.stdio : writeln;
    int[] ints = allocArray!int(10);
    scope(exit) deallocate(ints);
    
    foreach(i; 0 .. 10) {
        ints[i] = i;
    }

    foreach(i; ints[]) {
        writeln(i);
    }
}

allocate returns void[] rather than void* because it carries with it the number of allocated bytes in its length property. In this case, since we’re allocating an array, we could instead rewrite allocArray to slice the returned pointer immediately, but anyone calling allocate directly would still have to take into account the size of the memory. The disassociation between arrays and their length in C is a major source of bugs, so the sooner we can associate them the better. Toss in some templates for calloc and realloc and you’ve got the foundation of a memory manager based on the C heap.

On a side note, the preceding three snippets (yes, even the one with the allocArray template) work with and without -betterC. But from here on out, we’ll restrict ourselves to features in normal D code.

Avoid leaking like a sieve

When working directly with slices of memory allocated outside of the GC heap, be careful about appending, concatenating, and resizing. By default, the append (~=) and concatenate (~) operators on built-in dynamic arrays and slices will allocate from the GC heap. Concatenation will always allocate a new memory block for the combined string. Normally, the append operator will allocate to expand the backing memory only when it needs to. As the following example demonstrates, it always needs to when it’s given a slice of non-GC memory.

import core.stdc.stdlib : malloc;
import std.stdio : writeln;

void main()
{
    int[] ints = (cast(int*)malloc(int.sizeof * 10))[0 .. 10];
    writeln("Capacity: ", ints.capacity);

    // Save the array pointer for comparison
    int* ptr = ints.ptr;
    ints ~= 22;
    writeln(ptr == ints.ptr);
}

This should print the following:

Capacity: 0
false

A capacity of 0 on a slice indicates that the next append will trigger an allocation. Arrays allocated from the GC heap normally have space for extra elements beyond what was requested, meaning some appending can occur without triggering a new allocation. It’s more like a property of the memory backing the array rather than of the array itself. Memory allocated from the GC does some internal bookkeeping to keep track of how many elements the memory block can hold so that it knows at any given time if a new allocation is needed. Here, because the memory for ints was not allocated by the GC, none of that bookkeeping is being done by the runtime on the existing memory block, so it must allocate on the next append (see Steven Schveighoffer’s ’D Slices article for more info).

This isn’t necessarily a bad thing when it’s the desired behavior, but anyone who’s not prepared for it can easily run into ballooning memory usage thanks to leaks from malloced memory never being deallocated. Consider these two functions:

void leaker(ref int[] arr)
{
    ...
    arr ~= 10;
    ...
}

void cleaner(int[] arr)
{
    ...
    arr ~= 10;
    ...
}

Although arrays are reference types, meaning that modifying existing elements of an array argument inside a function will modify the elements in the original array, they are passed by value as function parameters. Any activity that modifies the structure of an array argument, i.e. its length and ptr properties, only affects the local variable inside the function. The original will remain unchanged unless the array is passed by reference.

So if an array backed by the C heap is passed to leaker, the append will cause a new array to be allocated from the GC heap. Worse, if free is subsequently called on the ptr property of the original array, which now points into the GC heap rather than the C heap, we’re in undefined behavior territory. cleaner, on the other hand, is fine. Any array passed into it will remain unchanged. Internally, the GC will allocate, but the ptr property of the original array still points to the original memory block.

As long as the original array isn’t overwritten or allowed to go out of scope, this is a non-issue. Functions like cleaner can do what they want with their local slice and things will be fine externally. Otherwise, if the original array is to be discarded, you can prevent all of this by tagging functions that you control with @nogc. Where that’s either not possible or not desirable, then either a copy of the pointer to the original malloced memory must be kept and freeed at some point after the reallocation takes place, custom appending and concatenation needs to be implemented, or the allocation strategy needs to be reevaluated.

Note that std.container.array contains an Array type that does not rely on the GC and may be preferable over managing all of this manually.

Other APIs

The C standard library isn’t the only game in town for heap allocations. A number of alternative malloc implementations exist and any of those can be used instead. This requires manually compiling the source and linking with the resultant objects, but that’s not an onerous task. Heap memory can also be allocated through system APIs, like the Win32 HeapAlloc function on Windows (available by importing core.sys.windows.windows). As long as there’s a way to get a pointer to a block of heap memory, it can be sliced and manipulated in a D program in place of a block of GC memory.

Aggregate types

If we only had to worry about allocating arrays in D, then we could jump straight on to the next section. However, we also need to concern ourselves with struct and class types. For this discussion, however, we will only focus on the former. The next couple of posts in the series will focus exclusively on classes.

Allocating an array of struct types, or a single instance of one, is often no different than when the type is int.

struct Point { int x, y; }
Point* onePoint = cast(Point*)malloc(Point.sizeof);
Point* tenPoints = cast(Point*)malloc(Point.sizeof * 10);

Where things break down is when contructors enter the mix. malloc and friends know nothing about constructing D object instances. Thankfully, Phobos provides us with a function template that does.

std.conv.emplace can take either a pointer to typed memory or an untyped void[], along with an optional number of arguments, and return a pointer to a single, fully initialized and constructed instance of that type. This example shows how to do so using both malloc and the allocate function template from above:

struct Vertex4f 
{ 
    float x, y, z, w; 
    this(float x, float y, float z, float w = 1.0f)
    {
        this.x = x;
        this.y = y;
        this.z = z;
        this.w = w;
    }
}

void main()
{
    import core.stdc.stdlib : malloc;
    import std.conv : emplace;
    import std.stdio : writeln;
    
    Vertex4f* temp1 = cast(Vertex4f*)malloc(Vertex4f.sizeof);
    Vertex4f* vert1 = emplace(temp1, 4.0f, 3.0f, 2.0f); 
    writeln(*vert1);

    void[] temp2 = allocate(Vertex4f.sizeof);
    Vertex4f* vert2 = emplace!Vertex4f(temp2, 10.0f, 9.0f, 8.0f);
    writeln(*vert2);
}

Another feature of emplace is that it also handles default initialization. Consider that struct types in D need not implement constructors. Here’s what happens when we change the implementation of Vertex4f to remove the constructor:

struct Vertex4f 
{
    // x, y, z are default inited to float.nan
    float x, y, z;

    // w is default inited to 1.0f
    float w = 1.0f;
}

void main()
{
    import core.stdc.stdlib : malloc;
    import std.conv : emplace;
    import std.stdio : writeln;

    Vertex4f vert1, vert2 = Vertex4f(4.0f, 3.0f, 2.0f);
    writeln(vert1);
    writeln(vert2);    
    
    auto vert3 = emplace!Vertex4f(allocate(Vertex4f.sizeof));
    auto vert4 = emplace!Vertex4f(allocate(Vertex4f.sizeof), 4.0f, 3.0f, 2.0f);
    writeln(*vert3);
    writeln(*vert4);
}

This prints the following:

Vertex4f(nan, nan, nan, 1)
Vertex4f(4, 3, 2, 1)
Vertex4f(nan, nan, nan, 1)
Vertex4f(4, 3, 2, 1)

So emplace allows heap-allocated struct instances to be initialized in the same manner as stack allocated struct instances, with or without a constructor. It also works with the built-in types like int and float. Just always remember that emplace is intended to initialize and construct a single instance, not an array of instances.

If the aggregate type has a destructor, it should be invoked before its memory is deallocated. This can be achieved with the destroy function (always available through the implicit import of std.object).

std.experimental.allocator

The entirety of the text above describes the fundamental building blocks of a custom memory manager. For many use cases, it may be sufficient to forego cobbling something together by hand and instead take advantage of the D standard library’s std.experimental.allocator package. This is a high-level API that makes use of low-level techniques like those described above, along with Design by Introspection, to facilitate the assembly of different types of allocators that know how to allocate, initialize, and construct arrays and type instances. Allocators like Mallocator and GCAllocator can be used to grab chunks of memory directly, or combined with other building blocks for specialized behavior. See the emsi-containers library for a real-world example.

Keeping the GC informed

Given that it’s rarely recommended to disable the GC entirely, most D programs allocating outside the GC heap will likely also be using memory from the GC heap in the same program. In order for the GC to properly do its job, it needs to be informed of any non-GC memory that contains, or may potentially contain, references to memory from the GC heap. For example, a linked list whose nodes are allocated with malloc might contain references to classes allocated with new.

The GC can be given the news via GC.addRange.

import core.memory;
enum size = int.sizeof * 10;
void* p1 = malloc(size);
GC.addRange(p1, size);

void[] p2 = allocate!int(10);
GC.addRange(p2.ptr, p2.length);

When the memory block is no longer needed, the corresponding GC.removeRange can be called to prevent it from being scanned. This does not deallocate the memory block. That will need to be done manually via free or whatever allocator interface was used to allocate it. Be sure to read the documentation before using either function.

Given that one of the goals of allocating from outside the GC heap is to reduce the amount of memory the GC must scan, this may seem self-defeating. That’s the wrong way to look at it. If non-GC memory is going to hold references to GC memory, then it’s vital to let the GC know about it. Not doing so can cause the GC to free up memory to which a reference still exists. addRange is a tool specifically designed for that situation. If it can be guaranteed that no GC-memory references live inside a non-GC memory block, such as a malloced array of vertices, then addRange need not be called on that memory block.

A word of warning

Be careful when passing typed pointers to addRange. Because the function was implemented with the C like approach of taking a pointer to a block of memory and the number of bytes it contains, there is an opportunity for error.

struct Item { SomeClass foo; }
auto items = (cast(Item*)malloc(Item.sizeof * 10))[0 .. 10];
GC.addRange(items.ptr, items.length);

With this, the GC would be scanning a block of memory exactly ten bytes in size. The length property returns the number of elements the slice refers to. Only when the type is void (or the element type is one-byte long, like byte and ubyte) does it equate to the size of the memory block the slice refers to. The correct thing to do here is:

GC.addRange(items.ptr, items.length * Item.sizeof);

However, until DRuntime is updated with an alternative, it may be best to implement a wrapper that takes a void[] parameter.

void addRange(void[] mem) 
{
    import core.memory;
    GC.addRange(mem.ptr, mem.length);
}

Then calling addRange(items) will do the correct thing. The implicit conversion of the slice to void[] in the function call will mean that mem.length is the same as items.length * Item.sizeof.

The GC series marches on

This post has covered the very basics of using the non-GC heap in D programs. One glaring omission, in addition to class types, is what to do about destructors. I’m saving that topic for the post about classes, where it is highly relevant. That’s the next scheduled post in the GC series. Stay tuned!

Thanks to Walter Bright, Guillaume Piolat, Adam D. Ruppe, and Steven Schveighoffer for their valuable feedback on a draft of this article.

Go Your Own Way (Part One: The Stack)

This is my third post in the GC series. In the first post, I introduced D’s garbage collector and the language features that require it, and touched on simple strategies to use it effectively. In the second post, I showed off the tools provided by the language and library to disable or prohibit the GC in specific parts of a code base, how to use the compiler to assist in that endeavor, and recommended that D programs be written initially to embrace the GC, taking advantage of simple strategies to mitigate its impact, and later tuned to avoid it or further optimize its usage only when profiling shows it’s warranted.

When garbage collection is turned off via GC.disable or prevented by the @nogc function annotation, memory will still need to be allocated from somewhere. And even when the GC is fully embraced, it’s still desirable to minimize the size and quantity of GC heap allocations. That means allocating either via the stack, or via the non-GC heap. The focus of this post is the former. Non-GC heap allocations will be covered in my next post in this series.

Stack allocation

The simplest allocation strategy in D is the same as it is in C: avoid the heap and use the stack whenever possible. When a local array is needed and the size can be known at compile time, use a static rather than a dynamic array. Structs, which are value types and stack-allocated by default, should be preferred where possible over classes, which are reference types and are usually allocated from one heap or another. D’s compile-time features can present opportunities here that might not otherwise be available.

Static arrays

Static array declarations in D require the length to be known at compile-time.

// OK
int[10] nums;

// Error: variable x cannot be read at compile time
int x = 10;
int[x] err;

Unlike dynamic arrays, static arrays can be initialized with array literals with no allocation taking place on the GC heap. The lengths must match, otherwise the compiler will emit an error.

@nogc void main() {
    int[3] nums = [1, 2, 3];
}

Static arrays are automatically sliced when passed to any function taking a slice as a parameter, making them interchangeable with dynamic arrays.

void printNums(int[] nums) {
    import std.stdio : writeln;
    writeln(nums);
}

void main() {
    int[]  dnums = [0, 1, 2];
    int[3] snums = [0, 1, 2];
    printNums(dnums);
    printNums(snums);
}

When compiling with -vgc to find the potential GC allocations in a program and eliminate them where possible, this is an easy win. Just be wary of situations like the following:

int[] foo() {
    auto nums = [0, 1, 2];

    // Do work with nums...

    return nums;
}

Converting nums in this example to a static array would be a mistake. The return statement in that case would be returning a slice to stack-allocated memory, which is a programming error. Luckily, doing so will generate a compiler error.

On the other hand, if the return is conditional, it may be desirable to heap-allocate the array only when absolutely necessary rather than every time the function is called. In that scenario, a static array can be declared locally and a dynamic copy made on return. Enter the .dup property:

int[] foo() {
    int[3] nums = [0, 1, 2];
    
    // Let x = the result of some work with nums
    bool condtion = x;

    if(condition) return nums.dup;
    else return [];
}

This function still uses the GC via .dup, but only allocates if it needs to and avoids allocation when it doesn’t. Note that [] is equivalent to null in this case, a slice (or dynamic array) with a .length of 0 and a .ptr of null.

Structs vs. classes

Struct instances in D are allocated on the stack by default, but can be allocated on the heap when desired. Stack-allocated structs have deterministic destruction, with their destructors called as soon as the enclosing scope exits.

struct Foo {
    int x;
    ~this() {
        import std.stdio;
        writefln("#%s says bye!", x);
    }
}
void main() {
    Foo f1 = Foo(1);
    Foo f2 = Foo(2);
    Foo f3 = Foo(3);
}

As expected, this prints:

#3 says bye!
#2 says bye!
#1 says bye!

Classes, being reference types, are almost always allocated on the heap. Usually, that’s the GC heap via new, though it could also be the non-GC heap through a custom allocator. But there’s no rule saying they can’t be allocated on the stack. The standard library template std.typecons.scoped allows us to easily do so.

class Foo {
    int x;

    this(int x) { 
        this.x = x; 
    }
    
    ~this() {
        import std.stdio;
        writefln("#%s says bye!", x);
    }
}
void main() {
    import std.typecons : scoped;
    auto f1 = scoped!Foo(1);
    auto f2 = scoped!Foo(2);
    auto f3 = scoped!Foo(3);
}

Functionally, this is identical to the struct example above; it prints the same results. Deterministic destruction is achieved via the core.object.destroy function, which allows destructors to be called outside of GC collections.

Note that neither scoped nor destroy are currently usable in @nogc functions. This isn’t necessarily a problem, as a function doesn’t have to be annotated such to avoid the GC, but it can be a headache if you are trying to fit everything into a @nogc call tree. In future posts, we’ll look at some of the design issues that crop up when using @nogc and how to avoid them.

Generally, when implementing custom types in D, the choice between struct and class should be dependent on the type’s intended usage. POD types are obvious candidates for struct, whereas for types in something like a GUI system, where inheritance heirarchies and runtime interfaces are extremely useful, class is a more appropriate choice. Beyond those obvious cases, there are a number of other considerations which could be the focus of a separate blog post on the topic. For our purposes, just keep in mind that whether or not a type is implemented as a struct or a class need not always dictate whether or not instances can be allocated on the stack.

alloca

Given that D makes alloca available, it is also an option for stack allocation. This is a candidate especially for arrays when you want to avoid or eliminate a local GC allocation, but the array size is only known at run time. The following example allocates a dynamic array with a runtime size on the stack.

import core.stdc.stdlib : alloca;

void main() {
    size_t size = 10;
    void* mem = alloca(size);

    // Slice the memory block
    int[] arr = cast(int[])mem[0 .. size];
}

The same caution about using alloca in C applies here: be careful not to blow up the stack. And as with local static arrays, don’t return a slice of arr. Return arr.dup instead.

A simple example

Consider an implementation of a Queue data type. An idiomatic implementation in D is going to be a struct that’s templated on the type it’s intended to contain. In Java, collection usage is interface heavy and it’s recommended to declare an instance using an interface type rather than the implementation type. Structs in D can’t implement interfaces, but in many cases they can still be used to program to interfaces thanks to Design by Introspection (DbI). This allows programming to a common interface that is verified via compile-time introspection without the need for an interface type, so it can work with structs, classes and, thanks to Universal Function Call Syntax (UFCS), even free functions (when the functions are in scope).

D’s arrays are an obvious choice as the backing store for a Queue implementation. Moreover, there’s an opportunity to make the backing store a static array when a queue is intended to be bounded with a fixed size. Since it’s already a templated type, an additional parameter, a template value parameter with a default value can easily be added to decide at compile time if the array should be static or not and, if so, how much space it should require.

// A default Size of 0 means to use a dynamic array for the
// backing store; non-zero indicates a static array.
struct Queue(T, size_t Size = 0) 
{
    // This constant will be inferred as a boolean. By making it
    // public, a DbI template outside of this module can determine
    // whether or not the Queue might grow. 
    enum isFixedSize = Size > 0;

    void enqueue(T item) 
    {
        static if(isFixedSize) {
            assert(_itemCount < _items.length);
        }
        else {
            ensureCapacity();
        }
        push(item);
    }

    T dequeue() {
        assert(_itemCount != 0);
        static if(isFixedSize) {
            return pop();
        }
        else {
            auto ret = pop();
            ensurePacked();
            return ret;
        }
    }

    // Only available on a growable array
    static if(!isFixedSize) {
        void reserve(size_t capacity) { /* Allocate space for new items */ }
    }

private:   
    static if(isFixedSize) {
        T[Size] _items;     
    }
    else T[] _items;
    size_t _head, _tail;
    size_t _itemCount;

    void push(T item) { 
        /* Add item, update _head and _tail */
        static if(isFixedSize) { ... }
        else { ... }
    }

    T pop() { 
        /* Remove item, update _head and _tail */ 
        static if(isFixedSize) { ... }
        else { ... }
    }

    // These are only available on a growable array
    static if(!isFixedSize) {
        void ensureCapacity() { /* Alloc memory if needed */ }
        void ensurePacked() { /* Shrink the array if needed */}
    }
}

With this, the client can declare instances like so:

Queue!Foo qUnbounded;
Queue!(Foo, 128) qBounded;

qBounded requires no heap allocations. What happens with qUnbounded depends on the implementation. Moreover, compile-time introspection can be used to test if an instance is a fixed size or not. The isFixedSize constant is a convenience for that. Clients could alternatively use the built-in __traits(hasMember, T, "reserve") or the standard library function std.traits.hasMember!T("reserve") in one compile-time construct or another (__traits and std.traits are great for DbI, and the latter should be preferred when it provides similar functionality), but including the constant in the type is more convenient.

void doSomethingWithQueueInterface(T)(T queue)
{
    static if(T.isFixedSize) { ... }
    else { ... }
}

Conclusion

This has been a brief overview of a few options for stack allocation in D to avoid allocations from the GC heap. Making use of them when possible is an easy way to minimize the size and quantity of GC allocations, a proactive strategy for mitigating potential negative performance impacts from garbage collection.

The next post in this series will cover some of the options available for non-GC heap allocations.

Life in the Fast Lane

The first post I wrote in the GC series introduced the D garbage collector and the language features that use it. Two key points that I tried to get across in the article were:

  1. The GC can only run when memory allocations are requested. Contrary to popular misconception, the D GC isn’t generally going to decide to pause your Minecraft clone in the middle of the hot path. It will only run when memory from the GC heap is requested, and then only if it needs to.
  2. Simple C and C++ allocation strategies can mitigate GC pressure. Don’t allocate memory in inner loops – preallocate as much as possible, or fetch it from the stack instead. Minimize the total number of heap allocations. These strategies work because of point #1 above. The programmer can dictate when it is possible for a collection to occur simply by being smart about when GC heap allocations are made.

The strategies in point #2 are fine for code that a programmer writes herself, but they aren’t going to help at all with third-party libraries. For those situations, D provides built-in mechanisms to guarantee that no GC allocations can occur, both in the language and the runtime. There are also command-line options that can help make sure the GC stays out of the way.

Let’s imagine a hypothetical programmer named J.P. who, for reasons he considers valid, has decided he would like to avoid garbage collection completely in his D program. He has two immediate options.

The GC chill pill

One option is to make a call to GC.disable when the program is starting up. This doesn’t stop allocations, but puts a hold on collections. That means all collections, including any that may result from allocations in other threads.

void main() {
    import core.memory;
    import std.stdio;
    GC.disable;
    writeln("Goodbye, GC!");
}

Output:

Goodbye, GC!

This has the benefit that all language features making use of the GC heap will still work as expected. But, considering that allocations are still going without any cleanup, when you do the math you’ll realize this might be problematic. If allocations start to get out of hand, something’s gotta give. From the documentation:

Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

Depending on J.P.’s perspective, this might not be a good thing. But if this constraint is acceptable, there are some additional steps that can help keep things under control. J.P. can make calls to GC.enable or GC.collect as necessary. This provides greater control over collection cycles than the simple C and C++ allocation strategies.

The GC wall

When the GC is simply intolerable, J.P. can turn to the @nogc attribute. Slap it at the front of the main function and thou shalt suffer no collections.

@nogc
void main() { ... }

This is the ultimate GC mitigation strategy. @nogc applied to main will guarantee that the garbage collector will never run anywhere further along the callstack. No more caveats about collecting “where the implementation deems necessary”.

At first blush, this may appear to be a much better option than GC.disable. Let’s try it out.

@nogc
void main() {
    import std.stdio;
    writeln("GC be gone!");
}

This time, we aren’t going to get past compilation:

Error: @nogc function 'D main' cannot call non-@nogc function 'std.stdio.writeln!string.writeln'

What makes @nogc tick is the compiler’s ability to enforce it. It’s a very blunt approach. If a function is annotated with @nogc, then any function called from inside it must also be annotated with @nogc. As may be obvious, writeln is not.

That’s not all:

@nogc 
void main() {
    auto ints = new int[](100);
}

The compiler isn’t going to let you get away with that one either.

Error: cannot use 'new' in @nogc function 'D main'

Any language feature that allocates from the GC heap is out of reach inside a function marked @nogc (refer to the first post in this series for an overview of those features). It’s turtles all the way down. The big benefit here is that it guarantees that third-party code can’t use those features either, so can’t be allocating GC memory behind your back. Another downside is that any third-party library that is not @nogc aware is not going to be available in your program.

Using this approach requires a number of workarounds to make up for non-@nogc language features and library functions, including several in the standard library. Some are trivial, some are not, and others can’t be worked around at all (we’ll dive into the details in a future post). One example that might not be obvious is throwing an exception. The idiomatic way is:

throw new Exception("Blah");

Because of the new in that line, this isn’t possible in @nogc functions. Getting around this requires preallocating any exceptions that will be thrown, which in turn runs into the issue that any exception memory allocated from the regular heap still needs to be deallocated, which leads to ideas of reference counting or stack allocation… In other words, it’s a big can of worms. There’s currently a D Improvement Proposal from Walter Bright intended to stuff all the worms back into the can by making throw new Exception work without the GC when it needs to.

It’s not an insurmountable task to get around the limitations of @nogc main, it just requires a good bit of motivation and dedication.

One more thing to note about @nogc main is that it doesn’t banish the GC from the program completely. D has support for static constructors and destructors. The former are executed by the runtime before entering main and the latter upon exiting. If any of these exist in the program and are not annotated with @nogc, then GC allocations and collections can technically be present in the program. Still, @nogc applied to main means there won’t be any collections running once main is entered, so it’s effectively the same as having no GC at all.

Working it out

Here’s where I’m going to offer an opinion. There’s a wide range of programs that can be written in D without disabling or cutting the GC off completely. The simple strategies of minimizing GC allocations and keeping them out of the hot path will get a lot of mileage and should be preferred. It can’t be repeated enough given how often it’s misunderstood: D’s GC will only have a chance to run when the programmer allocates GC memory and it will only run if it needs to. Use that knowledge to your advantage by keeping the allocations small, infrequent, and isolated outside your inner loops.

For those programs where more control is actually needed, it probably isn’t going to be necessary to avoid the GC entirely. Judicious use of @nogc and/or the core.memory.GC API can often serve to avoid any performance issues that may arise. Don’t put @nogc on main, put it on the functions where you really want to disallow GC allocations. Don’t call GC.disable at the beginning of the program. Call it instead before entering a critical path, then call GC.enable when leaving that path. Force collections at strategic points, such as between game levels, with GC.collect.

As with any performance tuning strategy in software development, it pays to understand as fully as possible what’s actually happening under the hood. Adding calls to the core.memory.GC API in places where you think they make sense could potentially make the GC do needless work, or have no impact at all. Better understanding can be achieved with a little help from the toolchain.

The DRuntime GC option --DRT-gcopt=profile:1 can be passed to a compiled program (not to the compiler!) for some tune-up assistance. This will report some useful GC profiling data, such as the total number of collections and the total collection time.

To demonstrate, gcstat.d appends twenty values to a dynamic array of integers.

void main() {
    import std.stdio;
    int[] ints;
    foreach(i; 0 .. 20) {
        ints ~= i;
    }
    writeln(ints);
}

Compiling and running with the GC profile switch:

dmd gcstat.d
gcstat --DRT-gcopt=profile:1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
        Number of collections:  1
        Total GC prep time:  0 milliseconds
        Total mark time:  0 milliseconds
        Total sweep time:  0 milliseconds
        Total page recovery time:  0 milliseconds
        Max Pause Time:  0 milliseconds
        Grand total GC time:  0 milliseconds
GC summary:    1 MB,    1 GC    0 ms, Pauses    0 ms <    0 ms

This reports one collection, which almost certainly happened as the program was shutting down. The runtime terminates the GC as it exits which, in the current implementation, will generally trigger a collection. This is done primarily to run destructors on collected objects, even though D does not require destructors of GC-allocated objects to ever be run (a topic for a future post).

DMD supports a command-line option, -vgc, that will display every GC allocation in a program, including those that are hidden behind language features like the array append operator.

To demonstrate, take a look at inner.d:

void printInts(int[] delegate() dg)
{
    import std.stdio;
    foreach(i; dg()) writeln(i);
} 

void main() {
    int[] ints;
    auto makeInts() {
        foreach(i; 0 .. 20) {
            ints ~= i;
        }
        return ints;
    }

    printInts(&makeInts);
}

Here, makeInts is an inner function. A pointer to a non-static inner function is not a function pointer, but a delegate (a context pointer/function pointer pair; if an inner function is static, a pointer of type function is produced instead). In this particular case, the delegate makes use of a variable in its parent scope. Here’s the output of compiling with -vgc:

dmd -vgc inner.d
inner.d(11): vgc: operator ~= may cause GC allocation
inner.d(7): vgc: using closure causes GC allocation

What we’re seeing here is that memory needs to be allocated so that the delegate can carry the state of ints, making it a closure (which is not itself a type – the type is still delegate). Move the declaration of ints inside the scope of makeInts and recompile. You’ll find that the closure allocation goes away. A better option is to change the declaration of printInts to look like this:

void printInts(scope int[] delegate() dg)

Adding scope to any function parameter ensures that any references in the parameter cannot be escaped. In other words, it now becomes impossible to do something like assign dg to a global variable, or return it from the function. The effect is that there is no longer a need to create a closure, so there will be no allocation. See the documentation for more on function pointers, delegates and closures, and function parameter storage classes.

The gist

Given that the D GC is very different from those in languages like Java and C#, it’s certain to have different performance characteristics. Moreover, D programs tend to produce far less garbage than those written in a language like Java, where almost everything is a reference type. It helps to understand this when embarking on a D project for the first time. The strategies an experienced Java programmer uses to mitigate the impact of collections aren’t likley to apply here.

While there is certainly a class of software in which no GC pauses are ever acceptable, that is an arguably small set. Most D projects can, and should, start out with the simple mitigation strategies from point #2 at the top of this article, then adapt the code to use @nogc or core.memory.GC as and when performance dictates. The command-line options demonstrated here can help ferret out the areas where that may be necessary.

As time goes by, it’s going to become easier to micromanage garbage collection in D programs. There’s a concerted effort underway to make Phobos, D’s standard library, as @nogc-friendly as possible. Language improvements such as Walter’s proposal to modify how exceptions are allocated should speed that work considerably.

Future posts in this series will look at how to allocate memory outside of the GC heap and use it alongside GC allocations in the same program, how to compensate for disabled language features in @nogc code, strategies for handling the interaction of the GC with object destructors, and more.

Thanks to Vladimir Panteleev, Guillaume Piolat, and Steven Schveighoffer for their valuable feedback on drafts of this article.

The article has been amended to remove a misleading line about Java and C#, and to add some information about multiple threads.

automem: Hands-Free RAII for D

Átila Neves has used both C++ and D professionally. He’s responsible for several D libraries and tools, like unit-threaded, cerealed, and reggae.


Garbage collected languages tend to suffer from a framing problem, and D is no exception. Its inclusion of a mark-and-sweep garbage collector makes safe memory management easy and convenient, but, thanks to a widespread perception that GC in general is a performance killer, alienates a lot of potential users due to its mere existence.

Coming to D as I did from C++, the one thing I didn’t like about the language initially was the GC. I’ve since come to realize that my fears were mostly unfounded, but the fact remains that, for many people, the GC is reason enough to avoid the language. Whether or not that is reasonable given their use cases is debatable (and something over which reasonable people may disagree), but the existence of the perception is not.

A lot of work has been done over the years to make sure that D code can be written which doesn’t depend on the GC. The @nogc annotation is especially important here, and I don’t think it’s been publicized enough. A @nogc main function is a compile-time guarantee that the program will not ever allocate any GC memory. For the types of applications that need those sorts of guarantees, this is invaluable.

But if not allocating from the GC heap, where does one get the memory? Still in the experimental package of the standard library, std.experimental.allocator provides building blocks for composing allocators that should satisfy any and all memory allocation needs where the GC is deemed inappropriate. Better still, via the IAllocator interface, one can even switch between GC and custom allocation strategies as needed at runtime.

I’ve recently used std.experimental.allocator in order to achieve @nogc guarantees and, while it works, there’s one area in which the experience wasn’t as smooth as when using C++ or Rust: disposing of memory. Like C++ and Rust, D has RAII. As is usual in all three, it’s considered bad form to explicitly release resources. And yet, in the current state of affairs, while using the D standard library one has to manually dispose of memory if using std.experimental.allocator. D makes it easier than most languages that support exceptions, due to scope(exit), but in a language with RAII that’s just boilerplate. And as the good lazy programmer that I am, I abhor writing code that doesn’t need to be, and shouldn’t be, written. An itch developed.

The inspiration for the solution I came up with was C++; ever since C++11 I’ve been delighted with using std::unique_ptr and std::shared_ptr and basically never again worrying about manually managing memory. D’s standard library has Unique and RefCounted in std.typecons but they predate std.experimental.allocator and so “bake in” the allocation strategy. Can we have our allocation cake and eat it too?

Enter automem, a library I wrote providing C++-style smart pointers that integrate with std.experimental.allocator. It was clear to me that the design had to be different from the smart pointers it took inspiration from. In C++, it’s assumed that memory is allocated with new and freed with delete (although it’s possible to override both). With custom allocators and no real obvious default choice, I made it so that the smart pointers would allocate memory themselves. This makes it so one can’t allocate with one allocator and deallocate with a different one, which is another benefit.

Another goal was to preserve the possibility of Unique, like std::unique_ptr, to be a zero-cost abstraction. In that sense the allocator type must be specified (it defaults to IAllocator); if it’s a value type with no state, then it takes up no space. In fact, if it’s a singleton (determined at compile-time by probing where Allocator.instance exists), then it doesn’t even need to be passed in to the constructor! As in much modern D code, Design by Instropection pays its dues here. Example code:

struct Point {
    int x;
    int y;
}

{
    // must pass arguments to initialise the contained object
    // but not an allocator instance since Mallocator is
    // a singleton (Mallocator.instance) returns the only
    // instantiation
    
    auto u1 = Unique!(Point, Mallocator)(2, 3);
    assert(*u1 == Point(2, 3));
    assert(u1.y == 3); // forwards to the contained object

    // auto u2 = u1; // won't compile, can only move
    typeof(u1) u2;
    move(u1, u2);
    assert(cast(bool)u1 == false); // u1 is now empty
}
// memory freed for the Point structure created in the block

RefCounted is automem’s equivalent of C++’s std::shared_ptr. Unlike std::shared_ptr however, it doesn’t always do an atomic reference count increment/decrement. The reason is that it leverage’s D’s type system to determine when it has to; if the payload is shared, then the reference count is changed atomically. If not, it can’t be sent to other threads anyway and the performance penalty doesn’t have to be paid. C++ always does an atomic increment/decrement. Rust gets around this with two types, Arc and Rc. In D the type system disambiguates. Another win for Design by Introspection, something that really is only possible in D. Example code:

{
    auto s1 = RefCounted!(Point, Mallocator)(4, 5);
    assert(*s1 == Point(4, 5));
    assert(s1.x == 4);
    {
        auto s2 = s1; // can be copied, non-atomic reference count
    } // ref count goes to 1 here

} // ref count goes to 0 here, memory released

Given that the allocator type is usually specified, it means that when using a @nogc allocator (most of them), the code using automem can itself be made @nogc, with RAII taking care of any memory management duties. That means compile-time guarantees of no GC allocation for the applications that need them.

I hope automem and std.experimental.allocator manage to solve D’s GC framing problem. Now it should be possible to write @nogc code with no manual memory disposal in D, just as it is in C++ and Rust.

Don’t Fear the Reaper

D, like many other programming languages in active use today, comes with a garbage collector out of the box. There are many types of software that can be written without worrying at all about the GC, taking full advantage of its benefits. But the GC does have drawbacks, and there are certainly scenarios in which garbage collection is undesirable. For those situations, the language provides the means to temporarily disable it, or even avoid it completely.

In order to maximize the positive impacts of garbage collection and minimize any negative, it’s necessary to have a good grounding in how the GC works in D. A good place to start is the Garbage Collection page at dlang.org, which outlines the rationale for D’s GC and provides some tips on working with it. This post is the first in a series intended to expand on the information provided on that page.

This time, we’ll look at the very basics, focusing on the language features that can trigger GC allocations. Future posts will introduce ways to disable the GC when necessary, as well as idioms useful in dealing with its nondeterministic nature (e.g. managing resources in the destructors of GC-managed objects).

The first thing to understand about D’s garbage collector is that it only runs during allocation, and only if there is no memory available to allocate. It isn’t sitting in the background, periodically scanning the heap and collecting garbage. This knowledge is fundamental to writing code that makes efficient use of GC-managed memory. Take the following example:

void main() {
    int[] ints;
    foreach(i; 0..100) {
        ints ~= i;
    }
}

This declares a dynamic array of int, then uses D’s append operator to append the numbers 0 to 99 in a foreach range loop. What isn’t obvious to the untrained eye is that the append operator makes use of the GC heap to allocate space for the values it adds to the array.

DRuntime’s array implementation isn’t a dumb one. In the example, there aren’t going to be one hundred allocations, one for each value. When more memory is needed, the implementation will allocate more space than is requested. In this particular case, it’s possible to determine how many allocations are actually made by making use of the capacity property of D’s dynamic arrays and slices. This returns the total number of elements the array can hold before an allocation is necessary.

void main() {
    import std.stdio : writefln;
    int[] ints;
    size_t before, after;
    foreach(i; 0..100) {
        before = ints.capacity;
        ints ~= i;
        after = ints.capacity;
        if(before != after) {
            writefln("Before: %s After: %s",
                before, after);
        }
    }
}

Executing this when compiled with DMD 2.073.2 shows the message is printed six times, meaning there were six total GC allocations in the loop. That means there were six opportunities for the GC to collect garbage. In this small example, it almost certainly didn’t. If this loop were part of a larger program, with GC allocations throughout, it very well might.

On a side note, it’s informative to examine the values of before and after. Doing so shows a sequence of 0, 3, 7, 15, 31, 63, and 127. So by the end, ints contains 100 values and has space for appending 27 more before the next allocation, which, extrapolating from the values in the sequence, should be 255. That’s an implementation detail of DRuntime, though, and could be tweaked or changed in any release. For more details on how arrays and slices are managed by the GC, take a look at Steven Schveighoffer’s excellent article on the topic.

So, six allocations, six opportunities for the GC to initiate one of its pauses of unpredictable length even in that simple little loop. In the general case, that could become an issue depending on if the loop is in a hot part of code and how much total memory is allocated from the GC heap. But even then, it’s not necessarily a reason to disable the GC in that part of the code.

Even with languages that don’t come with a stock GC out of the box, like C and C++, most programmers learn at some point that it’s better for overall performance to allocate as much as possible up front and minimize allocations in the inner loops. It’s one of the many types of premature optimization that are not actually the root of all evil, something we tend to call best practice. Given that D’s GC only runs when memory is allocated, the same strategy can be applied as a simple way to mitigate any potential impact it could have on performance. Here’s one way to rewrite the example:

void main() {
    int[] ints = new int[](100);
    foreach(i; 0..100) {
        ints[i] = i;
    }
}

Now we’ve gone from six allocations down to one. The only opportunity the GC has to run is before the inner loop. This actually allocates space for at least 100 values and initializes them all to 0 before entering the loop. The array will have a length of 100 after new, but will almost certainly have additional capacity.

There’s an alternative to new for arrays: the reserve function:

void main() {
    int[] ints;
    ints.reserve(100);
    foreach(i; 0..100) {
        ints ~= i;
    }
}

This allocates memory for at least 100 values, but the array is still empty (its length property will return 0) when it returns, so nothing is default initialized. Given that the loop only appends 100 values, it’s guaranteed not to allocate.

In addition to new and reserve, it’s possible to call GC.malloc directly for explicit allocation.

import core.memory;
void* intsPtr = GC.malloc(int.sizeof * 100);
auto ints = (cast(int*)intsPtr)[0 .. 100];

Array literals will usually allocate.

auto ints = [0, 1, 2];

This is also true when an array literal enum is used.

enum intsLiteral = [0, 1, 2];
auto ints1 = intsLiteral;
auto ints2 = intsLiteral;

An enum exists only at compile time and has no memory address. Its name is a synonym for its value. Everywhere you use one, it’s like copying and pasting its value in place of its name. Both ints1 and ints2 trigger allocations exactly as if they were declared like so:

auto ints1 = [0, 1, 2];
auto ints2 = [0, 1, 2];

Array literals do not allocate when the target is a static array. Also, string literals (strings in D are arrays under the hood) are an exception to the rule.

int[3] noAlloc1 = [0, 1, 2];
auto noAlloc2 = "No Allocation!";

The concatenate operator will always allocate:

auto a1 = [0, 1, 2];
auto a2 = [3, 4, 5];
auto a3 = a1 ~ a2;

D’s associative arrays have their own allocation strategy, but you can expect them to allocate when items are added and potentially when removed. They also expose two properties, keys and values, which both allocate arrays and fill them with copies of the respective items. When its desirable to modify the underlying associative array during iteration, or when the items need to be sorted or otherwise manipulated independently of the associative array, these properties are just what the doctor ordered. Otherwise, they’re needless allocations that put undue pressure on the GC.

When the GC does run, the total amount of memory it needs to scan is going to determine how long it takes. The smaller, the better. Avoiding unnecessary allocations isn’t going to hurt anyone and is another good mitigation strategy. D’s associative arrays provide three properties that help do just that: byKey, byValue, and byKeyValue. These each return forward ranges that can be iterated lazily. They do not allocate because they actually refer to the items in the associative array, so it should not be modified while iterating them. For more on ranges, see the chapters titled Ranges and More Ranges in Ali Çehreli’s Programming in D.

Closures, which are delegates or function literals that need to carry around a pointer to the local stack frame, may also allocate. The last allocating language feature listed on the Garbage Collection page is the assertion. An assertion will allocate when it fails because it needs to throw an AssertError, which is part of D’s class-based exception hierarchy (we’ll look at how classes interact with the GC in a future post).

Then there’s Phobos, D’s standard library. Once upon a time, much of Phobos was implemented with little concern for GC allocations, making it difficult to use in situations where they were undesirable. However, a massive effort was initiated
to make it more conservative in its GC usage. Some functions were made to work with lazy ranges, others were rewritten to take buffers supplied by the caller, and some were reimplemented to avoid unnecessary allocations internally. The result is a standard library more amenable to GC-free code (though there are still probably some corners of the library that have not yet been renovated — PRs welcome).

Now that we’ve seen the basics of using the GC, the next post in this series looks at the tools the language and the compiler provide for turning the GC off and making sure specific sections of code are GC-free.

Thanks to Guillaume Piolat and Steven Schveighoffer for their help with this article.