Life in the Fast Lane

The first post I wrote in the GC series introduced the D garbage collector and the language features that use it. Two key points that I tried to get across in the article were:

The GC can only run when memory allocations are requested. Contrary to popular misconception, the D GC isn’t generally going to decide to pause your Minecraft clone in the middle of the hot path. It will only run when memory from the GC heap is requested, and then only if it needs to.
Simple C and C++ allocation strategies can mitigate GC pressure. Don’t allocate memory in inner loops – preallocate as much as possible, or fetch it from the stack instead. Minimize the total number of heap allocations. These strategies work because of point #1 above. The programmer can dictate when it is possible for a collection to occur simply by being smart about when GC heap allocations are made.

The strategies in point #2 are fine for code that a programmer writes herself, but they aren’t going to help at all with third-party libraries. For those situations, D provides built-in mechanisms to guarantee that no GC allocations can occur, both in the language and the runtime. There are also command-line options that can help make sure the GC stays out of the way.

Let’s imagine a hypothetical programmer named J.P. who, for reasons he considers valid, has decided he would like to avoid garbage collection completely in his D program. He has two immediate options.

The GC chill pill

One option is to make a call to GC.disable when the program is starting up. This doesn’t stop allocations, but puts a hold on collections. That means all collections, including any that may result from allocations in other threads.

void main() {
    import core.memory;
    import std.stdio;
    GC.disable;
    writeln("Goodbye, GC!");
}

Output:

Goodbye, GC!

This has the benefit that all language features making use of the GC heap will still work as expected. But, considering that allocations are still going without any cleanup, when you do the math you’ll realize this might be problematic. If allocations start to get out of hand, something’s gotta give. From the documentation:

Collections may continue to occur in instances where the implementation deems necessary for correct program behavior, such as during an out of memory condition.

Depending on J.P.’s perspective, this might not be a good thing. But if this constraint is acceptable, there are some additional steps that can help keep things under control. J.P. can make calls to GC.enable or GC.collect as necessary. This provides greater control over collection cycles than the simple C and C++ allocation strategies.

The GC wall

When the GC is simply intolerable, J.P. can turn to the @nogc attribute. Slap it at the front of the main function and thou shalt suffer no collections.

@nogc
void main() { ... }

This is the ultimate GC mitigation strategy. @nogc applied to main will guarantee that the garbage collector will never run anywhere further along the callstack. No more caveats about collecting “where the implementation deems necessary”.

At first blush, this may appear to be a much better option than GC.disable. Let’s try it out.

@nogc
void main() {
    import std.stdio;
    writeln("GC be gone!");
}

This time, we aren’t going to get past compilation:

Error: @nogc function 'D main' cannot call non-@nogc function 'std.stdio.writeln!string.writeln'

What makes @nogc tick is the compiler’s ability to enforce it. It’s a very blunt approach. If a function is annotated with @nogc, then any function called from inside it must also be annotated with @nogc. As may be obvious, writeln is not.

That’s not all:

@nogc 
void main() {
    auto ints = new int[](100);
}

The compiler isn’t going to let you get away with that one either.

Error: cannot use 'new' in @nogc function 'D main'

Any language feature that allocates from the GC heap is out of reach inside a function marked @nogc (refer to the first post in this series for an overview of those features). It’s turtles all the way down. The big benefit here is that it guarantees that third-party code can’t use those features either, so can’t be allocating GC memory behind your back. Another downside is that any third-party library that is not @nogc aware is not going to be available in your program.

Using this approach requires a number of workarounds to make up for non-@nogc language features and library functions, including several in the standard library. Some are trivial, some are not, and others can’t be worked around at all (we’ll dive into the details in a future post). One example that might not be obvious is throwing an exception. The idiomatic way is:

throw new Exception("Blah");

Because of the new in that line, this isn’t possible in @nogc functions. Getting around this requires preallocating any exceptions that will be thrown, which in turn runs into the issue that any exception memory allocated from the regular heap still needs to be deallocated, which leads to ideas of reference counting or stack allocation… In other words, it’s a big can of worms. There’s currently a D Improvement Proposal from Walter Bright intended to stuff all the worms back into the can by making throw new Exception work without the GC when it needs to.

It’s not an insurmountable task to get around the limitations of @nogc main, it just requires a good bit of motivation and dedication.

One more thing to note about @nogc main is that it doesn’t banish the GC from the program completely. D has support for static constructors and destructors. The former are executed by the runtime before entering main and the latter upon exiting. If any of these exist in the program and are not annotated with @nogc, then GC allocations and collections can technically be present in the program. Still, @nogc applied to main means there won’t be any collections running once main is entered, so it’s effectively the same as having no GC at all.

Working it out

Here’s where I’m going to offer an opinion. There’s a wide range of programs that can be written in D without disabling or cutting the GC off completely. The simple strategies of minimizing GC allocations and keeping them out of the hot path will get a lot of mileage and should be preferred. It can’t be repeated enough given how often it’s misunderstood: D’s GC will only have a chance to run when the programmer allocates GC memory and it will only run if it needs to. Use that knowledge to your advantage by keeping the allocations small, infrequent, and isolated outside your inner loops.

For those programs where more control is actually needed, it probably isn’t going to be necessary to avoid the GC entirely. Judicious use of @nogc and/or the core.memory.GC API can often serve to avoid any performance issues that may arise. Don’t put @nogc on main, put it on the functions where you really want to disallow GC allocations. Don’t call GC.disable at the beginning of the program. Call it instead before entering a critical path, then call GC.enable when leaving that path. Force collections at strategic points, such as between game levels, with GC.collect.

As with any performance tuning strategy in software development, it pays to understand as fully as possible what’s actually happening under the hood. Adding calls to the core.memory.GC API in places where you think they make sense could potentially make the GC do needless work, or have no impact at all. Better understanding can be achieved with a little help from the toolchain.

The DRuntime GC option --DRT-gcopt=profile:1 can be passed to a compiled program (not to the compiler!) for some tune-up assistance. This will report some useful GC profiling data, such as the total number of collections and the total collection time.

To demonstrate, gcstat.d appends twenty values to a dynamic array of integers.

void main() {
    import std.stdio;
    int[] ints;
    foreach(i; 0 .. 20) {
        ints ~= i;
    }
    writeln(ints);
}

Compiling and running with the GC profile switch:

dmd gcstat.d
gcstat --DRT-gcopt=profile:1
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
        Number of collections:  1
        Total GC prep time:  0 milliseconds
        Total mark time:  0 milliseconds
        Total sweep time:  0 milliseconds
        Total page recovery time:  0 milliseconds
        Max Pause Time:  0 milliseconds
        Grand total GC time:  0 milliseconds
GC summary:    1 MB,    1 GC    0 ms, Pauses    0 ms <    0 ms

This reports one collection, which almost certainly happened as the program was shutting down. The runtime terminates the GC as it exits which, in the current implementation, will generally trigger a collection. This is done primarily to run destructors on collected objects, even though D does not require destructors of GC-allocated objects to ever be run (a topic for a future post).

DMD supports a command-line option, -vgc, that will display every GC allocation in a program, including those that are hidden behind language features like the array append operator.

To demonstrate, take a look at inner.d:

void printInts(int[] delegate() dg)
{
    import std.stdio;
    foreach(i; dg()) writeln(i);
} 

void main() {
    int[] ints;
    auto makeInts() {
        foreach(i; 0 .. 20) {
            ints ~= i;
        }
        return ints;
    }

    printInts(&makeInts);
}

Here, makeInts is an inner function. A pointer to a non-static inner function is not a function pointer, but a delegate (a context pointer/function pointer pair; if an inner function is static, a pointer of type function is produced instead). In this particular case, the delegate makes use of a variable in its parent scope. Here’s the output of compiling with -vgc:

dmd -vgc inner.d
inner.d(11): vgc: operator ~= may cause GC allocation
inner.d(7): vgc: using closure causes GC allocation

What we’re seeing here is that memory needs to be allocated so that the delegate can carry the state of ints, making it a closure (which is not itself a type – the type is still delegate). Move the declaration of ints inside the scope of makeInts and recompile. You’ll find that the closure allocation goes away. A better option is to change the declaration of printInts to look like this:

void printInts(scope int[] delegate() dg)

Adding scope to any function parameter ensures that any references in the parameter cannot be escaped. In other words, it now becomes impossible to do something like assign dg to a global variable, or return it from the function. The effect is that there is no longer a need to create a closure, so there will be no allocation. See the documentation for more on function pointers, delegates and closures, and function parameter storage classes.

The gist

Given that the D GC is very different from those in languages like Java and C#, it’s certain to have different performance characteristics. Moreover, D programs tend to produce far less garbage than those written in a language like Java, where almost everything is a reference type. It helps to understand this when embarking on a D project for the first time. The strategies an experienced Java programmer uses to mitigate the impact of collections aren’t likley to apply here.

While there is certainly a class of software in which no GC pauses are ever acceptable, that is an arguably small set. Most D projects can, and should, start out with the simple mitigation strategies from point #2 at the top of this article, then adapt the code to use @nogc or core.memory.GC as and when performance dictates. The command-line options demonstrated here can help ferret out the areas where that may be necessary.

As time goes by, it’s going to become easier to micromanage garbage collection in D programs. There’s a concerted effort underway to make Phobos, D’s standard library, as @nogc-friendly as possible. Language improvements such as Walter’s proposal to modify how exceptions are allocated should speed that work considerably.

Future posts in this series will look at how to allocate memory outside of the GC heap and use it alongside GC allocations in the same program, how to compensate for disabled language features in @nogc code, strategies for handling the interaction of the GC with object destructors, and more.

Thanks to Vladimir Panteleev, Guillaume Piolat, and Steven Schveighoffer for their valuable feedback on drafts of this article.

The article has been amended to remove a misleading line about Java and C#, and to add some information about multiple threads.

9 thoughts on “Life in the Fast Lane”

Jo Hick June 17, 2017 at 8:46 am

You should mention that in some cases we lose even more power. Equality comparisons and OpEqual have problems with @nogc because the base implementation is @gc. We cannot make a class @nogc and override OpEquals that uses the built in object comparer because it is @gc. We cannot also disable the gc completely because we will eventually run out of memory, but if we enable it, a large collection might occur. If we cannot control that(say, a real time app), it effectively becomes useless(better to try to minimize GC allocations but allow it to run often to amortize the effect.

These types of problems creep up constantly with @nogc because it was added on as an afterthought(relatively recently).
1. Michael Parker Post authorJune 17, 2017 at 8:58 am
  
  There are more posts coming in this series that will explore a range of GC-related topics, including issues like the one you mention. These first few posts I’m writing are introductory, so I’m trying to focus on big picture stuff without getting lost in the details. Later posts will dig deeper.
Kotet June 17, 2017 at 9:56 pm

Can I publish a Japanese translation of this article?
1. Michael Parker Post authorJune 18, 2017 at 1:08 am
  
  Yes, you can. You can publish translations of any post on this blog. You don’t need to ask permission for each one 🙂
  1. Kotet June 18, 2017 at 2:13 am
    
    Ok, I won’t ask permission in future.
Kotet June 26, 2017 at 2:16 am

I translated this article to Japanese!
https://kotet.github.io/2017/06/26/life-in-the-fast-lane.html
1. Michael Parker Post authorJune 26, 2017 at 5:03 am
  
  Awesome! Thanks for letting us know.
Chris Katko August 5, 2017 at 7:42 am

What’s really frustrating from an outsider’s perspective (who is eager to learn and harness the power of D in his applications), I still don’t feel like I understand what’s going on.

Allow me to elaborate.

When you program something, you have to keep a set of ideas in your head. Assumptions/requirements/contraints/etc. You have to know the difference between stack and heap variables, value and reference types, and so on. The thing is, the more balls you juggle in the air, the more likely you’re going to eventually miss something and drop one of those balls… and some people simply cannot physically keep track of more than X ideas at once.

So my issue with D is, the GC not only adds to the list of ideas, but MULTIPLIES the list of ideas. Now I have to think “will GC affect stack.”, “will GC affect an const, heap-allocated, inherited class, method in thread 2?”. And even worse, GC adds the TEMPORAL element to it. The GC may run and pause your thread when ANY THREAD decides to allocate (ala use a string or array). So your one function may look perfect, then some 3rd-party library calls a thread and blows the determinism to hell.

C programmers (and many C++ programmers) want to feel confident that their code will act in a predictable way. And so far, it still seems like we have to jump through tons of hoops and read many articles just to get a grasp for WHY things are happening, and we’re still stuck with solutions that “mostly work in X situations but not Y” (another thing to remember while coding!).

If I can’t succinctly explain to someone how D’s GC works in 60 seconds, or a ~3 paragraph forum post, most people simply aren’t going to be persuaded.

I get that “most people won’t even notice GC problems.” But that really needs to be provable somehow instead of just saying “take my word for it.” If Facebook/Netflix/big project comes out and says “Yeah, we never really had problems with D’s GC.” that would be GREAT and you could simply post that link.

But it seems like over-and-over GC is the biggest issue that pops up and the “solutions” to GC seem to explode the amount of complexity. “Did I remember to unlock my thread from the garbage collector?”

And one of the problems is still, even if I know I can disable the GC (or keep it), WHEN? WHEN do choose? Is simply keeping tight-loops non-allocated fast enough in most cases? Or should I plan ahead and have X/Y/Z already using static pools to avoid the GC. I don’t know. It seems like the only solution is “benchmark it.” But that means I’m basically writing TWO versions of my same program in the off-chance the GC is a problem and I disable that version. I can’t feel confident about when and where to pick GC or noGC.

Surely that comes with time and experience. Perhaps I’m nitpicking and C# people wouldn’t even care because they have no choice at all.

That’s just my stream-of-consciousness on the matter as I’ve been looking into this on-and-off for at least a year or two now and I still don’t feel “confident” that I’ll be making the right choice without coding twice and testing it.

Either way, thanks for these articles. They’ve definitely one of the top article series I’m following relating to D.
1. Michael Parker Post authorAugust 5, 2017 at 11:41 am
  
  Hi, Chris. Thanks for commenting. I stand behind my position that most people can just pretend D’s GC isn’t there. It’s hard to say if or when it will be a problem for any given program without profiling and testing. Some programs may experience little hiccups here and there, but I’m convinced that the set of programs that must disable it is tiny. From that perspective, there is little of the complexity you are concerned about, just the two points I keep reiterating: minimize the size and the total number of allocations as you can to relieve the pressure. That’s it.
  
  When profiling shows that there are issues caused by the GC, then you need to know the details. But going beyond the basics into the realm of optimization increases the complexity of what you need to know in any language. The reason I’ve always preferred C and Java over C++ is because of the level of complexity C++ requires to really know how to use it effectively. I’m not interested in investing the time to learn it. D is nowhere near that complex, but it does have a few dark corners and some arcane incantations that you need to know if you want to squeeze the most out of it.
  
  This series serves two purposes. One, some people start reflexively gagging when they hear the term “garbage collector”. This means they immediately start trying to figure out how to avoid it before they ever know if they need to. It’s the penultimate example of the “premature optimization” mantra. Ultimately, I’d like this series to show that D’s GC ain’t the evil beast some people think it is. Second, I hope to provide a starting point for those who do want to dig into the details and take more control over their memory management.
  
  Personally, for my own D code I never worry about it. There isn’t a single @nogc, or call to an external allocator, anywhere in my current project’s codebase (a 2D game engine I’ll be open sourcing with a game at some point). I’ve embraced the GC fully. I decided long ago that mucking around with @nogc just isn’t worth it for the kind of programs I want to write. I’ll be blogging about my GC usage once the code is out there.
  
  So yeah, until we have people out there writing about their firsthand D GC experiences, there’s little I can show to back up my claim that it isn’t going to matter for most people. However, I recently published a post highlighting Funkwerk’s passenger information system, which is actively running on several long-distance and local train platforms in Europe. Mario Kröplin said, “With garbage collection, changing to D is easy enough for both C++ and Java programmers.” So they’ve embraced it. The only issue regarding GC that came up in our discussion about problems they ran into: GC signals were interrupting their blocking socket reads. They worked around it and moved on. They were more concerned about the limitations of D’s unittest blocks. I’m sure if they had seen any major problems with the GC, he would have told me.
  
  At any rate, I’m hoping to include perspectives from other people in the GC series, including posts from people who use it fully and people who have needed to minimize its use. It’s still in early days yet.

Comments are closed.

The D Blog

The official blog for the D Programming Language.

Life in the Fast Lane

The GC chill pill

The GC wall

Working it out

The gist

9 thoughts on “Life in the Fast Lane”