{"id":3128,"date":"2023-01-05T12:03:51","date_gmt":"2023-01-05T12:03:51","guid":{"rendered":"https:\/\/dlang.org\/blog\/?p=3128"},"modified":"2023-07-11T16:48:48","modified_gmt":"2023-07-11T16:48:48","slug":"memory-safety-in-a-systems-programming-language-part-3","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2023\/01\/05\/memory-safety-in-a-systems-programming-language-part-3\/","title":{"rendered":"Memory Safety in a Systems Programming Language Part 3"},"content":{"rendered":"<p><img loading=\"lazy\" src=\"https:\/\/dlang.org\/blog\/wp-content\/uploads\/2019\/03\/brain02.png\" alt=\"\" width=\"200\" height=\"200\" class=\"alignleft size-full wp-image-2025\" \/><\/p>\n<p><a href=\"https:\/\/dlang.org\/blog\/2022\/06\/21\/dip1000-memory-safety-in-a-modern-system-programming-language-pt-1\/\">The first entry in this series<\/a> shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. <a href=\"https:\/\/dlang.org\/blog\/2022\/10\/08\/dip1000-memory-safety-in-a-modern-systems-programming-language-part-2\/\">The second entry in this series<\/a> teaches about the <code>ref<\/code> storage class and how DIP1000 works with aggregate types (classes, structs, and unions).<\/p>\n<p>So far the series has deliberately avoided templates and <code>auto<\/code> functions. This kept the first two posts simpler in that they did not have to deal with function attribute inference, which I have referred to as &#8220;attribute auto inference&#8221; in earlier posts. However, both <code>auto<\/code> functions and templates are very common in D code, so a series on DIP1000 can&#8217;t be complete without explaining how those features work with the language changes. Function attribute inference is our most important tool in avoiding so-called &#8220;attribute soup&#8221;, where a function is decorated with several attributes, which arguably decreases readability.<\/p>\n<p>We will also dig deeper into unsafe code. The previous two posts in this series focused on the <code>scope<\/code> attribute, but this post is more focused on attributes and memory safety in general. Since DIP1000 is ultimately about memory safety, we can&#8217;t get around discussing those topics.<\/p>\n<h2 id=\"avoidingrepetitionwithattributes\">Avoiding repetition with attributes<\/h2>\n<p><a href=\"https:\/\/dlang.org\/spec\/function.html#function-attribute-inference\">Function attribute inference<\/a> means that the language will analyze the body of a function and will automatically add the <code>@safe<\/code>, <code>pure<\/code>, <code>nothrow<\/code>, and <code>@nogc<\/code> attributes where applicable. It will also attempt to add <code>scope<\/code> or <code>return scope<\/code> attributes to parameters and <code>return ref<\/code> to <code>ref<\/code> parameters that can&#8217;t otherwise be compiled. Some attributes are never inferred. For instance, the compiler will not insert any <code>ref<\/code>, <code>lazy<\/code>, <code>out<\/code> or <code>@trusted<\/code> attributes, because very likely they are explicitly not wanted where they are left out.<\/p>\n<p>There are many ways to turn on function attribute inference. One is by omitting the return type in the function signature. Note that the <code>auto<\/code> keyword is not required for this. <code>auto<\/code> is a placeholder keyword used when no return type, storage class, or attribute is specified. For example, the declaration <code>half(int x) { return x\/2; }<\/code> does not parse, so we use <code>auto half(int x) { return x\/2; }<\/code> instead. But we could just as well write <code>@safe half(int x) { return x\/2; }<\/code> and the rest of the attributes (<code>pure<\/code>, <code>nothrow<\/code>, and <code>@nogc<\/code>) will be inferred just as they are with the <code>auto<\/code> keyword.<\/p>\n<p>The second way to enable attribute inference is to templatize the function. With our <code>half<\/code> example, it can be done this way:<\/p>\n<pre class=\"prettyprint lang-d\">int divide(int denominator)(int x) { return x\/denominator; }\nalias half = divide!2;<\/pre>\n<p>The D spec does not say that a template must have any parameters. An empty parameter list can be used to turn attribute inference on: <code>int half()(int x) { return x\/2; }<\/code>. Calling this function doesn&#8217;t even require the template instantiation syntax at the call site, e.g., <code>half!()(12)<\/code> is not required as <code>half(12)<\/code> will compile.<\/p>\n<p>Another means to turn on attribute inference is to store the function inside another function. <a href=\"https:\/\/dlang.org\/spec\/function.html#nested\">These are called nested functions<\/a>. Inference is enabled not only on functions nested directly inside another function but also on most things nested in a type or a template inside the function. Example:<\/p>\n<pre class=\"prettyprint lang-d\">@safe void parentFun()\n{\n    \/\/ This is auto-inferred.\n    int half(int x){ return x\/2; }\n\n    class NestedType\n    {\n        \/\/ This is auto inferred\n        final int half1(int x) { return x\/2; }\n\n        \/\/ This is not auto inferred; it's a\n        \/\/ virtual function and the compiler\n        \/\/ can't know if it has an unsafe override\n        \/\/ in a derived class.\n        int half2(int x) { return x\/2; }\n    }\n\n    int a = half(12); \/\/ Works. Inferred as @safe.\n    auto cl = new NestedType;\n    int b = cl.half1(18); \/\/ Works. Inferred as @safe.\n    int c = cl.half2(26); \/\/ Error.\n}<\/pre>\n<p>A downside of nested functions is that they can only be used in lexical order (the call site must be below the function declaration) unless both the nested function and the call are inside the same struct, class, union, or template that is in turn inside the parent function. Another downside is that they don&#8217;t work with <a href=\"https:\/\/dlang.org\/spec\/function.html#pseudo-member\">Uniform Function Call Syntax<\/a>.<\/p>\n<p>Finally, attribute inference is always enabled for <a href=\"https:\/\/dlang.org\/spec\/expression.html#FunctionLiteral\">function literals<\/a> (a.k.a. lambda functions). The halving function would be defined as <code>enum half = (int x) =&gt; x\/2;<\/code> and called exactly as normal. However, the language does not consider this declaration a function. It considers it a function pointer. This means that in global scope it&#8217;s important to use <code>enum<\/code> or <code>immutable<\/code> instead of <code>auto<\/code>. Otherwise, the lambda can be changed to something else from anywhere in the program and cannot be accessed from <code>pure<\/code> functions. In rare cases, such mutability can be desirable, but most often it is an antipattern (like global variables in general).<\/p>\n<h3 id=\"limitsofinference\">Limits of inference<\/h3>\n<p>Aiming for minimal manual typing isn&#8217;t always wise. Neither is aiming for maximal attribute bloat.<\/p>\n<p>The primary problem of auto inference is that subtle changes in the code can lead to inferred attributes turning on and off in an uncontrolled manner. To see when it matters, we need to have an idea of what will be inferred and what will not.<\/p>\n<p>The compiler in general will go to great lengths to infer <code>@safe<\/code>, <code>pure<\/code>, <code>nothrow<\/code>, and <code>@nogc<\/code> attributes. If your function <em>can<\/em> have those, it almost always will. The specification says that recursion is an exception: a function calling itself should not be <code>@safe<\/code>, <code>pure<\/code>, or <code>nothrow<\/code> unless explicitly specified as such. But in my testing, I found those attributes actually are inferred for recursive functions. It turns out, there is an ongoing effort to get recursive attribute inference working, and it partially works already.<\/p>\n<p>Inference of <code>scope<\/code> and <code>return<\/code> on function parameters is less reliable. In the most mundane cases, it&#8217;ll work, but the compiler gives up pretty quickly. The smarter the inference engine is, the more time it takes to compile, so the current design decision is to infer those attributes in only the simplest of cases.<\/p>\n<h3 id=\"wheretoletthecompilerinfer\">Where to let the compiler infer?<\/h3>\n<p>A D programmer should get into the habit of asking, &#8220;What will happen if I mistakenly do something that makes this function unsafe, impure, throwing, garbage-collecting, or escaping?&#8221; If the answer is &#8220;immediate compiler error&#8221;, auto inference is probably fine. On the other hand, the answer could be &#8220;user code will break when updating this library I&#8217;m maintaining&#8221;. In that case, annotate manually.<\/p>\n<p>In addition to the potential of losing attributes the author intends to apply, there is also another risk:<\/p>\n<pre class=\"prettyprint lang-d\">@safe pure nothrow @nogc firstNewline(string from)\n{\n    foreach(i; 0 .. from.length) switch(from[i])\n    {\n        case '\\r':\n        if(from.length &gt; i+1 &amp;&amp; from[i+1] == '\\n')\n        {\n            return &quot;\\r\\n&quot;;\n        }\n        else return &quot;\\r&quot;;\n\n        case '\\n': return &quot;\\n&quot;;\n\n        default: break;\n    }\n\n    return &quot;&quot;;\n}\n<\/pre>\n<p>You might think that since the author is manually specifying the attributes, there&#8217;s no problem. Unfortunately, that&#8217;s wrong. Suppose the author decides to rewrite the function such that all the return values are slices of the <code>from<\/code> parameter rather than string literals:<\/p>\n<pre class=\"prettyprint lang-d\">@safe pure nothrow @nogc firstNewline(string from)\n{\n    foreach(i; 0 .. from.length) switch(from[i])\n    {\n        case '\\r':\n        if (from.length &gt; i + 1 &amp;&amp; from[i + 1] == '\\n')\n        {\n            return from[i .. i + 2];\n        }\n        else return from[i .. i + 1];\n\n        case '\\n': return from[i .. i + 1];\n\n        default: break;\n    }\n\n    return &quot;&quot;;\n}<\/pre>\n<p>Surprise! The parameter <code>from<\/code> was previously inferred as <code>scope<\/code>, and a library user was relying on that, but now it&#8217;s inferred as <code>return scope<\/code> instead, breaking client code.<\/p>\n<p>Still, for internal functions, auto inference is a great way to save both our fingers when writing and our eyes when reading. Note that it&#8217;s perfectly fine to rely on auto inference of the <code>@safe<\/code> attribute as long as the function is used in explicitly in <code>@safe<\/code> functions or unit tests. If something potentially unsafe is done inside the auto-inferred function, it gets inferred as <code>@system<\/code>, not <code>@trusted<\/code>. Calling a <code>@system<\/code> function from a <code>@safe<\/code> function results in a compiler error, meaning auto inference is safe to rely on in this case.<\/p>\n<p>It still sometimes makes sense to manually apply attributes to internal functions, because the error messages generated when they are violated tend to be better with manual attributes.<\/p>\n<h3 id=\"whatabouttemplates\">What about templates?<\/h3>\n<p>Auto inference is always enabled for templated functions. What if a library interface needs to expose one? There is a way to block the inference, albeit an ugly one:<\/p>\n<pre class=\"prettyprint lang-d\">private template FunContainer(T)\n{\n    \/\/ Not auto inferred\n    \/\/ (only eponymous template functions are)\n    @safe T fun(T arg){return arg + 3;}\n}\n\n\/\/ Auto inferred per se, but since the function it calls\n\/\/ is not, only @safe is inferred.\nauto addThree(T)(T arg){return FunContainer!T.fun(arg);}<\/pre>\n<p>However, which attributes a template should have often depends on its compile-time parameters. It would be possible to use metaprogramming to designate attributes depending on the template parameters, but that would be a lot of work, hard to read, and easily as error-prone as relying on auto inference.<\/p>\n<p>It&#8217;s more practical to just test that the function template infers the wanted attributes. Such testing doesn&#8217;t have to, and probably shouldn&#8217;t, be done manually each time the function is changed. Instead:<\/p>\n<pre class=\"prettyprint lang-d\">float multiplyResults(alias fun)(float[] arr)\n    if (is(typeof(fun(new float)) : float))\n{\n    float result = 1.0f;\n    foreach (ref e; arr) result *= fun(&amp;e);\n    return result;\n}\n\n@safe pure nothrow unittest\n{\n    float fun(float* x){return *x+1;}\n    \/\/ Using a static array makes sure\n    \/\/ arr argument is inferred as scope or\n    \/\/ return scope\n    float[5] elements = [1.0f, 2.0f, 3.0f, 4.0f, 5.0f];\n\n    \/\/ No need to actually do anything with\n    \/\/ the result. The idea is that since this\n    \/\/ compiles, multiplyResults is proven @safe\n    \/\/ pure nothrow, and its argument is scope or\n    \/\/ return scope.\n    multiplyResults!fun(elements);\n}<\/pre>\n<p>Thanks to D&#8217;s compile-time introspection powers, testing against unwanted attributes is also covered:<\/p>\n<pre class=\"prettyprint lang-d\">@safe unittest\n{\n    import std.traits : attr = FunctionAttribute,\n        functionAttributes, isSafe;\n\n    float fun(float* x)\n    {\n        \/\/ Makes the function both throwing\n        \/\/ and garbage collector dependant.\n        if (*x &gt; 5) throw new Exception(&quot;&quot;);\n        static float* impureVar;\n\n        \/\/ Makes the function impure.\n        auto result = impureVar? *impureVar: 5;\n\n        \/\/ Makes the argument unscoped.\n        impureVar = x;\n        return result;\n    }\n\n    enum attrs = functionAttributes!(multiplyResults!fun);\n\n    assert(!(attrs &amp; attr.nothrow_));\n    assert(!(attrs &amp; attr.nogc));\n\n    \/\/ Checks against accepting scope arguments.\n    \/\/ Note that this check does not work with\n    \/\/ @system functions.\n    assert(!isSafe!(\n    {\n        float[5] stackFloats;\n        multiplyResults!fun(stackFloats[]);\n    }));\n\n    \/\/ It's a good idea to do positive tests with\n    \/\/ similar methods to make sure the tests above\n    \/\/ would fail if the tested function had the\n    \/\/ wrong attributes.\n    assert(attrs &amp; attr.safe);\n    assert(isSafe!(\n    {\n        float[] heapFloats;\n        multiplyResults!fun(heapFloats[]);\n    }));\n}<\/pre>\n<p>If assertion failures are wanted at compile time before the unit tests are run, adding the <code>static<\/code> keyword before each of those <code>assert<\/code>s will get the job done. Those compiler errors can even be had in non-unittest builds by converting that unit test to a regular function, e.g., by replacing <code>@safe unittest<\/code> with, say, <code>private @safe testAttrs()<\/code>.<\/p>\n<h2 id=\"thelivefireexercise:system\">The live fire exercise: @system<\/h2>\n<p>Let&#8217;s not forget that D is a systems programming language. As this series has shown, in most D code the programmer is well protected from memory errors, but D would not be D if it didn&#8217;t allow going low-level and bypassing the type system in the same manner as C or C++: bit arithmetic on pointers, writing and reading directly to hardware ports, executing a <code>struct<\/code> destructor on a raw byte blob&#8230; D is designed to do all of that.<\/p>\n<p>The difference is that in C and C++ it takes only one mistake to break the type system and <a href=\"https:\/\/blog.regehr.org\/archives\/213\">cause undefined behavior<\/a> anywhere in the code. A D programmer is only at risk when not in a <code>@safe<\/code> function, or when using dangerous compiler switches such as <code>-release<\/code> or <code>-check=assert=off<\/code> (failing a disabled assertion is undefined behavior), and even then the semantics tend to be less UB-prone. For example:<\/p>\n<pre class=\"prettyprint lang-d\">float cube(float arg)\n{\n    float result;\n    result *= arg;\n    result *= arg;\n    return result;\n}<\/pre>\n<p>This is a language-agnostic function that compiles in C, C++, and D. Someone intended to calculate the cube of <code>arg<\/code> but forgot to initialize <code>result<\/code> with <code>arg<\/code>. In D, nothing dangerous happens despite this being a <code>@system<\/code> function. No initialization value means <code>result<\/code> is default initialized to <code>NaN<\/code> (not-a-number), which leads to the result also being <code>NaN<\/code>, which is a glaringly obvious &#8220;error&#8221; value when using this function the first time.<\/p>\n<p>However, in C and C++, not initializing a local variable means reading it is (sans a few narrow exceptions) <em>undefined behavior<\/em>. This function does not even handle pointers, yet according to the standard, calling this function could just as well have <code>*(int*) rand() = 0XDEADBEEF;<\/code> in it, all due to a trivial mistake. While many compilers with enabled warnings will catch this one, not all do, and these languages are full of similar examples where even warnings don&#8217;t help.<\/p>\n<p>In D, even if you explicitly requested no default initialization with <code>float result = void<\/code>, it&#8217;d just mean the return value of the function is undefined, not anything and everything that happens if the function is called. Consequently, that function could be annotated <code>@safe<\/code> even with such an initializer.<\/p>\n<p>Still, for anyone who cares about memory safety, as they probably should for anything intended for a wide audience, it&#8217;s a bad idea to assume that D <code>@system<\/code> code is safe enough to be the default mode. Two examples will demonstrate what can happen.<\/p>\n<h3 id=\"whatundefinedbehaviorcando\">What undefined behavior can do<\/h3>\n<p>Some people assume that &#8220;Undefined Behavior&#8221; simply means &#8220;erroneous behavior&#8221; or crashing at runtime. While that is often what ultimately happens, undefined behavior is far more dangerous than, say, an uncaught exception or an infinite loop. The difference is that with undefined behavior, you have no guarantees at all about what happens. This might not sound any worse than an infinite loop, but an accidental infinite loop is discovered the first time it&#8217;s entered. Code with undefined behavior, on the other hand, might do what was intended when it&#8217;s tested, but then do something completely different in production. Even if the code is tested with the same flags it&#8217;s compiled with in production, the behavior may change from one compiler version to another, or when making completely unrelated changes to the code. Time for an example:<\/p>\n<pre class=\"prettyprint lang-d\">\/\/ return whether the exception itself is in the array\nbool replaceExceptions(Object[] arr, ref Exception e)\n{\n    bool result;\n    foreach (ref o; arr)\n    {\n        if (&amp;o is &amp;e) result = true;\n        if (cast(Exception) o) o = e;\n    }\n\n    return result;\n}<\/pre>\n<p>The idea here is that the function replaces all exceptions in the array with <code>e<\/code>. If <code>e<\/code> itself is in the array, it returns <code>true<\/code>, otherwise <code>false<\/code>. And indeed, testing confirms it works. The function is used like this:<\/p>\n<pre class=\"prettyprint lang-d\">auto arr = [new Exception(&quot;a&quot;), null, null, new Exception(&quot;c&quot;)];\nauto result = replaceExceptions\n(\n    cast(Object[]) arr,\n    arr[3]\n);<\/pre>\n<p>This cast is not a problem, right? Object references are always of the same size regardless of their type, and we&#8217;re casting the exceptions to the parent type, <code>Object<\/code>. It&#8217;s not like the array contains anything other than object references.<\/p>\n<p>Unfortunately, that&#8217;s not how <a href=\"https:\/\/dlang.org\/spec\/expression.html#AssignExpression\">the D specification views it<\/a>. Having two class references (or any references, for that matter) in the same memory location but with different types, and then assigning one of them to the other, is undefined behavior. That&#8217;s exactly what happens in<\/p>\n<pre class=\"prettyprint lang-d\">if (cast(Exception) o) o = e;<\/pre>\n<p>if the array does contain the <code>e<\/code> argument. Since <code>true<\/code> can only be returned when undefined behavior is triggered, it means that any compiler would be free to optimize <code>replaceExceptions<\/code> to always return <code>false<\/code>. This is a dormant bug no amount of testing will find, but that might, years later, completely mess up the application when compiled with the powerful optimizations of an advanced compiler.<\/p>\n<p>It may seem that requiring a cast to use a function is an obvious warning sign that a good D programmer would not ignore. I wouldn&#8217;t be so sure. Casts aren&#8217;t that rare even in fine high-level code. Even if you disagree, other cases are provably bad enough to bite anyone. Last summer, this case <a href=\"https:\/\/forum.dlang.org\/thread\/t7qd45$1lrb$1@digitalmars.com\">appeared in the D forums<\/a>:<\/p>\n<pre class=\"prettyprint lang-d\">string foo(in string s)\n{\n    return s;\n}\n\nvoid main()\n{\n    import std.stdio;\n    string[] result;\n    foreach(c; &quot;hello&quot;)\n    {\n        result ~= foo([c]);\n    }\n    writeln(result);\n}<\/pre>\n<p>This problem was encountered by Steven Schveighoffer, a long-time D veteran who has himself <a href=\"https:\/\/dlang.org\/blog\/2016\/09\/28\/how-to-write-trusted-code-in-d\/\">lectured about <code>@safe<\/code> and <code>@system<\/code><\/a> on <a href=\"https:\/\/dconf.org\/2020\/online\/#steven\">more than one occasion<\/a>. Anything that can burn him can burn any of us.<\/p>\n<p>Normally, this works just as one would think and is fine according to the spec. However, if one enables <a href=\"https:\/\/dlang.org\/spec\/function.html#in-params\">another soon-to-be-default language feature<\/a> with the <code>-preview=in<\/code> compiler switch along with DIP1000, the program starts malfunctioning. The old semantics for <code>in<\/code> are the same as <code>const<\/code>, but the new semantics make it <code>const scope<\/code>.<\/p>\n<p>Since the argument of <code>foo<\/code> is <code>scope<\/code>, the compiler assumes that <code>foo<\/code> will copy <code>[c]<\/code> before returning it, or return something else, and therefore it allocates <code>[c]<\/code> on the same stack position for each of the &#8220;hello&#8221; letters. The result is that the program prints <code>[&quot;o&quot;, &quot;o, &quot;o&quot;, &quot;o&quot;, &quot;o&quot;]<\/code>. At least for me, it&#8217;s already somewhat hard to understand what&#8217;s happening in this simple example. Hunting down this sort of bug in a complex codebase could be a nightmare.<\/p>\n<p>(With my nightly DMD version somewhere between 2.100 and 2.101 a compile-time error is printed instead. With 2.100.2, the example runs as described above.)<\/p>\n<p>The fundamental problem in both of these examples is the same: <code>@safe<\/code> is not used. Had it been, both of these undefined behaviors would have resulted in compilation errors (the <code>replaceExceptions<\/code> function itself can be <code>@safe<\/code>, but the cast at the usage site cannot). By now it should be clear that <code>@system<\/code> code should be used sparingly.<\/p>\n<h3 id=\"whentoproceedanyway\">When to proceed anyway<\/h3>\n<p>Sooner or later, though, the time comes when the guard rail has to be temporarily lowered. Here&#8217;s an example of a good use case:<\/p>\n<pre class=\"prettyprint lang-d\">\/\/\/ Undefined behavior: Passing a non-null pointer\n\/\/\/ to a standalone character other than '\\0', or\n\/\/\/ to an array without '\\0' at or after the\n\/\/\/ pointed character, as utf8Stringz\nextern(C) @system pure\nbool phobosValidateUTF8(const char* utf8Stringz)\n{\n    import std.string, std.utf;\n\n    try utf8Stringz.fromStringz.validate();\n    catch (UTFException) return false;\n\n    return true;\n}<\/pre>\n<p>This function lets code written in another language validate a UTF-8 string using Phobos. C being C, it tends to use zero-terminated strings, so the function accepts a pointer to one as the argument instead of a D array. This is why the function has to be unsafe. There is no way to safely check that <code>utf8Stringz<\/code> is pointing to either <code>null<\/code> or a valid C string. If the character being pointed to is not <code>'\\0'<\/code>, meaning the next character has to be read, the function has no way of knowing whether that character belongs to the memory allocated for the string. It can only trust that the calling code got it right.<\/p>\n<p>Still, this function is a good use of the <code>@system<\/code> attribute. First, it is presumably called primarily from C or C++. Those languages do not get any safety guarantees anyway. Even a <code>@safe<\/code> function is safe only if it gets only those parameters that can be created in <code>@safe<\/code> D code. Passing <code>cast(const char*) 0xFE0DA1<\/code> as an argument to a function is unsafe no matter what the attribute says, and nothing in C or C++ verifies what arguments are passed.<\/p>\n<p>Second, the function clearly documents the cases that would trigger undefined behavior. However, it does not mention that passing an invalid pointer, such as the aforementioned <code>cast(const char*) 0xFE0DA1<\/code>, is UB, because UB is always the default assumption with <code>@system<\/code>-only values unless it can be shown otherwise.<\/p>\n<p>Third, the function is small and easy to review manually. No function should be needlessly big, but it&#8217;s many times more important than usual to keep <code>@system<\/code> and <code>@trusted<\/code> functions small and simple to review. <code>@safe<\/code> functions can be debugged to pretty good shape by testing, but as we saw earlier, undefined behavior can be immune to testing. Analyzing the code is the only general answer to UB.<\/p>\n<p>There is a reason why the parameter does not have a <code>scope<\/code> attribute. It could have it, no pointers to the string are escaped. However, it would not provide many benefits. Any code calling the function has to be <code>@system<\/code>, <code>@trusted<\/code>, or in a foreign language, meaning they can pass a pointer to the stack in any case. <code>scope<\/code> could potentially improve the performance of D client code in exchange for the increased potential for undefined behavior if this function is erroneously refactored. Such a tradeoff is unwanted in general unless it can be shown that the attribute helps with a performance problem. On the other hand, the attribute would make it clearer for the reader that the string is not supposed to escape. It&#8217;s a difficult judgment call whether <code>scope<\/code> would be a wise addition here.<\/p>\n<h3 id=\"furtherimprovements\">Further improvements<\/h3>\n<p>It should be documented why a <code>@system<\/code> function is <code>@system<\/code> when it&#8217;s not obvious. Often there is a safer alternative&mdash;our example function could have taken a D array or the CString struct from the previous post in this series. Why was an alternative not taken? In our case, we could write that the ABI would be different for either of those options, complicating matters on the C side, and the intended client (C code) is unsafe anyway.<\/p>\n<p><code>@trusted<\/code> functions are like <code>@system<\/code> functions, except they can be called from <code>@safe<\/code> functions, whereas <code>@system<\/code> functions cannot. When something is declared <code>@trusted<\/code>, it means the authors have verified that it&#8217;s just as safe to use as an actual <code>@safe<\/code> function with any arguments that can be created within safe code. They need to be just as carefully reviewed, if not more so, as <code>@system<\/code> functions.<\/p>\n<p>In these situations, it should be documented (for other developers, not users) how the function was deemed to be safe in all situations. Or, if the function is not fully safe to use and the attribute is just a temporary hack, it should have a big ugly warning about that.<\/p>\n<p><img loading=\"lazy\" src=\"https:\/\/dlang.org\/blog\/wp-content\/uploads\/2023\/01\/unsafe-safe.jpg\" alt=\"\" width=\"651\" height=\"383\" class=\"aligncenter size-full wp-image-3131\" srcset=\"https:\/\/dlang.org\/blog\/wp-content\/uploads\/2023\/01\/unsafe-safe.jpg 651w, https:\/\/dlang.org\/blog\/wp-content\/uploads\/2023\/01\/unsafe-safe-300x176.jpg 300w, https:\/\/dlang.org\/blog\/wp-content\/uploads\/2023\/01\/unsafe-safe-624x367.jpg 624w\" sizes=\"(max-width: 651px) 100vw, 651px\" \/><\/p>\n<p>Such greenwashing is of course highly discouraged, but if there&#8217;s a codebase full of <code>@system<\/code> code that&#8217;s just too difficult to make <code>@safe<\/code> otherwise, it&#8217;s better than giving up. Even as we often talk about the dangers of UB and memory corruption, in our actual work our attitudes tend to be much more carefree, meaning such codebases are unfortunately common.<\/p>\n<p>It might be tempting to define a small <code>@trusted<\/code> function inside a bigger <code>@safe<\/code> function to do something unsafe without disabling checks for the whole function:<\/p>\n<pre class=\"prettyprint lang-d\">extern(C) @safe pure\nbool phobosValidateUTF8(const char* utf8Stringz)\n{\n    import std.string, std.utf;\n\n    try (() @trusted =&gt; utf8Stringz.fromStringz)()\n      .validate();\n    catch (UTFException) return false;\n\n    return true;\n}<\/pre>\n<p>Keep in mind though, that the parent function needs to be documented and reviewed like an overt <code>@trusted<\/code> function because the encapsulated <code>@trusted<\/code> function can let the parent function do anything. In addition, since the function is marked <code>@safe<\/code>, it isn&#8217;t obvious on a first look that it&#8217;s a function that needs special care. Thus, a visible warning comment is needed if you elect to use <code>@trusted<\/code> like this.<\/p>\n<p>Most importantly, don&#8217;t trust yourself! Just like any codebase of non-trivial size has bugs, more than a handful of <code>@system<\/code> functions will include latent UB at some point. The remaining hardening features of D, meaning asserts, contracts, invariants, and bounds checking should be used aggressively and kept enabled in production. This is recommended even if the program is fully <code>@safe<\/code>. In addition, a project with a considerable amount of unsafe code should use external tools like LLVM address sanitizer and Valgrind to at least some extent.<\/p>\n<p>Note that the idea in many of these hardening tools, both those in the language and those of the external tools, is to crash as soon as any fault is detected. It decreases the chance of any surprise from undefined behavior doing more serious damage.<\/p>\n<p>This requires that the program is designed to accept a crash at any moment. The program must never hold such amounts of unsaved data that there would be any hesitation in crashing it. If it controls anything important, it must be able to regain control after being restarted by a user or another process, or it must have another backup program. Any program that &#8220;can&#8217;t afford&#8221; to run potentially crashing checks is in no business to be trusted with systems programming either.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>That concludes this blog series on DIP1000. There are some topics related to DIP1000 we have left up to the readers to experiment with themselves, such as associative arrays. Still, this should be enough to get them going.<\/p>\n<p>Though we have uncovered some practical tips in addition to language rules, there surely is a lot more that could be said. Tell us your memory safety tips <a href=\"https:\/\/forum.dlang.org\/\">in the D forums<\/a>!<\/p>\n<p><em>Thanks to Walter Bright and Dennis Korpel for providing feedback on this article.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The first entry in this series shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. The second entry in this series teaches about the ref storage class and how DIP1000 works with aggregate types (classes, structs, and unions). So far the series [&hellip;]<\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,39,9,20,30],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/3128"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=3128"}],"version-history":[{"count":5,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/3128\/revisions"}],"predecessor-version":[{"id":3134,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/3128\/revisions\/3134"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=3128"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=3128"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=3128"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}