{"id":3116,"date":"2022-10-08T15:19:06","date_gmt":"2022-10-08T15:19:06","guid":{"rendered":"https:\/\/dlang.org\/blog\/?p=3116"},"modified":"2023-07-11T16:48:45","modified_gmt":"2023-07-11T16:48:45","slug":"dip1000-memory-safety-in-a-modern-systems-programming-language-part-2","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2022\/10\/08\/dip1000-memory-safety-in-a-modern-systems-programming-language-part-2\/","title":{"rendered":"Memory Safety in a Modern Systems Programming Language Part 2"},"content":{"rendered":"<h1 id=\"dip1000:memorysafetyinamodernsystemprogramminglanguagept.2\">DIP1000: Memory Safety in a Modern System Programming Language Pt. 2<\/h1>\n<p><img loading=\"lazy\" src=\"https:\/\/dlang.org\/blog\/wp-content\/uploads\/2019\/03\/brain02.png\" alt=\"\" width=\"200\" height=\"200\" class=\"alignleft size-full wp-image-2025\" \/><\/p>\n<p><a href=\"https:\/\/dlang.org\/blog\/2022\/06\/21\/dip1000-memory-safety-in-a-modern-system-programming-language-pt-1\/\">The previous entry in this series<\/a> shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. But D can refer to the stack in other ways, too, and that&#8217;s the topic of this article.<\/p>\n<h2 id=\"object-orientedinstancesaretheeasiestcase\">Object-oriented instances are the easiest case<\/h2>\n<p>In Part 1, I said that if you understand how DIP1000 works with pointers, then you understand how it works with classes. An example is worth more than mere words:<\/p>\n<pre class=\"prettyprint lang-d\">@safe Object ifNull(return scope Object a, return scope Object b)\n{\n    return a? a: b;\n}<\/pre>\n<p>The <code>return scope<\/code> in the above example works exactly as it does in the following:<\/p>\n<pre class=\"prettyprint lang-d\">@safe int* ifNull(return scope int* a, return scope int* b)\n{\n    return a? a: b;\n}<\/pre>\n<p>The principle is: if the <code>scope<\/code> or <code>return scope<\/code> storage class is applied to an object in a parameter list, the address of the object instance is protected just as if the parameter were a pointer to the instance. From the perspective of machine code, it <strong>is<\/strong> a pointer to the instance.<\/p>\n<p>From the point of view of regular functions, that&#8217;s all there is to it. What about member functions of a class or an interface? This is how it&#8217;s done:<\/p>\n<pre class=\"prettyprint lang-d\">interface Talkative\n{\n    @safe const(char)[] saySomething() scope;\n}\n\nclass Duck : Talkative\n{\n    char[8] favoriteWord;\n    @safe const(char)[] saySomething() scope\n    {\n        import std.random : dice;\n\n        \/\/ This wouldn't work\n        \/\/ return favoriteWord[];\n\n        \/\/ This does\n        return favoriteWord[].dup;\n\n        \/\/ Also returning something totally\n        \/\/ different works. This\n        \/\/ returns the first entry 40% of the time,\n        \/\/ The second entry 40% of the time, and\n        \/\/ the third entry the rest of the time.\n        return\n        [\n            &quot;quack!&quot;,\n            &quot;Quack!!&quot;,\n            &quot;QUAAACK!!!&quot;\n        ][dice(2,2,1)];\n    }\n}<\/pre>\n<p><code>scope<\/code> positioned either before or after the member function name marks the <code>this<\/code> reference as <code>scope<\/code>, preventing it from leaking out of the function. Because the address of the instance is protected, nothing that refers directly to the address of the fields is allowed to escape either. That&#8217;s why <code>return favoriteWord[]<\/code> is disallowed; it&#8217;s a static array stored inside the class instance, so the returned slice would refer directly to it. <code>favoriteWord[].dup<\/code> on the other hand returns a copy of the data that isn&#8217;t located in the class instance, which is why it&#8217;s okay.<\/p>\n<p>Alternatively one could replace the <code>scope<\/code> attributes of both <code>Talkative.saySomething<\/code> and <code>Duck.saySomething<\/code> with <code>return scope<\/code>, allowing the return of <code>favoriteWord<\/code> without duplication.<\/p>\n<h3 id=\"dip1000andliskovsubstitutionprinciple\">DIP1000 and Liskov Substitution Principle<\/h3>\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Liskov_substitution_principle\">The Liskov substitution principle<\/a> states, in simplified terms, that an inherited function can give the caller more guarantees than its parent function, but never fewer. DIP1000-related attributes fall in that category. The rule works like this:<\/p>\n<ul>\n<li> if a parameter (including the implicit <code>this<\/code> reference) in the parent functions has no DIP1000 attributes, the child function may designate it <code>scope<\/code> or <code>return scope<\/code><\/li>\n<li> if a parameter is designated <code>scope<\/code> in the parent, it must be designated <code>scope<\/code> in the child<\/li>\n<li> if a parameter is <code>return scope<\/code> in the parent, it must be either <code>scope<\/code> or <code>return scope<\/code> in the child<\/li>\n<\/ul>\n<p>If there is no attribute, the caller can not assume anything; the function might store the address of the argument somewhere. If <code>return scope<\/code> is present, the caller can assume the address of the argument is not stored other than in the return value. With <code>scope<\/code>, the guarantee is that the address is not stored anywhere, which is an even stronger guarantee. Example:<\/p>\n<pre class=\"prettyprint lang-d\">class C1\n{   double*[] incomeLog;\n    @safe double* imposeTax(double* pIncome)\n    {\n        incomeLog ~= pIncome;\n        return new double(*pIncome * .15);\n    }\n}\n\nclass C2 : C1\n{\n    \/\/ Okay from language perspective (but maybe not fair\n    \/\/ for the taxpayer)\n    override @safe double* imposeTax\n        (return scope double* pIncome)\n    {\n        return pIncome;\n    }\n}\n\nclass C3 : C2\n{\n    \/\/ Also okay.\n    override @safe double* imposeTax\n        (scope double* pIncome)\n    {\n        return new double(*pIncome * .18);\n    }\n}\n\nclass C4: C3\n{\n    \/\/ Not okay. The pIncome parameter of C3.imposeTax\n    \/\/ is scope, and this tries to relax the restriction.\n    override @safe double* imposeTax\n        (double* pIncome)\n    {\n        incomeLog ~= pIncome;\n        return new double(*pIncome * .16);\n    }\n}<\/pre>\n<h2 id=\"thespecialpointerref\">The special pointer, <code>ref<\/code><\/h2>\n<p>We still have not uncovered how to use <code>struct<\/code>s and <code>union<\/code>s with DIP1000. Well, obviously we&#8217;ve uncovered pointers and arrays. When referring to a <code>struct<\/code> or a <code>union<\/code>, they work the same as they do when referring to any other type. But pointers and arrays are not the canonical way to use structs in D. They are most often passed around by value, or by reference when <a href=\"https:\/\/dlang.org\/spec\/function.html#ref-params\">bound to <code>ref<\/code> parameters<\/a>. Now is a good time to explain how <code>ref<\/code> works with DIP1000.<\/p>\n<p>They don&#8217;t work like just any pointer. Once you understand <code>ref<\/code>, you can use DIP1000 in many ways you otherwise could not.<\/p>\n<h3 id=\"asimplerefintparameter\">A simple <code>ref int<\/code> parameter<\/h3>\n<p>The simplest possible way to use <code>ref<\/code> is probably this:<\/p>\n<pre class=\"prettyprint lang-d\">@safe void fun(ref int arg) {\n    arg = 5;\n}<\/pre>\n<p>What does this mean? <code>ref<\/code> is internally a pointer&mdash;think <code>int* pArg<\/code>&mdash;but is used like a value in the source code. <code>arg = 5<\/code> works internally like <code>*pArg = 5<\/code>. Also, the client calls the function as if the argument were passed by value:<\/p>\n<pre class=\"prettyprint lang-d\">auto anArray = [1,2];\nfun(anArray[1]); \/\/ or, via UFCS: anArray[1].fun;\n\/\/ anArray is now [1, 5]<\/pre>\n<p>instead of <code>fun(&amp;anArray[1])<\/code>. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reference_(C%2B%2B)\">Unlike C++ references<\/a>, D <code>ref<\/code>erences can be <code>null<\/code>, but the application will instantly terminate with a segmentation fault if a null <code>ref<\/code> is used for something other than reading the address with the <code>&amp;<\/code> operator. So this:<\/p>\n<pre class=\"prettyprint lang-d\">int* ptr = null;\nfun(*ptr);<\/pre>\n<p>&#8230;compiles, but crashes at runtime because the assignment inside <code>fun<\/code> lands at the null address.<\/p>\n<p>The address of a <code>ref<\/code> variable is always guarded against escape. In this sense <code>@safe void fun(ref int arg){arg = 5;}<\/code> is like <code>@safe void fun(scope int* pArg){*pArg = 5;}<\/code>. For example, <code>@safe int* fun(ref int arg){return &amp;arg;}<\/code> will not compile, just like <code>@safe int* fun(scope int* pArg){return pArg;}<\/code> will not.<\/p>\n<p>There is a <code>return ref<\/code> storage class, however, that allows returning the address of the parameter but no other form of escape, just like <code>return scope<\/code>. This means that <code>@safe int* fun(return ref int arg){return &amp;arg;}<\/code> works.<\/p>\n<h3 id=\"referencetoareference\"><code>ref<\/code>erence to a reference<\/h3>\n<p><code>ref<\/code>erence to an <code>int<\/code> or similar type already allows much nicer syntax than one can get with pointers. But the real power of <code>ref<\/code> shows when it refers to a type that is a reference itself&mdash;a pointer or a class, for instance. <code>scope<\/code> or <code>return scope<\/code> can be applied to a reference that is referenced to by <code>ref<\/code>. For example:<\/p>\n<pre class=\"prettyprint lang-d\">@safe float[] mergeSort(ref return scope float[] arr)\n{\n    import std.algorithm: merge;\n    import std.array : Appender;\n\n    if(arr.length &lt; 2) return arr;\n\n    auto firstHalf = arr[0 .. $\/2];\n    auto secondHalf = arr[$\/2 .. $];\n\n    Appender!(float[]) output;\n    output.reserve(arr.length);\n\n    foreach\n    (\n        el;\n        firstHalf.mergeSort\n        .merge!floatLess(secondHalf.mergeSort)\n    )   output ~= el;\n\n    arr = output[];\n    return arr;\n}\n\n@safe bool floatLess(float a, float b)\n{\n    import std.math: isNaN;\n\n    return a.isNaN? false:\n          b.isNaN? true:\n          a&lt;b;\n}<\/pre>\n<p><code>mergeSort<\/code> here guarantees it won&#8217;t leak the address of the <code>float<\/code>s in <code>arr<\/code> except in the return value. This is the same guarantee that would be had from a <code>return scope float[] arr<\/code> parameter. But at the same time, because <code>arr<\/code> is a <code>ref<\/code> parameter, <code>mergeSort<\/code> can mutate the array passed to it. Then the client can write:<\/p>\n<pre class=\"prettyprint lang-d\">float[] values = [5, 1.5, 0, 19, 1.5, 1];\nvalues.mergeSort;<\/pre>\n<p>With a non-<code>ref<\/code> argument, the client would have to write <code>values = values.sort<\/code> instead (not using <code>ref<\/code> would be a perfectly reasonable API in this case, because we do not always want to mutate the original array). This is something that cannot be accomplished with pointers, because <code>return scope float[]* arr<\/code> would protect the address of the array&#8217;s metadata (the <code>length<\/code> and <code>ptr<\/code> fields of the array), not the address of it&#8217;s contents.<\/p>\n<p>It is also possible to have a returnable <code>ref<\/code> argument to a <code>scope<\/code> reference. Since this example has a unit test, remember to use the <code>-unittest<\/code> compile flag to include it in the compiled binary.<\/p>\n<pre class=\"prettyprint lang-d\">@safe ref Exception nullify(return ref scope Exception obj)\n{\n    obj = null;\n    return obj;\n}\n\n@safe unittest\n{\n    scope obj = new Exception(&quot;Error!&quot;);\n    assert(obj.msg == &quot;Error!&quot;);\n    obj.nullify;\n    assert(obj is null);\n    \/\/ Since nullify returns by ref, we can assign\n    \/\/ to it's return value.\n    obj.nullify = new Exception(&quot;Fail!&quot;);\n    assert(obj.msg == &quot;Fail!&quot;);\n}<\/pre>\n<p>Here we return the address of the argument passed to <code>nullify<\/code>, but still guard both the address of the object pointer and the address of the class instance against being leaked by other channels.<\/p>\n<p><code>return<\/code> is a free keyword that does not mandate <code>ref<\/code> or <code>scope<\/code> to follow it. What does <code>void* fun(ref scope return int*)<\/code> mean then? <a href=\"https:\/\/dlang.org\/spec\/function.html#ref-return-scope-parameters\">The spec states<\/a> that <code>return<\/code> without a trailing <code>scope<\/code> is always treated as <code>ref return<\/code>. This example thus is equivalent to <code>void* fun(return ref scope int*)<\/code>. However, this only applies if there is <code>ref<\/code>erence to bind to. Writing <code>void* fun(scope return int*)<\/code> means <code>void* fun(return scope int*)<\/code>. It&#8217;s even possible to write <code>void* fun(return int*)<\/code> with the latter meaning, but I leave it up to you to decide whether this qualifies as conciseness or obfuscation.<\/p>\n<h2 id=\"memberfunctionsandref\">Member functions and <code>ref<\/code><\/h2>\n<p><code>ref<\/code> and <code>return ref<\/code> often require careful consideration to keep track of which address is protected and what can be returned. It takes some experience to get confortable with them. But once you do, understanding how <code>struct<\/code>s and <code>union<\/code>s work with DIP1000 is pretty straightforward.<\/p>\n<p>The major difference to classes is that where the <code>this<\/code> reference is just a regular class reference in class member functions, <code>this<\/code> in a struct or union member function is <code>ref StructOrUnionName<\/code>.<\/p>\n<pre class=\"prettyprint lang-d\">union Uni\n{\n    int asInt;\n    char[4] asCharArr;\n\n    \/\/ Return value contains a reference to\n    \/\/ this union, won't escape references\n    \/\/ to it via any other channel\n    @safe char[] latterHalf() return\n    {\n        return asCharArr[2 .. $];\n    }\n\n    \/\/ This argument is implicitly ref, so the\n    \/\/ following means the return value does\n    \/\/ not refer to this union, and also that\n    \/\/ we don't leak it in any other way.\n    @safe char[] latterHalfCopy()\n    {\n        return latterHalf.dup;\n    }\n}<\/pre>\n<p>Note that <code>return ref<\/code> should not be used with the <code>this<\/code> argument. <code>char[] latterHalf() return ref<\/code> fails to parse. The language already has to understand what <code>ref char[] latterHalf() return<\/code> means: the return value is a <code>ref<\/code>erence. The &#8220;ref&#8221; in <code>return ref<\/code> would be redundant anyway.<\/p>\n<p>Note that we did not use the <code>scope<\/code> keyword here. <code>scope<\/code> would be meaningless with this union, because it does not contain references to anything. Just like it is meaningless to have a <code>scope ref int<\/code>, or a <code>scope int<\/code> function argument. <code>scope<\/code> makes sense only for types that refer to memory elsewhere.<\/p>\n<p><code>scope<\/code> in a <code>struct<\/code> or <code>union<\/code> means the same thing as it means in a static array. It means that the memory its members refer to cannot be escaped. Example:<\/p>\n<pre class=\"prettyprint lang-d\">struct CString\n{\n    \/\/ We need to put the pointer in an anonymous\n    \/\/ union with a dummy member, otherwise @safe user\n    \/\/ code could assign ptr to point to a character\n    \/\/ not in a C string.\n    union\n    {\n        \/\/ Empty string literals get optimised to null pointers by D\n        \/\/ compiler, we have to do this for the .init value to really point to\n        \/\/ a '\\0'.\n        immutable(char)* ptr = &amp;nullChar;\n        size_t dummy;\n    }\n\n    \/\/ In constructors, the &quot;return value&quot; is the\n    \/\/ constructed data object. Thus, the return scope\n    \/\/ here makes sure this struct won't live longer\n    \/\/ than the memory in arr.\n    @trusted this(return scope string arr)\n    {\n        \/\/ Note: Normal assert would not do! They may be\n        \/\/ removed from release builds, but this assert\n        \/\/ is necessary for memory safety so we need\n        \/\/ to use assert(0) instead which never gets\n        \/\/ removed.\n        if(arr[$-1] != '\\0') assert(0, &quot;not a C string!&quot;);\n        ptr = arr.ptr;\n    }\n\n    \/\/ The return value refers to the same memory as the\n    \/\/ members in this struct, but we don't leak references\n    \/\/ to it via any other way, so return scope.\n    @trusted ref immutable(char) front() return scope\n    {\n        return *ptr;\n    }\n\n    \/\/ No references to the pointed-to array passed\n    \/\/ anywhere.\n    @trusted void popFront() scope\n    {\n        \/\/ Otherwise the user could pop past the\n        \/\/ end of the string and then read it!\n        if(empty) assert(0, &quot;out of bounds!&quot;);\n        ptr++;\n    }\n\n    \/\/ Same.\n    @safe bool empty() scope\n    {\n        return front == '\\0';\n    }\n}\n\nimmutable nullChar = '\\0';\n\n@safe unittest\n{\n    import std.array : staticArray;\n\n    auto localStr = &quot;hello world!\\0&quot;.staticArray;\n    auto localCStr = localStr.CString;\n    assert(localCStr.front == 'h');\n\n    static immutable(char)* staticPtr;\n\n    \/\/ Error, escaping reference to local.\n    \/\/ staticPtr = &amp;localCStr.front();\n\n    \/\/ Fine.\n    staticPtr = &amp;CString(&quot;global\\0&quot;).front();\n\n    localCStr.popFront;\n    assert(localCStr.front == 'e');\n    assert(!localCStr.empty);\n}\n<\/pre>\n<p>Part One said that <code>@trusted<\/code> is a terrible footgun with DIP1000. This example demonstrates why. Imagine how easy it&#8217;d be to use a regular assert or forget about them totally, or overlook the need to use the anonymous union. I <em>think<\/em> this struct is safe to use, but it&#8217;s entirely possible I overlooked something.<\/p>\n<h2 id=\"finally\">Finally<\/h2>\n<p>We almost know all there is to know about using structs, unions, and classes with DIP1000. We have two final things to learn today.<\/p>\n<p>But before that, a short digression regarding the <code>scope<\/code> keyword. It is not used for just annotating parameters and local variables as illustrated. It is also used for <a href=\"https:\/\/dlang.org\/spec\/class.html#auto\">scope classes<\/a> and <a href=\"https:\/\/dlang.org\/spec\/statement.html#scope-guard-statement\">scope guard statements<\/a>. This guide won&#8217;t be discussing those, because the former feature is deprecated, and the latter is not related to DIP1000 or control of variable lifetimes. The point of mentioning them is to dispel a potential misconception that <code>scope<\/code> always means limiting the lifetime of something. Learning about scope guard statements is still a good idea, as it&#8217;s a useful feature.<\/p>\n<p>Back to the topic. The first thing is not really specific to structs or classes. We discussed what <code>return<\/code>, <code>return ref<\/code>, and <code>return scope<\/code> usually mean, but there&#8217;s an alternative meaning to them. Consider:<\/p>\n<pre class=\"prettyprint lang-d\">@safe void getFirstSpace\n(\n    ref scope string result,\n    return scope string where\n)\n{\n    \/\/...\n}<\/pre>\n<p>The usual meaning of the <code>return<\/code> attribute makes no sense here, as the function has a <code>void<\/code> return type. A special rule applies in this case: if the return type is <code>void<\/code>, and the first argument is <code>ref<\/code> or <code>out<\/code>, any subsequent <code>return<\/code> [<code>ref<\/code>\/<code>scope<\/code>] is assumed to be escaped by assigning to the first argument. With struct member functions, they are assumed to be assigned to the struct itself.<\/p>\n<pre class=\"prettyprint lang-d\">@safe unittest\n{\n    static string output;\n    immutable(char)[8] input = &quot;on stack&quot;;\n    \/\/Trying to assign stack contents to a static\n    \/\/variable. Won't compile.\n    getFirstSpace(output, input);\n}<\/pre>\n<p>Since <code>out<\/code> came up, it should be said it would be a better choice for <code>result<\/code> here than <code>ref<\/code>. <code>out<\/code> works like <code>ref<\/code>, with the one difference that the referenced data is automatically default-initialized at the beginning of the function, meaning any data to which the <code>out<\/code> parameter refers is guaranteed to not affect the function.<\/p>\n<p>The second thing to learn is that <code>scope<\/code> is used by the compiler to optimize class allocations inside function bodies. If a <code>new<\/code> class is used to initialize a <code>scope<\/code> variable, the compiler can put it on the stack. Example:<\/p>\n<pre class=\"prettyprint lang-d\">class C{int a, b, c;}\n@safe @nogc unittest\n{\n    \/\/ Since this unittest is @nogc, this wouldn't\n    \/\/ compile without the scope optimization.\n    scope C c = new C();\n}<\/pre>\n<p>This feature requires using the <code>scope<\/code> keyword explicitly. Inference of <code>scope<\/code> does not work, because initializing a class this way does not normally (meaning, without the <code>@nogc<\/code> attribute) mandate limiting the lifetime of <code>c<\/code>. The feature currently works only with classes, but there is no reason it couldn&#8217;t work with <code>new<\/code>ed struct pointers and array literals too.<\/p>\n<h3 id=\"untilnexttime\">Until next time<\/h3>\n<p>This is pretty much all that there is to manual DIP1000 usage. But this blog series shall not be over yet! DIP1000 is not intended to always be used explicitly&mdash;it works with attribute inference. That&#8217;s what the next post will cover.<\/p>\n<p>It will also cover some considerations when daring to use <code>@trusted<\/code> and <code>@system<\/code> code. The need for dangerous systems programming exists and is part of the D language domain. But even systems programming is a responsible affair when people do what they can to minimize risks. We will see that even there it&#8217;s possible to do a lot.<\/p>\n<p><em>Thanks to Walter Bright and Dennis Korpel for reviewing this article<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>DIP1000: Memory Safety in a Modern System Programming Language Pt. 2 The previous entry in this series shows how to use the new DIP1000 rules to have slices and pointers refer to the stack, all while being memory safe. But D can refer to the stack in other ways, too, and that&#8217;s the topic of [&hellip;]<\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,39,9,20,30],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/3116"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=3116"}],"version-history":[{"count":5,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/3116\/revisions"}],"predecessor-version":[{"id":3126,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/3116\/revisions\/3126"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=3116"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=3116"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=3116"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}