{"id":2136,"date":"2019-07-15T14:51:39","date_gmt":"2019-07-15T14:51:39","guid":{"rendered":"http:\/\/dlang.org\/blog\/?p=2136"},"modified":"2021-09-30T13:38:54","modified_gmt":"2021-09-30T13:38:54","slug":"ownership-and-borrowing-in-d","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2019\/07\/15\/ownership-and-borrowing-in-d\/","title":{"rendered":"Ownership and Borrowing in D"},"content":{"rendered":"<p><img loading=\"lazy\" class=\"alignright size-full wp-image-181\" src=\"http:\/\/dlang.org\/blog\/wp-content\/uploads\/2016\/08\/d6.png\" alt=\"Digital Mars logo\" width=\"200\" height=\"200\" \/>Nearly all non-trivial programs allocate and manage memory. Getting it right is becoming increasingly important, as programs get ever more complex and mistakes get ever more costly. The usual problems are:<\/p>\n<ol>\n<li>memory leaks (failure to free memory when no longer in use)<\/li>\n<li>double frees (freeing memory more than once)<\/li>\n<li>use-after-free (continuing to refer to memory already freed)<\/li>\n<\/ol>\n<p>The challenge is in keeping track of which pointers are responsible for freeing the memory (i.e. owning the memory), which pointers are merely referring to the memory, where they are, and which are active (in scope).<\/p>\n<p>The common solutions are:<\/p>\n<ol>\n<li>Garbage Collection &#8211; The GC owns the memory and periodically scans memory looking for any pointers to that memory. If none are found, the memory is released. This scheme is reliable and in common use in languages like Go and Java. It tends to use much more memory than strictly necessary, have pauses, and slow down code because of inserted write gates.<\/li>\n<li>Reference Counting &#8211; The RC object owns the memory and keeps a count of how many pointers point to it. When that count goes to zero, the memory is released. This is also reliable and is commonly used in languages like C++ and ObjectiveC. RC is memory efficient, needing only a slot for the count. The downside of RC is the expense of maintaining the count, building an exception handler to ensure the decrement is done, and the locking for all this needed for objects shared between threads. To regain efficiency, sometimes the programmer will cheat and temporarily refer to the RC object without dealing with the count, engendering a risk that this is not done correctly.<\/li>\n<li>Manual &#8211; Manual memory management is exemplified by C\u2019s <code>malloc<\/code> and <code>free<\/code>. It is fast and memory efficient, but there\u2019s no language help at all in using them correctly. It\u2019s entirely up to the programmer\u2019s skill and diligence in using it. I\u2019ve been using <code>malloc<\/code> and <code>free<\/code> for 35 years, and through bitter and endless experience rarely make a mistake with them anymore. But that\u2019s not the sort of thing a programming shop can rely on, and note I said \u201crarely\u201d and not \u201cnever\u201d.<\/li>\n<\/ol>\n<p>Solutions 2 and 3 more or less rely on faith in the programmer to do it right. Faith-based systems do not scale well, and memory management issues have proven to be very difficult to audit (so difficult that some coding standards prohibit use of memory allocation).<\/p>\n<p>But there is a fourth way &#8211; Ownership and Borrowing. It\u2019s memory efficient, as performant as manual management, and mechanically auditable. It has been recently popularized by the Rust programming language. It has its downsides, too, in the form of a reputation for having to rethink how one composes algorithms and data structures.<\/p>\n<p>The downsides are manageable, and the rest of this article is an outline of how the ownership\/borrowing (OB) system works, and how we propose to fit it into D. I had originally thought this would be impossible, but after spending a lot of time thinking about it I\u2019ve found a way to fit it in, much like we\u2019ve fit functional programming into D (with transitive immutability and function purity).<\/p>\n<h2 id=\"ownership\">Ownership<\/h2>\n<p>The solution to who owns the memory object is ridiculously simple\u2014there is only one pointer to it, so that pointer must be the owner. It is responsible for releasing the memory, after which it will cease to be valid. It follows that any pointers in the memory object are the owners of what they point to, there are no other pointers into the data structure, and the data structure therefore forms a tree.<\/p>\n<p>It also follows that pointers are not copied, they are moved:<\/p>\n<pre class=\"prettyprint lang-d\">T* f();\r\nvoid g(T*);\r\nT* p = f();\r\nT* q = p; \/\/ value of p is moved to q, not copied\r\ng(p);     \/\/ error, p has invalid value<\/pre>\n<p>Moving a pointer out of a data structure is not allowed:<\/p>\n<pre class=\"prettyprint lang-d\">struct S { T* p; }\r\nS* f();\r\nS* s = f();\r\nT* q = s.p; \/\/ error, can't have two pointers to s.p<\/pre>\n<p>Why not just mark <code>s.p<\/code> as being invalid? The trouble there is one would need to do so with a runtime mark, and this is supposed to be a compile-time solution, so attempting it is simply flagged as an error.<\/p>\n<p>Having an owning pointer fall out of scope is also an error:<\/p>\n<pre class=\"prettyprint lang-d\">void h() {\r\n  T* p = f();\r\n} \/\/ error, forgot to release p?<\/pre>\n<p>It\u2019s necessary to move the pointer somewhere else:<\/p>\n<pre class=\"prettyprint lang-d\">void g(T*);\r\nvoid h() {\r\n  T* p = f();\r\n  g(p);  \/\/ move to g(), it's now g()'s problem\r\n}<\/pre>\n<p>This neatly solves memory leaks and use-after-free problems. (Hint: to make it clearer, replace <code>f()<\/code> with <code>malloc()<\/code>, and <code>g()<\/code> with <code>free()<\/code>.)<\/p>\n<p>This can all be tracked at compile time through a function by using <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data-flow_analysis\">Data Flow Analysis (DFA)<\/a> techniques, like those used to compute <a href=\"https:\/\/en.wikipedia.org\/wiki\/Common_subexpression_elimination\">Common Subexpressions<\/a>. DFA can unravel whatever rat\u2019s nest of <code>goto<\/code>s happen to be there.<\/p>\n<h2 id=\"borrowing\">Borrowing<\/h2>\n<p>The ownership system described above is sound, but it is a little too restrictive. Consider:<\/p>\n<pre class=\"prettyprint lang-d\">struct S { void car(); void bar(); }\r\nstruct S* f();\r\nS* s = f();\r\ns.car();  \/\/ s is moved to car()\r\ns.bar();  \/\/ error, s is now invalid<\/pre>\n<p>To make it work, <code>s.car()<\/code> would have to have some way of moving the pointer value back into <code>s<\/code> when <code>s.car()<\/code> returns.<\/p>\n<p>In a way, this is how borrowing works. <code>s.car()<\/code> borrows a copy of <code>s<\/code> for the duration of the execution of <code>s.car()<\/code>. <code>s<\/code> is invalid during that execution and becomes valid again when <code>s.car()<\/code> returns.<\/p>\n<p>In D, struct member functions take the <code>this<\/code> by reference, so we can accommodate borrowing through an enhancement: taking an argument by ref borrows it.<\/p>\n<p>D also supports scope pointers, which are also a natural fit for borrowing:<\/p>\n<pre class=\"prettyprint lang-d\">void g(scope T*);\r\nT* f();\r\nT* p = f();\r\ng(p);      \/\/ g() borrows p\r\ng(p);      \/\/ we can use p again after g() returns<\/pre>\n<p>(When functions take arguments by ref, or pointers by scope, they are not allowed to escape the ref or the pointer. This fits right in with borrow semantics.)<\/p>\n<p>Borrowing in this way fulfills the promise that only one pointer to the memory object exists at any one time, so it works.<\/p>\n<p>Borrowing can be enhanced further with a little insight that the ownership system is still safe if there are multiple const pointers to it, as long as there are no mutable pointers. (Const pointers can neither release their memory nor mutate it.) That means multiple const pointers can be borrowed from the owning mutable pointer, as long as the owning mutable pointer cannot be used while the const pointers are active.<\/p>\n<p>For example:<\/p>\n<pre class=\"prettyprint lang-d\">T* f();\r\nvoid g(T*);\r\nT* p = f();  \/\/ p becomes owner\r\n{\r\n  scope const T* q = p; \/\/ borrow const pointer\r\n  scope const T* r = p; \/\/ borrow another one\r\n  g(p); \/\/ error, p is invalid while q and r are in scope\r\n}\r\ng(p); \/\/ ok<\/pre>\n<h2 id=\"principles\">Principles<\/h2>\n<p>The above can be distilled into the notion that a memory object behaves as if it is in one of two states:<\/p>\n<ol>\n<li>there exists exactly one mutable pointer to it<\/li>\n<li>there exist one or more const pointers to it<\/li>\n<\/ol>\n<p>The careful reader will notice something peculiar in what I wrote: \u201cas if\u201d. What do I mean by that weasel wording? Is there some skullduggery going on? Why yes, there is. Computer languages are full of \u201cas if\u201d dirty deeds under the hood, like the money you deposit in your bank account isn\u2019t actually there (I apologize if this is a rude shock to anyone), and this isn\u2019t any different. Read on!<\/p>\n<p>But first, a bit more necessary exposition.<\/p>\n<h2 id=\"foldingownershipborrowingintod\">Folding Ownership\/Borrowing into D<\/h2>\n<p>Isn\u2019t this scheme incompatible with the way people normally write D code, and won\u2019t it break pretty much every D program in existence? And not break them in an easily fixed way, but break them so badly they\u2019ll have to redesign their algorithms from the ground up?<\/p>\n<p>Yup, it sure is. Except that D has a (not so) secret weapon: function attributes. It turns out that the semantics for the Ownership\/Borrowing (aka OB) system can be run on a per-function basis after the usual semantic pass has been run. The careful reader may have noticed that no new syntax is added, just restrictions on existing code. D has a history of using function attributes to alter the semantics of a function&#8212;for example, adding the <code>pure<\/code> attribute causes a function to behave as if it were pure. To enable OB semantics for a function, an attribute <code>@live<\/code> is added.<\/p>\n<p>This means that OB can be added to D code incrementally, as needed, and as time and resources permit. It becomes possible to add OB while, and this is critical, keeping your project in a fully functioning, tested, and releasable state. It\u2019s mechanically auditable how much of the project is memory safe in this manner. It adds to the list of D\u2019s many other memory-safe guarantees (such as no pointers to the stack escaping).<\/p>\n<h2 id=\"asif\">As If<\/h2>\n<p>Some necessary things cannot be done with strict OB, such as reference counted memory objects. After all, the whole point of an RC object is to have multiple pointers to it. Since RC objects are memory safe (if built correctly), they can work with OB without negatively impinging on memory safety. They just cannot be built with OB. The solution is that D has other attributes for functions, like <code>@system<\/code>. <code>@system<\/code> is where much of the safety checking is turned off. Of course, OB will also be turned off in <code>@system<\/code> code. It\u2019s there that the RC object\u2019s implementation hides from the OB checker.<\/p>\n<p>But in OB code, the RC object looks to the OB checker like it is obeying the rules, so no problemo!<\/p>\n<p>A number of such library types will be needed to successfully use OB.<\/p>\n<h2 id=\"conclusion\">Conclusion<\/h2>\n<p>This article is a basic overview of OB. I am working on a much more comprehensive specification. It\u2019s always possible I\u2019ve missed something and that there\u2019s a hole below the waterline, but so far it\u2019s looking good. It\u2019s a very exciting development for D and I\u2019m looking forward to getting it implemented.<\/p>\n<p><i>For further discussion and comments from Walter, see the discussion threads on <a href=\"https:\/\/www.reddit.com\/r\/programming\/comments\/cdifbu\/ownership_and_borrowing_in_d\/\">the \/r\/programming subreddit<\/a> and <a href=\"https:\/\/news.ycombinator.com\/item?id=20441519\">at Hacker News<\/a>.<\/i><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Nearly all non-trivial programs allocate and manage memory. Getting it right is becoming increasingly important, as programs get ever more complex and mistakes get ever more costly. The usual problems are: memory leaks (failure to free memory when no longer in use) double frees (freeing memory more than once) use-after-free (continuing to refer to memory [&hellip;]<\/p>\n","protected":false},"author":15,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,8,20],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2136"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=2136"}],"version-history":[{"count":10,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2136\/revisions"}],"predecessor-version":[{"id":2153,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2136\/revisions\/2153"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=2136"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=2136"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=2136"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}