Category Archives: DIPs

DIP1000: Memory Safety in a Modern System Programming Language Pt. 1

Memory safety needs no checks

D is both a garbage-collected programming language and an efficient raw memory access language. Modern high-level languages like D are memory safe, preventing users from accidently reading or writing to unused memory or breaking the type system of the language.

As a systems programming language, not all of D can give such guarantees, but it does have a memory-safe subset that uses the garbage collector to take care of memory management much like Java, C#, or Go. A D codebase, even in a systems programming project, should aim to remain within that memory-safe subset where practical. D provides the @safe function attribute to verify that a function uses only memory-safe features of the language. For instance, try this.

@safe string getBeginning(immutable(char)* cString)
{
    return cString[0..3];
}

The compiler will refuse to compile this code. There’s no way to know what will result from the three-character slice of cString, which could be referring to an empty string (i.e., cString[0] is \0), a string with a length of 1, or even one or two characters without the terminating NUL. The result in those cases would be a memory violation.

@safe does not mean slow

Note that I said above that even a low-level systems programming project should use @safe where practical. How is that possible, given that such projects sometimes cannot use the garbage collector, a major tool used in D to guarantee memory safety?

Indeed, such projects must resort to memory-unsafe constructs every now and then. Even higher-level projects often have reasons to do so, as they want to create interfaces to C or C++ libraries, or avoid the garbage collector when indicated by runtime performance. But still, surprisingly large parts of code can be made @safe without using the garbage collector at all.

D can do this because the memory safe subset does not prevent raw memory access per se.

@safe void add(int* a, int* b, int* sum)
{
    *sum = *a + *b;
}

This compiles and is fully memory safe, despite dereferencing those pointers in the same completely unchecked way they are dereferenced in C. This is memory safe because @safe D does not allow creating an int* that points to unallocated memory areas, or to a float**, for instance. int* can point to the null address, but this is generally not a memory safety problem because the null address is protected by the operating system. Any attempt to dereference it would crash the program before any memory corruption can happen. The garbage collector isn’t involved, because D’s GC can only run if more memory is requestend from it, or if the collection is explicitly called.

D slices are similar. When indexed at runtime, they will check at runtime that the index is less than their length and that’s it. They will do no checking whatsoever on whether they are referring to a legal memory area. Memory safety is achieved by preventing creation of slices that could refer to illegal memory in the first place, as demonstrated in the first example of this article. And again, there’s no GC involved.

This enables many patterns that are memory-safe, efficient, and independent of the garbage collector.

struct Struct
{
    int[] slice;
    int* pointer;
    int[10] staticArray;
}

@safe @nogc Struct examples(Struct arg)
{
    arg.slice[5] = *arg.pointer;
    arg.staticArray[0..5] = arg.slice[5..10];
    arg.pointer = &arg.slice[8];
    return arg;
}

As demonstrated, D liberally lets one do unchecked memory handling in @safe code. The memory referred to by arg.slice and arg.pointer may be on the garbage collected heap, or it may be in the static program memory. There is no reason the language needs to care. The program will probably need to either call the garbage collector or do some unsafe memory management to allocate memory for the pointer and the slice, but handling already allocated memory does not need to do either. If this function needed the garbage collector, it would fail to compile because of the @nogc attribute.

However…

There’s a historical design flaw here in that the memory may also be on the stack. Consider what happens if we change our function a bit.

@safe @nogc Struct examples(Struct arg)
{
    arg.pointer = &arg.staticArray[8];
    arg.slice = arg.staticArray[0..8];
    return arg;
}

Struct arg is a value type. Its contents are copied to the stack when examples is called and can be ovewritten after the function returns. staticArray is also a value type. It’s copied along with the rest of the struct just as if there were ten integers in the struct instead. When we return arg, the contents of staticArray are copied to the return value, but ptr and slice continue to point to arg, not the returned copy!

But we have a fix. It allows one to write code just as performant in @safe functions as before, including references to the stack. It even enables a few formerly @system (the opposite of @safe) tricks to be written in a safe way. That fix is DIP1000. It’s the reason why this example already causes a deprecation warning by default if it’s compiled with the latest nightly dmd.

Born first, dead last

DIP1000 is a set of enhancements to the language rules regarding pointers, slices, and other references. The name stands for D Improvement Proposal number 1000, as that document is what the new rules were initially based on. One can enable the new rules with the preview compiler switch, -preview=dip1000. Existing code may need some changes to work with the new rules, which is why the switch is not enabled by default. It’s going to be the default in the future, so it’s best to enable it where possible and work to make code compatible with it where not.

The basic idea is to let people limit the lifetime of a reference (an array or pointer, for example). A pointer to the stack is not dangerous if it does not exist longer than the stack variable it is pointing to. Regular references continue to exist, but they can refer only to data with an unlimited lifetime—that is, garbage collected memory, or static or global variables.

Let’s get started

The simplest way to construct limited lifetime references is to assign to it something with a limited lifetime.

@safe int* test(int arg1, int arg2)
{
    int* notScope = new int(5);
    int* thisIsScope = &arg1;
    int* alsoScope; // Not initially scope...
    alsoScope = thisIsScope; // ...but this makes it so.

    // Error! The variable declared earlier is
    // considered to have a longer lifetime,
    // so disallowed.
    thisIsScope = alsoScope;

    return notScope; // ok
    return thisIsScope; // error
    return alsoScope; // error
}

When testing these examples, remember to use the compiler switch -preview=dip1000 and to mark the function @safe. The checks are not done for non-@safe functions.

Alternatively, the scope keyword can be explicitly used to limit the lifetime of a reference.

@safe int[] test()
{
    int[] normalRef;
    scope int[] limitedRef;

    if(true)
    {
        int[5] stackData = [-1, -2, -3, -4, -5];

        // Lifetime of stackData ends
        // before limitedRef, so this is
        // disallowed.
        limitedRef = stackData[];

        //This is how you do it
        scope int[] evenMoreLimited
            = stackData[];
    }

    return normalRef; // Okay.
    return limitedRef; // Forbidden.
}

If we can’t return limited lifetime references, how they are used at all? Easy. Remember, only the address of the data is protected, not the data itself. It means that we have many ways to pass scoped data out of the function.

@safe int[] fun()
{
    scope int[] dontReturnMe = [1,2,3];

    int[] result = new int[](dontReturnMe.length);
    // This copies the data, instead of having
    // result refer to protected memory.
    result[] = dontReturnMe[];
    return result;

    // Shorthand way of doing the same as above
    return dontReturnMe.dup;

    // Also you are not always interested
    // in the contents as a whole; you
    // might want to calculate something else
    // from them
    return
    [
        dontReturnMe[0] * dontReturnMe[1],
        cast(int) dontReturnMe.length
    ];
}

Getting interprocedural

With the tricks discussed so far, DIP1000 would be restricted to language primitives when handling limited lifetime references, but the scope storage class can be applied to function parameters, too. Because this guarantees the memory won’t be used after the function exits, local data references can be used as arguments to scope parameters.

@safe double average(scope int[] data)
{
    double result = 0;
    foreach(el; data) result += el;
    return result / data.length;
}

@safe double use()
{
    int[10] data = [1,2,3,4,5,6,7,8,9,10];
    return data[].average; // works!
}

Initially, it’s probably best to keep attribute auto inference off. Auto inference in general is a good tool, but it silently adds scope attributes to all parameters it can, meaning it’s easy to lose track of what’s happening. That makes the learning process a lot harder. Avoid this by always explicitly specifying the return type (or lack thereof with void or noreturn): @safe const(char[]) fun(int* val) as opposed to @safe auto fun(int* val) or @safe const fun(int* val). The function also must not be a template or inside a template. We’ll dig deeper on scope auto inference in a future post.

scope allows handling pointers and arrays that point to the stack, but forbids returning them. What if that’s the goal? Enter the return scope attribute:

//Being character arrays, strings also work with DIP1000.
@safe string latterHalf(return scope string arg)
{
    return arg[$/2 .. $];
}

@safe string test()
{
    // allocated in static program memory
    auto hello1 = "Hello world!";
    // allocated on the stack, copied from hello1
    immutable(char)[12] hello2 = hello1;

    auto result1 = hello1.latterHalf; // ok
    return result1; // ok

    auto result2 = hello2[].latterHalf; // ok
    // Nice try! result2 is scope and can't
    // be returned.
    return result2;
}

return scope parameters work by checking if any of the arguments passed to them are scope. If so, the return value is treated as a scope value that may not outlive any of the return scope arguments. If none are scope, the return value is treated as a global reference that can be copied freely. Like scope, return scope is conservative. Even if one does not actually return the address protected by return scope, the compiler will still perform the call site lifetime checks just as if one did.

scope is shallow

@safe void test()
{
    scope a = "first";
    scope b = "second";
    string[] arr = [a, b];
}

In test, initializing arr does not compile. This may be surprising given that the language automatically adds scope to a variable on initialization if needed.

However, consider what the scope on scope string[] arr would protect. There are two things it could potentially protect: the addresses of the strings in the array, or the addresses of the characters in the strings. For this assignment to be safe, scope would have to protect the characters in the strings, but it only protects the top-level reference, i.e., the strings in the array. Thus, the example does not work. Now change arr so that it’s a static array:

@safe void test()
{
    scope a = "first";
    scope b = "second";
    string[2] arr = [a, b];
}

This works because static arrays are not references. Memory for all of their elements is allocated in place on the stack (i.e., they contain their elements), as opposed to dynamic arrays which contain a reference to elements stored elsewhere. When a static array is scope, its elements are treated as scope. And since the example would not compile were arr not scope, it follows that scope is inferred.

Some practical tips

Let’s face it, the DIP1000 rules take time to understand, and many would rather spend that time coding something useful. The first and most important tip is: avoid non-@safe code like the plague if doable. Of course, this advice is not new, but it appears even more important with DIP1000. In a nutshell, the language does not check the validity of scope and return scope in a non-@safe function, but when calling those functions the compiler assumes that the attributes are respected.

This makes scope and return scope terrible footguns in unsafe code. But by resisting the temptation to mark code @trusted to avoid thinking, a D coder can hardly do damage. Misusing DIP1000 in @safe code can cause needless compilation errors, but it won’t corrupt memory and is unlikely to cause other bugs either.

A second important point worth mentioning is that there is no need for scope and return scope for function attributes if they receive only static or GC-allocated data. Many langauges do not let coders refer to the stack at all; just because D programmers can do so does not mean they must. This way, they don’t have to spend any more time solving compiler errors than they did before DIP1000. And if a desire to work with the stack arises after all, the authors can then return to annotate the functions. Most likely they will accomplish this without breaking the interface.

What’s next?

This concludes today’s blog post. This is enough to know how to use arrays and pointers with DIP1000. In principle, it also enables readers to use DIP1000 with classes and interfaces. The only thing to learn is that a class reference, including the this pointer in member functions, works with DIP1000 just like a pointer would. Still, it’s hard to grasp what that means from one sentence, so later posts shall illustrate the subject.

In any case, there is more to know. DIP1000 has some features for ref function parameters, structs, and unions that we didn’t cover here. We’ll also dig deeper on how DIP1000 plays with non-@safe functions and attribute auto inference. Currently, the plan is to do two more posts for this series.

Do let us know in the comment section or the D forums if you have any useful DIP1000 tips that were not covered!

Thanks to Walter Bright for reviewing this article.

DIP Reviews: Discussion vs. Feedback

Digital Mars D logoFor a while now, I’ve been including a link to the DIP Reviewer Guidelines in the initial forum post for every DIP review. My hope was that it would encourage reviewers to keep the thread on topic and also to provide more focused feedback. As it turns out, a link to reviewer guidelines is not quite enough. Recent review threads have turned into massive, 20+ page discussions touching on a number of tangential topics.

The primary purpose of the DIP review process, as I’ve tried to make clear in blog posts, forum discussions, and the reviewer guidelines, is to improve the DIP. It is not a referendum on the DIP. In every review round, the goal is to strengthen the content where it is lacking, bring clarity and precision to the language, make sure all the bases are covered, etc.

At the same time, we don’t want to discourage discussion on the merits of the proposal. Opinions about the necessity or the validity of a DIP can raise points that the language maintainers can take into consideration when they are deciding whether to approve or reject it, or even cause the DIP author to withdraw the proposal. It’s happened before. That’s why such discussion is encouraged in the Community Review rounds (though it’s generally discouraged in Final Review, which should be focused wholly on improving the proposal).

The problem

One issue with allowing such free-form discussion in the review threads is that there is a tremendous amount of noise drowning out the signal. Finding specific DIP-related feedback requires trawling through every post, digging through multiple paragraphs of mixed discussion and feedback. Sometimes, one or more people will level a criticism that spawns a long discussion and results in a changing of minds. This makes it time consuming for me as the DIP manager when I have to summarize the review. It also increases the likelihood that I’ll overlook something.

My summary isn’t just for the ‘Reviews’ section at the bottom of the DIP. It’s also my way of ensuring that the DIP author is aware of and has considered all the unique points of feedback. More than once I have found something the DIP author missed or had forgotten about. But if I overlook something and the DIP author also overlooks it, then we may have missed an opportunity to improve the DIP.

I have threatened to delete posts that go off topic in these threads,  but I can count on one hand the number of posts I’ve actually deleted. In reality, these discussions branch off in so many directions that it’s not easy to say definitively that a post that isn’t focused on the DIP itself is actually off topic. So I tend to let the posts stand rather than risk derailing the thread or removing information that is actually relevant.

The Solution

Starting with the upcoming Final Review of DIP 1027, I’m going to take a new approach to soliciting feedback. Rather than one review thread, I’ll be launching two for each DIP.

The Discussion Thread will be much the same as the current review thread. Opinions and discussion will be welcome and encouraged. I’ll still delete posts that are completely off topic, but other than that I’ll let the discussion flow where it may.

The Feedback Thread will be exclusively for feedback on the document and its contents. There will be no discussion allowed. Every post must contain specific points of feedback (preferably actionable items) intended to improve the proposal. Each post should be a direct reply to my initial post. There are only two exceptions: when a post author who has decided to retract feedback they made in a previous post, said poster can reply to the post in which they made the original feedback in order to make the retraction; and the DIP author may reply directly to any feedback post in order to indicate agreement or disagreement.

Posts in the feedback thread should contain answers to the questions posed in the DIP Reviewer Guidelines. It would be great if reviewers could take the time to do what Joseph Rushton Wakeling did in the Community Review for DIP 1028, where he explicitly listed and answered each question, but we won’t be requiring it. Feedback as bullet points is also very welcome.

Opinions on the validity of the proposed feature will be allowed in the feedback thread as long as they are backed with supporting arguments. In other words, “I’m against this! This is a terrible feature.” is not valid for the feedback thread. That sort of post goes in the discussion thread. However, “I’m against this. This is a terrible feature because <reasoned argument goes here>” is acceptable.

The rules of the feedback thread will be enforced without prejudice. Any post that is not a reply to my initial post, retraction of previous feedback, or a response by the DIP author will be deleted. Any post that does not provide the sort of feedback described above will be deleted. If I do delete a post, I won’t leave a new post explaining why. I’m going to update the DIP Reviewer Guidelines and each opening post in a feedback thread will include a link to that document as well as a paragraph or two summarizing the rules.

I’ll require DIP authors to follow both threads and to participate in the discussion thread. When it comes time to summarize the review, the feedback thread will be my primary source. I will, of course, follow the discussion thread as well and take notes on anything relevant. But if you want to ensure any specific criticisms you may have about a DIP are accounted for, be sure to post them in the feedback thread.

Hopefully, this new approach won’t be too disruptive. We’ll see how it goes.

 

Revisions to the DIP Process

At the AGM that was held prior to the Hackathon at DConf 2019 in London, I announced that I would be making revisions to the DIP progress aimed at shortening the length of time required to go from the Community  Review to a final verdict. I also, in response to Joseph Rushton Wakeling’s feedback about guidance for reviewers, agreed to enhance the existing documentation to clarify what is expected of reviewers during the Community and Final Review stages and to provide guidance on how to provide a good review.

The new documentation is now live in four documents under the new docs subdirectory in the DIP repository (all of which are linked in the README). The PROCEDURE and GUIDELINES documents are still there so that existing links remain valid, but all of their content has been replaced with links to the new documentation. Please consider the new documentation to be in draft form. I have not yet subjected them to intense editing, so any corrections are welcome, as are suggestions on how to enhance them.

Anyone intending to participate in a DIP review by leaving feedback in a review thread should familiarize themselves with the DIP Reviewer Guidelines. I can’t force anyone to do so, but I do consider this mandatory. In my role as DIP manager, trying to summarize long threads that have gone off on in-depth discussions of one thing or another, reading through post after post for any sign of actual feedback, is a time-consuming (and highly annoying!) process. Henceforth, I will be deleting any posts in Community and Final review discussion threads that do not adhere to the guidelines laid out in the above document. As I declare in the document, such posts will be copied and pasted in a separate thread where off-topic discussions of the DIP may continue. It’s not my intention to stifle debate or censor anyone or in any other ray restrict discussion of the DIP–I just want to make my job, and the DIP author’s, easier. So please, for my sake and yours, read and understand the reviewer guidelines.

The content of the GUIDELINES document, which was targeted at DIP authors, has been moved (and slightly modified) into the DIP Author Guidelines. The portion of the PROCEDURE document that was aimed at DIP authors is now in The DIP Authoring Process document. All potential DIP authors should read and understand both documents before submitting a DIP. Failure to do so may result in surprises. For example, I’m going to be more proactive in closing pull requests that are submitted while a DIP is still in development and not yet in a first draft state. So please read!

Finally, the portion of the PROCEDURE document that described the different review stages is now found in the document titled The DIP Review Process. Everyone should read this. The primary difference from the previous document is that I’ve explicitly declared that Community Reviews will always begin in the first seven days of a month and Final Reviews will always begin in the third week of a month. Additionally, where I would formerly allow multiple Community Reviews to take place simultaneously, I now restrict it to one at any given time. The goal is to streamline the process and minimize the time it takes to go from Community Review to a final verdict.

As the new document outlines, the best-case scenario, in which only one round of Community Review is required and no DIPs are in active consideration of the language maintainers when another DIP finishes the Final Review, should look like this:

  • the DIP enters Community Review in the first week of Month A.
  • after Community Review, the DIP author will have four weeks to complete any required revisions.
  • in the third week of Month B, the Final Review begins.
  • after the Final Review, no revisions are required and no other DIP is under active consideration, so the DIP may immediately move into Formal Assessment.
  • the language maintainers have enough information to render a verdict on the DIP within 30 days.

So it should take between two and three months for the review process to complete. Again, this is the best-case scenario. I expect it more likely that it will typically take between four and five months, given that some DIPs will need multiple Community Review rounds and some will require revision during Formal Assessment.

As I said at the AGM, I’m always open to improving the process to the extent I can within the boundaries of the current framework. Any fundamental structural changes will need approval by Walter and Átila. If you have any suggestions to strengthen the documentation, please let me know.

Updates in D Land

As we encroach upon the end of 2018, a recent Reddit thread wishing D a happy 17th birthday reminded me how far the language has come since I first stumbled upon it in 2003. Many of the steps along the way were powered by the energy of users who had little incentive to contribute beyond personal interest. And here we are, all these years later, still moving forward.

There are a number of current and upcoming happenings that will play a role in keeping that progress going. In this post, I’d like to remind you, update you, or inform you about some of them.

The Pull Request Manager Campaign

If you haven’t heard, the D Language Foundation has hired a pull request manager, to be paid out of a pool of donations. This is our first major fundraising campaign through Flipcause. I’m happy to report that it’s going well. As I write, we’ve raised $1,864 of our $3,000 target in 66 days thanks to the kindness of 30 supporters. If you’d like to support us in this cause, click on the campaign card.

You can access our full campaign menu at any time via the “Donate Now” button in the sidebar here on the blog. A pull request has also been submitted to integrate the menu into dlang.org’s donation page. Currently, we only have two campaigns (this one and the General Fund) but any future campaigns will be accessible through those menus.

Symmetry Autumn of Code

Earlier this year, Symmetry Investments partnered with the D Language Foundation to sponsor the Symmetry Autumn of Code (SAoC). Three participants were selected to work on D-related projects over the course of four months, with milestones to mark their progress. If you haven’t heard of it or had forgotten, you can read the details on the SAoC page here at the blog.

Unfortunately, one participant was unable to continue after the first milestone. The other two, whom we have come to refer to as the two Francescos, have each successfully completed three milestones and are in the home stretch, aiming for that final payment and free trip to DConf 2019.

Francesco Mecca is working on porting an old D1 GC to modern D, and Francesco Galla’ is busy adding HTTP/2 support the vibe-http library. Both have made significant progress and are on track to a successful SAoC. Read more about their projects in my previous SAoC update.

DIP Updates

I’ve received partial feedback on a decision regarding DIP 1013 (The Deprecation Process) and expect to hear the final verdict soon. As soon as I do, I’ll move Manu’s DIP 1016 (ref T accepts r-values) into the Formal Assessment stage for Walter and Andrei to consider.

I had intended to move Razvan’s Copy Constructor DIP into Community Review by now, as that is a high priority for Walter and Andrei. However, he’s been working out some more details so it’s not quite yet ready. So as not to hold up the process any longer, I’ll be starting Community Review for one of the other DIPs in the PR queue at the end of this week. When the Copy Constructor DIP is ready, I’ll run its review in parallel.

Google Summer of Code (GSoC) 2019

At the end of last month, I announced in the forums that we’re ramping up for GSoC 2019. I seeded our Wiki page with two potential project ideas to get us started. So far, only one additional idea has been added and no one has contacted me about participating as a student or a mentor.

It’s been a while since we were last accepted into GSoC and we’d very much like to get into it this time. To do so, we need more project ideas, students to execute them, and mentors to provide guidance to the students. If you’re looking for another way to contribute to the D community, this is a great way to do so. Adding project ideas costs little beyond the time it takes to add the details to the Wiki and, if you are lacking in ideas already dying to escape the confines of your neurocranium, the time it takes to brainstorm something. Student and mentor participation is a more significant commitment, but it’s also a lot more rewarding. If you’re interested, tell me at aldacron@gmail.com.

DConf 2019

Finally, I’m happy to announce… Just kidding. I can’t announce anything yet about DConf 2019, but I hope to be able to soon. What I can say with certainty is that in 2019, DConf will be where DConf has never gone before. We’re currently working out some details with an eye toward making 2019 a big year for DConf.

I’m really excited about it and eager to let everyone know. I’ll do so as soon as I’m able. Watch this space!