Field | Value |
---|---|
DIP: | 1035 |
Review Count: | 3 |
Author: | Dennis Korpel (dkorpel@gmail.com) Paul Backus |
Implementation: | |
Status: | Accepted |
The memory-safety of a program depends on the ability of the programmer and the language implementation to maintain the run-time invariants of the program's data.
The D compiler is aware of the run-time invariants of built-in types, like arrays and pointers, and can use compile-time checks to ensure they are maintained.
These checks are not always sufficient for user-defined types.
In order to reliably maintain invariants beyond those of which the compiler has hard-coded knowledge, D programmers must resort to manual verification of @safe
code and defensive run-time checks.
This DIP proposes a new language feature, @system
variables, to address this lack of expressiveness in D's memory-safety system.
In @safe
code, @system
variables cannot be directly written to and cannot have their values altered in uncontrolled ways via casting, overlapping, void
-initialization, etc.
As such, they can be relied upon to store data subject to arbitrary run-time invariants.
- Background
- Rationale
- Prior work
- Description
- Alternatives
- Breaking Changes and Deprecations
- Reference
- Copyright & License
- Reviews
D's memory safety system distinguishes between safe values, which can be used freely in @safe
code without causing undefined behavior, and unsafe values, which cannot.
A type that has only safe values is a safe type; one that has both safe and unsafe values is an unsafe type.
(For more detailed definitions of these and other related terms, refer to the Function Safety section of the D language spec.)
The D compiler has built-in knowledge of which types are safe and which are not. In broad terms, pointers, arrays, and other reference types are unsafe; integers, characters, and floating-point numbers are safe; and the safety of aggregate types is determined by the safety of their members.
A run-time invariant (or just "invariant") of a type is a rule that distinguishes between that type's safe and unsafe values. (N.B. "invariant" in this DIP is not referring to invariant blocks in contract programming) The values that satisfy the invariant are safe; those that do not are unsafe. It follows that any type with a run-time invariant is unsafe, and that a safe type has no run-time invariants.
To ensure that their invariants are not violated, the use of unsafe types is restricted in @safe
code:
- They cannot be void-initialized.
- They cannot be overlapped in a union.
- A
T[]
cannot be cast to aU[]
whenU
is an unsafe type.
Though the system described above works well for built-in types and their invariants, it does not provide any way for the programmer to indicate that a user-defined type has additional invariants of which the compiler may not be aware.
As a result, maintaining such invariants requires extra effort from the programmer.
For unsafe types, the programmer may be required to manually verify that those invariants are maintained in @safe
code.
For safe types, the programmer may additionally be required to insert defensive run-time checks to ensure that those invariants are maintained.
module intslice;
struct IntSlice
{
private int* ptr;
private size_t length;
@safe
this(int[] src)
{
ptr = &src[0];
length = src.length;
}
@trusted
ref int opIndex(size_t i)
{
if (i >= length) assert(0);
return ptr[i];
}
}
Invariant: The value of length
must be equal to the length of the array pointed to by ptr
.
First, observe that this code is memory-safe as-written (modulo bugs in the compiler).
There are only two functions that directly access ptr
and length
, and both correctly maintain the invariant.
However, in order to prove that this code is memory-safe, it is not sufficient for the programmer to verify the correctness of its @trusted
functions.
Instead, every function that touches ptr
and length
must be checked manually.
If ptr
and length
were @system
variables, then all code that directly accesed them would have to be @trusted
, and the programmer would not need to manually verify any @safe
code in order to prove that IntSlice
's invariant is maintained.
The same general pattern occurs with other user-defined types whose invariants involve the relationship between two or more variables, such as tagged unions and reference-counted smart pointers.
module shortstring;
struct ShortString
{
private ubyte length;
private char[15] data;
@safe
this(const(char)[] src)
{
assert(src.length <= data.length);
length = cast(ubyte) src.length;
data[0 .. src.length] = src[];
}
@trusted
const(char)[] opIndex() const
{
// should be ok to skip the bounds check here
return data.ptr[0 .. length];
}
}
Invariant: length <= 15
Once again, there is a constructor that establishes an invariant, and a member function that relies on the invariant to do its work. Unlike in the previous example, however, this code is not memory-safe as-written, though it may appear to be at first glance.
To understand why, consider the following program, which uses ShortString
to cause undefined behavior in @safe
code:
@safe
void main()
{
import shortstring;
import std.stdio;
ShortString oops = void;
writeln(oops[]);
}
void
-initializing a ShortString
will very likely produce an instance that violates its invariant.
Because opIndex
relies on that invariant to skip the bounds check, this results in an out-of-bounds memory access rather than a safe, predictable crash.
Why does the compiler allow a ShortString
to be void
-initialized in @safe
code?
Because, according to the rules in the language spec, a struct
containing only ubyte
and char
data is a safe type, and therefore must not have any invariants.
It follows that @safe
code is free to initialize a ShortString
to any value, including an unspecified one, without risking memory corruption.
In order to make this code memory-safe, the programmer must include an additional bounds check in opIndex
:
@safe
const(char)[] opIndex() const
{
return data[0 .. length];
}
This solution is unsatisfying: the program must do redundant work at run-time to compensate for the language's lack of expressiveness, or give up on the guarantees of @safe
.
If ShortString.length
could be marked as @system
, this dilemma would not exist.
The same general pattern occurs with other user-defined types that attempt to impose invariants on types the compiler considers "safe", such as enum
types used in final switch
statements and integer "handles" used as array indices by external libraries.
Allowing fields of an aggregate to be marked @system
helps the compiler maintain run-time invariants on user-defined types, but it is also important to ensure the variable was not constructed with an unsafe value to begin with.
Constructing unsafe values in a @safe
function is not allowed, and constructing them in @system
or @trusted
functions leaves the responsibility of memory safety up to the programmer.
When accessing global variables of an unsafe type in a @safe
function, the compiler should be either conservative and reject any access, or do basic taint-checking:
int* x = cast(int*) 0xDEADBEEF;
extern int* y;
int* z = new int(20);
void main() @safe
{
*x = 10; // Not allowed
*y = 10; // Not allowed
*z = 10; // Maybe allowed
}
Since the initialization expression cast(int*) 0xDEADBEEF
would not be allowed in a @safe
function, and since the initial value of y
is unknown, the compiler should annotate variables x
and y
as possibly containing an unsafe value, so they cannot be accessed in a @safe
function.
Only z
is known to have a safe initial value in this case, so the compiler could allow access to it in @safe
code.
Allowing @trusted
and @safe
to be applied to variables is useful when the programmer wants to relax the constraints, and applying @system
is useful to tighten the constraints:
@trusted int* x = cast(int*) 0xD000; // Assumed to be a good address
@safe extern int* y0; // Assumed to always have safe value
@system extern int* y1; // May have unsafe value
@system int* z = new int(20); // Starts out safe, but may be set to unsafe value in @trusted code
enum Opt {a, b, c}
@system Opt opt = Opt.a; // @trusted code relies on this being in range and not e.g. `cast(Opt) 100`
The need for encapsulation of data / restricted access to data in order to achieve memory safety has been mentioned in several discussions:
-
#8035: tupleof ignoring private shouldn't be accepted in
@safe
code (March 15, 2018) -
Re: shared - i need it to be useful (October 22, 2018)
-
Re: Manu's
shared
vs the @trusted promise (October 23, 2018) -
Re: Both safe and wrong? (February 7, 2019)
-
Should modifying private members be @system? (October 4, 2019)
-
Borrowing and Ownership (October 27, 2019)
-
#7347: Fix issue 20495 (choose copies unused union member, which is unsafe) (January 9, 2020)
-
Re: @trusted attribute should be replaced with @trusted blocks (January 16, 2020)
-
Trust Me: An Exploration of @trusted Code in D - Steven Schveighoffer (November 22, 2020)
Many other languages either do not allow systems programming at all (e.g. Java, Python) or do not support language-enforced memory safety (e.g. C/C++).
A notable exception is Rust, where the equivalent of this DIP has been proposed multiple times: Unsafe fields #381
Some excerpts from the discussion there are:
OTOH, privacy is primarily intended for abstraction (preventing users from depending on incidental details), not for protection (ensuring that invariants always hold). The fact that it can be used for protection is basically an happy accident. To clarify the difference, C strings have no abstraction whatever - they are a raw pointer to memory. However, they do have an invariant - they must point to a valid NUL-terminated string. Every place that constructs such a string must ensure it is valid, and every place that consumes it can rely on it. OTOH, a safe, say, buffered reader needs abstraction but doesn't need protection - it does not hold any critical invariant, but may want to change its internal representation.
This doesn't seem very useful to me. Within a module I would expect the authors to know what they're doing, and the unit-tests to save them when they do not. For other users, you could simply introduce getters and setters, and functions/methods can already be marked unsafe.
Ultimately the proposal has not yet been accepted.
The idea of using private
instead of @system
variables for D is discussed in the alternatives section.
More information about Rust's stand on unsafe functions can be found here:
Before the proposed changes, here is an overview of the relevant existing rules of which declarations can have the @system
attribute.
@system int w = 2; // compiles, does nothing
@system enum int x = 3; // compiles, does nothing
enum E
{
@system x, // error: @system is not a valid attribute for enum members
y,
}
@system alias x = E; // compiles, does nothing
@system template T() {} // compiles, does nothing
void func(@system int x) // error: @system attribute for function parameter is not supported
{
@system int x; // compiles, does nothing
}
template Temp(@system int x) {} // error: basic type expected, not @
Any function attribute can be attached to a variable declaration, but they cannot be retrieved:
@system @nogc pure nothrow int x;
pragma(msg, __traits(getFunctionAttributes, x)); // Error: first argument is not a function
pragma(msg, __traits(getAttributes, x)); // tuple()
(0) Accessing variables or fields marked @system
is not allowed in @safe
code
Even though read-only access of a @system
variable with a safe type could still be allowed without breaking @safe
, it is decided to restrict any 'access' (as defined by the specification) for simplicity.
Examples:
@system int x;
struct S
{
@system int y;
}
S s;
@safe
void main()
{
x += 10; // error: cannot modify @system variable 'x'
s.y += 10; // error: cannot modify @system field 'y'
int y = x; // error: cannot read @system variable 'x'
@system int z;
z += 1; // error: cannot modify @system variable 'z'
}
// inferred as a @system function
auto foo()
{
x = 0;
}
When using an alias
to a @system
variable, that alias has the same restrictions as the symbol to which it aliases.
Initialization of a @system
variable or field is allowed in @safe
code.
This includes static initialization, the automatically generated constructor, user-defined constructors, and the .init
value of a type.
@system int x;
shared static this() @safe
{
x = 3; // allowed, this is initialization
x = 3; // second time disallowed, this is assignment to a `@system` variable
}
struct T
{
@system int y;
@system int z = 3; // allowed
this(int y, int z) @safe
{
this.y = y; // allowed, this is initialization
this.y = y; // second time disallowed, this is assignment to a `@system` variable
this.z = z; // allowed
}
}
struct S
{
@system int y = 2;
}
void main() @safe
{
S s0 = {y: 3}; // static initialization
S s1 = S(3); // automatically generated constructor
S s2 = S.init; // .init value
S s3; // same as above
s3 = s2; // disallowed
}
Note that while it may be desirable to require a @trusted
annotation near initialization of @system
variables, realizing this is problematic since there is no syntax for @trusted
assignment.
@trusted
as a function annotation has its limitations:
- it does not work for global or local variables since a
@trusted
lambda there would move the declaration to that function's scope. - it not only trusts initialization of the variable on the left-hand side of the
=
, but also the initialization expression on right-hand side. Using a@trusted
function to return a variable byref
and assigning it does not count as initialization of that variable. - it disables the
scope
/return scope
checks of-dip1000
struct S
{
this(ref scope S s) @system
{
*(cast(int*) 0xDEADBEEF) = 0;
}
}
struct Wrapper(T)
{
@system T t;
this(T t) @trusted
{
this.t = t; // Oops! Calls a `@system` copy constructor
}
}
void main() @safe
{
auto w = Wrapper!S(S.init); // program killed by signal 11
() @trusted {@system int x = 3;}();
// x is not in scope anymore
}
@system int x = (() @trusted => 3)(); // this still does not mark the assignment `@trusted`
//() @trusted {@system int x = 3;}(); // does not work
(1) An aggregate with at least one @system
field is an unsafe type
Such an aggregate receives the same restrictions as pointer types in @safe
code, making implicit writes to @system
variables using e.g., array casting, impossible.
struct Handle
{
@system int handle;
}
void main() @safe
{
Handle h = void; // error
union U
{
Handle h;
int i;
}
U u;
u.i = 3; // error
ubyte[Handle.sizeof] storage;
auto array = cast(Handle[]) storage[]; // error
}
(2) Variables and fields without annotation are @safe
unless their initial value is not @safe
The rules regarding variables and fields are as follows:
- An initialization expression
x
is@system
when the function(() => x)
is inferred as@system
. - When marked
@system
, the result is always@system
regardless of the type. - When marked
@trusted
, the initialization expressionx
is treated as(() @trusted => x)
. - When marked
@safe
, the initialization expression must be@safe
. - In the absence of an annotation, the result is
@system
if the type is unsafe and the initialization expression is@system
, or if the type is unsafe and the variable isextern
.
int* getPtr() @system {return cast(int*) 0x8035FDF0;}
int getVal() @system {return -1;}
int* x1 = x0; // @safe, (() => x0) is @safe
int* x2 = cast(int*) 0x8035FDF0; // @system, (() => cast(int*) 0x8035FDF0) is @system
int* x3 = getPtr(); // @system, (() => getPtr()) is @system
int x4 = getVal(); // @safe, int is not an unsafe type
extern int* ext; // @system, unsafe type and initial value unknown
@system int x5 = 1; // @system as requested
@trusted int* x6 = getPtr(); // @safe, the getPtr call gets trusted
@safe int* x7 = getPtr(); // error: cannot initialize @safe variable with @system initializer
struct S {
// same rules for fields:
int* x9 = x3; // @system
int x8 = x5; // @safe
}
An exception to the above rules is made on unsafe types when the compiler knows the resulting value is safe.
int* getNull() pure @system {return null;}
int* n = getNull(); // despite unsafe type with @system initialization expression, inferred as @safe
Annotations with a scope (@system {}
) or colon (@system:
) affect variables just like they do functions.
@system
{
int y0; // @system
}
@system:
int y1; // @system
Placing @system
annotations is already allowed in the places where it's needed for this DIP, so there is no grammar change.
It has been suggested before that bypassing private
using e.g. .tupleof
or __traits(getMember)
should not be allowed in @safe
code.
While the need for giving a way of ensuring struct
invariants in @safe
code is in line with this DIP, the idea to use private
for it is argued against.
First of all, disallowing bypassing private
in @safe
code is not sufficient for ensuring run-time invariants on user-defined types.
When an aggregate has no members with an unsafe type, the private fields can still be indirectly written to via overlap in a union, void-initialization, or array casting.
Second, private
only acts on the module level, so a @trusted
member function cannot assume that a struct's invariants are upheld unless all other @safe
code in the module has been manually certified not to violate them.
This undermines the ability of the programmer to easily distinguish code requiring manual verification from code that can be checked automatically, especially since certain member functions like constructors, destructors, and operator overloads must be defined in the same module as the data on which they operate.
Finally, disallowing bypassing visibility with __traits(getMember, ...)
or .tupleof
would break @safe
code that relied on this, and issue 15371 explicitly requested this behavior.
Some have suggested that a struct
can be made into an unsafe type by adding an invariant
block.
struct Handle
{
invariant
{
// no run-time checks, just marking `Handle` as an unsafe type
}
private int fd;
}
However, Contract Programming is currently a separate feature from Memory Safety, and an empty invariant {}
block looks like something that can be safely removed.
Suddenly introducing @safe
restrictions and scope
semantics to types with invariant
blocks can be undesirable.
On top of that, it still does not protect from modifications outside of @trusted
code.
Attaching the @system
attribute to variables is already permitted, but doing so adds no compiler checks.
The additional checks for @system
variables in this proposal can cause existing @safe
code to break (note that @system
code is completely unaffected by everything in this DIP).
However, since @system
on variables does not currently do anything, the author suspects that users generally do not add this attribute to any variables at all, let alone variables that are meant to be used in @safe
code.
The biggest risk here is that variables accidentily fall inside a @system {}
block or under a @system:
section.
@system:
int x; // suddenly not writable in @safe code anymore
void unsafeFuncA() {};
void unsafeFuncB() {};
void main() @safe
{
x++; // not allowed anymore
}
Misconstructed pointers can be inferred @system
under the new rules.
struct S
{
int* a = cast(int*) 0x8035FDF0;
}
void main() @safe
{
S s;
*s.a = 0; // this gives an error now
}
Whenever this happens, there is a risk of memory corruption, so a compiler error would be in its place.
Still, a two-year deprecation period is proposed where instead of raising an error, a deprecation message is given whenever the new memory safety rules are broken.
A preview flag -preview=systemVariables
can additionally be added that immediately raises errors for violations while leaving other deprecation messages as warnings.
At the end of the preview period, there will also be a flag to revert it, -revert=systemVariables
, so that users can choose to keep the old behavior for a little longer.
Copyright (c) 2020-2022 by the D Language Foundation
Licensed under Creative Commons Zero 1.0
In the Feedback Thread, most of the feedback was related to details such as terminology, whether to use assert(x)
in the examples, etc.
The one structural piece of criticism was that making initialization of @system
variables safe is unsound, to wit, "Memory safety cannot depend on the correctness of a @safe
constructor." The DIP author replied that this boils down to "@trusted assumptions about @safe code", on which there is no consensus, and he has yet to determine a satisfactory design.
Of note, a detailed list of feedback was misplaced in the Discussion Thread. In short, the reviewer asserted that this proposal is essentially a response to bugs in the implementation of @safe
, and those bugs should be fixed rather than a new feature added to the language. Subsequent discussion appears to have led to consensus among the particpants that the DIP is necessary.
Only two items of actionable feedback were provided in the Feedback Thread:
- The Rationale and Description appear to have a conflict: the Rationale states that module-level
extern
variables should be disallowed in@safe
code, but a later example declares such "@safe
by default". A DIP author responded that there is no conflict; the Rationale describes existing behavior, the example describes the behavior following implementation of this DIP. - A specifc quote from the DIP ("...when the compiler knows the resulting value is safe") leads one to the question of what the compiler considers to be safe; the DIP should elaborate on this. A DIP author replied that the language spec already provides that information.
The following actionable feedback was provided in the Feedback Thread:
-
extern
variables should not be@safe
by default. The DIP author responded that this was originally intended to be consistent with the@safe
-by-default DIP. That DIP was rejected, so he will change this. -
The DIP provides an example, "int as pointer", and is correct in stating that there is no memory-safe risk in allowing a value without indirections to escape from a function. This undermines the motivation of the example: if there is no benefit from applying
scope
checking to data without indirections, then there is no justification to enabling such checks in@safe
code. This example should be removed. The DIP author responded that the motivation here was not to add scope checking to plain integers for non-memory safe reasons, but to be able to create custom types that represent an indirection under the hood. However, the DIP does not demonstrate what kind of@trusted
code this enables, and there are cases where one might want unsafe values without scope checking, so maybe it shouldn't be an effect of@system
members. -
The description says that
scope
"is not stripped away from an aggregate with at least one@system
field, even when the aggregate has no members that contain pointers." The example noted in the item above appears to be the only justification, so this sentence should be removed. See the DIP author's response to the previous item. -
Relating to the workaround described in the "int as pointer" example, wouldn't it work to put the handle in the union with
void[1]
? The DIP said no, asvoid[1]
is not a type with unsafe values. -
In the "Proposed changes" section, the "Further operations disallowed" appears to allow a case that defeats the purpose of disallowing reads of
@system
variables in@safe
code:ref const identity(T)(return ref const T var){ return var; } @safe void main() { auto x = someContainer.internalRepresentation.identity; }
The DIP author agreed.
-
In one of the examples in the "Proposed changes" section, the comment in the line
this.z = z; // disallowed, this is assignment
is wrong; this is not assignment, but construction. The DIP author agreed.
The language maintainers accepted this DIP, saying that it correctly identifies a loophole in the @safe
checks and provides a reasonable solution.