{"id":1723,"date":"2018-10-17T15:09:40","date_gmt":"2018-10-17T15:09:40","guid":{"rendered":"http:\/\/dlang.org\/blog\/?p=1723"},"modified":"2021-10-08T11:02:14","modified_gmt":"2021-10-08T11:02:14","slug":"interfacing-d-with-c-arrays-part-1","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2018\/10\/17\/interfacing-d-with-c-arrays-part-1\/","title":{"rendered":"Interfacing D with C: Arrays Part 1"},"content":{"rendered":"<p>This post is <a href=\"https:\/\/dlang.org\/blog\/the-d-and-c-series\/\">part of an ongoing series<\/a> on working with both D and C in the same project. <a href=\"https:\/\/dlang.org\/blog\/2017\/12\/05\/interfacing-d-with-c-getting-started\/\">The previous post<\/a> showed how to compile and link C and D objects. This post is the first in a miniseries focused on arrays.<\/p>\n<p>When interacting with C APIs, it\u2019s almost a given that arrays are going to pop up in one way or another (perhaps most often as strings, a subject of a future article in the \u201cD and C\u201d series). Although D arrays are implemented in a manner that is not directly compatible with C, the fundamental building blocks are the same. This makes compatibility between the two relatively painless as long as the differences are not forgotten. This article is the first of a few exploring those differences.<\/p>\n<p>When using a C API from D, it\u2019s sometimes necessary to translate existing code from C to D. A new D program can benefit from existing examples of using the C API, and anyone porting a program from C that uses the API would do well to keep the initial port as close to the original as possible. It\u2019s on that basis that we\u2019re starting off with a look at the declaration and initialization syntax in both languages and how to translate between them. Subsequent posts in this series will cover multidimensional arrays, the anatomy of a D array, passing D arrays to and receiving C arrays from C functions, and how the GC fits into the picture.<\/p>\n<p>My original concept of covering this topic was much smaller in scope, my intent to brush over the boring details and assume that readers would know enough of the basics of C to derive the why from the what and the how. That was before I gave a D tutorial presentation to a group among whom only one person had any experience with C. I\u2019ve also become more aware that there <a href=\"https:\/\/forum.dlang.org\/\">are regular users of the D forums<\/a> who have never touched a line of C. As such, I\u2019ll be covering a lot more ground than I otherwise would have (hence a two-part article has morphed into at least three). I urge those for whom much of said ground is old hat not to get complacent in their skimming of the page! A comfortable experience with C is more apt than none at all to obscure some of the pitfalls I describe.<\/p>\n<h3 id=\"arraydeclarations\">Array declarations<\/h3>\n<p>Let\u2019s start with a simple declaration of a one-dimensional array:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">int c0[3];<\/pre>\n<p>This declaration allocates enough memory on the stack to hold three <code>int<\/code> values. The values are stored contiguously in memory, one right after the other. <code>c0<\/code> may or may not be initialized, depending on where it\u2019s declared. Global variables and <code>static<\/code> local variables are default initialized to <code>0<\/code>, as the following C program demonstrates.<\/p>\n<p><strong>definit.c<\/strong><\/p>\n<pre class=\"prettyprint lang-c_cpp\">#include &lt;stdio.h&gt;\n\n\/\/ global (can also be declared static)\nint c1[3];\n\nvoid main(int argc, char** argv)\n{\n    static int c2[3];       \/\/ static local\n    int c3[3];              \/\/ non-static local\n\n    printf(\"one: %i %i %i\\n\", c1[0], c1[1], c1[2]);\n    printf(\"two: %i %i %i\\n\", c2[0], c2[1], c2[2]);\n    printf(\"three: %i %i %i\\n\", c3[0], c3[1], c3[2]);\n}<\/pre>\n<p>For me, this prints:<\/p>\n<pre>one: 0 0 0\ntwo: 0 0 0\nthree: -1 8 0<\/pre>\n<p>The values for <code>c3<\/code> just happened to be lying around at that memory location. Now for the equivalent D declaration:<\/p>\n<pre class=\"prettyprint lang-d\">int[3] d0;<\/pre>\n<p><em><a href=\"https:\/\/run.dlang.io\/is\/moXqNt\">Try it online<\/a><\/em><\/p>\n<p>Here we can already find the first gotcha.<\/p>\n<p>A general rule of thumb in D is that C code pasted into a D source file should either work as it does in C or fail to compile. For a long while, C array declaration syntax fell into the former category and was a legal alternative to the D syntax. It has since been deprecated and subsequently removed from the language, meaning <code>int d0[3]<\/code> will now cause the compiler to scold you:<\/p>\n<pre>Error: instead of C-style syntax, use D-style int[3] d0<\/pre>\n<p>It may seem an arbitrary restriction, but it really isn\u2019t. At its core, it\u2019s about consistency at a couple of different levels.<\/p>\n<p>One is that <a href=\"https:\/\/dlang.org\/spec\/declaration.html\">we read declarations in D from right to left<\/a>. In the declaration of <code>d0<\/code>, everything flows from right to left in the same order that we say it: \u201c(d0) is an (array of three) (integers)\u201d. The same is not true of the C-style declaration.<\/p>\n<p>Another is that the type of <code>d0<\/code> is actually <code>int[3]<\/code>. Consider the following pointer declarations:<\/p>\n<pre class=\"prettyprint lang-d\">int* p0, p1;<\/pre>\n<p>The type of both <code>p0<\/code> and <code>p1<\/code> is <code>int*<\/code> (in C, only <code>p0<\/code> would be a pointer; <code>p1<\/code> would simply be an <code>int<\/code>). It\u2019s the same as all type declarations in D\u2014type on the left, symbol on the right. Now consider this:<\/p>\n<pre class=\"prettyprint lang-d\">int d1[3], d2[3];\nint[3] d4, d5;<\/pre>\n<p>Having two different syntaxes for array declarations, with one that splits the type like an infinitive, sets the stage for the production of inconsistent and potentially confusing code. By making the C-style syntax illegal, consistency is enforced. Code readability is a key component of maintainability.<\/p>\n<p>Another difference between <code>d0<\/code> and <code>c0<\/code> is that the elements of <code>d0<\/code> will be default initialized no matter where or how it\u2019s declared. Module scope, local scope, static local\u2026 it doesn\u2019t matter. Unless the compiler is told otherwise, variables in D are always default initialized to the predefined value specified by <a href=\"https:\/\/dlang.org\/spec\/property.html#init\">the <code>init<\/code> property of each type<\/a>. Array elements are initialized to the <code>init<\/code> property of the element type. As it happens, <code>int.init == 0<\/code>. Translate <strong>definit.c<\/strong> to D and see it for yourself (open up <a href=\"https:\/\/run.dlang.io\/\">run.dlang.io and give it a go<\/a>).<\/p>\n<p>When translating C to D, this default initialization business is a subtle gotcha. Consider this innocently contrived C snippet:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">\/\/ static variables are default initialized to 0 in C\nstatic float vertex[3];\nsome_func_that_expects_inited_vert(vertex);<\/pre>\n<p>A direct translation straight to D will not produce the expected result, as <code>float.init == float.nan<\/code>, not <code>0.0f<\/code>!<\/p>\n<p>When translating between the two languages, always be aware of which C variables are not explicitly initialized, which are expected to be initialized, and <a href=\"https:\/\/dlang.org\/spec\/type.html#basic-data-types\">the default initialization value for each of the basic types<\/a> in D. Failure to account for the subtleties may well lead to debugging sessions of the hair-pulling variety.<\/p>\n<p>Default initialization can easily be disabled in D with <code>= void<\/code> in the declaration. This is particularly useful for arrays that are going to be loaded with values before they\u2019re read, or that contain elements with an <code>init<\/code> value that isn\u2019t very useful as anything other than a marker of uninitialized variables.<\/p>\n<pre class=\"prettyprint lang-d\">float[16] matrix = void;\nsetIdentity(matrix);<\/pre>\n<p>On a side note, the purpose of default initialization is not to provide a convenient default value, but to make uninitialized variables stand out (a fact you may come to appreciate in a future debugging session). A common mistake is to assume that types like <code>float<\/code> and <code>char<\/code>, with their \u201cnot a number\u201d (<code>float.nan<\/code>) and invalid UTF\u20138 (<code>0xFF<\/code>) initializers, are the oddball outliers. Not so. Those values are great markers of uninitialized memory because they aren\u2019t useful for much else. It\u2019s the integer types (and <code>bool<\/code>) that break the pattern. For these types, the entire range of values has potential meaning, so there\u2019s no single value that universally shouts \u201cHey! I\u2019m uninitialized!\u201d. As such, integer and <code>bool<\/code> variables are often left with their default initializer since <code>0<\/code> and <code>false<\/code> are frequently the values one would pick for explicit initialization for those types. Floating point and character values, however, should generally be explicitly initialized or assigned to as soon as possible.<\/p>\n<h3 id=\"explicitarrayinitialization\">Explicit array initialization<\/h3>\n<p>C allows arrays to be explicitly initialized in different ways:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">int ci0[3] = {0, 1, 2};  \/\/ [0, 1, 2]\nint ci1[3] = {1};        \/\/ [1, 0, 0]\nint ci2[]  = {0, 1, 2};  \/\/ [0, 1, 2]\nint ci3[3] = {[2] = 2, [0] = 1}; \/\/ [1, 0, 2]\nint ci4[]  = {[2] = 2, [0] = 1}; \/\/ [1, 0, 2]<\/pre>\n<p>What we can see here is:<\/p>\n<ul>\n<li>elements are initialized sequentially with the constant values in the initializer list<\/li>\n<li>if there are fewer values in the list than array elements, then all remaining elements are initialized to <code>0<\/code> (as seen in <code>ci1<\/code>)<\/li>\n<li>if the array length is omitted from the declaration, the array takes the length of the initializer list (<code>ci2<\/code>)<\/li>\n<li>designated initializers, as in <code>ci3<\/code>, allow specific elements to be initialized with <code>[index] = value<\/code> pairs, and indexes not in the list are initialized to <code>0<\/code><\/li>\n<li>when the length is omitted from the declaration and a designated initializer is used, the array length is based on the highest index in the initializer and elements at all unlisted indexes are initialized to <code>0<\/code>, as seen in <code>ci4<\/code><\/li>\n<\/ul>\n<p>Initializers aren\u2019t supposed to be longer than the array (<code>gcc<\/code> gives a warning and initializes a three-element array to the first three initializers in the list, ignoring the rest).<\/p>\n<p>Note that it\u2019s possible to mix the designated and non-designated syntaxes in a single initializer:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">\/\/ [0, 1, 0, 5, 0, 0, 0, 8, 44]\nint ci5[] = {0, 1, [3] = 5, [7] = 8, 44};<\/pre>\n<p>Each value without a designation is applied in sequential order as normal. If there is a designated initializer immediately preceding it, then it becomes the value for the next index, and all other elements are initialized to <code>0<\/code>. Here, <code>0<\/code> and <code>1<\/code> go to indexes <code>ci5[0]<\/code> and <code>ci5[1]<\/code> as normal, since they are the first two values in the list. Next comes a designator for <code>ci5[3]<\/code>, so <code>ci5[2]<\/code> has no corresponding value in this list and is initialized to <code>0<\/code>. Next comes the designator for <code>ci5[7]<\/code>.\u00a0 We have skipped <code>ci5[4]<\/code>, <code>ci5[5]<\/code>, and <code>ci5[6]<\/code>,\u00a0 so they are all initialized to <code>0<\/code>. Finally, <code>44<\/code> lacks a designator, but immediately follows <code>[7]<\/code>, so it becomes the value for the element at <code>ci5[8]<\/code>. In the end, <code>ci5<\/code> is initialized to a length of <code>9<\/code> elements.<\/p>\n<p>Also note that designated array initializers were added to C in C99. Some C compiler versions either don&#8217;t support the syntax or require a special command line flag to enable it. As such, it&#8217;s probably not something you&#8217;ll encounter very much in the wild, but still useful to know about when you do.<\/p>\n<p>Translating all of these to D opens the door to more gotchas. Thankfully, the first one is a compiler error and won\u2019t cause any heisenbugs down the road:<\/p>\n<pre class=\"prettyprint lang-d\">int[3] wrong = {0, 1, 2};\nint[3] right = [0, 1, 2];<\/pre>\n<p>Array initializers in D are array literals. The same syntax can be used to pass anonymous arrays to functions, as in <code>writeln([0, 1, 2])<\/code>. For the curious, the declaration of <code>wrong<\/code> produces the following compiler error:<\/p>\n<pre>Error: a struct is not a valid initializer for a int[3]<\/pre>\n<p>The <code>{}<\/code> syntax <a href=\"https:\/\/dlang.org\/spec\/struct.html#static_struct_init\">is used for <code>struct<\/code> initialization<\/a> in D (not to be confused with struct literals, which can also <a href=\"https:\/\/dlang.org\/spec\/struct.html#struct-literal\">be used to initialize a <code>struct<\/code> instance<\/a>).<\/p>\n<p>The next surprise comes in the translation of <code>ci1<\/code>.<\/p>\n<pre class=\"prettyprint lang-d\">\/\/ int ci1[3] = {1};\nint[3] di1 = [1];<\/pre>\n<p>This actually produces a compiler error:<\/p>\n<pre>Error: mismatched array lengths, 3 and 1<\/pre>\n<p>What gives? First, take a look at the translation of <code>ci2<\/code>:<\/p>\n<pre class=\"prettyprint lang-d\">\/\/ int ci2[] = {0, 1, 2};\nint[] di2 = [0, 1, 2];<\/pre>\n<p>In the C code, there is no difference between <code>ci1<\/code> and <code>ci2<\/code>. They both are fixed-length, three-element arrays allocated on the stack. In D, this is one case where that general rule of thumb about pasting C code into D source modules breaks down.<\/p>\n<p>Unlike C, D <a href=\"https:\/\/dlang.org\/spec\/arrays.html#static-arrays\">actually makes a distinction between arrays<\/a> of types <code>int[3]<\/code> and <code>int[]<\/code>. The former is, like C, a fixed-length array, commonly referred to in D as a static array. The latter, unlike C, is a dynamic-length array, commonly referred to as a dynamic array or a slice. Its length can grow and shrink as needed.<\/p>\n<p>Initializers for static arrays must have the same length as the array. D simply does not allow initializers shorter than the declared array length. Dynamic arrays take the length of their initializers. <code>di2<\/code> is initialized with three elements, but more can be appended. Moreover, the initializer is not required for a dynamic array. In C, <code>int foo[];<\/code> is illegal, as the length can only be omitted from the declaration when an initializer is present.<\/p>\n<pre class=\"prettyprint lang-d\">\/\/ gcc says \"error: array size missing in 'illegalC'\"\n\/\/ int illegalC[]\nint[] legalD;\nlegalD ~= 10;<\/pre>\n<p><code>legalD<\/code> is an empty array, with no memory allocated for its elements. Elements can be added via the append operator, <code>~=<\/code>.<\/p>\n<p>Memory for dynamic arrays is allocated at the point of declaration only when an explicit initializer is provided, as with <code>di2<\/code>. If no initializer is present, memory is allocated when the first element is appended. By default, dynamic array memory is allocated from the GC heap (though the compiler may determine that it\u2019s safe to allocate on the stack as an optimization) and space for more elements than needed is initialized in order to reduce the need for future allocations (<a href=\"https:\/\/dlang.org\/phobos\/object.html#.reserve\">the <code>reserve<\/code> function can be used<\/a> to allocate a large block in one go, without initializing any elements). Appended elements go into the preallocated slots until none remain, then the next append triggers a new allocation. <a href=\"https:\/\/dlang.org\/articles\/d-array-article.html\">Steven Schveighoffer&#8217;s excellent array article<\/a> goes into the details, and also describes array features we&#8217;ll touch on in the next part.<\/p>\n<p>Often, when translating a declaration like <code>ci2<\/code> to D, the difference between the fixed-length, stack-allocated C array and the dynamic-length, GC-allocated D array isn\u2019t going to matter one iota. One case where it does matter is when the D array is declared inside a function marked <code>@nogc<\/code>:<\/p>\n<pre class=\"prettyprint lang-d\">@nogc void main()\n{\n    int[] di2 = [0, 1, 2];\n}<\/pre>\n<p><em><a href=\"https:\/\/run.dlang.io\/is\/4AO9vT\">Try it online<\/a><\/em><\/p>\n<p>The compiler ain\u2019t letting you get away with that:<\/p>\n<pre>Error: array literal in @nogc function D main may cause a GC allocation<\/pre>\n<p>The same error isn\u2019t triggered when the array is static, since it\u2019s allocated on the stack and the literal elements are just shoved right in there. New C programmers coming to D for the first time tend to reach for <code>@nogc<\/code> almost as if it goes against their very nature not to, so this is something they will bump into until they eventually come to the realization that <a href=\"https:\/\/dlang.org\/blog\/the-gc-series\/\">the GC is not the enemy of the people<\/a>.<\/p>\n<p>To wrap this up, that big paragraph on designated array initializers in C is about to pull double duty. D also supports designated array initializers, just with a different syntax.<\/p>\n<pre class=\"prettyprint lang-d\">\/\/ [0, 1, 0, 5, 0, 0, 0, 8, 44]\n\/\/ int ci5[] = {0, 1, [3] = 5, [7] = 8, 44};\nint[] di5 = [0, 1, 3:5, 7:8, 44];\nint[9] di6 = [0, 1, 3:5, 7:8, 44];<\/pre>\n<p><em><a href=\"https:\/\/run.dlang.io\/is\/4kAt6u\">Try it online<\/a><\/em><\/p>\n<p>It works with both static and dynamic arrays, following the same rules and producing the same initialization values as in C.<\/p>\n<p>The main takeaways from this section are:<\/p>\n<ul>\n<li>there is a distinction in D between static and dynamic arrays, in C there is not<\/li>\n<li>static arrays are allocated on the stack<\/li>\n<li>dynamic arrays are allocated on the GC heap<\/li>\n<li>uninitialized static arrays are default initialized to the <code>init<\/code> property of the array elements<\/li>\n<li>dynamic arrays can be explicitly initialized and take the length of the initializer<\/li>\n<li>dynamic arrays cannot be explicitly initialized in <code>@nogc<\/code> scopes<\/li>\n<li>uninitialized dynamic arrays are empty<\/li>\n<\/ul>\n<h3 id=\"thisisthetimeonthedblogwherewedance\">This is the time on the D Blog when we dance<\/h3>\n<p>There are a lot more words in the preceding sections than I had originally intended to write about array declarations and initialization, and I still have quite a bit more to say about arrays. In the next post, we\u2019ll look at the anatomy of a D array and dig into the art of passing D arrays across the language divide.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When interacting with C APIs, it\u2019s almost a given that arrays are going to pop up in one way or another (perhaps most often as strings, a subject of a future article in the \u201cD and C\u201d series). Although D arrays are implemented in a manner that is not directly compatible with C, the fundamental building blocks are the same. This makes compatibility between the two relatively painless as long as the differences are not forgotten. This article is the first of a few exploring those differences.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,29,30],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/1723"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=1723"}],"version-history":[{"count":25,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/1723\/revisions"}],"predecessor-version":[{"id":2862,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/1723\/revisions\/2862"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=1723"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=1723"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=1723"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}