{"id":2385,"date":"2020-04-28T14:22:53","date_gmt":"2020-04-28T14:22:53","guid":{"rendered":"http:\/\/dlang.org\/blog\/?p=2385"},"modified":"2021-09-30T13:36:27","modified_gmt":"2021-09-30T13:36:27","slug":"interfacing-d-with-c-arrays-and-functions-arrays-part-two","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2020\/04\/28\/interfacing-d-with-c-arrays-and-functions-arrays-part-two\/","title":{"rendered":"Interfacing D with C: Arrays and Functions (Arrays Part 2)"},"content":{"rendered":"<p><img loading=\"lazy\" src=\"http:\/\/dlang.org\/blog\/wp-content\/uploads\/2016\/08\/d6.png\" alt=\"Digital Mars D logo\" width=\"200\" height=\"200\" class=\"alignleft size-full wp-image-181\" \/><\/p>\n<p>This post is part of <a href=\"https:\/\/dlang.org\/blog\/the-d-and-c-series\/\">an ongoing series<\/a> on working with both D and C in the same project. <a href=\"https:\/\/dlang.org\/blog\/2018\/10\/17\/interfacing-d-with-c-arrays-part-1\/\">The previous post explored the differences<\/a> in array declaration and initialization. This post takes the next step: declaring and calling C functions that take arrays as parameters.<\/p>\n<h2 id=\"arraysandcfunctiondeclarations\">Arrays and C function declarations<\/h2>\n<p>Using C libraries in D is extremely easy. Most of the time, things work exactly as one would expect, but as we saw in the previous article there can be subtle differences. When working with C functions that expect arrays, it&#8217;s necessary to fully understand these differences.<\/p>\n<p>The most straightforward and common way of declaring a C function that accepts an array as a parameter is to to use a pointer in the parameter list. For example, this hypothetical C function:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">void f0(int *arr);<\/pre>\n<p>In C, any array of <code>int<\/code> can be passed to this function no matter how it was declared. Given <code>int a[]<\/code>, <code>int b[3]<\/code>, or <code>int *c<\/code>, the function calls <code>f0(a)<\/code>, <code>f0(b)<\/code>, and <code>f0(c)<\/code> are all the same: a pointer to the first element of each array is passed to the function. Or using the lingo of C programmers, arrays <em>decay<\/em> to pointers<\/p>\n<p>Typically, in a function like <code>f0<\/code>, the implementer will expect the array to have been terminated with a marker appropriate to the context. For example, strings in C are arrays of <code>char<\/code> that are terminated with the <code>\\0<\/code> character (we&#8217;ll look at D strings vs. C strings in a future post). This is necessary because, without that character, the implementation of <code>f0<\/code> has no way to know which element in the array is the last one. Sometimes, a function is simply documented to expect a certain length, either in comments or in the function name, e.g., a <code>vector3f_add(float *vec)<\/code> will expect that <code>vec<\/code> points to exactly 3 elements. Another option is to require the length of the array as a separate argument:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">void f1(int *arr, size_t len);<\/pre>\n<p>None of these approaches is foolproof. If <code>f0<\/code> receives an array with no end marker or which is shorter than documented, or if <code>f1<\/code> receives an array with an actual length shorter than <code>len<\/code>, then the door is open for memory corruption. D arrays take this possibility into account, making it much easier to avoid such problems. But again, even D&#8217;s safety features aren&#8217;t 100% foolproof when calling C functions from D.<\/p>\n<p>There are other, less common, ways array parameters may be declared in C:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">void f2(int arr[]);\nvoid f3(int arr[9]);\nvoid f4(int arr[static 9]);<\/pre>\n<p>Although these parameters are declared using C&#8217;s array syntax, they boil down to the exact same function signature as <code>f0<\/code> because of the aforementioned pointer decay. The <code>[9]<\/code> in <code>f3<\/code> triggers no special enforcement by the compiler; <code>arr<\/code> is still effectively a pointer to <code>int<\/code> with unknown length. The <code>[9]<\/code> serves as documentation of what the function expects, and the implementation cannot rely on the array having nine elements.<\/p>\n<p>The only potential difference is in <code>f4<\/code>. The <code>static<\/code> added to the declaration tells the compiler that the function must take an array of, in this case, <em>at least<\/em> nine elements. It could have more than nine, but it can&#8217;t have fewer. That also rules out null pointers. The problem is, this isn&#8217;t necessarily enforced. Depending on which C compiler you use, if you shortchange the function and send it less than nine elements you might see warnings if they are enabled, but the compiler might not complain at all. (I haven&#8217;t tested current compilers for this article to see if any are actually reporting errors for this, or which ones provide warnings.)<\/p>\n<p>The behavior of C compilers doesn&#8217;t matter from the D side. All we need be concerned with is declaring these functions appropriately so that we can call them from D such that there are no crashes or unexpected results. Because they are all effectively the same, we could declare them all in D like so:<\/p>\n<pre class=\"prettyprint lang-d\">extern(C):\nvoid f0(int* arr);\nvoid f1(int* arr, size_t len);\nvoid f2(int* arr);\nvoid f3(int* arr);\nvoid f4(int* arr);<\/pre>\n<p>But just because we can do a thing doesn&#8217;t mean we should. Consider these alternative declarations of <code>f2<\/code>, <code>f3<\/code>, and <code>f4<\/code>:<\/p>\n<pre class=\"prettyprint lang-d\">extern(C):\nvoid f2(int[] arr);\nvoid f3(int[9] arr);\nvoid f4(int[9] arr);<\/pre>\n<p>Are there any consequences of taking this approach? The answer is yes, but that doesn&#8217;t mean we should default to <code>int*<\/code> in each case. To understand why, we need first to explore the innards of D arrays.<\/p>\n<h2 id=\"theanatomyofadarray\">The anatomy of a D array<\/h2>\n<p>The previous article showed that D makes a distinction between dynamic and static arrays:<\/p>\n<pre class=\"prettyprint lang-d\">int[] a0;\nint[9] a1;<\/pre>\n<p><code>a0<\/code> is a dynamic array and <code>a1<\/code> is a static array. Both have the properties <code>.ptr<\/code> and <code>.length<\/code>. Both may be indexed using the same syntax. But there are some key differences between them.<\/p>\n<h3 id=\"dynamicarrays\">Dynamic arrays<\/h3>\n<p>Dynamic arrays are usually allocated on the heap (though that isn&#8217;t a requirement). In the above case, no memory for <code>a0<\/code> has been allocated. It would need to be initialized with memory allocated via <code>new<\/code> or <code>malloc<\/code>, or some other allocator, or with an array literal. Because <code>a0<\/code> is uninitialized, <code>a0.ptr<\/code> is <code>null<\/code> and <code>a0.length<\/code> is <code>0<\/code>.<\/p>\n<p>A dynamic array in D is an aggregate type that contains the two properties as members. Something like this:<\/p>\n<pre class=\"prettyprint lang-d\">struct DynamicArray {\n    size_t length;\n    size_t ptr;\n}<\/pre>\n<p>In other words, a dynamic array is essentially a reference type, with the pointer\/length pair serving as a handle that refers to the elements in the memory address contained in the <code>ptr<\/code> member. Every built-in D type has a <code>.sizeof<\/code> property, so if we take <code>a0.sizeof<\/code>, we&#8217;ll find it to be <code>8<\/code> on 32-bit systems, where <code>size_t<\/code> is a 4-byte <code>uint<\/code>, and <code>16<\/code> on 64-bit systems, where <code>size_t<\/code> is an 8-byte <code>ulong<\/code>. In short, it&#8217;s the size of the handle and not the cumulative size of the array elements.<\/p>\n<h3 id=\"staticarrays\">Static arrays<\/h3>\n<p>Static arrays are generally allocated on the stack. In the declaration of <code>a1<\/code>, stack space is allocated for nine <code>int<\/code> values, all of which are initialized to <code>int.init<\/code> (which is <code>0<\/code>) by default. Because <code>a1<\/code> is initialized, <code>a1.ptr<\/code> points to the allocated space and <code>a1.length<\/code> is <code>9<\/code>. Although these two properties are the same as those of the dynamic array, the implementation details differ.<\/p>\n<p>A static array is a value type, with the value being <em>all of its elements<\/em>. So given the declaration of <code>a1<\/code> above, its nine <code>int<\/code> elements indicate that <code>a1.sizeof<\/code> is <code>9 * int.sizeof<\/code>, or <code>36<\/code>. The <code>.length<\/code> property is a compile-time constant that never changes, and the <code>.ptr<\/code> property, though not readable at compile time, is also a constant that never changes (it&#8217;s not even an lvalue, which means it&#8217;s impossible to make it point somewhere else).<\/p>\n<p>These implementation details are why we must pay attention when we cut and paste C array declarations into D source modules.<\/p>\n<h2 id=\"passingdarraystoc\">Passing D arrays to C<\/h2>\n<p>Let&#8217;s go back to the declaration of <code>f2<\/code> in C and give it an implementation:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">void f2(int arr[]) {\n    for(int i=0; i&lt;3; ++i)\n        printf(&quot;%d\\n&quot;, arr[i]);\n}<\/pre>\n<p>A na\u00efve declaration in D:<\/p>\n<pre class=\"prettyprint lang-d\">extern(C) void f2(int[]);\n\nvoid main() {\n    int[] a = [10, 20, 30];\n    f2(a);\n}<\/pre>\n<p>I say na\u00efve because this is never the right answer. Compiling <code>f2.c<\/code> with <code>df2.d<\/code> on Windows (<code>cl \/c f2.c<\/code> in the &#8220;x64 Native Tools&#8221; command prompt for Visual Studio, followed by <code>dmd -m64 df2.d f2.obj<\/code>), then running <code>df2.exe<\/code>, shows me the following output:<\/p>\n<pre>3\n0\n1970470928<\/pre>\n<p>There is no compiler error because the declaration of <code>f2<\/code> is pefectly valid D. The <code>extern(C)<\/code> indicates that this function uses the <code>cdecl<\/code> calling convention. Calling conventions affect the way arguments are passed to functions and how the function&#8217;s symbol is mangled. In this case, the symbol will be either <code>_f2<\/code> or <code>f2<\/code> (other calling conventions, like <code>stdcall<\/code>&#8212;<code>extern(Windows)<\/code> in D&#8212;have different mangling schemes). The declaration still has to be valid D. (In fact, any D function can be marked as <code>extern(C)<\/code>, something which is necessary when creating a D library that will be called from other languages.)<\/p>\n<p>There is also no linker error. DMD is calling out to the system linker (in this case, Microsoft&#8217;s <code>link.exe<\/code>), the same linker used by the system&#8217;s C and C++ compilers. That means the linker has no special knowledge about D functions. All it knows is that there is a call to a symbol, <code>f2<\/code> or <code>_f2<\/code>, that needs to be linked with the implementation. Since the type and number of parameters are not mangled into the symbol name, the linker will happily link with any matching symbol it finds (which, by the way, is the same thing it would do if a C program tried to call a C function which was declared with an incorrect parameter list).<\/p>\n<p>The C function is expecting a single pointer as an argument, but it&#8217;s instead receiving two values: the array length followed by the array pointer.<\/p>\n<p>The moral of this story is that any C function with array parameters declared using array syntax, like <code>int[]<\/code>, should be declared to accept pointers in D. Change the D source to the following and recompile using the same command line as before (there&#8217;s no need to recompile the C file):<\/p>\n<pre class=\"prettyprint lang-d\">extern(C) void f2(int*);\n\nvoid main() {\n    int[] a = [10, 20, 30];\n    f2(a.ptr);\n}<\/pre>\n<p>Note the use of <code>a.ptr<\/code>. It&#8217;s an error to try to pass a D array argument where a pointer is expected (with one very special exception, string literals, which I&#8217;ll cover in the next article in this series), so the array&#8217;s <code>.ptr<\/code> property must be used instead.<\/p>\n<p>The story for <code>f3<\/code> and <code>f4<\/code> is similar:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">void f3(int arr[9]);\nvoid f4(int arr[static 9]);<\/pre>\n<p>Remember, <code>int[9]<\/code> in D is a static array, not a dynamic array. The following do not match the C declarations:<\/p>\n<pre class=\"prettyprint lang-d\">void f3(int[9]);\nvoid f4(int[9]);<\/pre>\n<p>Try it yourself. The C implementation:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">void f3(int arr[9]) {\n    for(int i=0; i&lt;9; ++i)\n        printf(&quot;%d\\n&quot;, arr[i]);\n}<\/pre>\n<p>And the D implementation:<\/p>\n<pre class=\"prettyprint lang-d\">extern(C) void f3(int[9]);\n\nvoid main() {\n    int[9] a = [10, 20, 30, 40, 50, 60, 70, 80, 90];\n    f3(a);\n}<\/pre>\n<p>This is likely to crash, depending on the system. Rather than passing a pointer to the array, this code is instead passing all nine array elements by value! Now consider a C library that does something like this:<\/p>\n<pre class=\"prettyprint lang-c_cpp\">typedef float[16] mat4f;\nvoid do_stuff(mat4f mat);<\/pre>\n<p>Generally, when writing D bindings to C libraries, it&#8217;s a good idea to keep the same interface as the C library. But if the above is translated like the following in D:<\/p>\n<pre class=\"prettyprint lang-d\">alias mat4f = float[16];\nextern(C) void do_stuff(mat4f);<\/pre>\n<p>The sixteen floats will be passed to <code>do_stuff<\/code> every time it&#8217;s called. The same for all functions that take a <code>mat4f<\/code> parameter. One solution is just to do the same as in the <code>int[]<\/code> case and declare the function to take a pointer. However, that&#8217;s no better than C, as it allows the function to be called with an array that has fewer elements than expected. We can&#8217;t do anything about that in the <code>int[]<\/code> case, but that will usually be accompanied by a length parameter on the C side anyway. C functions that take typedef&#8217;d types like <code>mat4f<\/code> usually don&#8217;t have a length parameter and rely on the caller to get it right. <\/p>\n<p>In D, we can do better:<\/p>\n<pre class=\"prettyprint lang-d\">void do_stuff(ref mat4f);<\/pre>\n<p>Not only does this match the API implementor&#8217;s intent, the compiler will guarantee that any arrays passed to <code>do_stuff<\/code> are static float arrays with 16 elements. Since a <code>ref<\/code> parameter is just a pointer under the hood, all is as it should be on the C side.<\/p>\n<p>With that, we can rewrite the <code>f3<\/code> example:<\/p>\n<pre class=\"prettyprint lang-d\">extern(C) void f3(ref int[9]);\n\nvoid main() {\n    int[9] a = [10, 20, 30, 40, 50, 60, 70, 80, 90];\n    f3(a);\n}<\/pre>\n<h3 id=\"conclusion\">Conclusion<\/h3>\n<p>Most of the time, when interfacing with C from D, the C API declarations and any example code can be copied verbatim in D. But <em>most of the time<\/em> is not <em>all of the time<\/em>, so care must be taken to account for those exceptional cases. As we saw in the previous article, carelessness when declaring array variables can usually be caught by the compiler. As this article shows, the same is not the case for C function declarations. Interfacing D with C requires the same care as when writing C code.<\/p>\n<p>In the next article in this series, we&#8217;ll look at mixing D strings and C strings in the same program and some of the pitfalls that may arise. In the meantime, Steven Schveighoffer&#8217;s excellent article, &#8220;D Slices&#8221;, <a href=\"https:\/\/dlang.org\/articles\/d-array-article.html\">is a great place to start<\/a> for more details about D arrays.<\/p>\n<p><em>Thanks to Walter Bright and \u00c1tila Neves for their valuable feedback on this article.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is part of an ongoing series on working with both D and C in the same project. The previous post explored the differences in array declaration and initialization. This post takes the next step: declaring and calling C functions that take arrays as parameters. Arrays and C function declarations Using C libraries in [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,29,30],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2385"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=2385"}],"version-history":[{"count":24,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2385\/revisions"}],"predecessor-version":[{"id":2863,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2385\/revisions\/2863"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=2385"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=2385"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=2385"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}