Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century. This post is the third in a series about D’s BetterC mode
D as BetterC (a.k.a. DasBetterC) is a way to upgrade existing C projects to D in an incremental manner. This article shows a step-by-step process of converting a non-trivial C project to D and deals with common issues that crop up.
While the dmd D compiler front end has already been converted to D, it’s such a large project that it can be hard to see just what was involved. I needed to find a smaller, more modest project that can be easily understood in its entirety, yet is not a contrived example.
The old make program I wrote for the Datalight C compiler in the early 1980’s came to mind. It’s a real implementation of the classic make program that’s been in constant use since the early 80’s. It’s written in pre-Standard C, has been ported from system to system, and is a remarkably compact 1961 lines of code, including comments. It is still in regular use today.
Here’s the make manual, and the source code. The executable size for make.exe is 49,692 bytes and the last modification date was Aug 19, 2012.
The Evil Plan is:
- Minimize diffs between the C and D versions. This is so that if the programs behave differently, it is far easier to figure out the source of the difference.
- No attempt will be made to fix or improve the C code during translation. This is also in the service of (1).
- No attempt will be made to refactor the code. Again, see (1).
- Duplicate the behavior of the C program as exactly and as much as possible,
bugs and all. - Do whatever is necessary as needed in the service of (4).
Once that is completed, only then is it time to fix, refactor, clean up, etc.
Spoiler Alert!
The completed conversion. The resulting executable is 52,252 bytes (quite comparable to the original 49,692). I haven’t analyzed the increment in size, but it is likely due to instantiations of the NEWOBJ
template (a macro in the C version), and changes in the DMC runtime library since 2012.
Step By Step
Here are the differences between the C and D versions. It’s 664 out of 1961 lines, about a third, which looks like a lot, but I hope to convince you that nearly all of it is trivial.
The #include files are replaced
by corresponding D imports, such as replacing #include <stdio.h>
with import core.stdc.stdio;
. Unfortunately, some of the #include
files are specific to Digital Mars C, and D versions do not exist (I need to fix that). To not let that stop the project, I simply included the relevant declarations in lines 29 to 64. (See the documentation for the import
declaration.)
#if _WIN32
is replaced with version (Windows)
. (See the documentation for the version condition and predefined versions.)
extern (C):
marks the remainder of the declarations in the file as compatible with C. (See the documentation for the linkage attribute.)
A global search/replace changes uses of the debug1, debug2 and debug3 macros to debug printf. In general, #ifdef DEBUG
preprocessor directives are replaced with debug
conditional compilation. (See the documentation for the debug
statement.)
/* Delete these old C macro definitions... #ifdef DEBUG -#define debug1(a) printf(a) -#define debug2(a,b) printf(a,b) -#define debug3(a,b,c) printf(a,b,c) -#else -#define debug1(a) -#define debug2(a,b) -#define debug3(a,b,c) -#endif */ // And replace their usage with the debug statement // debug2("Returning x%lx\n",datetime); debug printf("Returning x%lx\n",datetime);
The TRUE
, FALSE
and NULL
macros are search/replaced with true
, false
, and null
.
The ESC macro is replaced
by a manifest constant. (See the documentation for manifest constants.)
// #define ESC '!' enum ESC = '!';
The NEWOBJ macro is replaced
with a template function.
// #define NEWOBJ(type) ((type *) mem_calloc(sizeof(type))) type* NEWOBJ(type)() { return cast(type*) mem_calloc(type.sizeof); }
The filenamecmp
macro is replaced with a function.
Support for obsolete platforms is removed.
Global variables in D are placed by default into thread-local storage (TLS). But since make
is a single-threaded program, they can be inserted into global storage with the __gshared
storage class. (See the documentation for the __gshared
attribute.)
// int CMDLINELEN; __gshared int CMDLINELEN
D doesn’t have a separate struct tag name space, so the typedefs are not necessary. An
alias
can be used instead. (See the documentation for alias
declarations.) Also, struct
is omitted from variable declarations.
/* typedef struct FILENODE { char *name,genext[EXTMAX+1]; char dblcln; char expanding; time_t time; filelist *dep; struct RULE *frule; struct FILENODE *next; } filenode; */ struct FILENODE { char *name; char[EXTMAX1] genext; char dblcln; char expanding; time_t time; filelist *dep; RULE *frule; FILENODE *next; } alias filenode = FILENODE;
macro
is a keyword in D, so we’ll just use MACRO
instead.
Grouping together multiple pointer declarations is not allowed in D, use this instead:
// char *name,*text; // In D, the * is part of the type and // applies to each symbol in the declaration. char* name, text;
C array declarations are transformed to D array declarations. (See the documentation for D’s declaration syntax.)
// char *name,genext[EXTMAX+1]; char *name; char[EXTMAX+1] genext;
static
has no meaning at module scope in D. static
globals in C are equivalent to private
module-scope variables in D, but that doesn’t really matter when the module is never imported anywhere. They still need to be __gshared
and that can be applied to an entire block of declarations. (See the documentation for the static
attribute)
/* static ignore_errors = FALSE; static execute = TRUE; static gag = FALSE; static touchem = FALSE; static debug = FALSE; static list_lines = FALSE; static usebuiltin = TRUE; static print = FALSE; ... */ __gshared { bool ignore_errors = false; bool execute = true; bool gag = false; bool touchem = false; bool xdebug = false; bool list_lines = false; bool usebuiltin = true; bool print = false; ... }
Forward reference declarations for functions are not necessary in D. Functions defined in a module can be called at any point in the same module, before or after their definition.
Wildcard expansion doesn’t have much meaning to a make
program.
Function parameters declared with array syntax are pointers in reality, and are declared as pointers in D.
// int cdecl main(int argc,char *argv[]) int main(int argc,char** argv)
mem_init()
expands to nothing and we previously removed the macro.
C code can play fast and loose with arguments to functions, D demands that function prototypes be respected.
void cmderr(const char* format, const char* arg) {...} // cmderr("can't expand response file\n"); cmderr("can't expand response file\n", null);
Global search/replace C’s arrow operator (->
) with the dot operator (.
), as member access in D is uniform.
Replace conditional compilation directives with D’s version
.
/* #if TERMCODE ... #endif */ version (TERMCODE) { ... }
The lack of function prototypes shows the age of this code. D requires proper prototypes.
// doswitch(p) // char *p; void doswitch(char* p)
debug
is a D keyword. Rename it to xdebug
.
The \n\
line endings for C multiline string literals are not necessary in D.
Comment out unused code using D’s /+ +/
nesting block comments. (See the documentation for line, block and nesting block comments.)
static if
can replace many uses of #if
. (See the documentation for the static if
condition.)
Decay of arrays to pointers is not automatic in D, use .ptr
.
// utime(name,timep); utime(name,timep.ptr);
Use const
for C-style strings derived from string literals in D, because D won’t allow taking mutable pointers to string literals. (See the documentation for const
and immutable
.)
// linelist **readmakefile(char *makefile,linelist **rl) linelist **readmakefile(const char *makefile,linelist **rl)
void*
cannot be implicitly cast to char*
. Make it explicit.
// buf = mem_realloc(buf,bufmax); buf = cast(char*)mem_realloc(buf,bufmax);
inout
can be used to transfer the “const-ness” of a function from its argument to its return value. If the parameter is const
, so will be the return value. If the parameter is not const
, neither will be the return value. (See the documentation for inout
functions.)
// char *skipspace(p) {...} inout(char) *skipspace(inout(char)* p) {...}
arraysize
can be replaced with the .length
property of arrays. (See the documentation for array properties.)
// useCOMMAND |= inarray(p,builtin,arraysize(builtin)); useCOMMAND |= inarray(p,builtin.ptr,builtin.length)
String literals are immutable, so it is necessary to replace mutable ones with a stack allocated array. (See the documentation for string literals.)
// static char envname[] = "@_CMDLINE"; char[10] envname = "@_CMDLINE";
.sizeof
replaces C’s sizeof()
. (See the documentation for the .sizeof
property).
// q = (char *) mem_calloc(sizeof(envname) + len); q = cast(char *) mem_calloc(envname.sizeof + len);
Don’t care about old versions of Windows.
Replace ancient C usage of char *
with void*
.
And that wraps up the changes! See, not so bad. I didn’t set a timer, but I doubt this took more than an hour, including debugging a couple errors I made in the process.
This leaves the file man.c, which is used to open the browser on the make manual page when the -man
switch is given. Fortunately, this was already ported to D, so we can just copy that code.
Building make
is so easy it doesn’t even need a makefile:
\dmd2.079\windows\bin\dmd make.d dman.d -O -release -betterC -I. -I\dmd2.079\src\druntime\import\ shell32.lib
Summary
We’ve stuck to the Evil Plan of translating a non-trivial old school C program to D, and thereby were able to do it quickly and get it working correctly. An equivalent executable was generated.
The issues encountered are typical and easily dealt with:
- Replacement of
#include
withimport
- Lack of D versions of
#include
files - Global search/replace of things like
->
- Replacement of preprocessor macros with:
- manifest constants
- simple templates
- functions
- version declarations
- debug declarations
- Handling identifiers that are D keywords
- Replacement of C style declarations of pointers and arrays
- Unnecessary forward references
- More stringent typing enforcement
- Array handling
- Replacing C basic types with D types
None of the following was necessary:
- Reorganizing the code
- Changing data or control structures
- Changing the flow of the program
- Changing how the program works
- Changing memory management
Future
Now that it is in DasBetterC, there are lots of modern programming features available to improve the code:
- modules!
- memory safety (including buffer overflow checking)
- metaprogramming
- RAII
- Unicode
- nested functions
- member functions
- operator overloading
- documentation generation
- functional programming support
- Compile Time Function Execution
- etc.
Action
Let us know over at the D Forum how your DasBetterC project is coming along!