DasBetterC: Converting make.c to D

Walter Bright is the BDFL of the D Programming Language and founder of Digital Mars. He has decades of experience implementing compilers and interpreters for multiple languages, including Zortech C++, the first native C++ compiler. He also created Empire, the Wargame of the Century. This post is the third in a series about D’s BetterC mode


D as BetterC (a.k.a. DasBetterC) is a way to upgrade existing C projects to D in an incremental manner. This article shows a step-by-step process of converting a non-trivial C project to D and deals with common issues that crop up.

While the dmd D compiler front end has already been converted to D, it’s such a large project that it can be hard to see just what was involved. I needed to find a smaller, more modest project that can be easily understood in its entirety, yet is not a contrived example.

The old make program I wrote for the Datalight C compiler in the early 1980’s came to mind. It’s a real implementation of the classic make program that’s been in constant use since the early 80’s. It’s written in pre-Standard C, has been ported from system to system, and is a remarkably compact 1961 lines of code, including comments. It is still in regular use today.

Here’s the make manual, and the source code. The executable size for make.exe is 49,692 bytes and the last modification date was Aug 19, 2012.

The Evil Plan is:

  1. Minimize diffs between the C and D versions. This is so that if the programs behave differently, it is far easier to figure out the source of the difference.
  2. No attempt will be made to fix or improve the C code during translation. This is also in the service of (1).
  3. No attempt will be made to refactor the code. Again, see (1).
  4. Duplicate the behavior of the C program as exactly and as much as possible,
    bugs and all.
  5. Do whatever is necessary as needed in the service of (4).

Once that is completed, only then is it time to fix, refactor, clean up, etc.

Spoiler Alert!

The completed conversion. The resulting executable is 52,252 bytes (quite comparable to the original 49,692). I haven’t analyzed the increment in size, but it is likely due to instantiations of the NEWOBJ template (a macro in the C version), and changes in the DMC runtime library since 2012.

Step By Step

Here are the differences between the C and D versions. It’s 664 out of 1961 lines, about a third, which looks like a lot, but I hope to convince you that nearly all of it is trivial.

The #include files are replaced by corresponding D imports, such as replacing #include <stdio.h> with import core.stdc.stdio;. Unfortunately, some of the #include files are specific to Digital Mars C, and D versions do not exist (I need to fix that). To not let that stop the project, I simply included the relevant declarations in lines 29 to 64. (See the documentation for the import declaration.)

#if _WIN32 is replaced with version (Windows). (See the documentation for the version condition and predefined versions.)

extern (C): marks the remainder of the declarations in the file as compatible with C. (See the documentation for the linkage attribute.)

A global search/replace changes uses of the debug1, debug2 and debug3 macros to debug printf. In general, #ifdef DEBUG preprocessor directives are replaced with debug conditional compilation. (See the documentation for the debug statement.)

/* Delete these old C macro definitions...
#ifdef DEBUG
-#define debug1(a)       printf(a)
-#define debug2(a,b)     printf(a,b)
-#define debug3(a,b,c)   printf(a,b,c)
-#else
-#define debug1(a)
-#define debug2(a,b)
-#define debug3(a,b,c)
-#endif
*/

// And replace their usage with the debug statement
// debug2("Returning x%lx\n",datetime);
debug printf("Returning x%lx\n",datetime);

The TRUE, FALSE and NULL macros are search/replaced with true, false, and null.

The ESC macro is replaced by a manifest constant. (See the documentation for manifest constants.)

// #define ESC     '!'
enum ESC =      '!';

The NEWOBJ macro is replaced with a template function.

// #define NEWOBJ(type)    ((type *) mem_calloc(sizeof(type)))
type* NEWOBJ(type)() { return cast(type*) mem_calloc(type.sizeof); }

The filenamecmp macro is replaced with a function.

Support for obsolete platforms is removed.

Global variables in D are placed by default into thread-local storage (TLS). But since make is a single-threaded program, they can be inserted into global storage with the __gshared storage class. (See the documentation for the __gshared attribute.)

// int CMDLINELEN;
__gshared int CMDLINELEN

D doesn’t have a separate struct tag name space, so the typedefs are not necessary. An
alias can be used instead. (See the documentation for alias declarations.) Also, struct is omitted from variable declarations.

/*
typedef struct FILENODE
        {       char            *name,genext[EXTMAX+1];
                char            dblcln;
                char            expanding;
                time_t          time;
                filelist        *dep;
                struct RULE     *frule;
                struct FILENODE *next;
        } filenode;
*/
struct FILENODE
{
        char            *name;
        char[EXTMAX1]  genext;
        char            dblcln;
        char            expanding;
        time_t          time;
        filelist        *dep;
        RULE            *frule;
        FILENODE        *next;
}

alias filenode = FILENODE;

macro is a keyword in D, so we’ll just use MACRO instead.

Grouping together multiple pointer declarations is not allowed in D, use this instead:

// char *name,*text;
// In D, the * is part of the type and 
// applies to each symbol in the declaration.
char* name, text;

C array declarations are transformed to D array declarations. (See the documentation for D’s declaration syntax.)

// char            *name,genext[EXTMAX+1];
char            *name;
char[EXTMAX+1]  genext;

static has no meaning at module scope in D. static globals in C are equivalent to private module-scope variables in D, but that doesn’t really matter when the module is never imported anywhere. They still need to be __gshared and that can be applied to an entire block of declarations. (See the documentation for the static attribute)

/*
static ignore_errors = FALSE;
static execute = TRUE;
static gag = FALSE;
static touchem = FALSE;
static debug = FALSE;
static list_lines = FALSE;
static usebuiltin = TRUE;
static print = FALSE;
...
*/

__gshared
{
    bool ignore_errors = false;
    bool execute = true;
    bool gag = false;
    bool touchem = false;
    bool xdebug = false;
    bool list_lines = false;
    bool usebuiltin = true;
    bool print = false;
    ...
}

Forward reference declarations for functions are not necessary in D. Functions defined in a module can be called at any point in the same module, before or after their definition.

Wildcard expansion doesn’t have much meaning to a make program.

Function parameters declared with array syntax are pointers in reality, and are declared as pointers in D.

// int cdecl main(int argc,char *argv[])
int main(int argc,char** argv)

mem_init() expands to nothing and we previously removed the macro.

C code can play fast and loose with arguments to functions, D demands that function prototypes be respected.

void cmderr(const char* format, const char* arg) {...}

// cmderr("can't expand response file\n");
cmderr("can't expand response file\n", null);

Global search/replace C’s arrow operator (->) with the dot operator (.), as member access in D is uniform.

Replace conditional compilation directives with D’s version.

/*
 #if TERMCODE
    ...
 #endif
*/
    version (TERMCODE)
    {
        ...
    }

The lack of function prototypes shows the age of this code. D requires proper prototypes.

// doswitch(p)
// char *p;
void doswitch(char* p)

debug is a D keyword. Rename it to xdebug.

The \n\ line endings for C multiline string literals are not necessary in D.

Comment out unused code using D’s /+ +/ nesting block comments. (See the documentation for line, block and nesting block comments.)

static if can replace many uses of #if. (See the documentation for the static if condition.)

Decay of arrays to pointers is not automatic in D, use .ptr.

// utime(name,timep);
utime(name,timep.ptr);

Use const for C-style strings derived from string literals in D, because D won’t allow taking mutable pointers to string literals. (See the documentation for const and immutable.)

// linelist **readmakefile(char *makefile,linelist **rl)
linelist **readmakefile(const char *makefile,linelist **rl)

void* cannot be implicitly cast to char*. Make it explicit.

// buf = mem_realloc(buf,bufmax);
buf = cast(char*)mem_realloc(buf,bufmax);

Replace unsigned with uint.

inout can be used to transfer the “const-ness” of a function from its argument to its return value. If the parameter is const, so will be the return value. If the parameter is not const, neither will be the return value. (See the documentation for inout functions.)

// char *skipspace(p) {...}
inout(char) *skipspace(inout(char)* p) {...}

arraysize can be replaced with the .length property of arrays. (See the documentation for array properties.)

// useCOMMAND  |= inarray(p,builtin,arraysize(builtin));
useCOMMAND  |= inarray(p,builtin.ptr,builtin.length)

String literals are immutable, so it is necessary to replace mutable ones with a stack allocated array. (See the documentation for string literals.)

// static char envname[] = "@_CMDLINE";
char[10] envname = "@_CMDLINE";

.sizeof replaces C’s sizeof(). (See the documentation for the .sizeof property).

// q = (char *) mem_calloc(sizeof(envname) + len);
q = cast(char *) mem_calloc(envname.sizeof + len);

Don’t care about old versions of Windows.

Replace ancient C usage of char * with void*.

And that wraps up the changes! See, not so bad. I didn’t set a timer, but I doubt this took more than an hour, including debugging a couple errors I made in the process.

This leaves the file man.c, which is used to open the browser on the make manual page when the -man switch is given. Fortunately, this was already ported to D, so we can just copy that code.

Building make is so easy it doesn’t even need a makefile:

\dmd2.079\windows\bin\dmd make.d dman.d -O -release -betterC -I. -I\dmd2.079\src\druntime\import\ shell32.lib

Summary

We’ve stuck to the Evil Plan of translating a non-trivial old school C program to D, and thereby were able to do it quickly and get it working correctly. An equivalent executable was generated.

The issues encountered are typical and easily dealt with:

  • Replacement of #include with import
  • Lack of D versions of #include files
  • Global search/replace of things like ->
  • Replacement of preprocessor macros with:
    • manifest constants
    • simple templates
    • functions
    • version declarations
    • debug declarations
  • Handling identifiers that are D keywords
  • Replacement of C style declarations of pointers and arrays
  • Unnecessary forward references
  • More stringent typing enforcement
  • Array handling
  • Replacing C basic types with D types

None of the following was necessary:

  • Reorganizing the code
  • Changing data or control structures
  • Changing the flow of the program
  • Changing how the program works
  • Changing memory management

Future

Now that it is in DasBetterC, there are lots of modern programming features available to improve the code:

Action

Let us know over at the D Forum how your DasBetterC project is coming along!

4 thoughts on “DasBetterC: Converting make.c to D

  1. Niels Vilsk

    Incredibly clear and pragmatic explanations on how any C programmer can progressively convert its C legacy software to the modern D language.

    Probably the best “D tutorial for C programmers” to date !!!

  2. Rugxulo

    Pardon the suggestion, but I think minised would be a good tool to convert. (I halfway wanted to convert it to Turbo Pascal [FPC] myself!) It’s BSD-licensed, has a good test suite, and is a standard POSIX tool. I often find sed very useful (even if I’m not really *nix savvy).

    http://exactcode.com/opensource/minised/

  3. hen3ry

    I started on Dash (that is, Bash) as a training exercise. The C code is surprisingly clean. It does, however, use YACC, which I would replace with a Pratt parser solution in the Future.

Comments are closed.