Author Archives: Jean-Louis Leroy

Function Generation in D: The Good, the Bad, the Ugly, and the Bolt

Introduction

Digital Mars D logo

A while ago, Andrei Alexandrescu started a thread in the D Programming Language forums, titled “Perfect forwarding”, about a challenge which came up during the July 2020 beerconf:

Write an idiomatic template forward that takes an alias fun and defines (generates) one overload for each overload of fun.

Several people proposed solutions. In the course of the discussion, it arose that there is sometimes a need to alter a function’s properties, like adding, removing, or hanging user-defined attributes or parameter storage classes. This was exactly the problem I struggled with while trying to support all the bells and whistles of D functions in my openmethods library.

Eventually, I created a module that helps with this problem and contributed it to the bolts meta-programming library. But before we get to that, let’s first take a closer look at function generation in D.

The Good

I speculate that many a programmer who is a moderately advanced beginner in D would quickly come up with a mostly correct solution to the “Perfect forwarding” challenge, especially those with a C++ background who have an interest in performing all sorts of magic tricks by means of template meta-programming. The solution will probably look like this:

template forward(alias fun)
{
  import std.traits: Parameters;
  static foreach (
    ovl; __traits(getOverloads, __traits(parent, fun), __traits(identifier, fun))) {
    auto forward(Parameters!ovl args)
    {
      return ovl(args);
    }
  }
}

...

int plus(int a, int b) { return a + b; }
string plus(string a, string b) { return a ~ b; }

assert(forward!plus(1, 2) == 3);        // pass
assert(forward!plus("a", "b") == "ab"); // pass

This solution is not perfect, as we shall see, but it is not far off either. It covers many cases, including some that a beginner may not even be aware of. For example, forward handles the following function without dropping function attributes or parameter storage classes:

class Matrix { ... }

Matrix times(scope const Matrix a, scope const Matrix b) pure @safe
{
  return ...;
}

pragma(msg, typeof(times));
// pure @safe Matrix(scope const(Matrix) a, scope const(Matrix) b)

pragma(msg, typeof(forward!times));
// pure @safe Matrix(scope const(Matrix) _param_0, scope const(Matrix) _param_1)

It even handles user-defined attributes (UDAs) on parameters:

struct testParameter;

void testPlus(@testParameter int a, @testParameter int b);

pragma(msg, typeof(testPlus));
// void(@(testParameter) int a, @(testParameter) int b)

pragma(msg, typeof(forward!testPlus));
// void(@(testParameter) int a, @(testParameter) int b)

Speaking of UDAs, that’s one of the issues with the solution above: it doesn’t carry function UDAs. It also doesn’t work with functions that return a reference. Both issues are easy to fix:

template forward(alias fun)
{
  import std.traits: Parameters;
  static foreach (ovl; __traits(getOverloads, __traits(parent, fun), __traits(identifier, fun)))
  {
    @(__traits(getAttributes, fun)) // copy function UDAs
    auto ref forward(Parameters!ovl args)
    {
      return ovl(args);
    }
  }
}

This solution is still not 100% correct though. If the forwardee is @trusted, the forwarder will be @safe:

@trusted void useSysCall() { ... }

pragma(msg, typeof(&useSysCall));         // void function() @trusted
pragma(msg, typeof(&forward!useSysCall)); // void function() @safe

This happens because the body of the forwarder consists of a single statement calling the useSysCall function. Since calling a trusted function is safe, the forwarder is automatically deemed safe by the compiler.

The Bad

However, Andrei’s challenge was not exactly what we discussed in the previous section. It came with a bit of pseudocode that suggested the template should not be eponymous. In other words, I believe that the exact task was to write a template that would be used like this: forward!fun.fun(...). Here is the pseudocode:

// the instantiation of forward!myfun would be (stylized):

template forward!myfun
{
    void myfun(int a, ref double b, out string c)
    {
        return myfun(a, b, c);
    }
    int myfun(in string a, inout double b)
    {
        return myfun(a, b);
    }
}

Though this looks like a small difference, if we want to implement exactly this, a complication arises. In the eponymous forward, we did not need to create a new identifier; we simply used the template name as the function name. Thus, the function name was fixed. Now we need to create a function with a name that depends on the forwardee’s name. And the only way to do this is with a string mixin.

The first time I had to do this, I tried the following:

template forward(alias fun)
{
  import std.format : format;
  import std.traits: Parameters;
  enum name = __traits(identifier, fun);
  static foreach (ovl; __traits(getOverloads, __traits(parent, fun), name)) {
    @(__traits(getAttributes, fun))
    auto ref mixin(name)(Parameters!ovl args)
    {
      return ovl(args);
    }
  }
}

This doesn’t work because a string mixin can only be used to create expressions or statements. Therefore, the solution is to simply expand the mixin to encompass the entire function definition. The token-quote operator q{} is very handy for this:

template forward(alias fun)
{
  import std.format : format;
  import std.traits: Parameters;
  enum name = __traits(identifier, fun);
  static foreach (ovl; __traits(getOverloads, __traits(parent, fun), name)) {
    mixin(q{
        @(__traits(getAttributes, fun))
          auto ref %s(Parameters!ovl args)
        {
          return ovl(args);
        }
      }.format(name));
  }
}

Though string mixins are powerful, they are essentially C macros. For many D programmers, resorting to a string mixin can feel like a defeat.

Let us now move on to a similar, yet significantly more difficult, challenge:

Write a class template that mocks an interface.

For example:

interface JsonSerializable
{
  string asJson() const;
}

void main()
{
  auto mock = new Mock!JsonSerializable();
}

Extrapolating the techniques acquired during the previous challenge, a beginner would probably try this first:

class Mock(alias Interface) : Interface
{
  import std.format : format;
  import std.traits: Parameters;
  static foreach (member; __traits(allMembers, Interface)) {
    static foreach (fun; __traits(getOverloads, Interface, member)) {
      mixin(q{
          @(__traits(getAttributes, fun))
          auto ref %s(Parameters!fun args)
          {
            // record call
            static if (!is(ReturnType!fun == void)) {
              return ReturnType!fun.init;
            }
          }
        }.format(member));
    }
  }
}

Alas, this fails to compile, throwing errors like:

Error: function `challenge.Mock!(JsonSerializable).Mock.asJson` return type
inference is not supported if may override base class function

In other words, auto cannot be used here. We have to fall back to explicitly specifying the return type:

class Mock(alias Interface) : Interface
{
  import std.format : format;
  import std.traits: Parameters, ReturnType;
  static foreach (member; __traits(allMembers, Interface)) {
    static foreach (fun; __traits(getOverloads, Interface, member)) {
      mixin(q{
          @(__traits(getAttributes, fun))
          ReturnType!fun %s(Parameters!fun args)
          {
            // record call
            static if (!is(ReturnType!fun == void)) {
              return ReturnType!fun.init;
            }
          }
        }.format(member));
    }
  }
}

This will not handle ref functions though. What about adding a ref in front of the return type, like we did in the first challenge?

// as before
          ref ReturnType!fun %s(Parameters!fun args) ...

This will fail with all the functions in the interface that do not return a reference.

The reason why everything worked almost magically in the first challenge is that we called the wrapped function inside the template. It enabled the compiler to deduce almost all of the characteristics of the original function and copy them to the forwarder function. But we have no model to copy from here. The compiler will copy some of the aspects of the function (pure, @safe, etc.) to match those of the overriden function, but not some others (ref, const, and the other modifiers).

Then, there is the issue of the function modifiers: const, immutable, shared, and static. These are yet another category of function “aspects”.

At this point, there is no other option than to analyze some of the function attributes by means of traits, and convert them to a string to be injected in the string mixin:

      mixin(q{
          @(__traits(getAttributes, fun))
          %sReturnType!fun %s(Parameters!fun args)
          {
            // record call
            static if (!is(ReturnType!fun == void)) {
              return ReturnType!fun.init;
            }
          }
        }.format(
            (functionAttributes!fun & FunctionAttribute.const_ ? "const " : "")
          ~ (functionAttributes!fun & FunctionAttribute.ref_ ? "ref " : "")
          ~ ...,
          member));
    }

If you look at the implementation of std.typecons.wrap, you will see that part of the code deals with synthesizing bits of a string mixin for the storage classes and modifiers.

The Ugly

So far, we have looked at the function storage classes, modifiers, and UDAs, but we have merely passed the parameter list as a single, monolithic block. However, sometimes we need to perform adjustments to the parameter list of the generated function. This may seem far-fetched, but it does happen. I encountered this problem in my openmethods library. During the “Perfect forwarding” discussion, it appeared that I was not the only one who wanted to do this.

I won’t delve into the details of openmethods here (see an older blog post for an overview of the module); for the purpose of this article, it suffices to say that, given a function declaration like this one:

Matrix times(virtual!Matrix a, double b);

openmethods generates this function:

Matrix dispatcher(Matrix a, double b)
{
  return resolve(a)(a, b);
}

The virtual template is a marker; it indicates which parameters should be taken into account (i.e., passed to resolve) when picking the appropriate specialization of times. Note that only a is passed to the resolve function—that is because the first parameter uses the virtual! marker and the second does not.

Bear in mind, though, that dispatcher is not allowed to use the type of the parameters directly. Inside the openmethods module, there is no Matrix type. Thus, when openmethods is handed a function declaration, it needs to synthesize a dispatcher function that refers to the declaration’s parameter types exclusively via the declaration. In other words, it needs to use the ReturnType and Parameters templates from std.traits to extract the types involved in the declaration – just like we did in the examples above.

Let’s put aside function attributes and UDAs – we already discussed those in the previous section. The obvious solution then seems to be:

ReturnType!times dispatcher(
  RemoveVirtual!(Parameters!times[0]) a, Parameters!times[1] b)
{
  return resolve(a)(a, b);
}

pragma(msg, typeof(&dispatcher)); // Matrix function(Matrix, double)

where RemoveVirtual is a simple template that peels off the virtual! marker from the type.

Does this preserve parameter storage classes and UDAs? Unfortunately, it does not:

@nogc void scale(ref virtual!Matrix m, lazy double by);

@nogc ReturnType!scale dispatcher(RemoveVirtual!(Parameters!scale[0]) a, Parameters!scale[1] b)
{
  return resolve(a)(a, b);
}

pragma(msg, typeof(&dispatcher)); // void function(Matrix a, double b)

We lost the ref on the first parameter and the lazy on the second. What happened to them?

The culprit is Parameters. This template is a wrapper around an obscure feature of the is operator used in conjunction with the __parameters type specialization. And it is quite a strange beast. We used it above to copy the parameter list of a function, as a whole, to another function, and it worked perfectly. The problem is what happens when you try to process the parameters separately. Let’s look at a few examples:

pragma(msg, Parameters!scale.stringof); // (ref virtual!(Matrix), lazy double)
pragma(msg, Parameters!scale[0].stringof); // virtual!(Matrix)
pragma(msg, Parameters!scale[1].stringof); // double

We see that accessing a parameter individually returns the type… and discards everything else!

There is actually a way to extract everything about a single parameter: use a slice instead of an element of the paramneter pack (yes, this is getting strange):

pragma(msg, Parameters!scale[0..1].stringof); // (ref virtual!(Matrix))
pragma(msg, Parameters!scale[1..2].stringof); // (lazy double)

So this gives us a solution for handling the second parameter of scale:

ReturnType!scale dispatcher(???, Parameters!scale[1..2]) { ... }

But what can we put in place of ???. RemoveVirtual!(Parameters!scale[0..1]) would not work. RemoveVirtual expects a type, and Parameters!scale[1..2] is not a type—it is a sort of conglomerate that contains a type, and perhaps storage classes, type constructors, and UDAs.

At this point, we have no other choice but to construct a string mixin once again. Something like this:

mixin(q{
    %s ReturnType!(scale) dispatcher(
      %s RemoveVirtual!(Parameters!(scale)[1]) a,
      Parameters!(scale)[1..2] b)
    {
        resolve(a)(a, b);
    }
  }.format(
    functionAttributes!scale & FunctionAttribute.nogc ? "@nogc " : ""
    /* also handle other function attributes */,
    __traits(getParameterStorageClasses, scale, 0)));

pragma(msg, typeof(dispatcher)); // @nogc void(ref double a, lazy double)

This is not quite sufficient though, because it still doesn’t take care of parameter UDAs.

To Boltly Refract…

openmethods once contained kilometers of mixin code like the above. Such heavy use of string mixins was too ugly and messy, so much so that the project began to feel less like fun and more like work. So I decided to sweep all the ugliness under a neat interface, once and for all. The result was a “refraction” module, which I later carved out of openmethods and donated to Ali Akhtarzada’s excellent bolts library. bolts attempts to fill in the gaps and bring some regularity to D’s motley set of meta-programming features.

refraction’s entry point is the refract function template. It takes a function and an “anchor” string, and returns an immutable Function object that captures all the aspects of a function. Function objects can be used at compile-time. It is, actually, their raison d’être.

Function has a mixture property that returns a declaration for the original function. For example:

Matrix times(virtual!Matrix a, double b);
pragma(msg, refract!(times, "times").mixture);
// @system ReturnType!(times) times(Parameters!(times) _0);

Why does refract need the anchor string? Can’t the string "times" be inferred from the function by means of __traits(identifier...)? Yes, it can, but in real applications we don’t want to use this. The whole point of the library is to be used in templates, where the function is typically passed to refract via an alias. In general, the function’s name has no meaning in the template’s scope—or if, by chance, the name exists, it does not name the function. All the meta-expressions used to dissect the function must work in terms of the local symbol that identifies the alias.

Consider:

module matrix;

Matrix times(virtual!Matrix a, double b);

Method!times timesMethod; // openmethods creates a `Method` object for each
                          // declared method

module openmethods;

struct Method(alias fun)
{
    enum returnTypeMixture = refract!(fun, "fun").returnType;
    pragma(msg, returnTypeMixture);              // ReturnType!(fun)
    mixin("alias R = ", returnTypeMixture, ";"); // ok
    pragma(msg, R.stringof);                     // Matrix
}

There is no times and no Matrix in module openmethods. Even if they existed, they could not be the times function and the Matrix class from module matrix, as this would require a circular dependency between the two modules, something that D forbids by default. However, there is a fun symbol, and it aliases to the function; thus, the return type can be expressed as ReturnType!(fun).

All aspects of the function are available piecemeal. For example:

@nogc void scale(ref virtual!Matrix m, lazy double by);
pragma(msg, refract!(scale, "scale").parameters[0].storageClasses); // ["ref"]

Function also has methods that return a new Function object, with an alteration to one of the aspects. They can be used to create a variation of a function. For example:

pragma(msg,
  refract!(scale, "scale")
  .withName("dispatcher")
  .withBody(q{{ resolve(_0[0])(_0); }})
  .mixture
);

@nogc @system ReturnType!(scale) dispatcher(ref Parameters!(scale)[0] _0, lazy Parameters!(scale)[1] _1)
{
  resolve(_0[0])(_0);
}

This is the reason behind the name “refraction”: the module creates a blueprint of a function, performs some alterations on it, and returns a string—called a mixture—which, when passed to mixin, will create a new function.

openmethods needs to change the type of the first parameter while preserving storage classes. With bolts.experimental.refraction, this becomes easy:

original = refract!(scale, "scale");

pragma(msg,
  original
  .withName("dispatcher")
  .withParameters(
    [original.parameters[0].withType(
        "RemoveVirtual!(%s)".format(original.parameters[0].type)),
     original.parameters[1],
    ])
  .withBody(q{{
      return resolve(_0)(%s);
   }}.format(original.argumentMixture))
);

This time, the generated code splits the parameter pack into individual components:

@nogc @system ReturnType!(scale) dispatcher(
  ref RemoveVirtual!(Parameters!(scale)[0]) _0, Parameters!(scale)[1..2] _1)
{
  return resolve(_0)(_0);
}

Note how the first and second parameters are handled differently. The first parameter is cracked open because we need to replace the type. That forces us to access the first Parameters value via indexing, and that loses the storage classes, UDAs, etc. So they need to be re-applied explicitly.

On the other hand, the second parameter does not have this problem. It is not edited; thus, the Parameters slice trick can be used. The lazy is indeed there, but it is inside the parameter conglomerate.

Conclusion

Initially, D looked almost as good as Lisp for generating functions. As we tried to gain finer control of the generated function, our code started to look a lot more like C macros; in fact, in some respects, it was even worse: we had to put an entire function definition in a string mixin just to set its name.

This is due to the fact that D is not as “regular” a language as Lisp. Some of the people helming the evolution of D are working on changing this, and it is my hope that an improved D will emerge in the not-too-distant future.

In the meantime, the experimental refraction module from the bolts meta-programming library offers a saner, easier way of generating functions without compromising on the idiosyncrasies that come with them. It allows you to pretend that functions can be disassembled and reassembled at will, while hiding all the gory details of the string mixins that are necessarily involved in that task.


Jean-Louis Leroy is not French, but Belgian. He got his first taste of programming from a HP-25 calculator. His first real programming language was Forth, where CTFE is pervasive. Later he programmed (a little) in Lisp and Smalltalk, and (a lot) in C, C++, and Perl. He now works for Bloomberg LP in New York. His interests include object-relational mapping, open multi-methods, DSLs, and language extensions in general.

Open Methods: From C++ to D

Prelude


Earlier this year I attended C++Now, a major conference dedicated to C++. I listened to talks given by very bright people, who used all sorts of avant-garde C++ techniques to accomplish all sorts of feats at compile time. It was a constexpr party! However, at the end of the week I had severe doubts about the future of C++.

I’ll say this for the organizers, though: they were quite open minded. They reserved the largest auditorium for a two-hour presentation of competing languages, one every day. We had Haskell and Rust, and Ali Çehreli talked about D.

I knew next to nothing about D. You see, I learned to program in Forth. Later I did some Lisp programming just for fun. To me, the idea of CTFE was natural right off the bat. So when Ali talked about static if and mixins, he definitely got my attention.

In order to learn (and evaluate) D, I decided to reproduce parts of my C++ library yomm11. It implements open multi-methods and contains code that exercises the “interesting” parts of the language, both at compile time and run time. Initially, I thought I would just see how I could reimplement bits of yomm11, how nice (or ugly) the syntax for declaring methods would turn out to be. The result was satisfying. I would even say intoxicating. I ended up bringing the port to completion and I feel that the result–openmethods.d–is the best implementation of open methods I’ve crafted so far. And it’s all done in a library, relying only on existing language features.

But wait, what are open methods?

From Member to Free

Open methods are just like virtual functions, except that they are declared outside of a class hierarchy. They are often conflated with multi-methods, because they are frequently implemented together (as is the case with this library), but really these are two different concepts. The ‘open’ part is, I believe, the more important, so I will focus more on that in this article.

Here is an example of a virtual function:

interface Animal
{
  string kick();
}

class Dog : Animal
{
  string kick() { return "bark"; }
}

class Pitbull : Dog
{
  override string kick() { return super.kick() ~ " and bite"; }
}

void main()
{
  import std.stdio : writeln;
  Animal snoopy = new Dog, hector = new Pitbull;
  writeln("snoopy.kick(): ", snoopy.kick()); // bark
  writeln("hector.kick(): ", hector.kick()); // bark and bite
}

The direct equivalent, translated to open methods, reads like this:

import openmethods;
mixin(registerMethods);

interface Animal
{
}

class Dog : Animal
{
}

class Pitbull : Dog
{
}

string kick(virtual!Animal);

@method
string _kick(Dog dog) { return "bark"; }

@method
string _kick(Pitbull dog) { return next!kick(dog) ~ " and bite"; }

void main()
{
  updateMethods();
  import std.stdio : writeln;
  Animal snoopy = new Dog, hector = new Pitbull;
  writeln("snoopy.kick(): ", snoopy.kick()); // bark
  writeln("hector.kick(): ", hector.kick()); // bark an dbite
}

Let’s break it down.

  • The string kick() in interface Animal becomes the free function declaration string kick(virtual!Animal). The implicit this parameter becomes an explicit parameter, and its type is prefixed with virtual!, thus indicating that the parameter is used to resolve calls at run time.
  • The string kick() override in class Dog becomes the free function definition @method string _kick(Dog dog) { return "bark"; }. Three things here:
    • the override is preceded by the @method attribute
    • the function name is prefixed with an underscore
    • the implicit this argument is explicitly named: Dog dog
  • The same thing happens to the override in class Pitbull, with an extra twist: super.kick() becomes next!kick(dog)
  • The calls to kick in main become free function calls – although, incidentally, they could have remained unchanged, thanks to Uniform Function Call Syntax.
  • After importing the openmethods module, a mixin is called: mixin(registerMethods). It should be called in each module that imports openmethods. It matches method declarations and overrides. It also creates a kick(Animal) function (note: sans the virtual!), which is the entry point in the dynamic dispatch mechanism.
  • Finally, main calls updateMethods. This should be done before calling any method (typically first thing in main) and each time a library containing methods is dynamically loaded or unloaded.

Open Is Good

What does it gain us? Well, a lot. Now we can add polymorphic behavior to any class hierarchy without modifying it. In fact, this implementation even allows you to add methods to Object, in a matter of speaking. Because, of course, class Object is never modified.

Let’s take a more serious example. Suppose that you have written a nifty matrix math library. Matrices come in all sorts of flavors: diagonal, shallow, tri-diagonal, and of course dense (i.e. “normal” matrices). Depending on the exact nature of a matrix, you can optimize some operations. Transposing a diagonal or a symmetric matrix amounts to returning it, unchanged. Adding sparse matrices does not require adding thousands of zeroes; and so on. And you have exploited all these properties in your matrix library, varying the behavior by means of virtual functions.

Neat.

Now let me ask you a question: should you provide a print function? A persist function?

Almost certainly not. For starters, there are many ways to display a matrix. If it is sparse, you may want to show only the non-zero elements… or all of them. You may want to display the null matrix as [0]… or in full. It is the privilige of the application to decide what matrices should look like on screen or paper. The matrix library should do the maths, and the application should do the presentation. If it needs to display matrices at all, that is. In game programming, there may be no need to display matrices. However, if you provide a print function, given the way they are implemented, the print or the persist code will always be pulled in from the library. Not good.

Now the application programmer will have to write his print and persist functions, but immediately he will be facing a problem: certainly he wants to vary the behavior according to the exact matrix type; he wants polymorphism! So he will probably end up coding a set of type switches.

Open methods solve this problem more neatly:

void print(virtual!Matrix m);

@method
void _print(Matrix m)
{
  const int nr = m.rows;
  const int nc = m.cols;
  for (int i = 0; i < nr; ++i) {
    for (int j = 0; j < nc; ++j) {
      writef("%3g", m.at(i, j));
    }
    writeln();
  }
}

@method
void _print(DiagonalMatrix m)
{
  import std.algorithm;
  import std.format;
  import std.array;
  writeln("diag(", m.elems.map!(x => format("%g", x)).join(", "), ")");
}

Accept No Visitors (c) Yuriy Solodkyy

A popular existing solution to this problem comes in the form of the Visitor pattern. Your matrix library could provide one, thus allowing the application writer to process different matrices according to their type.

In truth, Visitor is more an anti-pattern than a pattern, because the base class is aware of all its derived classes – something that flies in the face of all OOP design rules.

Here it is anyway:

import std.stdio;

interface Matrix
{
  interface Visitor
  {
    void visit(DenseMatrix m);
    void visit(DiagonalMatrix m);
  }

  void accept(Visitor v);
}

class DenseMatrix : Matrix
{
  void accept(Visitor v) { v.visit(this); }
}

class DiagonalMatrix : Matrix
{
  void accept(Visitor v) { v.visit(this); }
}

class PrintVisitor : Matrix.Visitor
{
  this(File of) { this.of = of; }

  void visit(DenseMatrix m) { of.writeln("print a DenseMatrix"); }
  void visit(DiagonalMatrix m) { of.writeln("print a DiagonalMatrix"); }

  File of;
}

void main()
{
  Matrix dense = new DenseMatrix, diagonal = new DiagonalMatrix;
  auto printer = new PrintVisitor(stdout);
  dense.accept(printer);
  diagonal.accept(printer);
}

This approach is more verbose than using an open method, and it has a more fatal flaw: it is not extensible. Suppose that the user of your matrix library wants to add matrices of his own design. For example, a SparseMatrix. The Visitor will be of no help here. With open methods, on the other hand, the solution is available, simple, and elegant:

// from library

void print(virtual!Matrix m, File of);

@method
void _print(DenseMatrix m, File of)
{
  of.writeln("print a DenseMatrix");
}

@method
void _print(DiagonalMatrix m, File of)
{
  of.writeln("print a DiagonalMatrix");
}

// extend library

class SparseMatrix : Matrix
{
  // ...
}

@method
void _print(SparseMatrix m, File of)
{
  of.writeln("print a SparseMatrix");
}

Multiple Dispatch

Occasionally, there is a need to take into account the type of two or more arguments to select the appropriate behavior. This is called multiple dispatch. Most languages only support single dispatch in the form of virtual member functions. Once again, the “solution” involves type switches or visitors. A few languages address this situation directly by means of multi-methods. The most notorious example is the Common Lisp Object System. Recently, a string of new languages have native support for multi-methods: Clojure (unsurprising for a lispoid), Julia, Nice, Cecil, TADS (a language for developing text-based adventure games).

This library implements multi-methods as well. There is no limit to the number of arguments that can be adorned with the virtual! qualifier. They will all be considered during dynamic dispatch.

Continuing the matrix library example, you probably want to provide binary operations on matrices: addition, subtraction and multiplication. If both operands are matrices, you really want to pick the right algorithm depending on the respective types of both operands. There is no point wasting time on adding all the elements if both operands are diagonal matrices; adding the diagonals suffices. Crucially, adding two DiagonalMatrix objects should return a DiagonalMatrix, not a plain DenseMatrix. Adding a DiagonalMatrix and a TriDiagonalMatrix should return a TriDiagonalMatrix, etc.

With open multi-methods, there is no problem at all:

module matrix;

Matrix plus(virtual!Matrix, virtual!Matrix);

module densematrix;

@method
Matrix _plus(Matrix a, Matrix b)
{
  // fallback: add all elements, fetched via interface
  // return a DenseMatrix
}

@method
Matrix _plus(DenseMatrix a, DenseMatrix b)
{
  // add all elements, access representation directly
  // return a DenseMatrix
}

module diagonalmatrix;

@method
Matrix _plus(DiagonalMatrix a, DiagonalMatrix b)
{
  // just add the elements on diagonals
  // return a DiagonalMatrix
}

Once again, open methods make the library extensible. It is trivial to plug new types in:

module mymatrices;

@method
Matrix _plus(SparseMatrix a, SparseMatrix b)
{
  // just add the non-zero elements
  // return a SparseMatrix
}

@method
Matrix _plus(SparseMatrix a, DiagonalMatrix b)
{
  // still don't add all the zeroes
  // return a SparseMatrix
}

@method
Matrix _plus(DiagonalMatrix a, SparseMatrix b)
{
  return plus(b, a); // matrix addition is commutative
}

Implementation Notes and Performance

This implementation uses tables of pointers to select the appropriate function to call. The process is very similar to what happens when a regular, virtual member function is called.

Each class involved in method dispatch–either because it is used as a virtual argument in a method declaration, or because it inherits from a class or an interface used as a virtual argument–has an associated method table (mtbl). The pointer to the method table (mptr) associated to a given class is stored, by default, in the deallocator pointer of the class’s ClassInfo. The first entry in a class’s vtable contains a pointer to its ClassInfo. The deallocator pointer was used to implement the deprecated delete method, so it is reasonable to recycle it. The deallocator pointer may be removed some day, or one may want to use methods in conjunction with classes that implement delete, so an alternative is supported. Tagging a method with @mptr("hash") makes it fetch the method table pointer from an array indexed by a perfect integer hash calculated during updateMethods. In this case, finding the mptr amounts to multiplying the vptr’s value by an integer and applying a bit mask.

The method table contains one entry for each virtual parameter for each method. If the method has a single virtual argument, the entry contains the specialization’s address, just like an ordinary virtual function; otherwise, the entry contains a pointer to a row in a multi-dimensional dispatch table for the first argument, and integer indexes for the subsequent virtual arguments.

Since the set of methods applicable to a given class is known only at run time and may change in the presence of dynamic loading, the position of a method’s entries in the method table is not fixed; it is stored in a table associated with each method. Finally, in the presence of multiple dispatch, a per-method array of strides is used to convert the multi-dimensional index to a linear offset.

However, finding the specialization amounts to a few memory reads, additions and perhaps multiplications. As a result, open methods are almost as fast as virtual functions backed by the compiler. How much slower they are depends on several factors, including the compiler, or whether the call is issued from an interface or a class. The following table sums up some of my benchmarks. Rows come in groups of three: the “usual”, compiler-supported virtual member functions; the functional equivalent using open methods; and the cost, expressed as (method - virtual) / virtual:

mptr in deallocator dmd ldc2 gdc
vfunc (interface) 1.84 1.80 1.80
vs 1-method (interface) 10.73 3.53 6.05
delta% 484% 96% 236%
vfunc (class) 1.83 1.80 1.80
vs 1-method (class) 5.12 2.13 1.80
delta% 180% 18% 0%
double dispatch 4.11 2.40 2.13
2-method 7.75 3.14 3.40
delta% 88.45% 30.71 59.85

Times in nanoseconds, measured on my Asus ROG G751JT.

A few results stand out. The first is expected, the others are quite remarkable.

  1. gdc and ldc2 do a better job at optimizing method dispatch
  2. Method calls that take an object perform much better than those taking an interface; there may be some further improvements to be done here.
  3. Method calls from an object are almost as fast as plain virtual function calls when ldc2 is used; they are just as fast with gdc. The latter is surprising and calls for further investigation.
  4. Disappointingly, double dispatch beats binary methods. This is not the case in C++. My intuition is that extracting the method table pointer requires traversing too many indirections, to the point that it is more costly than a plain virtual function call. In contrast, yomm11 sticks the mptr right inside the object (but at the cost of requiring changes to the classes). This deserves further investigation, but I am convinced that a bit of help from the compiler (like reserving the second element of the vtbl for the mptr) would reverse this result.

Memory footprint is also a common concern when implementing table-based multiple dispatch: imagine a method with three virtual arguments, which can each be any of a dozen classes. This gives us a 12x12x12 table, containing 1728 function pointers. Fortunately, it is rare that a specialization is defined for each combination of arguments. Typically, there is a lot of duplication along each axis. This implementation takes advantages of this: it builds tables free of redundancies. The table is not “compressed” per se, as it never exists as a cartesian product of all the class sets; rather, it is built in terms of class partitions, not classes, where all the classes in the same group in the same dimension have the same set of candidate specializations. See
this article for an example.

Extending the Language – in D and in C++

Yomm11, the initial implementation of open methods in C++, takes 1845 lines of code (excluding comments) to implement; the D version weighs 1120 lines. Much of the difference is due to D’s ClassInfo. It contains information on the base class and inherited interfaces. It is used to build a bi-directional inheritance graph of the types that have methods attached to them.

C++’s type_info contains no such informaton, thus yomm11 comes with its own runtime class information system, and a macro that the user must call for each class participating in method dispatch. The usual difficulties with static constructors arise, and necessitates extra code to handle them.

Yomm11 can be used in two modes: intrusive and orthogonal. In the intrusive mode, the user augments the classes using macro calls. One of them allocates a method table pointer in the object; the other–called in each constructor–initializes the method pointer. In the orthogonal mode, no modification of the classes is required: the method pointer is stored in a hash map keyed by the type_info obtained via the typeid operator.

openmethods.d has two modes, too, but they are both orthogonal. The default mode stores the method pointer in the deallocator field of the ClassInfo. The ClassInfo of an object is available as the first pointer of the virtual function table; all this is documented. However, hijacking deallocator is a bit like cheating, and nothing guarantees that that field will be there forever.

For that reason, the library supports another mode, which is only slightly slower than the first: store the method pointer in an array indexed by a perfect integer hash of the virtual table pointer.

Unfortunately, it is not possible to use this approach in C++. It is possible to retrieve an object’s vptr, albeit by resorting to undocumented implementation details. However, the library needs to build the method tables without having instances of objects at hand; in D, on the other hand, the value of the vptr is available in the ClassInfo. Another idea would be to use a pointer to the type_info structure; alas, while a type_info can be obtained from a type as well as from an object, the standard explicitly states that the type_info object for a given type may not be unique.

Thus D provides at bit more information than C++, and that makes all the difference.

As for the meta-programming involved in processing the method declarations and specializations, it is easier, and yields a better syntax, in D than in C++, for several reasons.

Obviously, constructs like static if and foreach on type tuples make meta-programming easier. But the real advantage of D comes from the interplay
of template mixins, string mixins, compile-time reflection and alias. The mixin(registerMethods) incantation scans the entire translation unit and:

  • locates all the method declarations by detecting the functions that have virtual! in their signature
  • creates (via an alias created by a string mixin) a function with the same signature, minus the virtual qualifiers, which is what the user calls
  • finds all the method specializations (by locating the functions that have a @method attribute) and generates code that, at runtime, will register the specializations with the appropriate method

Conclusion

Object-oriented programming became popular in the nineties, but has been subjected to a lot of criticism in the last decade. This is in part because OOP promised modularity and extensibility, but failed to deliver. Instead we got “God” classes and Visitors. It is not the fault of the OOP paradigm per se, but rather of the unnatural and unnecessay fusion of class membership and polymorphism that most OO languages enforce. Open methods correct this mistake. As a bonus, this implementation also supports multiple dispatch. This is OOP done right: not objects “talking” to each other, but applying the appropriate algorithm depending on the arguments’ runtime types.

Open methods can be implemented as a library in C++ and in D, but D has a clear edge when it comes to meta-programming. As a result, the D version of the library delivers a lighter, cleaner syntax.

openmethods.d is available on dub


Jean-Louis Leroy is not French, but Belgian. He got his first taste of programming from a HP-25 calculator. His first real programming language was Forth, where CTFE is pervasive. Later he programmed (a little) in Lisp and Smalltalk, and (a lot) in C, C++, and Perl. He now works for Bloomberg LP in New York. His interests include object-relational mapping, open multi-methods, DSLs, and language extensions in general.