Lost in Translation: Encapsulation

Posted on

I first learned programming in BASIC. Outgrew it, and switched to Fortran. Amusingly, my early Fortran code looked just like BASIC. My early C code looked like Fortran. My early C++ code looked like C. – Walter Bright, the creator of D

Programming in a language is not the same as thinking in that language. A natural side effect of experience with one programming language is that we view other languages through the prism of its features and idioms. Languages in the same family may look and feel similar, but there are guaranteed to be subtle differences that, when not accounted for, can lead to compiler errors, bugs, and missed opportunities. Even when good docs, books, and other materials are available, most misunderstandings are only going to be solved through trial-and-error.

D programmers come from a variety of programming backgrounds, C-family languages perhaps being the most common among them. Understanding the differences and how familiar features are tailored to D can open the door to more possibilities for organizing a code base, and designing and implementing an API. This article is the first of a few that will examine D features that can be overlooked or misunderstood by those experienced in similar languages.

We’re starting with a look at a particular feature that’s common among languages that support Object-Oriented Programming (OOP). There’s one aspect in particular of the D implementation that experienced programmers are sure they already fully understand and are often surprised to later learn they don’t.

Encapsulation

Most readers will already be familiar with the concept of encapsulation, but I want to make sure we’re on the same page. For the purpose of this article, I’m talking about encapsulation in the form of separating interface from implementation. Some people tend to think of it strictly as it relates to object-oriented programming, but it’s a concept that’s more broad than that. Consider this C code:

#include <stdio.h>
static size_t s_count;

void print_message(const char* msg) {
    puts(msg);
    s_count++;
}

size_t num_prints() { return s_count; }

In C, functions and global variables decorated with static become private to the translation unit (i.e. the source file along with any headers brought in via #include) in which they are declared. Non-static declarations are publicly accessible, usually provided in header files that lay out the public API for clients to use. Static functions and variables are used to hide implementation details from the public API.

Encapsulation in C is a minimal approach. C++ supports the same feature, but it also has anonymous namespaces that can encapsulate type definitions in addition to declarations. Like Java, C#, and other languages that support OOP, C++ also has access modifiers (alternatively known as access specifiers, protection attributes, visibility attributes) which can be applied to class and struct member declarations.

C++ supports the following three access modifiers, common among OOP languages:

  • public – accessible to the world
  • private – accessible only within the class
  • protected – accessible only within the class and its derived classes

An experienced Java programmer might raise a hand to say, “Um, excuse me. That’s not a complete definition of protected.” That’s because in Java, it looks like this:

  • protected – accessible within the class, its derived classes, and classes in the same package.

Every class in Java belongs to a package, so it makes sense to factor packages into the equation. Then there’s this:

  • package-private (not a keyword) – accessible within the class and classes in the same package.

This is the default access level in Java when no access modifier is specified. This combined with protected make packages a tool for encapsulation beyond classes in Java.

Similarly, C# has assemblies, which MSDN defines as “a collection of types and resources that forms a logical unit of functionality”. In C#, the meaning of protected is identical to that of C++, but the language has two additional forms of protection that relate to assemblies, and that are analogous to Java’s protected and package-private.

  • internal – accessible within the class and classes in the same assembly.
  • protected internal – accessible within the class, its derived classes, and classes in the same assembly.

Examining encapsulation in other programming languages will continue to turn up similarities and differences. Common encapsulation idioms are generally adapted to language-specific features. The fundamental concept remains the same, but the scope and implementation vary. So it should come as no surprise that D also approaches encapsulation in its own, language-specific manner.

Modules

The foundation of D’s approach to encapsulation is the module. Consider this D version of the C snippet from above:

module mymod;

private size_t _count;

void printMessage(string msg) {
    import std.stdio : writeln;

    writeln(msg);
    _count++;
}

size_t numPrints() { return _count; }

In D, access modifiers can apply to module-scope declarations, not just class and struct members. _count is private, meaning it is not visible outside of the module. printMessage and numPrints have no access modifiers; they are public by default, making them visible and accessible outside of the module. Both functions could have been annotated with the keyword public.

Note that imports in module scope are private by default, meaning the symbols in the imported modules are not visible outside the module, and local imports, as in the example, are never visible outside of their parent scope.

Alternative syntaxes are supported, giving more flexibility to the layout of a module. For example, there’s C++ style:

module mymod;

// Everything below this is private until either another
// protection attribute or the end of file is encountered.
private:
    size_t _count;

// Turn public back on
public:
    void printMessage(string msg) {
        import std.stdio : writeln;

        writeln(msg);
        _count++;
    }

    size_t numPrints() { return _count; }

And this:

module mymod;

private {
    // Everything declared within these braces is private.
    size_t _count;
}

// The functions are still public by default
void printMessage(string msg) {
    import std.stdio : writeln;

    writeln(msg);
    _count++;
}

size_t numPrints() { return _count; }

Modules can belong to packages. A package is a way to group related modules together. In practice, the source files corresponding to each module should be grouped together in the same directory on disk. Then, in the source file, each directory becomes part of the module declaration:

// mypack/amodule.d
mypack.amodule;

// mypack/subpack/anothermodule.d
mypack.subpack.anothermodule;

Note that it’s possible to have package names that don’t correspond to directories and module names that don’t correspond to files, but it’s bad practice to do so. A deep dive into packages and modules will have to wait for a future post.

mymod does not belong to a package, as no packages were included in the module declaration. Inside printMessage, the function writeln is imported from the stdio module, which belongs to the std package. Packages have no special properties in D and primarily serve as namespaces, but they are a common part of the codescape.

In addition to public and private, the package access modifier can be applied to module-scope declarations to make them visible only within modules in the same package.

Consider the following example. There are three modules in three files (only one module per file is allowed), each belonging to the same root package.

// src/rootpack/subpack1/mod2.d
module rootpack.subpack1.mod2;
import std.stdio;

package void sayHello() {
    writeln("Hello!");
}

// src/rootpack/subpack1/mod1.d
module rootpack.subpack1.mod1;
import rootpack.subpack1.mod2;

class Speaker {
    this() { sayHello(); }
}

// src/rootpack/app.d
module rootpack.app;
import rootpack.subpack1.mod1;

void main() {
    auto speaker = new Speaker;
}

Compile this with the following command line:

cd src
dmd -i rootpack/app.d

The -i switch tells the compiler to automatically compile and link imported modules (excluding those in the standard library namespaces core and std). Without it, each module would have to be passed on the command line, else they wouldn’t be compiled and linked.

The class Speaker has access to sayHello because they belong to modules that are in the same package. Now imagine we do a refactor and we decide that it could be useful to have access to sayHello throughout the rootpack package. D provides the means to make that happen by allowing the package attribute to be parameterized with the fully-qualified name (FQN) of a package. So we can change the declaration of sayHello like so:

package(rootpack) void sayHello() {
    writeln("Hello!");
}

Now all modules in rootpack and all modules in packages that descend from rootpack will have access to sayHello. Don’t overlook that last part. A parameter to the package attribute is saying that a package and all of its descendants can access this symbol. It may sound overly broad, but it isn’t.

For one thing, only a package that is a direct ancestor of the module’s parent package can be used as a parameter. Consider a module rootpack.subpack.subsub.mymod. That name contains all of the packages that are legal parameters to the package attribute in mymod.d, namely rootpack, subpack, and subsub. So we can say the following about symbols declared in mymod:

  • package – visible only to modules in the parent package of mymod, i.e. the subsub package.
  • package(subsub) – visible to modules in the subsub package and modules in all packages descending from subsub.
  • package(subpack) – visible to modules in the subpack package and modules in all packages descending from subpack.
  • package(rootpack) – visible to modules in the rootpack package and modules in all packages descending from rootpack.

This feature makes packages another tool for encapsulation, allowing symbols to be hidden from the outside world but visible and accessible in specific subtrees of a package hierarchy. In practice, there are probably few cases where expanding access to a broad range of packages in an entire subtree is desirable.

It’s common to see parameterized package protection in situations where a package exposes a common public interface and hides implementations in one or more subpackages, such as a graphics package with subpackages containing implementations for DirectX, Metal, OpenGL, and Vulkan. Here, D’s access modifiers allow for three levels of encapsulation:

  • the graphics package as a whole
  • each subpackage containing the implementations
  • individual modules in each package

Notice that I didn’t include class or struct types as a fourth level. The next section explains why.

Classes and structs

Now we come to the motivation for this article. I can’t recall ever seeing anyone come to the D forums professing surprise about package protection, but the behavior of access modifiers in classes and structs is something that pops up now and then, largely because of expectations derived from experience in other languages.

Classes and structs use the same access modifiers as modules: public, package, package(some.pack), and private. The protected attribute can only be used in classes, as inheritance is not supported for structs (nor for modules, which aren’t even objects). public, package, and package(some.pack) behave exactly as they do at the module level. The thing that surprises some people is that private also behaves the same way.

import std.stdio;

class C {
    private int x;
}

void main() {
    C c = new C();
    c.x = 10;
    writeln(c.x);
}

Run this example online

Snippets like this are posted in the forums now and again by people exploring D, accompanying a question along the lines of, “Why does this compile?” (and sometimes, “I think I’ve found a bug!”). This is an example of where experience can cloud expectations. Everyone knows what private means, so it’s not something most people bother to look up in the language docs. However, those who do would find this:

Symbols with private visibility can only be accessed from within the same module.

private in D always means private to the module. The module is the lowest level of encapsulation. It’s easy to understand why some experience an initial resistance to this, that it breaks encapsulation, but the intent behind the design is to strengthen encapsulation. It’s inspired by the C++ friend feature.

Having implemented and maintained a C++ compiler for many years, Walter understood the need for a feature like friend, but felt that it wasn’t the best way to go about it.

Being able to declare a “friend” that is somewhere in some other file runs against notions of encapsulation.

An alternative is to take a Java-like approach of one class per module, but he felt that was too restrictive.

One may desire a set of closely interrelated classes that encapsulate a concept, and those should go into a module.

So the way to view a module in D is not just as a single source file, but as a unit of encapsulation. It can contain free functions, classes, and structs, all operating on the same data declared in module scope and class scope. The public interface is still protected from changes to the private implementation inside the module. Along those same lines, protected class members are accessible not just in derived classes, but also in the module.

Sometimes though, there really is a benefit to denying access to private members in a module. The bigger a module becomes, the more of a burden it is to maintain, especially when it’s being maintained by a team. Every place a private member of a class is accessed in a module means more places to update when a change is made, thereby increasing the maintenance burden. The language provides the means to alleviate the burden in the form of the special package module.

In some cases, we don’t want to require the user to import multiple modules individually. Splitting a large module into smaller ones is one of those cases. Consider the following file tree:

-- mypack
---- mod1.d
---- mod2.d

We have two modules in a package called mypack. Let’s say that mod1.d has grown extremely large and we’re starting to worry about maintaining it. For one, we want to ensure that private members aren’t manipulated outside of class declarations with hundreds or thousands of lines in between. We want to split the module into smaller ones, but at the same time we don’t want to break user code. Currently, users can get at the module’s symbols by importing it with import mypack.mod1. We want that to continue to work. Here’s how we do it:

-- mypack
---- mod1
------ package.d
------ split1.d
------ split2.d
---- mod2.d

We’ve split mod1.d into two new modules and put them in a package named mod1. We’ve also created a special package.d file, which looks like this:

module mypack.mod1;

public import mypack.mod1.split1,
              mypack.mod1.split2;

When the compiler sees package.d, it knows to treat it specially. Users will be able to continue using import mypack.mod1 without ever caring that it’s now split into two modules in a new package. The key is the module declaration at the top of package.d. It’s telling the compiler to treat this package as the module mod1. And instead of automatically importing all modules in the package, the requirement to list them as public imports in package.d allows more freedom in implementing the package. Sometimes, you might want to require the user to explicitly import a module even when a package.d is present.

Now users will continue seeing mod1 as a single module and can continue to import it as such. Meanwhile, encapsulation is now more stringently enforced internally. Because split1 and split2 are now separate modules, they can’t touch each other’s private parts. Any part of the API that needs to be shared by both modules can be annotated with package protection. Despite the internal transformation, the public interface remains unchanged, and encapsulation is maintained.

Wrapping up

The full list of access modifiers in D can be defined as such:

  • public – accessible everywhere.
  • package – accessible to modules in the same package.
  • package(some.pack) – accessible to modules in the package some.pack and to the modules in all of its descendant packages.
  • private – accessible only in the module.
  • protected (classes only) – accessible in the module and in derived classes.

Hopefully, this article has provided you with the perspective to think in D instead of your “native” language when thinking about encapsulation in D.

Thanks to Ali Çehreli, Joakim Noah, and Nicholas Wilson for reviewing and providing feedback on this article.

5 thoughts on “Lost in Translation: Encapsulation”

    1. It’s listed as one, yes, but it serves a very different purpose from those I describe here, so it isn’t relevant to a discussion about encapsulation.

  1. It’s an interesting article.

    I find the mention of ‘expectations’ a little watered down, since C++/Java/C# represent perhaps the 3 most widely used langauages on the planet, where private means the same thing in each langauge – for decades now. In other words, a great deal many programmers would expect that. It is not a trivial expectation that a language seeking to attract such programmers should dismiss so easily. That such programmers, when they come to D, need to ‘read the documentation’ just to discover that private means something different, seems an unrealistic burden to put on them. But they will surely learn it one way or the other.

    But there are other options a programmer might want in D, besides having everything in a module be friends.

    For example, non-friend, non-member functions, inside a module with a class.

    Two tightly coupled classes, but properly encapsulated from each other, in the same module.

    Static, comile time verification of the use of your types interface by other code in the module…

    unittests inside the module that cannot bypass the declared interface of your types.

    ..the list can go on and on… it’s really easy to think of advantages from being able to better encapsulate/hide information, from other code in the same module.

    You cannot do simple things like this in D. Not even the ‘option’ is there.

    Additionally, there if major push back from the D community whenever there is a discussion about the possibility of making that an option, for the programmer.

    That’s a real shame, and for me, demonstrates a real weakness in the language.

    As your article demonstrates, the best D offers to those programmers, are workarounds.

    1. The argument that private class members in a module are more encapsulated if they can’t be accessed from outside the class boundaries within the module is a weak one. It’s an abstract, purely conceptual idea. From a practical perspective, it just isn’t true. Anyone editing the module has access to those private members. As far as the public API is concerned, encapsulation is maintained. D’s solution is not a “workaround”. It’s a common-sense application of private protection to D’s language-specific features.

      The only reasonable argument against this feature that I’ve seen is the one I referred to in the article regarding maintenance. You want to minimize the number of points private members are directly accessed so that when a change is made, there are fewer places that need to be updated. It’s why in Java it’s recommended that private members be access through getters/setters even inside the class. In practice, it’s only going to be a potential issue with extremely large classes maintained by teams over a long period of time. And even then, it’s a rare occurrence and is something that is usually going to be caught at compile time.

      Java offers no means of solving this problem other than breaking a single class into multiple classes. Large classes in D would present the same potential issue. It would never go away. Private-to-the-module also presents the same issue, but with the more practical remedy that the module can be split into multiple modules and the public interface remains unchanged.

      I suspect you and I have had this discussion many times already, but just in case you actually aren’t who I think you are, I’ll just point out the underlying theme of my post: you aren’t thinking in D. I’ve been programming in D since 2003. I can count the number of voices I have heard raising this feature as a potential issue on one hand. The number I’m aware of who have reported running into maintenance issues because of it is zero.

      It’s just not a problem in practice. Step outside this idea of the class as some sacrosanct boundary and you’ll eventually come to see that the module as the unit of encapsulation provides exactly the same guarantees.

      1. I just disagree. When a class defines a private member, and then defines a public method that can be called to operate on that private member, then, the only way to operate on that private member variable, should be by that public method – the defined interface.

        It’s a core concept in C++/C#/Java. There are no ifs or buts about it. The compiler on those langauges will enforce that constraint, regardless of what code surrounds that class.

        Sure, there might be exceptions to that. That’s why C++ got some friends.

        Now Walter apparently thought the idea of having friends over here, and over there, is problematic. I agree. But Walters solution, was to make everyone in the module friends – with no ability to unfriend – except through extensive redesign of how and where you layout your code (i.e the one class per module thing).

        I tell you, I much prefer to have the ability to define friends, that not have the ablitity to unfriend – in life, as well as in code.

        Now you’re telling me, I’ll be fine if other code in the module can just ignore those interface constraints you went to some effort to design, because I have control of that other code in the module?

        How can I take that argument seriously, I mean gee!

        Just ask Scott Meyers, how many mistakes are in his books, despite the really extensive effort he and others go to, to not have mistakes.

        The compiler should prevent you from making such mistakes. The language shouldn’t be telling me, ‘you’ll be fine, you won’t shoot yourself in the foot – just trust yourself that you’re writing correct code’. No! I don’t trust myself from not making mistakes.

        So, sorry, but I just cannot take that argument seriously.

        And anyway, if D really is a ‘multiparadigm’ langauge, then it shouldn’t demand that I ‘think in D’.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.