{"id":2303,"date":"2020-01-28T13:54:16","date_gmt":"2020-01-28T13:54:16","guid":{"rendered":"http:\/\/dlang.org\/blog\/?p=2303"},"modified":"2021-09-30T13:37:19","modified_gmt":"2021-09-30T13:37:19","slug":"wc-in-d-712-characters-without-a-single-branch","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2020\/01\/28\/wc-in-d-712-characters-without-a-single-branch\/","title":{"rendered":"wc in D: 712 Characters Without a Single Branch"},"content":{"rendered":"<p>After reading <a href=\"https:\/\/chrispenner.ca\/posts\/wc\">\u201cBeating C With 80 Lines Of Haskell: Wc\u201d<\/a>, which I found on <a href=\"https:\/\/news.ycombinator.com\/news\">Hacker News<\/a>, I thought D could do better. So I wrote a <code>wc<\/code> in D.<\/p>\n<h2 id=\"theprogram\">The Program<\/h2>\n<p>It consists of one file and has 34 lines and 712 characters.<\/p>\n<pre class=\"prettyprint lang-d\">import std.stdio : writefln, File;\r\nimport std.algorithm : map, fold, splitter;\r\nimport std.range : walkLength;\r\nimport std.typecons : Yes;\r\nimport std.uni : byCodePoint;\r\n\r\nstruct Line {\r\n\tsize_t chars;\r\n\tsize_t words;\r\n}\r\n\r\nstruct Output {\r\n\tsize_t lines;\r\n\tsize_t words;\r\n\tsize_t chars;\r\n}\r\n\r\nOutput combine(Output a, Line b) pure nothrow {\r\n\treturn Output(a.lines + 1, a.words + b.words, a.chars + b.chars);\r\n}\r\n\r\nLine toLine(char[] l) pure {\r\n\treturn Line(l.byCodePoint.walkLength, l.splitter.walkLength);\r\n}\r\n\r\nvoid main(string[] args) {\r\n\tauto f = File(args[1]);\r\n\tOutput o = f\r\n\t\t.byLine(Yes.keepTerminator)\r\n\t\t.map!(l =&gt; toLine(l))\r\n\t\t.fold!(combine)(Output(0, 0, 0));\r\n\r\n\twritefln!\"%u %u %u %s\"(o.lines, o.words, o.chars, args[1]);\r\n}<\/pre>\n<p>Sure, it is using <a href=\"https:\/\/dlang.org\/phobos\/index.html\">Phobos, D\u2019s standard library<\/a>, but then why wouldn\u2019t it? Phobos is awesome and ships with every D compiler. The program itself does not contain a single <code>if<\/code> statement. The Haskell <code>wc<\/code> implementation has several <code>if<\/code> statements. The D program, apart from the <code>main<\/code> function, contains three tiny functions. I could have easily put all the functionally in one range chain, but then it probably would have exceeded 80 characters per line. That\u2019s a major code-smell.<\/p>\n<h2 id=\"theperformance\">The Performance<\/h2>\n<p>Is the D <code>wc<\/code> faster than the coreutils <code>wc<\/code>? No, but it took me 15 minutes to write mine (I had to <a href=\"https:\/\/dlang.org\/phobos\/std_range_primitives.html#walkLength\">search for <code>walkLength<\/code><\/a>, because I forgot its name).<\/p>\n<table>\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/>\n<col \/> <\/colgroup>\n<thead>\n<tr>\n<th>file<\/th>\n<th>lines<\/th>\n<th>bytes<\/th>\n<th>coreutils<\/th>\n<th>haskell<\/th>\n<th>D<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>app.d<\/td>\n<td>46<\/td>\n<td>906<\/td>\n<td>3.5 ms +- 1.9 ms<\/td>\n<td>39.6 ms +- 7.8 ms<\/td>\n<td>8.9 ms +- 2.1 ms<\/td>\n<\/tr>\n<tr>\n<td>big.txt<\/td>\n<td>862<\/td>\n<td>64k<\/td>\n<td>4.7 ms +- 2.0 ms<\/td>\n<td>39.6 ms +- 7.8 ms<\/td>\n<td>9.8 ms +- 2.1 ms<\/td>\n<\/tr>\n<tr>\n<td>vbig.txt<\/td>\n<td>1.7M<\/td>\n<td>96M<\/td>\n<td>658.6ms +- 24.5ms<\/td>\n<td>226.4 ms +- 29.5 ms<\/td>\n<td>1.102 s +- 0.022 s<\/td>\n<\/tr>\n<tr>\n<td>vbig2.txt<\/td>\n<td>12.1M<\/td>\n<td>671M<\/td>\n<td>4.4 s +- 0.058 s<\/td>\n<td>1.1 s +- 0.039 s<\/td>\n<td>7.4 s +- 0.085 s<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Memory:<\/p>\n<table>\n<colgroup>\n<col \/>\n<col \/>\n<col \/>\n<col \/> <\/colgroup>\n<thead>\n<tr>\n<th>file<\/th>\n<th>coreutils<\/th>\n<th>haskell<\/th>\n<th>D<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>app.d<\/td>\n<td>2052K<\/td>\n<td>7228K<\/td>\n<td>7708K<\/td>\n<\/tr>\n<tr>\n<td>big.txt<\/td>\n<td>2112K<\/td>\n<td>7512K<\/td>\n<td>7616K<\/td>\n<\/tr>\n<tr>\n<td>vbig.txt<\/td>\n<td>2288K<\/td>\n<td>42620K<\/td>\n<td>7712K<\/td>\n<\/tr>\n<tr>\n<td>vbig2.txt<\/td>\n<td>2360K<\/td>\n<td>50860K<\/td>\n<td>7736K<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Is the Haskell <code>wc<\/code> faster? For big files, absolutely, but then it is using threads. For small files, GNU&#8217;s coreutils still beats the competition. At this stage my version is very likely IO bound, and it\u2019s fast enough anyway.<\/p>\n<p>I\u2019ll not claim that one language is faster than another. If you spend a chunk of time on optimizing a micro-benchmark, you are likely going to beat the competition. That\u2019s not real life. <strong>But I will claim that functional programming in D gives functional programming in Haskell a run for its money.<\/strong><\/p>\n<h2 id=\"abitaboutranges\">A Bit About Ranges<\/h2>\n<p><img loading=\"lazy\" class=\"alignleft size-full wp-image-181\" src=\"http:\/\/dlang.org\/blog\/wp-content\/uploads\/2016\/08\/d6.png\" alt=\"Digital Mars D logo\" width=\"200\" height=\"200\" \/>A range is an abstraction that you can consume through iteration without consuming the underlying collection (if there is one). Technically, a range can be a struct or a class that adheres to one of a handful of <code>Range<\/code> interfaces. The most basic form, <a href=\"https:\/\/dlang.org\/phobos\/std_range_primitives.html#isInputRange\">the <code>InputRange<\/code><\/a>, requires the function<\/p>\n<pre class=\"prettyprint lang-d\">void popFront();<\/pre>\n<p>and two members or properties:<\/p>\n<pre class=\"prettyprint lang-d\">T front;\r\nbool empty;<\/pre>\n<p><code>T<\/code> is the generic type of the elements the range iterates.<\/p>\n<p>In D, ranges are special in a way that other objects are not. When a range is given to a <code>foreach<\/code> statement, the compiler does a little rewrite.<\/p>\n<pre class=\"prettyprint lang-d\">foreach (e; range) { ... }<\/pre>\n<p>is rewritten to<\/p>\n<pre class=\"prettyprint lang-d\">for (auto __r = range; !__r.empty; __r.popFront()) {\r\n    auto e = __r.front;\r\n    ...\r\n}<\/pre>\n<p><code>auto e =<\/code> infers the type and is equivalent to <code>T e =<\/code>.<\/p>\n<p>Given this knowledge, building a range that can be used by <code>foreach<\/code> is easy.<\/p>\n<pre class=\"prettyprint lang-d\">struct Iota {\r\n\tint front;\r\n\tint end;\r\n\r\n\t@property bool empty() const {\r\n\t\treturn this.front == this.end;\r\n\t}\r\n\r\n\tvoid popFront() {\r\n\t\t++this.front;\r\n\t}\r\n}\r\n\r\nunittest {\r\n\timport std.stdio;\r\n\tforeach(it; Iota(0, 10)) {\r\n\t\twriteln(it);\r\n\t}\r\n}<\/pre>\n<p><code>Iota<\/code> is a very simple range. It functions as a generator, having no underlying collection. It iterates integers from front to end in steps of one. This snippet introduces a little bit of D syntax.<\/p>\n<pre class=\"prettyprint lang-d\">@property bool empty() const {<\/pre>\n<p>The <code>@property<\/code> attribute allows us to use the function <code>empty<\/code> the same way as a member variable (calling the function without the parenthesis). The trailing <code>const<\/code> means that we don\u2019t modify any data of the instance we call <code>empty<\/code> on. The built-in unit test prints the numbers <code>0<\/code> to <code>10<\/code>.<\/p>\n<p>Another small feature is the lack of an explicit constructor. The struct <code>Iota<\/code> has two member variables of type <code>int<\/code>. In the <code>foreach<\/code> statement in the test, we create an <code>Iota<\/code> instance as if it had a constructor that takes two <code>int<\/code>s. This is <a href=\"https:\/\/dlang.org\/spec\/struct.html#struct-literal\">a struct literal<\/a>. When the D compiler sees this, and the struct has no matching constructor, the <code>int<\/code>s will be assigned to the struct\u2019s member variables from top to bottom in the order of declaration.<\/p>\n<p>The relation between the three members is really simple. If <code>empty<\/code> is <code>false<\/code>, <code>front<\/code> is guaranteed to return a different element, the next one in the iteration, after a call to <code>popFront<\/code>. After calling <code>popFront<\/code> the value of <code>empty<\/code> might have changed. If it is <code>true<\/code>, this means there are no more elements to iterate and any further calls to <code>front<\/code> are not valid. According to <a href=\"https:\/\/dlang.org\/phobos\/std_range_primitives.html#isInputRange\">the <code>InputRange<\/code> documentation<\/a>:<\/p>\n<ul>\n<li><code>front<\/code> can be legally evaluated if and only if evaluating <code>empty<\/code> has, or would have, equaled <code>false<\/code>.<\/li>\n<li><code>front<\/code> can be evaluated multiple times without calling <code>popFront<\/code> or otherwise mutating the range object or the underlying data, and it yields the same result for every evaluation.<\/li>\n<\/ul>\n<p>Now, using <code>foreach<\/code> statements, or loops in general, is not really functional in my book. Lets say we want to filter all uneven numbers of the <code>Iota<\/code> range. We could put an <code>if<\/code> inside the <code>foreach<\/code> block, but that would only make it worse. It would be nicer if we had a range that takes a range and a predicate that can decide if an element is okay to pass along or not.<\/p>\n<pre class=\"prettyprint lang-d\">struct Filter {\r\n\tIota input;\r\n\tbool function(int) predicate;\r\n\r\n\tthis(Iota input, bool function(int) predicate) {\r\n\t\tthis.input = input;\r\n\t\tthis.predicate = predicate;\r\n\t\tthis.testAndIterate();\r\n\t}\r\n\r\n\tvoid testAndIterate() {\r\n\t\twhile(!this.input.empty\r\n\t\t\t\t&amp;&amp; !this.predicate(this.input.front))\r\n\t\t{\r\n\t\t\tthis.input.popFront();\r\n\t\t}\r\n\t}\r\n\r\n\tvoid popFront() {\r\n\t\tthis.input.popFront();\r\n\t\tthis.testAndIterate();\r\n\t}\r\n\r\n\t@property int front() {\r\n\t\treturn this.input.front;\r\n\t}\r\n\r\n\t@property bool empty() const {\r\n\t\treturn this.input.empty;\r\n\t}\r\n}\r\n\r\nbool isEven(int a) {\r\n\treturn a % 2 == 0;\r\n}\r\n\r\nunittest {\r\n\tforeach(it; Filter(Iota(0,10), &amp;isEven)) {\r\n\t\twriteln(it);\r\n\t}\r\n}<\/pre>\n<p><code>Filter<\/code> is again really simple: it takes one <code>Iota<\/code> and a function pointer. On construction of <code>Filter<\/code>, we call <code>testAndIterate<\/code>, which pops elements from <code>Iota<\/code> until it is either empty or the predicate returns <code>false<\/code>. The idea is that the passed predicate decides what to filter out and what to keep. The properties <code>front<\/code> and <code>empty<\/code> just forward to <code>Iota<\/code>. The only thing that actually does any work is <code>popFront<\/code>. It pops the current element and calls <code>testAndIterate<\/code>. That\u2019s it. That\u2019s an implementation of filter.<\/p>\n<p>Sure, there is a new <code>while<\/code> loop in <code>testAndIterate<\/code>, but rewriting that with recursion is just silly, in my opinion. What makes D great is that you can use the right tool for the job. Functional programming is fine and dandy a lot of the time, but sometimes it\u2019s not. If a bit of inline assembly would be necessary or nicer, use that.<\/p>\n<p>The call to <code>Filter<\/code> still does not look very nice. Assuming, you are used to reading from left to right, <code>Filter<\/code> comes before <code>Iota<\/code>, even though it is executed after <code>Iota<\/code>. D has no pipe operator, but it does have <a href=\"https:\/\/tour.dlang.org\/tour\/en\/gems\/uniform-function-call-syntax-ufcs\">Uniform Function Call Syntax (UFCS)<\/a>. If an expression can be implicitly converted to the first parameter of a function, the function can be called like it is a member function of the type of the expression. That&#8217;s a lot of words, I know. An example helps:<\/p>\n<pre class=\"prettyprint lang-d\">string foo(string a) {\r\n\treturn a ~ \"World\";\r\n}\r\n\r\nunittest {\r\n\tstring a = foo(\"Hello \");\r\n\tstring b = \"Hello \".foo();\r\n\tassert(a == b);\r\n}<\/pre>\n<p>The above example shows two calls to the function <code>foo<\/code>. As the <code>assert<\/code> indicates, both calls are equivalent. What does that mean for our Iota Filter example? UFCS allows us to rewrite the unit test to:<\/p>\n<pre class=\"prettyprint lang-d\">unittest {\r\n\tforeach(it; Iota(1,10).Filter(&amp;isEven)) {\r\n\t\twriteln(it);\r\n\t}\r\n}<\/pre>\n<p>Implementing a map\/transform range should now be possible for every reader. Sure, <code>Filter<\/code> can be made more abstract through <a href=\"https:\/\/dlang.org\/spec\/template.html\">the use of templates<\/a>, but that\u2019s just work, nothing conceptually new.<\/p>\n<p>Of course, there are different kinds of ranges, <a href=\"https:\/\/dlang.org\/phobos\/std_range_primitives.html#isBidirectionalRange\">like a bidirectional range<\/a>. Guess what that allows you to do. A small tip: a bidirectional range has two new primitives called <code>back<\/code> and <code>popBack<\/code>. There are other range types as well, but after you understand the input range demonstrated twice above, you pretty much know them all.<\/p>\n<p>P.S. Just to be clear, do not implement your own <code>filter<\/code>, <code>map<\/code>, or <code>fold<\/code>; the D standard library Phobos has everything you every need. Have a look at <a href=\"https:\/\/dlang.org\/phobos\/std_algorithm.html\"><code>std.algorithm<\/code><\/a> and <a href=\"https:\/\/dlang.org\/phobos\/std_range.html\"><code>std.range<\/code><\/a>.<\/p>\n<h2 id=\"abouttheauthor\">About the Author<\/h2>\n<p>Robert Schadek received a master&#8217;s degree in Computer Science at the University of Oldenburg. His master thesis was titled \u201cDMCD A Distributed Multithreading Caching D Compiler\u201d where he work on building a D compiler from scratch. He was a computer science PhD student from 2012\u20132018 at the University of Oldenburg. His PhD research focuses on quorum systems in combination with graphs. Since 2018 he is happily using D in his day job working for Symmetry Investments.<\/p>\n<h3 id=\"whatissymmetryinvestments\">What is Symmetry Investments?<\/h3>\n<p>Symmetry Investments is a global investment company with offices in Hong Kong, Singapore and London. We have been in business since 2014 after successfully spinning off from a major New York-based hedge fund.<\/p>\n<p>At Symmetry, we seek to engage in intelligent risk-taking to create value for our clients, partners and employees. We derive our edge from our capacity to generate Win-Wins \u2013 in the broadest sense. Win-Win is our fundamental ethical and strategic principle. By generating Win-Wins, we can create unique solutions that reconcile perspectives that are usually seen as incompatible or opposites, and encompass the best that each side has to offer. We integrate fixed-income arbitrage with global macro strategies in a novel way. We invent and develop technology that focuses on the potential of human-machine integration. We build systems where machines do what they do best, supporting people to do what people do best. We are creating a collaborative meritocracy: a culture where individual contribution serves both personal and collective goals \u2013 and is rewarded accordingly. We value both ownership thinking AND cooperative team spirit, self-realisation AND community.<\/p>\n<p>People at Symmetry Investments have been active participants in the D community since 2014. We have sponsored the development of <a href=\"https:\/\/github.com\/symmetryinvestments\/excel-d\">excel-d<\/a>, <a href=\"https:\/\/github.com\/atilaneves\/dpp\">dpp<\/a>, <a href=\"https:\/\/github.com\/symmetryinvestments\/autowrap\">autowrap<\/a>, <a href=\"https:\/\/github.com\/libmir\">libmir<\/a>, and various other projects. We started Symmetry Autumn of Code in 2018 and hosted DConf 2019 in London.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After reading \u201cBeating C With 80 Lines Of Haskell: Wc\u201d, which I found on Hacker News, I thought D could do better. So I wrote a wc in D. The Program It consists of one file and has 34 lines and 712 characters. import std.stdio : writefln, File; import std.algorithm : map, fold, splitter; import [&hellip;]<\/p>\n","protected":false},"author":38,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,9,20,30],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2303"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/38"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=2303"}],"version-history":[{"count":3,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2303\/revisions"}],"predecessor-version":[{"id":2306,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2303\/revisions\/2306"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=2303"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=2303"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=2303"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}