{"id":2579,"date":"2020-06-03T14:27:04","date_gmt":"2020-06-03T14:27:04","guid":{"rendered":"http:\/\/dlang.org\/blog\/?p=2579"},"modified":"2021-09-30T13:35:42","modified_gmt":"2021-09-30T13:35:42","slug":"a-look-at-chapel-d-and-julia-using-kernel-matrix-calculations","status":"publish","type":"post","link":"https:\/\/dlang.org\/blog\/2020\/06\/03\/a-look-at-chapel-d-and-julia-using-kernel-matrix-calculations\/","title":{"rendered":"A Look at Chapel, D, and Julia Using Kernel Matrix Calculations"},"content":{"rendered":"<h2 id=\"introduction\">Introduction<\/h2>\n<p>It seems each time you turn around there is a new programming language aimed at solving some specific problem set. Increased proliferation of programming languages and data are deeply connected in a fundamental way, and increasing demand for \u201cdata science\u201d computing is a related phenomenon. In the field of scientific computing, Chapel, D, and Julia are highly relevant programming languages. They arise from different needs and are aimed at different problem sets: Chapel focuses on data parallelism on single multi-core machines and large clusters; D was initially developed as a more productive and safer alternative to C++; Julia was developed for technical and scientific computing and aimed at getting the best of both worlds&#8212;the high performance and safety of static programming languages and the flexibility of dynamic programming languages. However, they all emphasize performance as a feature. In this article, we look at how their performance varies over kernel matrix calculations and present approaches to performance optimization and other usability features of the languages.<\/p>\n<p>Kernel matrix calculations form the basis of kernel methods in machine learning applications. They scale rather poorly&#8212;<code>O(m n^2)<\/code>, where <code>n<\/code> is the number of items and <code>m<\/code> is the number of elements in each item. In our exercsie, <code>m<\/code> will be constant and we will be looking at execution time in each implementation as <code>n<\/code> increases. Here <code>m = 784<\/code> and <code>n = 1k, 5k, 10k, 20k, 30k<\/code>, each calculation is run three times and an average is taken. We disallow any use of BLAS and only allow use of packages or modules from the standard library of each language, though in the case of D the benchmark is compared with calculations using <a href=\"https:\/\/github.com\/libmir\">Mir, a multidimensional array package<\/a>, to make sure that my matrix implementation reflects the true performance of D. The details for the calculation of the kernel matrix and kernel functions <a href=\"https:\/\/github.com\/dataPulverizer\/KernelMatrixBenchmark\/blob\/master\/docs\/kernel.pdf\">are given here<\/a>.<\/p>\n<p>While preparing the code for this article, the Chapel, D, and Julia communities were very helpful and patient with all inquiries, so they are acknowledged here.<\/p>\n<p>In terms of bias, going in I was much more familiar with D and Julia than I was with Chapel. However, getting the best performance from each language required a lot of interaction with each programming community, and I have done my best to be aware of my biases and correct for them where necessary.<\/p>\n<h2 id=\"languagebenchmarksforkernelmatrixcalculation\">Language Benchmarks for Kernel Matrix Calculation<\/h2>\n<p><img loading=\"lazy\" src=\"http:\/\/dlang.org\/blog\/wp-content\/uploads\/2020\/06\/benchplot.jpg\" alt=\"\" width=\"1400\" height=\"1000\" class=\"alignleft size-full wp-image-2582\" \/><\/p>\n<p>The above chart (generated using R&#8217;s ggplot2 <a href=\"https:\/\/github.com\/dataPulverizer\/KernelMatrixBenchmark\/blob\/master\/images\/charts.r\">using a script<\/a>) shows the performance benchmark time taken against the number of items <code>n<\/code> for Chapel, D, and Julia, for nine kernels. D performs best in five of the nine kernels, Julia performs best in two of the nine kernels, and in two of the kernels (Dot and Gaussian) the picture is mixed. Chapel was the slowest for all of the kernel functions examined.<\/p>\n<p>It is worth noting that the mathematics functions used in D were pulled from C&#8217;s math API made available in D through its <code>core.stdc.math<\/code> module because the mathematical functions in D&#8217;s standard library <a href=\"https:\/\/dlang.org\/phobos\/std_math.html\"><code>std.math<\/code><\/a> can be quite slow. The math functions used <a href=\"https:\/\/github.com\/dataPulverizer\/KernelMatrixBenchmark\/blob\/master\/d\/math.d\">are given here<\/a>. By way of comparison, <a href=\"https:\/\/github.com\/dataPulverizer\/KernelMatrixBenchmark\/blob\/master\/d\/mathdemo.d\">consider the <code>mathdemo.d<\/code><\/a> script comparing the imported C <code>log<\/code> function D&#8217;s <code>log<\/code> function from <code>std.math<\/code>:<\/p>\n<pre>$ ldc2 -O --boundscheck=off --ffast-math --mcpu=native --boundscheck=off mathdemo.d &amp;&amp; .\/mathdemo\r\nTime taken for c log: 0.324789 seconds.\r\nTime taken for d log: 2.30737 seconds.<\/pre>\n<p>The <code>Matrix<\/code> object used in the D benchmark was implemented specifically because the use of modules outside standard language libraries was disallowed. To make sure that this implementation is competitive, i.e., it does not unfairly represent D&#8217;s performance, it is compared to Mir&#8217;s ndslice library written in D. The chart below shows matrix implementation times minus ndslice times; negative means that ndslice is slower, indicating that the implementation used here does not negatively represent D&#8217;s performance.<\/p>\n<p><img loading=\"lazy\" src=\"http:\/\/dlang.org\/blog\/wp-content\/uploads\/2020\/06\/ndsliceDiagnostic.jpg\" alt=\"\" width=\"1200\" height=\"800\" class=\"alignleft size-full wp-image-2581\" \/><\/p>\n<h2 id=\"environment\">Environment<\/h2>\n<p>The code was run on a computer with an Ubuntu 20.04 OS, 32 GB memory, and an Intel\u00ae Core\u2122 i9&#8211;8950HK CPU @ 2.90GHz with 6 cores and 12 threads.<\/p>\n<pre>$ julia --version\r\njulia version 1.4.1<\/pre>\n<pre>$ dmd --version\r\nDMD64 D Compiler v2.090.1<\/pre>\n<pre>ldc2 --version\r\nLDC - the LLVM D compiler (1.18.0):\r\n  based on DMD v2.088.1 and LLVM 9.0.0<\/pre>\n<pre>$ chpl --version\r\nchpl version 1.22.0<\/pre>\n<h3 id=\"compilation\">Compilation<\/h3>\n<p>Chapel:<\/p>\n<pre>chpl script.chpl kernelmatrix.chpl --fast &amp;&amp; .\/script<\/pre>\n<p>D:<\/p>\n<pre>ldc2 script.d kernelmatrix.d arrays.d -O5 --boundscheck=off --ffast-math -mcpu=native &amp;&amp; .\/script<\/pre>\n<p>Julia (no compilation required but can be run from the command line):<\/p>\n<pre>julia script.jl<\/pre>\n<h2 id=\"implementations\">Implementations<\/h2>\n<p>Efforts were made to avoid non-standard libraries while implementing these kernel functions. The reasons for this are:<\/p>\n<ul>\n<li>To make it easy for the reader after installing the language to copy and run the code. Having to install external libraries can be a bit of a &#8220;faff&#8221;.<\/li>\n<li>Packages outside standard libraries can go extinct, so avoiding external libraries keeps the article and code relevant.<\/li>\n<li>It&#8217;s completely transparent and shows how each language works.<\/li>\n<\/ul>\n<h3 id=\"chapel\">Chapel<\/h3>\n<p>Chapel uses a <code>forall<\/code> loop to parallelize over threads. Also, C pointers to each item are used rather than the default array notation, and <code>guided<\/code> iteration over indices is used:<\/p>\n<pre class=\"prettyprint lang-chapel\">proc calculateKernelMatrix(K, data: [?D] ?T)\r\n{\r\n  var n = D.dim(0).last;\r\n  var p = D.dim(1).last;\r\n  var E: domain(2) = {D.dim(0), D.dim(0)};\r\n  var mat: [E] T;\r\n  var rowPointers: [1..n] c_ptr(T) =\r\n    forall i in 1..n do c_ptrTo(data[i, 1]);\r\n\r\n  forall j in guided(1..n by -1) {\r\n    for i in j..n {\r\n      mat[i, j] = K.kernel(rowPointers[i], rowPointers[j], p);\r\n      mat[j, i] = mat[i, j];\r\n    }\r\n  }\r\n  return mat;\r\n}<\/pre>\n<p>Chapel code was the most difficult to optimize for performance and required the highest number of code changes.<\/p>\n<h3 id=\"d\">D<\/h3>\n<p>D uses a <code>taskPool<\/code> of threads from its <code>std.parallel<\/code> package to parallelize code. The D code underwent the fewest number of changes for performance optimization&#8212;a lot of the performance benefits came from the specific compiler used and the flags selected (discussed later). My implementation of a <code>Matrix<\/code> allows columns to be selected by reference via <code>refColumnSelect<\/code>.<\/p>\n<pre class=\"prettyprint lang-d\">auto calculateKernelMatrix(alias K, T)(K!(T) kernel, Matrix!(T) data)\r\n{\r\n  long n = data.ncol;\r\n  auto mat = Matrix!(T)(n, n);\r\n\r\n  foreach(j; taskPool.parallel(iota(n)))\r\n  {\r\n    auto arrj = data.refColumnSelect(j).array;\r\n    foreach(long i; j..n)\r\n    {\r\n      mat[i, j] = kernel(data.refColumnSelect(i).array, arrj);\r\n      mat[j, i] = mat[i, j];\r\n    }\r\n  }\r\n  return mat;\r\n}<\/pre>\n<h3 id=\"julia\">Julia<\/h3>\n<p>The Julia code uses the <code>@threads<\/code> macro for parallelising the code and <code>@views<\/code> macro for referencing arrays. One confusing thing about Julia&#8217;s arrays is their reference status. Sometimes, as in this case, arrays will behave like value objects and they have to be referenced by using the <code>@views<\/code> macro, otherwise they generate copies. At other times they behave like reference objects, for example, when passing them into a function. It can be a little tricky dealing with this because you don&#8217;t always know what set of operations will generate a copy, but where this occurs <code>@views<\/code> provides a good solution.<\/p>\n<p>The <code>Symmetric<\/code> type saves the small bit of extra work needed for allocating to both sides of the matrix.<\/p>\n<pre class=\"prettyprint lang-julia\">function calculateKernelMatrix(Kernel::K, data::Array{T}) where {K &lt;: AbstractKernel,T &lt;: AbstractFloat}\r\n  n = size(data)[2]\r\n  mat = zeros(T, n, n)\r\n  @threads for j in 1:n\r\n      @views for i in j:n\r\n          mat[i,j] = kernel(Kernel, data[:, i], data[:, j])\r\n      end\r\n  end\r\n  return Symmetric(mat, :L)\r\nend<\/pre>\n<p>The <code>@bounds<\/code> and <code>@simd<\/code> macros in the kernel functions were used to turn bounds checking off and apply SIMD optimization to the calculations:<\/p>\n<pre class=\"prettyprint lang-julia\">struct DotProduct &lt;: AbstractKernel end\r\n@inline function kernel(K::DotProduct, x::AbstractArray{T, N}, y::AbstractArray{T, N}) where {T,N}\r\n  ret = zero(T)\r\n  m = length(x)\r\n  @inbounds @simd for k in 1:m\r\n      ret += x[k] * y[k]\r\n  end\r\n  return ret\r\nend<\/pre>\n<p>These optimizations are quite visible but very easy to apply.<\/p>\n<h2 id=\"memoryusage\">Memory Usage<\/h2>\n<p>The total time for each benchmark and the total memory used was captured using the <code>\/usr\/bin\/time -v<\/code> command. The output for each of the languages is given below.<\/p>\n<p>Chapel took the longest total time but consumed the least amount of memory (nearly 6GB RAM peak memory):<\/p>\n<pre>Command being timed: \".\/script\"\r\n\tUser time (seconds): 113190.32\r\n\tSystem time (seconds): 6.57\r\n\tPercent of CPU this job got: 1196%\r\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 2:37:39\r\n\tAverage shared text size (kbytes): 0\r\n\tAverage unshared data size (kbytes): 0\r\n\tAverage stack size (kbytes): 0\r\n\tAverage total size (kbytes): 0\r\n\tMaximum resident set size (kbytes): 5761116\r\n\tAverage resident set size (kbytes): 0\r\n\tMajor (requiring I\/O) page faults: 0\r\n\tMinor (reclaiming a frame) page faults: 1439306\r\n\tVoluntary context switches: 653\r\n\tInvoluntary context switches: 1374820\r\n\tSwaps: 0\r\n\tFile system inputs: 0\r\n\tFile system outputs: 8\r\n\tSocket messages sent: 0\r\n\tSocket messages received: 0\r\n\tSignals delivered: 0\r\n\tPage size (bytes): 4096\r\n\tExit status: 0<\/pre>\n<p>D consumed the highest amount of memory (around 20GB RAM peak memory) but took less total time than Chapel to execute:<\/p>\n<pre>Command being timed: \".\/script\"\r\n\tUser time (seconds): 106065.71\r\n\tSystem time (seconds): 58.56\r\n\tPercent of CPU this job got: 1191%\r\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 2:28:29\r\n\tAverage shared text size (kbytes): 0\r\n\tAverage unshared data size (kbytes): 0\r\n\tAverage stack size (kbytes): 0\r\n\tAverage total size (kbytes): 0\r\n\tMaximum resident set size (kbytes): 20578840\r\n\tAverage resident set size (kbytes): 0\r\n\tMajor (requiring I\/O) page faults: 0\r\n\tMinor (reclaiming a frame) page faults: 18249033\r\n\tVoluntary context switches: 3833\r\n\tInvoluntary context switches: 1782832\r\n\tSwaps: 0\r\n\tFile system inputs: 0\r\n\tFile system outputs: 8\r\n\tSocket messages sent: 0\r\n\tSocket messages received: 0\r\n\tSignals delivered: 0\r\n\tPage size (bytes): 4096\r\n\tExit status: 0<\/pre>\n<p>Julia consumed a moderate amount of memory (around 7.5 GB peak memory) but ran the quickest&#8212;probably because its random number generator is the fastest:<\/p>\n<pre>Command being timed: \"julia script.jl\"\r\n\tUser time (seconds): 49794.85\r\n\tSystem time (seconds): 30.58\r\n\tPercent of CPU this job got: 726%\r\n\tElapsed (wall clock) time (h:mm:ss or m:ss): 1:54:18\r\n\tAverage shared text size (kbytes): 0\r\n\tAverage unshared data size (kbytes): 0\r\n\tAverage stack size (kbytes): 0\r\n\tAverage total size (kbytes): 0\r\n\tMaximum resident set size (kbytes): 7496184\r\n\tAverage resident set size (kbytes): 0\r\n\tMajor (requiring I\/O) page faults: 794\r\n\tMinor (reclaiming a frame) page faults: 38019472\r\n\tVoluntary context switches: 2629\r\n\tInvoluntary context switches: 523063\r\n\tSwaps: 0\r\n\tFile system inputs: 368360\r\n\tFile system outputs: 8\r\n\tSocket messages sent: 0\r\n\tSocket messages received: 0\r\n\tSignals delivered: 0\r\n\tPage size (bytes): 4096\r\n\tExit status: 0<\/pre>\n<h2 id=\"performanceoptimization\">Performance optimization<\/h2>\n<p>The process of performance optimization in all three languages was very different, and all three communities were very helpful in the process. But there were some common themes.<\/p>\n<ul>\n<li>Static dispatching of kernel functions instead of using polymorphism. This means that when passing the kernel function, use parametric (static compile time) polymorphism rather than runtime (dynamic) polymorphism where dispatch with virtual functions carries a performance penalty.<\/li>\n<li>Using views\/references rather than copying data over multiple threads makes a big difference.<\/li>\n<li>Parallelising the calculations makes a huge difference.<\/li>\n<li>Knowing if your array is row\/column major and using that in your calculation makes a huge difference.<\/li>\n<li>Bounds checks and compiler optimizations make a tremendous difference, especially in Chapel and D.<\/li>\n<li>Enabling SIMD in D and Julia made a contribution to the performance. In D this was done using the <code>-mcpu=native<\/code> flag, and in Julia this was done using the <code>@simd<\/code> macro.<\/li>\n<\/ul>\n<p>In terms of language-specific issues, getting to performant code in Chapel was the most challenging, and the Chapel code changed the most from easy-to-read array operations to using pointers and guided iterations. But on the compiler side it was relatively easy to add <code>--fast<\/code> and get a large performance boost.<\/p>\n<p>The D code changed very little, and most of the performance was gained by the choice of compiler and its optimization flags. D\u2019s LDC compiler is rich in terms of options for performance optimization. It has 8 <code>-O<\/code> optimization levels, but some are repetitions of others. For instance, <code>-O<\/code>, <code>-O3<\/code>, and <code>-O5<\/code> are identical, and there are myriad other flags that affect performance in various ways. In this case the flags used were <code>-O5 --boundscheck=off \u2013ffast-math<\/code>, representing aggressive compiler optimizations, bounds checking, and LLVM\u2019s fast-math, and <code>-mcpu=native<\/code> to enable CPU vectorization instructions.<\/p>\n<p>In Julia the macro changes discussed previously markedly improved the performance, but they were not too intrusive. I tried changing the optimization <code>-O<\/code> level, but this did not improve performance.<\/p>\n<h2 id=\"qualityoflife\">Quality of life<\/h2>\n<p>This section examines the relative pros and cons around the convenience and ease of use of each language. People underestimate the effort it takes to use a language day-to-day; the support and infrastructure required is significant, so it is worth comparing various facets of each language. Readers seeking to avoid the TLDR should scroll to the end of this section for the table comparing the language features discussed here. Every effort has been made to be as objective as possible, but comparing programming languages is difficult, bias prone, and contentious, so read this section with that in mind. Some elements looked at, such as arrays, are from the \u201cdata science\u201d\/technical\/scientific computing point of view, and others are more general.<\/p>\n<h3 id=\"interactivity\">Interactivity<\/h3>\n<p>Programmers want a fast code\/compile\/result loop during development to quickly observe results and outputs in order to make progress or necessary changes. Julia\u2019s interpreter is hands down the best for this and offers a smooth and feature-rich development experience, and D comes a close second. This code\/compile\/result loop in compilers can be slow even when compiling small code volumes. D has three compilers, the standard DMD compiler, the LLVM-based LDC compiler, and the GCC-based GDC. In this development process, the DMD and LDC compilers were used. DMD has <strong>very<\/strong> fast compilation times which is great for development. The LDC compiler is great at creating <strong>fast<\/strong> code. Chapel&#8217;s compiler is very slow in comparison. To give an example, running Linux\u2019s <code>time<\/code> command on DMD vs Chapel\u2019s compiler for the kernel matrix code with no optimizations gives us for D:<\/p>\n<pre>real\t0m0.545s\r\nuser\t0m0.447s\r\nsys\t0m0.101s<\/pre>\n<p>Compared with Chapel:<\/p>\n<pre>real\t0m5.980s\r\nuser\t0m5.787s\r\nsys\t0m0.206s<\/pre>\n<p>That\u2019s a large actual and <em>psychological<\/em> difference, it can make programmers reluctant to check their work and delay the development loop if they have to wait for outputs, especially when source code increases in volume and compilation times become significant.<\/p>\n<p>It is worth mentioning, however, that when developing packages in Julia, compilation times can be very long, and users have noticed that when they load some packages ,compilation times can stretch. So the experience of the development loop in Julia could vary, but in this specific case the process was seamless.<\/p>\n<h3 id=\"documentationandexamples\">Documentation and examples<\/h3>\n<p>One way of comparing documentation in the different languages is to compare them all with Python\u2019s official documentation, which is <em>the<\/em> gold standard for programming languages. It combines examples with formal definitions and tutorials in a seamless and user-friendly way. Since many programmers are familiar with the Python documentation, this approach gives an idea of how they compare.<\/p>\n<p>Julia\u2019s documentation is the closest to Python\u2019s documentation quality and gives the user a very smooth, detailed, and relatively painless transition into the language. It also has a rich ecosystem of blogs, and topics on many aspects of the language are easy to come by. D\u2019s official documentation is not as good and can be challenging and frustrating, however there is a <em>very<\/em> good free book <a href=\"https:\/\/wiki.dlang.org\/Books\">\u201cProgramming in D\u201d<\/a> which is a great introduction to the language, but no single book can cover a programming language and there are not many sources for advanced topics. Chapel\u2019s documentation is quite good for getting things done, though examples vary in presence and quality. Often, the programmer needs a lot of knowledge to look in the right place. A good topic for comparison is file I\/O libraries in Chapel, D, and Julia. Chapel\u2019s I\/O library has too few examples but is relatively clear and straightforward; D\u2019s I\/O is kind of spread across a few modules, and documentation is more difficult to follow; Julia\u2019s I\/O documentation has lots of examples and is clear and easy to follow.<\/p>\n<p>Perhaps one factor affecting Chapel\u2019s adoption is lack of example&#8212;since its arrays have a non-standard interface, the user has to work hard to become familiar with them. Whereas even though D\u2019s documentation may not be as good in places, the language has many similarities to C and C++, so it gets away with more sparse documentation.<\/p>\n<h3 id=\"multi-dimensionalarraysupport\">Multi-dimensional Array support<\/h3>\n<p>\u201cArrays\u201d here does not refer to native C and C++ style arrays available in D, but mathematical arrays. Julia and Chapel ship with array support and D does not, but <a href=\"http:\/\/mir-algorithm.libmir.org\/\">it has the Mir library<\/a> which has multidimensional arrays (ndslice). In the implementation of kernel matrix, I wrote my own matrix object in D, which is not difficult if you understand the principle, but it&#8217;s not something a user wants to do. However, D has <a href=\"https:\/\/github.com\/kaleidicassociates\/lubeck\">a linear algebra library called Lubeck<\/a> which has impressive performance characteristics and interfaces with all the usual BLAS implementations. Julia\u2019s arrays are by far the easiest and most familiar. Chapel&#8217;s arrays are more difficult to get started with than Julia\u2019s but are designed to be run on single-core, multi-core, and computer clusters using the same or very similar code, which is a good unique selling point.<\/p>\n<h3 id=\"languagepower\">Language power<\/h3>\n<p>Since Julia is a dynamic programming language, some might say, \u201cwell Julia is a dynamic language which is far more permissive than static programming languages, therefore the debate is over\u201d, but it\u2019s more complicated than that. There is power in static type systems. Julia has a type system similar in nature to type systems from static languages, so you can write code as if you were using a static language, but you can do things reserved only for dynamic languages. It has a highly developed generic and meta-programming syntax and powerful macros. It also has a highly flexible object system and multiple dispatch. This mix of features is what makes Julia the most powerful language of the three.<\/p>\n<p>D was intended to be a replacement for C++ and takes very much after C++ (and also borrows from Java), but makes template programming and compile-time evaluation much more user-friendly than in C++. It is a single dispatch language (though multi-methods are available in a package). Instead of macros, D has string and template \u201cmixins\u201d which serve a similar purpose.<\/p>\n<p>Chapel has generic programming support and nascent support for single dispatch OOP, no macro support, and is not yet as mature as D or Julia in these terms.<\/p>\n<h3 id=\"concurrencyparallelprogramming\">Concurrency &amp; Parallel Programming<\/h3>\n<p>Nowadays, new languages tout support for concurrency and its popular subset, parallelism, but the details vary a lot between languages. Parallelism is more relevant in this example and all three languages deliver. Writing parallel for loops is straightforward in all three languages.<\/p>\n<p>Chapel\u2019s concurrency model has much more emphasis on data parallelism but has tools for task parallelism and ships with support for cluster-based concurrency.<\/p>\n<p>Julia has good support for both concurrency and parallelism.<\/p>\n<p>D has industry strength support for parallelism and concurrency, though its support for threading is much less well documented with examples.<\/p>\n<h3 id=\"standardlibrary\">Standard Library<\/h3>\n<p>How good is the standard library of all three languages in general? What range of tasks do they allow users to easily tend to? It\u2019s a tough question because library quality and documentation factor in. All three languages have very good standard libraries. D has the most comprehensive standard library, but Julia is a great second, then Chapel, but things are never that simple. For example, a user seeking to write binary I\/O may find Julia the easiest to start with; it has the most straightforward, clear interface and documentation, followed by Chapel, and then D. Though in my implementation of an IDX file reader, D\u2019s I\/O was the fastest, but then Julia code was easy to write for cases unavailable in the other two languages.<\/p>\n<h3 id=\"packagemanagerspackageecosystems\">Package Managers &amp; Package Ecosystems<\/h3>\n<p>In terms of documentation, usage, and features, D\u2019s Dub package manager is the most comprehensive. D also has a rich package ecosystem <a href=\"https:\/\/code.dlang.org\/\">in the Dub website<\/a>, Julia\u2019s package manager runs tightly integrated with GitHub and is a good package system with good documentation. Chapel has a package manager but does not have a highly developed package ecosystem.<\/p>\n<h3 id=\"cintegration\">C Integration<\/h3>\n<p>C interop is easy in all three languages; Chapel has good documentation but is not as well popularised as the others. D\u2019s documentation is better and Julia\u2019s documentation is the most comprehensive. Oddly enough though, none of the languages&#8217; documentation show the commands required to compile your own C code and integrate it with the language, which is an oversight especially when it comes to novices. It is, however, easy to search for and find examples for the compilation process in D and Julia.<\/p>\n<h3 id=\"community\">Community<\/h3>\n<p>All three languages have convenient places where users can ask questions. For Chapel, the easiest place is Gitter, for Julia it\u2019s Discourse (though there is a Julia Gitter), and for D it\u2019s the official website forum. The Julia community is the most active, followed by D, and then Chapel. I\u2019ve found that you\u2019ll get good responses from all three communities, but you\u2019ll probably get quicker answers from the D and Julia communities.<\/p>\n<table>\n<colgroup>\n<col \/>\n<col style=\"text-align:center\" \/>\n<col style=\"text-align:center\" \/>\n<col style=\"text-align:right\" \/>\n<\/colgroup>\n<thead>\n<tr>\n<th>        <\/th>\n<th style=\"text-align:center\"> Chapel  <\/th>\n<th style=\"text-align:center\"> D         <\/th>\n<th style=\"text-align:right\"> Julia <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td> Compilation\/Interactivty <\/td>\n<td style=\"text-align:center\"> Slow   <\/td>\n<td style=\"text-align:center\"> Fast        <\/td>\n<td style=\"text-align:right\"> Best  <\/td>\n<\/tr>\n<tr>\n<td> Documentation &amp; Examples <\/td>\n<td style=\"text-align:center\"> Detailed  <\/td>\n<td style=\"text-align:center\"> Patchy        <\/td>\n<td style=\"text-align:right\"> Best  <\/td>\n<\/tr>\n<tr>\n<td> Multi-dimensional Arrays <\/td>\n<td style=\"text-align:center\"> Yes   <\/td>\n<td style=\"text-align:center\"> Native Only <br \/>(library support) <\/td>\n<td style=\"text-align:right\"> Yes  <\/td>\n<\/tr>\n<tr>\n<td> Language Power    <\/td>\n<td style=\"text-align:center\"> Good   <\/td>\n<td style=\"text-align:center\"> Great        <\/td>\n<td style=\"text-align:right\"> Best  <\/td>\n<\/tr>\n<tr>\n<td> Concurrency &amp; Parallelism <\/td>\n<td style=\"text-align:center\"> Great   <\/td>\n<td style=\"text-align:center\"> Great        <\/td>\n<td style=\"text-align:right\"> Good  <\/td>\n<\/tr>\n<tr>\n<td> Standard Library   <\/td>\n<td style=\"text-align:center\"> Good   <\/td>\n<td style=\"text-align:center\"> Great        <\/td>\n<td style=\"text-align:right\"> Great <\/td>\n<\/tr>\n<tr>\n<td> Package Manager &amp; Ecosystem <\/td>\n<td style=\"text-align:center\"> Nascent  <\/td>\n<td style=\"text-align:center\"> Best        <\/td>\n<td style=\"text-align:right\"> Great <\/td>\n<\/tr>\n<tr>\n<td> C Integration    <\/td>\n<td style=\"text-align:center\"> Great   <\/td>\n<td style=\"text-align:center\"> Great        <\/td>\n<td style=\"text-align:right\"> Great <\/td>\n<\/tr>\n<tr>\n<td> Community     <\/td>\n<td style=\"text-align:center\"> Small   <\/td>\n<td style=\"text-align:center\"> Vibrant        <\/td>\n<td style=\"text-align:right\"> Largest <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Table for quality of life features in Chapel, D &amp; Julia<\/p>\n<h2 id=\"summary\">Summary<\/h2>\n<p>If you are a novice programmer writing numerical algorithms and doing calculations based in scientific computing and want a fast language that&#8217;s easy to use, Julia is your best bet. If you are an experienced programmer working in the same space, Julia is still a great option. If you specifically want a more conventional, &#8220;industrial strength&#8221;, statically compiled, high-performance language with all the &#8220;bells and whistles&#8221;, but want something more productive, safer, and less painful than C++, then D is your best bet. You can write &#8220;anything&#8221; in D and get great performance from its compilers. If you need to get array calculations happening on clusters, then Chapel is probably the easiest place to go.<\/p>\n<p>In terms of raw performance on this task, D was the winner, clearly performing better in 5 out of the 9 kernels benchmarked. This exercise reveals that Julia&#8217;s label as a high-performance language is more than just hype&#8212;it has held it&#8217;s own against highly competitive languages. It was harder than expected to get competitive performance from Chapel&#8212;it took a lot of investigation from the Chapel team to come up with the current solution. However, as the Chapel language matures we could see further improvement.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction It seems each time you turn around there is a new programming language aimed at solving some specific problem set. Increased proliferation of programming languages and data are deeply connected in a fundamental way, and increasing demand for \u201cdata science\u201d computing is a related phenomenon. In the field of scientific computing, Chapel, D, and [&hellip;]<\/p>\n","protected":false},"author":40,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[26,9],"tags":[],"_links":{"self":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2579"}],"collection":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/comments?post=2579"}],"version-history":[{"count":4,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2579\/revisions"}],"predecessor-version":[{"id":2585,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/posts\/2579\/revisions\/2585"}],"wp:attachment":[{"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/media?parent=2579"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/categories?post=2579"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dlang.org\/blog\/wp-json\/wp\/v2\/tags?post=2579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}