Performance and Concrete Types in Julia

There are many reasons that I like the Julia language. One of these reasons is the simple fact that Julia is very fast. But even though Julia is very fast right out of the box, it’s also worth mentioning that there are a few insights that can greatly improve the speed and efficiency of your code. I’ll discuss one of these here; but first, here’s a bit of essential background on Julia’s type system.

Notes on Types

Julia is a dynamic language with a very interesting type system. Of particular interest for us here is that the type system is hierarchical, with the highest (i.e., most general) level being Abstract types. For instance, in terms of numeric types, the most general type in Julia is the type Number. Below Number in the type hierarchy there are other abstract types, including Real and Integer. As we continue farther down the type hierarchy, we encounter the concrete types, such as Float64 and Int.

If you’re interested, you can investigate Julia’s type hierarchy using the <: infix operator, which tests whether the argument on the left of the operator is a sub-type of the argument on the right. For instance:

 
julia> Complex <: Number     
  true
 
julia> Real <: Number     
  true
 
julia> String <: Any     
  true
 
julia> UTF8String <: AbstractString     
  true
 
julia> Bool <: Int     
  false

Julia’s type system gives you a great deal of flexibility and control. And with respect to performance, it can also be used to help the compiler generate very fast native code.

Like many languages, notably Scala and Haskell, Julia’s compiler does type inference. One feature of this is that you can write a function such as:


function cube(x)
    res = x^3
    return res 
end

You’ll note in the code above that we don’t do any type annotation. The code looks a lot like what you’d write in R, Python, or Matlab. We don’t have to tell the compiler the type of res and neither do we tell it the type of x. And yet, if we were to run cube(17), the compiler knows to generate native code with the expectation that res will be of type Int.

Now, had we wanted to, we could re-write our function such that it specifies the types of x and res. So, something like this:


function cube2(x::Int)
    res::Int = x^3
    return res 
end 

But that turns out to be unnecessary in this case; the original cube() function and cube2() will behave completely identically. If you’re skeptical, or just curious, you can use Julia’s @code_native macro on both functions to see they generate identical assembly code. This is true because the compiler is smart enough to deduce that the fact that x being of type Int guarantees that res will also be of type Int. The functions use concrete types and are “type-stable”; the variables’ types don’t change in the body of the function, and knowing the type of the input determines the type of the output.

Functions with Concrete Types

One insight concerning performance in the Julia language is that it is generally very beneficial to write your code such that the compiler can correctly infer the type of the variables (and that those types are concrete). This allows the compiler to generate native code that is more specialized.

Consider a toy example. Suppose we want to write a function that, when given a vector of integers, it returns a dictionary with counts for how often each number occurred in the vector. Let’s give our first version of this function the highly imaginative name count_numbers1().


function count_numbers1(v)
    n = length(v)
    d = Dict()
    for i = 1:n 
        d[v[i]] = get(d, v[i], 0) + 1
    end 
    return d 
end 

Even if you haven’t seen much Julia code before, the above function should be fairly clear. We loop over every element of the vector v and increment the counts in our dictionary d. And in case it’s not immediately obvious, the get() function in Julia operates much like a dictionary’s get() method in Python. It takes three arguments: a dictionary, a key, and a default value to be used if the dictionary doesn’t contain the key.

But now suppose we want to know how well the compiler has inferred the types of the variables in count_numbers1(). We can take a look at this using the macro @code_warntype:


julia> v = rand([1, 2, 3], 1000)         # 1000 draws from vector [1, 2, 3]

julia> @code_warntype count_numbers1(v)

This produces the following output (truncated for legibility):


Variables:
  #self#::#count_numbers1
  v::Array{Int64,1}
  n@_3::Int64
  d::Dict{Any,Any}
  #temp#@_5::Int64
  i::Int64
  n@_7::Int64
  index::Int64
  #temp#@_9::Any
...

 

The output is a bit cryptic, but I’ll draw your attention to a couple of things. First, we note that the compiler has correctly inferred the types of our input vector v, as well as n, which is the length of our vector. And of course, we see that the compiler knows the type of our loop’s index i.

We also note that the compiler has created a few temporary variables. These appear with the prefix #temp#. Of note is that the compiler hasn’t inferred a concrete type for #temp#@_9, so it’s been left with the most general abstract type: Any. For this reason, it’s highlighted in red (as it appears in the REPL). This function uses abstract types, and it turns out that this is sub-optimal in terms of performance. So let’s re-write our function such that we are using concrete types and that knowing the input type gives the compiler perfect information about the output type. There are a couple of ways to do this, but we’ll use the most straight-forward (but not the most general).


function count_numbers2(v)
    n = length(v)
    d = Dict{Int, Int}()
    for i = 1:n 
        d[v[i]] = get(d, v[i], 0) + 1
    end 
    return d 
end 

 

The only change we’ve made in count_numbers2() is that now we declare d to be a dictionary where both its keys and its values are of type Int. And if we examine the output of the @code_warntype macro, we can confirm that the compiler has inferred the types of all the variables and they are concrete.


julia> @code_warntype count_numbers2(v)

  Variables:
    #self#::#count_numbers2
    v::Array{Int64,1}
    n@_3::Int64
    d::Dict{Int64,Int64}
    #temp#@_5::Int64
    i::Int64
    n@_7::Int64
    index::Int64
    #temp#@_9::Int64

Now let’s examine the performance difference between our two functions. Let’s look at our non-concrete-type version first; we’ll run it a few times on vectors of length 100,000, 1 million, and 10 million.


julia> v = rand([1, 2, 3], 100_000);

julia> @time count_numbers1(v)
  0.014738 seconds (196.94 k allocations: 3.006 MB)
Dict{Any,Any} with 3 entries:
  2 => 33402
  3 => 33385
  1 => 33213

julia> v = rand([1, 2, 3], 1_000_000);

julia> @time count_numbers1(v)
  0.136329 seconds (2.00 M allocations: 30.472 MB, 1.26% gc time)
Dict{Any,Any} with 3 entries:
  2 => 333218
  3 => 333323
  1 => 333459


julia> v = rand([1, 2, 3], 10_000_000);

julia> @time count_numbers1(v)
  1.362224 seconds (20.01 M allocations: 305.813 MB, 4.78% gc time)
Dict{Any,Any} with 3 entries:
  2 => 3333486
  3 => 3330745
  1 => 3335769

As we can see from the output above, the non-concrete-type version of the function seems to scale linearly (or a roughly so) with its input size in terms of its run time, number of allocations, and memory consumption. Now let’s take a look at our concrete-type version.


julia> v = rand([1, 2, 3], 100_000);

julia> @time count_numbers2(v)
  0.004238 seconds (8 allocations: 768 bytes)
Dict{Int64,Int64} with 3 entries:
  2 => 33303
  3 => 33233
  1 => 33464

julia> v = rand([1, 2, 3], 1_000_000);

julia> @time count_numbers2(v)
  0.037438 seconds (8 allocations: 768 bytes)
Dict{Int64,Int64} with 3 entries:
  2 => 332261
  3 => 334466
  1 => 333273

julia> v = rand([1, 2, 3], 10_000_000);

julia> @time count_numbers2(v)
  0.322777 seconds (8 allocations: 768 bytes)
Dict{Int64,Int64} with 3 entries:
  2 => 3333816
  3 => 3334099
  1 => 3332085

The above results are really striking. In terms of speed, we see that our concrete-type version of the function is about 4x faster than our non-concrete-type version (incidentally, it’s about 14x faster than Python for vectors of length 10 million). We also note that speed scales linearly (or a bit better) with input size.

But what I find fascinating is that our allocations and memory usage is actually constant with respect to input size! In our non-concrete version, our function call allocates 300 MB of memory on the vector of length 10 million; and the concrete-type version uses only 768 bytes.

This floored me when I came across it. It feels basically like magic. The lesson here seems clear: when performance is critical, do everything you can to use concrete types.

Leave a Reply

Your email address will not be published. Required fields are marked *