Recently, I have been working on a very interesting and sophisticated project called
typerep-map. A lot of advanced features and tricks were used during the development process and I have discovered many amusing and new sides of Haskell. So, I decided to share the ideas, steps, issues, etc. in this blog post.
If you want to skip all the funny parts, here is the link to the code itself:
What it’s all about🔗
The basic idea behind
typerep-map is to have a data structure like
Map, but where types serve as keys, and values stored in the map are of the type specified in the corresponding key.
An example image of this structure:
There can be only one key-value pair for each type.
And here is an example written in pseudo-code for better understanding:
We also want our values to be indexed by a polymorphic type, but that will be explained later.
There already exist some libraries that implement ideas similar to
type-mapappears to resemble our project, however the interface is different. They track the elements in the types and don’t provide the desired parametrization.
dependent-mapis closer to our goal in terms of the interface but the package has a complete reimplementation of
Data.Map.Lazy, and the goal of the
typerep-mapproject is to have an efficient implementation based on primitive unboxed arrays.
You might wonder what
typerep-map brings to the table if there are other packages that aim to fulfil the same purpose. The primary goal is to use it in the
caps library instead of the
DMap type from
dependent-map parametrized by
caps the performance of lookups is extremely important so it makes sense to prioritize its performance above that of other functions.
Sections below describe the details of the implementation phases and the general concepts.
NOTE: in this blog post I am talking about
The code snippets in this blog post assume that the following extensions are enabled:
The reference implementation is more or less straightforward. It uses a lazy
containers as an internal representation of the desired data type.
Normally, types in Haskell are only present at compile-time: after type-checking, they are completely erased from the program. And yet we want to use types as keys in a map. This requires a runtime representation for types. Luckily, the standard library provides a runtime representation for types in the form of
TypeRep. But there are actually two different definitions of
The one in
Type.Reflection was introduced in GHC 8.2 and the old one was rewritten to be based on it.
Type.Reflection.TypeRep has kind
TypeRep :: k -> * while the old one has kind
TypeRep :: *.
To have the basic idea of what actually
TypeRep is, you can think of the old
TypeRep as an infinite ADT with all types enumerated as tag constructors:
and the new
TypeRep is an equivalent to the infinite GADT:
If you are interested in the actual difference between old and new versions of the
TypeRep and motivation for this change, here is a nice ICFP video by Simon Peyton Jones:
I use the old
TypeRep that comes from
Data.Typeable. And I have an explanation for that: there is a limitation in regular
Map that all keys must be of the same type and this is not possible to achieve with parameterized
TypeRep. Also, the old
TypeRep will never be deprecated (from
8.2 it is just a different interface to the new
TypeRep, so it’s not obsolete), and it is sufficient for our goal to support older GHC versions.
Here is a usage example of basic
For the first prototype, I decided to use
Dynamic as values in our
So we’ve got:
and the initial interface looks like this:
When looking at the
Dynamic data type implementation
you can notice that it already stores
TypeRep inside, so it seems like it’s a bit suboptimal decision due to redundancy. And we can safely use
Any as our value type.
According to the
Dynamic implementation, we can use
unsafeCoerce function for the conversion to
Any and from
So we get:
Let’s check how it’s all working:
All right, we have a simple working version. But there are ways to improve it.
The next step is to parametrize our data type by type variable
f with kind
f :: k -> *. This
f will be the interpretation of our keys. Such parameterization allows us to encode additional structure common between all elements, making it possible to use
TypeRepMap to model a variety of things from extensible records to monadic effects. This sort of parametrization may be familiar to users of
Note that the input kind is
k — we want to support arbitrary kinds as well. Since
TypeRep is poly-kinded, the interpretation can use any kind for the keys (see some examples in documentation).
The implementation of the functions stays the same, but the types are different:
Our previous implementation is just equivalent to
TypeRepMap Identity in the context of the described design.
NOTE: Another reason to get rid of the
Dynamic: if we keep it then we have to specify
Typeable (f a)constraint instead of
Typeable ain the type declarations. And having
Typeable aconstraint would let us implement the following function efficiently:
The next step is to write an alternative implementation based on unboxed vectors, which is expected to be faster.
We want to use
Vector (TypeRep, Any) instead of our lazy map. This vector is going to be sorted by
lookup algorithms should be implemented manually in the following way:
insert: allocate a new vector of
n + 1element, copy everything from the initial vector adding the new element and don’t forget to keep the sorting.
lookup: the simple binary search.
The point of the unboxed vector is that it helps to get rid of the pointer indirection. If we take just
Vector we will observe this picture (
Ty stands for
El stands for an element):
Instead of this what we would like to see is:
In this way, as the result, the access to the
El is shorter for exactly one pointer dereference.
However, turned out that it’s more efficient to store keys and values in separate vectors under corresponding indices:
TypeRep doesn’t have the
Unbox instance and it looks like it’s not possible to write it. So instead of storing
TypeRep we will be storing a
Fingerprint is the hash for
TypeRep, so it makes sense to move in this direction.
Data.Typeable module is defined as
If we take a look at the
Ord instance of
base we’ll see that it’s confirmed that
Fingerprint is unique for each
TypeRep. That means it’s okay to use
Fingerprint as a key instead of
This is initial vector-based implementation:
We want to use unboxed vector as a type for the
fingerprints field of
Every unboxed vector is the newtype wrapper over some primitive vector. In order to use an unboxed vector of
Fingerprint we need to implement an instance of the
Prim typeclass from the
primitive package for
Fingerprint. It was proposed to add this instance under this issue in
primitive library (having this instance inside library would simplify implementation a lot):
As the reference for
Prim instance implementation, we can use the
Storable type class which contains similar functions. There is already the instance
Fingerprint. An assumption is that there is no significant difference between
Prim for our
lookup checks and we can use storable vector instead of unboxed one. For more information about the difference between those typeclasses see this SO answer.
Though our initial assumptions were false and turned out that
Storable doesn’t give the desired performance boost as shown with benchmarks.
According to the source code,
Fingerprint is a pair of
(Word64, Word64). So instead of having a single vector of
Fingerprints we can have a vector of
Fingerprint with number
i stored on
2 * i and
2 * i + 1 indices.
But actually, it’s better to split it into two separate vectors of
Word64 where one vector stores the first halves of
Fingerprint and the other one stores the second halves correspondingly. It makes the implementation easier and also faster (checked with benchmarks) because of the assumption that it should be almost always enough to compare only the first part and it makes key comparison faster.
After all described optimizations were done our structure took the following form:
lookup function was implemented like this:
It uses a manually implemented version of the binary search algorithm optimized for unboxed vectors. The algorithm initially performs a binary search using the
fingerprintAs vector only. And then, after finding the first half, walks through the
At first, a simple naive binary search was implemented but later it was rewritten into a cache-optimized binary search (see the description here) which boosted the performance significantly.
But that’s not all. Later we noticed that every vector has the following definition:
As you can see it contains two
Int fields. So we can make our representation more optimal by using
Array instead of boxed vector and
PrimArray instead of unboxed vector directly in the
TypeRepMap data type.
After all optimizations the final shape of the
TypeRepMap is following:
Initially, I was frustrated about this part because I had no idea how to create the
Map of 1000 elements as that means I needed to somehow generate 1000 types. But there was actually a pretty elegant solution for this puzzle — polymorphic recursion.
Let’s introduce the following data types as type-level natural numbers:
Using these data types we can now implement the function which builds
TypeRepMap of the desired size.
so when I run
buildBigMap with size
Proxy a, it calls itself recursively with
n - 1 and
Proxy (S a) at each step, so the types are growing on each step.
But this wasn’t the only challenge in benchmarking
TypeRepMap. There were also a few interesting things with benchmarks to keep in mind:
- We should force maps to normal form before benchmarking.
- We can’t use
TypeRepMapis not possible because there can be no
Any. We won’t be able to use
rnfbecause it would try to force both the keys and the values, as our values are
Any(can’t force them), but since evaluating the values is not important at all for the benchmark, we could try to define a function like
rnfbut without touching the values.
Map-based implementation we need to benchmark the
lookup function on different depths of our tree (as
Map is internally a tree). But the key can be very close to the root so our benchmarks won’t be honest enough. Thus we need to test on different
Proxys with different types.
Here is the diagram of how the tree’s been constructed. You can notice that the
Char element is the direct child of the root:
size: 16 tree: +--Proxy * (S (S (S (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))))) | +--Char | | | | +--Proxy * (S (S (S (S (S (S Z)))))) | | | | +--Proxy * (S (S (S (S (S (S (S (S (S (S (S Z))))))))))) | | | +--| | Proxy * (S (S (S (S (S (S (S (S (S (S (S (S (S Z))))))))))))) | | +--Proxy * (S (S (S (S (S (S (S (S (S (S Z)))))))))) | | | +--Proxy * (S (S (S (S (S (S (S (S (S Z))))))))) | | | | | +--Proxy * (S (S (S (S (S (S (S (S Z)))))))) | | +--Proxy * (S (S (S (S (S (S (S Z))))))) | | +--Proxy * (S Z) | | | +--Proxy * (S (S (S Z))) | | | | | +--Proxy * (S (S (S (S (S (S (S (S (S (S (S (S Z)))))))))))) | | +--Proxy * (S (S (S (S Z)))) | | +--Proxy * (S (S Z)) | | +--Proxy * (S (S (S (S (S Z))))) | +--Proxy * Z
Since we can’t predict how
TypeRep will behave we need to select a
Proxy from our range randomly, however, because our types were huge we introduced the following type family to solve that issue:
type family BigProxy (n :: Nat) :: * where BigProxy 0 = Z BigProxy n = S (BigProxy (n - 1))
While running this version of benchmarks it turned out that
rnf function was taking a lot of time mostly on normalisation of the enormous
TypeRep keys which consisted of tall nested types like
S (S (S ...)).
So, eventually, I end up using the ghc plugin ghc-typelits-knownnat and the type of the
In order to benchmark
lookup function we implemented a special function
fromList to use in place of the bunch of inserts, so we will be able to see the real time measures of
lookup operation itself.
data TF f where TF :: Typeable a => f a -> TF f fromList :: [TF f] -> TypeRepMap f
buildBigMap function will have type
Benchmarks make 10 lookups to collect average performance statistics:
and compare the work of map-based implementation with optimal array-based implementation. Here are the achieved results:
NOTE: time in the report is for 10 lookups. To get the average time of single
lookupyou need to divide time by 10.
- Benches GHC-8.4.3
benchmarking map-based/lookup time 2.198 μs (2.195 μs .. 2.202 μs) 1.000 R² (1.000 R² .. 1.000 R²) mean 2.196 μs (2.193 μs .. 2.199 μs) std dev 10.46 ns (8.436 ns .. 12.67 ns) benchmarking dependent map/lookup time 819.0 ns (810.7 ns .. 829.1 ns) 0.999 R² (0.999 R² .. 1.000 R²) mean 815.8 ns (812.1 ns .. 822.5 ns) std dev 16.11 ns (9.371 ns .. 23.09 ns) benchmarking vector-binary-search/lookup time 370.7 ns (368.9 ns .. 372.2 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 368.9 ns (368.2 ns .. 369.7 ns) std dev 2.512 ns (1.938 ns .. 3.474 ns) benchmarking array-cache-optimized-binary-search/lookup time 183.5 ns (183.2 ns .. 183.8 ns) 1.000 R² (1.000 R² .. 1.000 R²) mean 183.6 ns (183.3 ns .. 184.4 ns) std dev 1.535 ns (958.3 ps .. 2.631 ns)
In this blog post, I wanted to show the difficulties, tricks, and useful information which I personally learned during the implementation of an optimized version of
TypeRepMap. Also, I needed to somehow structure the knowledge I’ve gained while working on this project. You can say that some parts of the post can be skipped or might be irrelevant but I wrote it in such a way on purpose to highlight the topics that I find very hard to find and understand quickly. So I hope you too will find this knowledge useful!
Many thanks to Vladislav Zavialov (@int-index) for mentoring this project! It was the great experience for me.
A few more challenges on the way to the release
During interface enhancement I ran into some weird issue described below.
It’s nice to have the
member function and it makes sense to implement it using already written lookup function:
Type of the
lookup function is the following:
Unfortunately, this implementation of
member doesn’t compile! The problem is in the fact that the compiler can’t infer that type variable
a and the argument to
f have the same kind. These two functions have the following type with implicitly inferred kinds:
After this ghc proposal is implemented, it should be possible to write such type signatures directly in code. The current workaround is to use this trick with
New TypeRep performance🔗
During benchmarking the
Map-based implementation of
TypeRepMap, very perceptible performance degradation was noticed. Here is the comparison table with the results we have with our
We didn’t observe this performance degradation when we used
Fingerprint as keys, so it’s probably an issue with the new
KnownNat and Typeable🔗
Initial version of
buildBigMap function had this type signature:
But, unfortunately, it became broken on GHC-8.4.3! Turned out that
KnownNat constraints don’t play well together. This observation resulted in the following ghc ticket with quite an interesting discussion: