October 2017

Semilattices and their tensor products

It is becoming clear that semilattices are a powerful unifying abstraction in concurrent and distributed programming. See, for example, CvRDTs, Lindsey Kuper’s LVars, the BOOM group’s Bloom^L, and Sussman’s propagators (recently popularized by Edward Kmett). In this article, I will:

briefly explain what semilattices are;
describe an underappreciated mathematical construction, the tensor product of semilattices, which generalizes both sets of tuples (e.g. tables in a database) and finite maps (also called dictionaries, hashes, or associative arrays);
observe that semilattices under tensor product form a closed symmetric monoidal category, suggesting a connection to linear logic;
complain about how hard it is to compute with tensor products — which may explain their under-appreciation.

This article is written for a mathematical audience. (In particular, the notions of free and quotient types are used without explanation.) It’s more a “brain dump” than an essay; it has no thesis, just a set of observations.

1 Semilattices

A semilattice¹ is a type A with

a binary join operator ∨ : A × A → A that is idempotent, commutative, and associative; and
an identity or bottom element ⊥ : A.

Altogether, a semilattice must satisfy the following laws:

      a ∨ ⊥ = a                     identity
      a ∨ a = a                     idempotence
      a ∨ b = b ∨ a                 commutativity
a ∨ (b ∨ c) = (a ∨ b) ∨ c           associativity

The classic example of a semilattice is sets under union ∪, with the empty set ∅ as bottom:

      a ∪ ∅ = a                     identity
      a ∪ a = a                     idempotence
      a ∪ b = b ∪ a                 commutativity
a ∪ (b ∪ c) = (a ∪ b) ∪ c           associativity

Given a type X, one particularly useful collection of sets are the finite sets of values of type X (viewing X as a set, these are the finite subsets of X). These form the free semilattice on X, which I will write as Sets[X].

Here are a few more examples of semilattices:

Type	∨	⊥
Sets	union	empty set
Booleans	logical or	`false`
Booleans	logical and	`true`
Natural numbers	maximum	0
Positive integers	least common multiple	1

Semilattices are particularly useful in distributed programming. Determining order between events in a distributed system is hard; commutativity and associativity mean order does not matter. Likewise, idempotence means that multiplicity does not matter, and reliably avoiding or detecting duplication can also be difficult or expensive. A bottom element is convenient, serving as an “initial value” or “null message”; a natural ⊥ usually exists, and if not, it’s easy to add one.

For example, consider convergent replicated datatypes, a technique for building eventually consistent distributed systems — systems whose nodes can operate entirely independently, and may temporarily diverge, but eventually arrive at the same state if allowed to communicate. CvRDTs represent the system’s state using a type equipped with an idempotent, commutative, associative merge function, used to combine distinct nodes’ states. This merge function is just a semilattice join by another name!

2 Tensor product of semilattices

The tensor product A ⊗ B of two semilattices A, B is an interesting construction little-known outside of pure mathematics. A ⊗ B is defined as the free semilattice on A × B, modulo the equations:²

    (a, ⊥) = ⊥ = (⊥, b)                 bottoming out
(a₁ ∨ a₂, b) = (a₁, b) ∨ (a₂, b)        left distribution
(a, b₁ ∨ b₂) = (a, b₁) ∨ (a, b₂)        right distribution

These equations seem somewhat arbitrary. But! if we we pretend our variables stand for sets, replacing ∨ by ∪ and ⊥ by ∅ — and moreover replace pairing (a,b) with cross product a × b — then out pop some simple identities relating unions and cross products:

      a × ∅ = ∅ = ∅ × b                 absorption
(a₁ ∪ a₂) × b = (a₁ × b) ∪ (a₂ × b)     left distribution
a × (b₁ ∪ b₂) = (a × b₁) ∪ (a × b₂)     right distribution

This suggests there is an isomorphism of the form:

Sets[X] ⊗ Sets[Y] ≅ Sets[X × Y]

The tensor-product of a pair of sets — is sets of pairs! I sketch a proof of this in the footnotes.

Dictionaries as tensor products

The definition of tensor product is not specific to finite sets or cross products. We can take the tensor product of any two semilattices. So: what else can we construct using tensor product?

Suppose only one of the semilattices we tensor is a finite set semilattice: given a type X and a semilattice A, consider Sets[X] ⊗ A. This turns out to be equivalent to finite maps from X to A, or equivalently, dictionaries with keys of type X and values of type A. We observe this as follows:

A value of Sets[X] ⊗ A can always be expressed in the form

(s₁, a₁) ∨ (s₂, a₂) ∨ ... ∨ (sₙ, aₙ)

for some sequence of sets sᵢ : Sets[X] and non-bottom values aᵢ : A. (The “bottoming-out” rule lets us drop any pairs (sᵢ, ⊥) = ⊥.) The keys of the corresponding dictionary are the union ⋃ᵢ sᵢ. The value of a key x : X is the join of all aᵢ paired with a set sᵢ containing x:

keys = ⋃ᵢ sᵢ
value x = ⋁ᵢ {aᵢ | x ∈ sᵢ}

I’ll write this type of finite maps as Maps[X,A].

Tuple-sets as dictionaries

So far, we’ve learned that:

Sets[X × Y]	≅	Sets[X] ⊗ Sets[Y]
Sets[X] ⊗ A	≅	Maps[X, A]

Together, these suggest the principle behind database indexing! Given any table (set of tuples), pick some subset of its fields to be the key X, and let Y be the remainder. Then, applying each isomorphism in turn, our table has type:

Sets[X × Y] ≅ Sets[X] ⊗ Sets[Y] ≅ Maps[X, Sets[Y]]

That is, a set of X × Y pairs can be viewed as a dictionary mapping X keys to sets of Y values — a table indexed on the column X.

3 The category of semilattices

For the category theorists in the audience: the category SemiLat of semilattices and their homomorphisms (∨- and ⊥-preserving functions) forms a closed symmetric monoidal category where

⊗ is the tensor product.
The internal Hom is ⊸, where A ⊸ B is the semilattice of all homomorphisms from A to B, ordered pointwise.

This implies that linear λ-calculus (or a linearly typed process calculus) is a sound typed language for defining semilattice homomorphisms. I conjecture moreover that preserving ∨ arises from affine-ness (no duplication) and preserving ⊥ arises from relevance (no dropped variables):

4 Computing with tensor products

How should we represent values of type A ⊗ B so as to compute with them efficiently? We’ve seen some strategies that work for specific choices of A and B:

Sets[A] ⊗ Sets[B] can be seen as a set of tuples, Sets[A × B].
Sets[A] ⊗ B can be seen as a dictionary, Maps[A,B].

Sets and dictionaries are commonly represented using trees or hashtables.

However, it’s not obvious that there’s a good general strategy: a way to represent A ⊗ B that doesn’t depend on the particulars of A and B. In fact, only in 1978 was it shown it was even possible to decide equality of expressions of type A ⊗ B, and there was still work being done generalizing this result as recently as 2005.

Databases and tensor products

Let’s consider the restricted case Sets[A] ⊗ Sets[B]. Even then, the strategies we explored above conflict! Should we represent Sets[A] ⊗ Sets[B] as:

Sets[A × B], a set of tuples, or
Maps[A, Sets[B]], a dictionary keyed on A, or
Maps[B, Sets[A]], a dictionary keyed on B?

Lucky for us, this particular problem has been tackled before. In relational databases, this becomes the question: how shall we index our tables? (This example has only two columns, A and B, but the analogy generalizes.) Databases do not solve this problem automagically — you usually have to tell your database what indexes you want. But they do permit separating two concerns:

How your data will be represented, i.e. what to index on, and
The program to run on your data, i.e. your queries.

Databases accomplish this via multiple representations: one table may have many indexes. These representations are carefully kept consistent with one another. The query planner chooses which indexes to use to make a particular program (query) run efficiently.

Footnotes

What I call semilattices are more correctly termed join-semilattices with a least element (or almost equivalently, meet-semilattices with a greatest element), but that’s a mouthful. ↩
For the values of type A ⊗ B that I write (a, b), most sources write a ⊗ b. The latter notation distinguishes values of A ⊗ B from values of A × B; but it confuses the operation on semilattices with that on elements. It’s like writing 2 × 3 for (2, 3). ↩
Here is a sketch of a proof that Sets[A] ⊗ Sets[B] ≅ Sets[A × B] in the category SemiLat; that is, there exists a bijection between them preserving ∨ and ⊥. (This bijection is also natural in A and B, but I will not show this.)

We construct f : Sets[A] ⊗ Sets[B] → Sets[A × B] as follows:
```
f ⊥ = ∅
f (a,b) = a × b
f (x ∨ y) = f x ∪ f y
```
To justify this definition we show it respects the quotient structure of ⊗:
```
f (a, ∅) = a × ∅ = ∅ = f ⊥
f (a₁ ∪ a₂, b) = (a₁ ∪ a₂) × b
               = (a₁ × b) ∪ (a₂ × b)
               = f (a₁,b) ∪ f (a₂,b)
               = f ((a₁,b) ∨ (a₂,b))
```
The other cases are symmetric.

Now we construct f's inverse, g : Sets[A × B] → Sets[A] ⊗ Sets[B].
```
g {(a₁,b₁), (a₂,b₂), ..., (aₙ,bₙ)}
= ({a₁},{b₁}) ∨ ({a₂},{b₂}) ∨ ... ∨ ({aₙ},{bₙ})
```
As a special case, g ∅ = ⊥. And by construction, g (a ∪ b) = g a ∨ g b.

Now we show that f ∘ g = id:
```
f (g {(a₁,b₁), ..., (aₙ,bₙ)})
= f (({a₁},{b₁}) ∨ ... ∨ ({aₙ},{bₙ}))
= f ({a₁},{b₁}) ∪ ... ∪ f ({aₙ},{bₙ})
= ({a₁} × {b₁}) ∪ ... ∪ ({aₙ} × {bₙ})
= {(a₁,b₁), ..., (aₙ, bₙ)}
```
I omit the proof that g ∘ f = id, because I’m lazy and it involves induction.

Thus, f and g are inverses which preserve ⊥ and ∨, forming a semilattice isomorphism. QED. ↩