One of the key features of bound is that it introduces redundancies in the representation of syntaxes in order to avoid traversing entire subterms just to apply a weakening. However, equality testing currently quotients out these redundancies. In this post, I present a lazy equality test for the untyped lambda calculus with redundancies which does not perform any of this extra work.

This works stems out of a bold statement I made and was called out on by Thomas Braibant during an emacs-driven seminar given in front of the smart gallium people. The lazy equality test I claimed could be implemented turned out to be quite a bit more work than I expected & I was not able to deliver it when asked to. Better late than never.

Our running example will be the untyped lambda calculus with its well-known 3 constructors: var, lam and app. We will start by motivating the use of the notion of Scope when dealing with syntaxes with binding both for its added safety, genericity and efficiency. We will then move on to the actual meat of this post: the lazy equality test.

# Binding? de Bruijn!

De Bruijn indices are the canonical solution to representing binding in a language whilst not having to bother with α-conversion. Here is the inductive definition of the untyped lambda calculus:

data Lam : Set where var : (n : Nat) → Lam app : (t u : Lam) → Lam lam : (b : Lam) → Lam

Unfortunately this ease of use comes at a price: de Bruijn indices are an inherently untyped approach to context representation which means that they don't push any sanity check into the type system. Being the silly programmer I am, I always end up writing nonsensical expressions. The canonical example may very well be the definition of substitution [1] where one goes under a binder without even realizing that the former substitution must be weakened and extended:

subst : Lam → (Nat → Lam) → Lam subst (var x) ρ = ρ x subst (app t u) ρ = app (subst t ρ) (subst u ρ) subst (lam b) ρ = lam (subst b ρ) -- this typechecks... -- ouch!

# Type level de Bruijn indices

Now, de Bruijn indices are a really great idea when it comes to handling α-equivalence, capture avoidance, looking variables up in environments, etc. We would like to retain the general idea behind them whilst being reminded by the typechecker that some magic needs to happen after a new variable has been bound.

Typed alternatives to de Bruijn indices do exist; we can e.g. think of terms well-scoped by constructions using Fin or the non-regular datatypes reflecting the indices at the type level we are going to use:

data Lam (A : Set) : Set where var : (a : A) → Lam A app : (t u : Lam A) → Lam A lam : (b : Lam (Maybe A)) → Lam A

Lam quite clearly is a functor hence the existence of a higher-order function map which loosely corresponds to the notion of renaming / weakening [2]. The var and app cases are unchanged but the type of the lamdba's body now rejects the erroneous recursive call described earlier. We are forced to both extend the substitution with the new bound variable which is unchanged, and weaken the elements in the rest of the substitution.

subst : (t : Lam A) (ρ : A → Lam B) → Lam B subst (var a) ρ = ρ a subst (app t u) ρ = app (subst t ρ) (subst u ρ) subst (lam b) ρ = lam (subst b ρ') where ρ' none = var none ρ' (some a) = map some (ρ a) -- this traverses -- the whole term

Unfortunately, this is rather inefficient in that we have to walk through all the terms in the substitution every time we pass under a binder.

# Lazy weakenings

In order to avoid traversing the whole term whenever one just wants to weaken it, it is quite natural to add some redundancies in the representation. This can be done by letting the variables in a subterm be either bound by the nearest binder or whole subterms living in the non-extended context (aka weakened subterms). This naturally translates to the following notions of Scope:

Scope : (F : Set → Set) (A : Set) → Set Scope F A = F (Maybe (F A))

In this case, the definition is simple enough that Agda has no problem seeing that the Scope transformer preserves positivity and that Lam is therefore a legal inductive definition. However in more intricate examples, e.g. when describing a universe of syntaxes with Scopes, only a tool like MiniAgda which allows for positivity annotations will accept the definition as valid.

data Lam (A : Set) : Set where var : (a : A) → Lam A app : (t u : Lam A) → Lam A lam : (b : Scope Lam A) → Lam A

# Equality testing

All is well when it comes to substitution but what about equality testing? We have added redundancies which therefore need to be quotiented out when comparing two terms. The simplest way to implement such a comparison function is to define a function flatten which will push the weakenings into the leaves of the terms thus producing terms in de Bruijn normal form.

flatten : (t : Scope Lam A) → Lam (Maybe A)

# Lazy equality testing

Now, we know that we can test terms for equality. But it is not quite satisfactory yet: we introduced redundancies precisely to avoid having to perform all of these costly flattenings. But, with a little bit of work, it is actually possible to perform lazy equality testing.

# Context inclusions

One of the tools we are going to need is a way to describe how different contexts are related. The terms we are going to compare will both live in a common context but unfolding them may very-well bring us to widely different places based on the number of weakenings respectively encountered on the way down to the current subterms.

We introduce this notion of inclusion [3] whose 2nd constructor (↑_) tags the moment one goes under a lambda abstraction whilst the 3rd one (◂_) tags the moment one encounters a weakening.

data _⊆_ : (A B : Set) → Set₁ where ■ : A ⊆ A ↑_ : A ⊆ B → Maybe (Lam A) ⊆ Maybe (Lam B) ◂_ : Maybe (Lam A) ⊆ B → A ⊆ B

# Term comparison

Now, we are ready to describe the comparison of two terms living in their respective local contexts A and B both sharing a common super-context T.

data EQ : {T A : Set} (incA : A ⊆ T) (t : Lam A) {B : Set} (incB : B ⊆ T) (u : Lam B) → Set₁ where

Let's start off with the easy cases: two applications are equal whenever their functions (resp. arguments) are equal ; and two lambda abstractions are equal whenever their bodies are equal in the extended contexts.

EQapp : EQ incA t₁ incB t₂ → EQ incA u₁ incB u₂ → EQ incA (app t₁ u₁) incB (app t₂ u₂) EQlam : EQ (↑ incA) b₁ (↑ incB) b₂ → EQ incA (lam b₁) incB (lam b₂)

When two terms claim to be bound variables, we can compare the context inclusions to check that they refer to the same lambda binder. We will describe what this check amounts to in the next section.

EQvar : EQVar incA zero incB zero → EQ incA (var none) incB (var none)

When one of the terms is a weakening of a subterm, we record this information in the proof of context inclusion and keep on unfolding further.

EQvarFre₁ : EQ (◂ incA) v incB u → EQ incA (var (some v)) incB u EQvarFre₂ : EQ incA t (◂ incB) v → EQ incA t incB (var (some v))

The careful reader will have noticed that this description is mostly syntax-directed except for the two last rules which happen to commute. Turning this specification into a recursive algorithm will therefore just be a matter of picking an order in which to check that either EQvarFre₁ or EQvarFre₂ applies. Which is good news when you are in the business of actually deciding equality.

# Variable comparison

Variable comparison is always done up to weakenings. We count the numbers of weakenings encountered thus far respectively in kA and kB and manage (in/de)crements based on the tokens we stumble upon.

data EQVar : {TA A : Set} (incA : A ⊆ TA) (kA : Nat) {TB B : Set} (incB : B ⊆ TB) (kB : Nat) → Set₁ where

When both context inclusions are synchronized on a lambda binder (tagged by ↑_ in the derivations) and no weakening whatsoever has been applied (both kA and kB are zero) then we can conclude that the variables are indeed equal.

EQVarZRO : EQVar (↑ incA) zero (↑ incB) zero

When, on the other hand, they both claim to be referring to a binder which is at least one level up (both kA and kB are non-zero), one can forget about the binder at hand and go explore the context inclusions upward.

EQVarSUC : EQVar incA kA incB kB → EQVar (↑ incA) (suc kA) (↑ incB) (suc kB)

Finally, when we encounter a weakening (marked with a ◂_), we record its presence by incrementing the appropriate counter and keep comparing the rest of the inclusion proofs.

EQVarWK₁ : EQVar incA (suc kA) incB kB → EQVar (◂ incA) kA incB kB EQVarWK₂ : EQVar incA kA incB (suc kB) → EQVar incA kA (◂ incB) kB

# Testing equality

As expected, this specification also is mostly syntax-directed except for commutative steps. It is thus easy to write an algorithm checking whether such an equality derivation can be built. And it's quite a lot simpler to do now that the rules are guiding the implementation.

# Footnotes

Weakening is also a good candidate for this sort of nonsense given that one easily forgets to protect bound variables and end up treating them as if they were free.

It is respectively renaming / weakening provided that the function passed is well-behaved (resp. bijective / injective).

It makes sense as a notion of inclusion because a proof that A ⊆ B gives rise to a morphism from A to B. Cf. the accompanying Agda doc.


Last update: 2024 04
fun