A universe for syntax with binding

Generic programming has proven useful to define automatically operations acting on datatypes à la deriving in Haskell. However usual presentations do not allow the user to highlight some constructors as binding variables in the context. For every language with binders defined in such a universe, it is therefore necessary to define manually substitution and to prove its properties.
In this blog post, we strive to define a universe for syntaxes with binding for which substitution can be defined generically.

Operations such as maps, induction principles, equality tests, etc. acting on inductive datatypes can be defined generically over datatypes inhabiting a universe. Altenkirch and McBride [1] have shown how dependently-typed systems can accomodate these mechanisms as a set of libraries rather than e.g. a pre-processor.

# The universe

Our set of description constructors is basically identical to the one of McBride in his paper on ornaments [2] except for an extra constructor which allows us to introduce a new variable. It comes with a decoding function which, for this is the non dependent case, can be defined separately. We will however present each constructor together with its semantics given by the decoding function ⟦_⟧ in the pure tradition of Induction-Recursion [3]. This decoding function defines an endofunctor on List V → Set.

data Desc : Set₁ where ⟦_⟧ : ∀ {V} → Desc → (R : List V → Set) → List V → Set

Sigma types can either be used in a non-dependent fashion in order to store elements of a set inside the structure of the datatype e.g. when defining lists; or to give the user a set (indexed by A of different constructors she may use.

`σ[_]_ : (A : Set) → (da : (a : A) → Desc) → Desc ⟦ `σ[ A ] d ⟧ R Γ = Σ A (λ a → ⟦ d a ⟧ R Γ)

Recursive positions are the variable positions in the functor and are therefore interpreted by R which will be the fixpoint itself when tying the knot. They correspond to e.g. the tail of the list in the description of the _∷_ constructor.

`r_ : (d : Desc) → Desc ⟦ `r d ⟧ R Γ = R Γ × ⟦ d ⟧ R Γ

Units are used to state that a description is finished. In the indexed case, they will be tagged with an index restricting the type of the branch they belong to.

`1 : Desc ⟦ `1 ⟧ R Γ = ⊤

Binders expect the user to provide an identifier for the bound variable and then keep going through the reste of the definition with an extended context. In this post, we never use informative identifiers but we can easily imagine taking advantage of a set with decidable equality to alleviate the burden of calling a variable by providing a proof that it belongs to an environment.

`λ_ : (d : Desc) → Desc ⟦ `λ d ⟧ R Γ = Σ _ (λ v → ⟦ d ⟧ R (v ∷ Γ))

So far so good but... we have no way to actually use a variable! Sure, we can declare as many new ones as we fancy and have the joy of seeing them floating around in the context, but we might want to actually do something with them at some point.

# What is a variable?

In our universe with binders, we want to have a substitution operation generically defined on all the datatypes using binding. In other words, a variable's only purpose is to be substituted by a "proper" term. In a sense, a variable is just a placeholder for a "real" term, an excuse for an inhabitant which has not been crafted yet.
This is the reason why variables are treated separately: a whole subterm and a variable should be on an equal footing. This ascertainment induces us to enrich the meaning endofunctor with a position for variables right before building its fixpoint.

L⟦_⟧ : ∀ {V} → Desc → (R : List V → Set) → List V → Set L⟦ d ⟧ R Γ = Σ _ (λ v → v ∈ Γ) ⊎ ⟦ d ⟧ R Γ data `μ {V} (d : Desc) (Γ : List V) : Set where `⟨_⟩ : (t : L⟦ d ⟧ (`μ d) Γ) → `μ d Γ

Obviously Agda does not complain about our definition: the described functors are strictly positive by construction and accordingly have a fixpoint.

# Defining substitution

We call substitution for d from Γ to Δ, a function which provides for each variable in Γ a term of type `μ d using variables taken in Δ.

subst[_] : ∀ {V} → Desc → List V → List V → Set subst[ d ] Γ Δ = ∀ v → v ∈ Γ → `μ d Δ

As one would expect from the name of this function, given a substitution for d from Γ to Δ and a term in `μ d with free variables ranging in Γ, one can build a term in `μ d only using variables in Δ. The whole process corresponds to applying this substitution and is defined by two mutually defined functions.
A term is either a variable, in which case we can just apply the substitution, or it is an inhabitant of the functor applied to the fixpoint and we rely on the helper function to propagate calls to the substitution on recursive positions.

mutual _∶_⇇_ : ∀ {V} {Γ Δ : List V} d (t : `μ d Γ) (ρ : subst[ d ] Γ Δ) → `μ d Δ d ∶ `⟨ inj₁ (v , pr) ⟩ ⇇ ρ = ρ v pr d ∶ `⟨ inj₂ t ⟩ ⇇ ρ = `⟨ inj₂ (map⟦ d ⟧ t ρ) ⟩

The helper function is quite straightforward to define except for the case where we go under a binder. In this case, we extend the substitution with the identity on the first variable and weaken the terms corresponding to the other ones. The careful reader will notice the generalization applied to the type of the recursive positions to ensure that the definition goes through. In practice, all direct calls to map⟦_⟧ will be such that d = e.

map⟦_⟧ : ∀ {V} d {e} {Γ Δ : List V} (t : ⟦ d ⟧ (`μ e) Γ) (ρ : subst[ e ] Γ Δ) → ⟦ d ⟧ (`μ e) Δ map⟦ `σ[ A ] d ⟧ (a , t) ρ = a , map⟦ d a ⟧ t ρ map⟦ `r d ⟧ (r , t) ρ = _ ∶ r ⇇ ρ , map⟦ d ⟧ t ρ map⟦ `1 ⟧ t ρ = t map⟦ `λ d ⟧ (v , t) ρ = v , map⟦ d ⟧ t ρ' where ρ' = λ { .v z → `⟨ inj₁ (v , z) ⟩ ; w (s pr) → weaken _ _ (ρ w pr) }

Given an easy proof relying on weakening that the identity substitution exists, it is rather easy to define what's named β reduction in the lambda calculus.

βred : ∀ {V} d {Γ v} → `μ d (v ∷ Γ) → `μ d Γ → `μ d Γ βred d {Γ} {v} t u = d ∶ t ⇇ ρ where ρ = λ { .v z → u ; w (s pr) → subst-id d Γ w pr }

# Examples

When building examples, we introduce enumerations of labels corresponding to the different constructors of the language. We hope the definitions of the various Desc functors to be easier to understand this way.
Let's start with a very simple language: the untyped lambda calculus. Its grammar has two components: one can introduce a lambda abstraction (label lam) which binds a variable and contains a subterm (a recursive position `r) or an applications (label app) which contains two subterms. The set of lambda terms with variables in Γ is then given by `μ. The set of closed terms is therefore lc []

lc-grammar : Desc lc-grammar = `σ[ lc-labl ] λ { lam → `λ `r `1 ; app → `r `r `1 } lc : ∀ (Γ : List ⊤) → Set lc Γ = `μ lc-grammar Γ

Using Andjelkovic and Gundry's pattern synonyms [4], we can recover a nice syntax closer to the one we are used to. Our lambdas, variables and applications will desugar to the somewhat horrendous codes describing elements of the fixpoint of lc-grammar. We can, for instance, define the well-known looping function :Ω.

:δ :Ω : lc [] :δ = :λ :v z :$ :v z :Ω = :δ :$ :δ

We can now define a function fire : lc [] → Maybe (lc []) which explores a (closed) term looking for a redex and returns either nothing if the term is in weak-head normal form or just the term after having fired the first redex encountered. Iterating fire on a term gives rise to an execute function which produces a CoList of successive reducts.

As one would expect, running :Ω yields an infinitely long CoList of :Ωs.

lemma : execute :Ω ≣ repeat :Ω lemma = ∷ ♯ lemma

# What's next?

I cooked up the indexed version with which it is possible to get sexy well-typed terms for the simply-typed lambda calculus such as `K. But that is another story.

`K : ∀ {Γ σ τ} → Γ ⊢ σ `→ τ `→ σ `K = :λ :λ :v (s z)

# Footnotes

[1]

Generic Programming within Dependently Typed Programming

[2]

Ornamental algebras, algebraic ornaments (pdf)

[3]

Cf. e.g. Peter Dybjer's work.

[4]

Feature request on agda's bug tracker.

Last update: 2025 04