Generic programming has proven useful to define automatically operations
acting on datatypes à la deriving in Haskell. However usual presentations
do not allow the user to highlight some constructors as binding variables in the
context. For every language with binders defined in such a universe, it is therefore
necessary to define manually substitution and to prove its properties.
In this blog post, we strive to define a universe for syntaxes with binding
for which substitution can be defined generically.
Operations such as maps, induction principles, equality tests, etc. acting on inductive datatypes can be defined generically over datatypes inhabiting a universe. Altenkirch and McBride [1] have shown how dependently-typed systems can accomodate these mechanisms as a set of libraries rather than e.g. a pre-processor.
Our set of description constructors is basically identical to the one of McBride in his paper on ornaments [2] except for an extra constructor which allows us to introduce a new variable. It comes with a decoding function which, for this is the non dependent case, can be defined separately. We will however present each constructor together with its semantics given by the decoding function ⟦_⟧ in the pure tradition of Induction-Recursion [3]. This decoding function defines an endofunctor on List V → Set.
data Desc : Set₁ where ⟦_⟧ : ∀ {V} → Desc → (R : List V → Set) → List V → Set
Sigma types can either be used in a non-dependent fashion in order to store elements of a set inside the structure of the datatype e.g. when defining lists; or to give the user a set (indexed by A of different constructors she may use.
`σ[_]_ : (A : Set) → (da : (a : A) → Desc) → Desc ⟦ `σ[ A ] d ⟧ R Γ = Σ A (λ a → ⟦ d a ⟧ R Γ)
Recursive positions are the variable positions in the functor and are therefore interpreted by R which will be the fixpoint itself when tying the knot. They correspond to e.g. the tail of the list in the description of the _∷_ constructor.
`r_ : (d : Desc) → Desc ⟦ `r d ⟧ R Γ = R Γ × ⟦ d ⟧ R Γ
Units are used to state that a description is finished. In the indexed case, they will be tagged with an index restricting the type of the branch they belong to.
`1 : Desc ⟦ `1 ⟧ R Γ = ⊤
Binders expect the user to provide an identifier for the bound variable and then keep going through the reste of the definition with an extended context. In this post, we never use informative identifiers but we can easily imagine taking advantage of a set with decidable equality to alleviate the burden of calling a variable by providing a proof that it belongs to an environment.
`λ_ : (d : Desc) → Desc ⟦ `λ d ⟧ R Γ = Σ _ (λ v → ⟦ d ⟧ R (v ∷ Γ))
So far so good but... we have no way to actually use a variable! Sure, we can declare as many new ones as we fancy and have the joy of seeing them floating around in the context, but we might want to actually do something with them at some point.
In our universe with binders, we want to have a substitution operation
generically defined on all the datatypes using binding. In other words, a
variable's only purpose is to be substituted by a "proper" term. In a sense,
a variable is just a placeholder for a "real" term, an excuse for an inhabitant
which has not been crafted yet.
This is the reason why variables are treated separately: a whole subterm and
a variable should be on an equal footing. This ascertainment induces us to enrich
the meaning endofunctor with a position for variables right before building its
fixpoint.
L⟦_⟧ : ∀ {V} → Desc → (R : List V → Set) → List V → Set L⟦ d ⟧ R Γ = Σ _ (λ v → v ∈ Γ) ⊎ ⟦ d ⟧ R Γ data `μ {V} (d : Desc) (Γ : List V) : Set where `⟨_⟩ : (t : L⟦ d ⟧ (`μ d) Γ) → `μ d Γ
Obviously Agda does not complain about our definition: the described functors are strictly positive by construction and accordingly have a fixpoint.
We call substitution for d from Γ to Δ, a function which provides for each variable in Γ a term of type `μ d using variables taken in Δ.
subst[_] : ∀ {V} → Desc → List V → List V → Set subst[ d ] Γ Δ = ∀ v → v ∈ Γ → `μ d Δ
As one would expect from the name of this function, given a substitution for
d from Γ to Δ and a term in `μ d with free
variables ranging in Γ, one can build a term in `μ d only using
variables in Δ. The whole process corresponds to applying this substitution
and is defined by two mutually defined functions.
A term is either a variable, in which case we can just apply the substitution,
or it is an inhabitant of the functor applied to the fixpoint and we rely on the
helper function to propagate calls to the substitution on recursive positions.
mutual _∶_⇇_ : ∀ {V} {Γ Δ : List V} d (t : `μ d Γ) (ρ : subst[ d ] Γ Δ) → `μ d Δ d ∶ `⟨ inj₁ (v , pr) ⟩ ⇇ ρ = ρ v pr d ∶ `⟨ inj₂ t ⟩ ⇇ ρ = `⟨ inj₂ (map⟦ d ⟧ t ρ) ⟩
The helper function is quite straightforward to define except for the case where we go under a binder. In this case, we extend the substitution with the identity on the first variable and weaken the terms corresponding to the other ones. The careful reader will notice the generalization applied to the type of the recursive positions to ensure that the definition goes through. In practice, all direct calls to map⟦_⟧ will be such that d = e.
map⟦_⟧ : ∀ {V} d {e} {Γ Δ : List V} (t : ⟦ d ⟧ (`μ e) Γ) (ρ : subst[ e ] Γ Δ) → ⟦ d ⟧ (`μ e) Δ map⟦ `σ[ A ] d ⟧ (a , t) ρ = a , map⟦ d a ⟧ t ρ map⟦ `r d ⟧ (r , t) ρ = _ ∶ r ⇇ ρ , map⟦ d ⟧ t ρ map⟦ `1 ⟧ t ρ = t map⟦ `λ d ⟧ (v , t) ρ = v , map⟦ d ⟧ t ρ' where ρ' = λ { .v z → `⟨ inj₁ (v , z) ⟩ ; w (s pr) → weaken _ _ (ρ w pr) }
Given an easy proof relying on weakening that the identity substitution exists, it is rather easy to define what's named β reduction in the lambda calculus.
βred : ∀ {V} d {Γ v} → `μ d (v ∷ Γ) → `μ d Γ → `μ d Γ βred d {Γ} {v} t u = d ∶ t ⇇ ρ where ρ = λ { .v z → u ; w (s pr) → subst-id d Γ w pr }
When building examples, we introduce enumerations of labels corresponding to
the different constructors of the language. We hope the definitions of the various
Desc functors to be easier to understand this way.
Let's start with a very simple language: the untyped lambda calculus. Its
grammar has two components: one can introduce a lambda abstraction (label
lam) which binds a variable and contains a subterm (a recursive position
`r) or an applications (label app) which contains two subterms.
The set of lambda terms with variables in Γ is then given by `μ.
The set of closed terms is therefore lc []
lc-grammar : Desc lc-grammar = `σ[ lc-labl ] λ { lam → `λ `r `1 ; app → `r `r `1 } lc : ∀ (Γ : List ⊤) → Set lc Γ = `μ lc-grammar Γ
Using Andjelkovic and Gundry's pattern synonyms [4], we can recover a nice syntax closer to the one we are used to. Our lambdas, variables and applications will desugar to the somewhat horrendous codes describing elements of the fixpoint of lc-grammar. We can, for instance, define the well-known looping function :Ω.
:δ :Ω : lc [] :δ = :λ :v z :$ :v z :Ω = :δ :$ :δ
We can now define a function fire : lc [] → Maybe (lc []) which explores a (closed) term looking for a redex and returns either nothing if the term is in weak-head normal form or just the term after having fired the first redex encountered. Iterating fire on a term gives rise to an execute function which produces a CoList of successive reducts.
As one would expect, running :Ω yields an infinitely long CoList of :Ωs.
lemma : execute :Ω ≣ repeat :Ω lemma = ∷ ♯ lemma
I cooked up the indexed version with which it is possible to get sexy well-typed terms for the simply-typed lambda calculus such as `K. But that is another story.
`K : ∀ {Γ σ τ} → Γ ⊢ σ `→ τ `→ σ `K = :λ :λ :v (s z)