Phantom types are a well-known compile-time tool to ensure that illegal actions (such as attempting to write in a read-only file) are unrepresentable. In this blog post we introduce poltergeist types: phantom types whose parameters may still knock around at runtime.

# A Lightning Introduction to Phantom Types

A phantom type is a parametrised type whose parameters do not show up in the types of the (usually unique) constructor's arguments. They can be used to document whether the arguments are respecting some invariants. For instance the following definition declares a type of file handles parametrised by two booleans corresponding to whether the file has been opened with the ability to read and to write from it.

If we do not export the MkHandle constructor then the only way for a user to manufacture a file handle is to use the functions we provide. We can make sure that these functions enforce that the parameters are always set to reflect the file handle's status.

data Handle : (read, write : Bool) -> Type where MkHandle : AnyPtr -> Path r w

We can now make sure that we only attempt to get a line from a file we are allowed to read from by insisting that the read parameter of the handle we pass to fGetLine is True. Similarly we can restrict fPutStr to file handles where we statically know that we are allowed to write.

fGetLine : Handle True write -> IO String fPutStr : Handle read True -> String -> IO ()

Note that in both cases we only restrict the parameter of interest, leaving the other one as polymorphic as possible so that we may use fGetLine with both read-only and read-write files. If we need to we can enfore more constraints or no constraints whatsoever e.g. reversing a file will require the ability to read and write to it but closing a handle is completely agnostic with respect to the handle's capabilities.

The example we just saw is set in a dependently typed language but is not specific to it. In OCaml for instance, we could define a largely similar interface with the exception that the parameters of the Handle phantom type would be types themselves and not the boolean values we used. Apart from this small change, the code would be largely the same and the runtime behaviour virtually identical. In particular all of the parameters would be erased at runtime [1].

The dependently typed setting however allows us to ask whether we would sometimes want the parameters' values to be available at runtime.

# Poltergeist Types

For this case study we want to devise a high level representation of a directory tree. Up until now the only way to interact with the file system in Idris was to use some very low-level primitives: openDir would take a string and, provided that it represents a valid directory path, would return a pointer representing the directory. You could then repeatedly call dirEntry with that pointer to get your hands on directory entries one by one. Once you were done you would need to remember to call (exactly once!) closeDir to free the pointer. This leads to extremely imperative code with no separation of concerns: in any function the exploration of the directory tree is intimately interleaved with the function's own internal logic.

Our solution is to describe a directory tree as a record containing a list of file names and a list of directory names each paired with an IO computation delivering the tree corresponding to the subdirectory. We can provide a generic function building such a tree and then users can explore it without having to bother with manipulating pointers. The final decision is to pick how to represent these file names and directory names. We could use absolute paths but that would involve a lot of information duplication as all the names in the tree share a common prefix. We could alternatively use base names but that would force users to reconstruct absolute paths when exploring the tree if they actually need them which seems like an error-prone process. Instead we opt for a phantom type: a FileName is parametrised by its root but only stores the basename as a string.

data FileName : Path -> Type where MkFileName : String -> FileName root

And here is where we depart from classic phantom types: a user needing access to the full file path can call toFilePath on a FileName root which will, provided that the root is runtime-relevant, compute the appropriate answer.

toFilePath : {root : Path} -> FileName root -> String toFilePath file = show (root /> file)

We can finally give the definition of a Tree we only sketched earlier. A tree is a record anchored at a root and it comprises a list of file names as well as a list of subtrees both of which are anchored at that same root.

record Tree (root : Path) where constructor MkTree files : List (FileName root) subTrees : List (SubTree root)

The key definition is that of the notion of subtrees. It is a dependent pair storing a file name (the name of the directory) and an IO computation returning a tree anchored at the parent's root extended with the name of the subdirectory. This definition gives us the best of both worlds: the file names are only storing the base names, but the user never has to explicitly reconstruct the path when they explore a given tree. Indeed our definition guarantees that the path is automatically extended when entering a subdirectory, and that this extended path is used when calling toFilePath.

SubTree : Path -> Type SubTree root = (dir : FileName root ** IO (Tree (root /> dir)))

When writing a function consuming a Tree we can leave the parameter implicit (marking it erased or not depending on whether we actually need this information at runtime) and let Idris reconstruct the appropriate value by unification when performing a recursive call. We show below the type of two different traversals provided by the System.Directory.Tree library: print only uses base names to display the tree in the console and as such its root argument is guaranteed to be erased. The findFile function however looks for a file whose basename matches a given string and returns a full path. As such it needs to have access to the value of root and that is why it is explicitly introduced with an unrestricted modality.

print : Tree root -> IO () findFile : {root : Path} -> String -> Tree root -> IO (Maybe Path)

# Conclusion

We have seens that instead of storing potentially useless information in a data structure, or forcing users to carefully reconstruct it on the fly were they to need it, we can instead parametrise our data structure over this information in such a way that if it is needed then it will be automatically reconstructed by the type system for us. This gives us a new kind of types that are phantom in the sense that their payloads are not influenced by their parameters but whose parameters may yet manifest themselves physically at runtime. We call them poltergeists.

# Footnotes

In Idris the variables in a type that are not explicitly bound are implicitly universally quantified in a prenex position with quantity 0 i.e. they are runtime-irrelevant. In the type of fGetLine for instance, the boolean variable write is therefore universally quantified, and marked as erased. The same thing applies to the variable read in the type of fPutStr.


Last update: 2024 04
fun