Lecture 3.1: Program Semantics — Operational Semantics

In this part, we will see how to use Lean to specify the syntax and semantics of programming languages and to reason about the semantics

Motivation

A formal semantics helps specify and reason about the programming language itself, and about individual programs

It can form the basis of verified compilers, interpreters, verifiers, static analyses, type systems, etc.

Without formal proofs, semantics, compilers, interpreters, verifiers, static analyses, type systems, etc., are almost always wrong

In this area, proof assistants are widely used—e.g. every year, about 10-20% of POPL papers are partially or totally formalized:

Little machinery (background libraries, tactics) is needed to get started, beyond inductive types and predicates and recursive functions
The proofs tend to have lots of cases, which is a good match for computers
Proof assistants keep track of what needs to be changed when as extend the programming language with more features

The WHILE language

WHILE is a minimalistic imperative language with the following grammar:

p  ::=  skip                 -- no-op
     |  x := a               -- assignment
     |  p ; p                -- sequential composition
     |  if b then p else p   -- conditional statement
     |  while b do p         -- while loop

where p stands for a program (also called statement or command), x for a variable, a for an arithmetic expression, and b for a Boolean expression

Backronym: Weak Hypothetical Imperative Language Example

Deep vs. shallow embeddings

In our informal presentation, we deliberately leave the syntax of arithmetic and Boolean expressions unspecified

In Lean, we have the choice:

We could use a type such as aexp from lecture 1.2 and similarly for Boolean expressions
We could decide that an arithmetic expression is simply a function from states to natural numbers (e.g. σ → ℕ) and a Boolean expression is a predicate (e.g. σ → Prop or σ → bool)

This corresponds to the difference between deep and shallow embeddings:

A deep embedding of some syntax (expression, formula, program, ...) consists of an abstract syntax tree specified in the proof assistant (e.g. aexp) with a semantics
In contrast, a shallow embedding simply reuses the corresponding mechanisms from the logic (e.g. σ → ℕ, σ → Prop)

A deep embedding allows us to reason explictly about the syntax (and the semantics)

A shallow embedding is more leightweight, because we can use it directly, without having to define a semantics

We will use a deep embedding of programs (which we find interesting), except assignments, and shallow embeddings of assignments and Boolean expressions (which we find boring)

Operational semantics

An operational semantics correponds to an idealized, abstract interpreter (specified in a Prolog-like language)

Two main variants:

big-step semantics
small-step semantics

Big-step semantics

In a big-step semantics (also called natural semantics), judgments have the form ⟨p, s⟩ ⟹ s':

Starting in a state s, executing p terminates in the state s'

A state s is a function from variable names to values, e.g. string → ℕ

Example:

⟨x := x + y; y := 0,  [x ↦ 3, y ↦ 5]⟩  ⟹  [x ↦ 8, y ↦ 0]

Derivation rules:

——————————————— Skip
⟨skip, s⟩ ⟹ s


——————————————————————————— Asn
⟨x := e, s⟩ ⟹ s[x ↦ s(e)]


⟨p, s⟩ ⟹ s'   ⟨p', s'⟩ ⟹ s''
——————————————————————————————— Seq
⟨p ; p', s⟩ ⟹ s''


⟨p, s⟩ ⟹ s'
——————————————————————————————— If-True   if s(b) = true
⟨if b then p else p', s⟩ ⟹ s'


⟨p', s⟩ ⟹ s'
——————————————————————————————— If-False   if s(b) = false
⟨if b then p else p', s⟩ ⟹ s'


⟨p, s⟩ ⟹ s'   ⟨while b do p, s'⟩ ⟹ s''
—————————————————————————————————————————— While-True   if s(b) = true
⟨while b do p, s⟩ ⟹ s''


————————————————————————— While-False   if s(b) = false
⟨while b do p, s⟩ ⟹ s

Above, s(e) denotes the value of expression e in state s

In Lean, the judgment corresponds to an inductive predicate, and the derivation rules correspond to the predicate's introduction rules

Using an inductive predicate as opposed to a recursive function allows us to cope with nontermination (e.g. a diverging while) and nondeterminism (e.g. in the exercise)

Advantages and disadvantages of big steps

Equipped with a big-step semantics, we can

prove properties of the programming language, such as equivalence proofs between programs and determinism
reason about concrete programs, proving theorems relating final states s' with initial states s

On the other hand, a big-step semantics

does not let us reason about intermediate states
does not let us express nontermination or interleaving (for multithreading)

Small-step semantics

Small-step semantics (also called structural operational semantics, SOS) solve the above issues

A judgment has the form ⟨p, s⟩ ⇒ ⟨p', s'⟩:

Starting in a state s, executing one step of p leaves us in the state s', with the program p' remaining to be executed

An execution is a finite or infinite chain ⟨p₀, s₀⟩ ⇒ ⟨p₁, s₁⟩ ⇒ ...

A pair ⟨p, s⟩ is called a configuration; it is final if no transition of the form ⟨p, s⟩ ⇒ _ is possible

Example:

   ⟨[x ↦ 3, y ↦ 5],  x := x + y; y := 0⟩
⇒  ⟨[x ↦ 8, y ↦ 5],  y := 0⟩
⇒  ⟨[x ↦ 8, y ↦ 0],  skip⟩

Derivation rules:

————————————————————————————————— Asn
⟨x := e, s⟩ ⇒ ⟨skip, s[x ↦ s(e)]⟩


⟨p, s⟩ ⇒ ⟨p', s'⟩
————————————————————————— Seq-Step
⟨p ; q, s⟩ ⇒ ⟨p' ; q, s'⟩


—————————————————————— Seq-Skip
⟨skip ; p, s⟩ ⇒ ⟨p, s⟩


————————————————————————————————— If-True   if s(b) = true
⟨if b then p else p', s⟩ ⇒ ⟨p, s⟩


—————————————————————————————————— If-False   if s(b) = false
⟨if b then p else p', s⟩ ⇒ ⟨p', s⟩


————————————————————————————————————————————————————————————————— While
⟨while b do p, s⟩ ⟹ ⟨if b then (p ; while b do p) else skip, s⟩

Notice that there is no rule for skip (why?)

Advantages and disadvantages of small steps

Equipped with a small-step semantics, we can define a big-step semantics:

⟨p, s⟩ ⟹ s' if and only if ⟨p, s⟩ ⇒* ⟨skip, s'⟩

where r* denotes the reflexive transitive closure of a relation r

Alternatively, if we have already defined a big-step semantics, we can prove the above equivalence theorem to validate our definitions

We can prove that a configuration ⟨p, s⟩ is final if and only if p = skip; this ensures that we have not forgotten a derivation rule

The main disadvantage of small-step semantics is that we now have two relations, ⇒ and ⇒*, and the derivation rules and proofs tend to be more complicated

Advanced induction with tactics

Inductive predicates have often parameters which change in the induction

Pattern matching and cases handles this gracfully, but induction needs some care

Assume we have

inductive even : ℕ → Prop
| zero    : even 0
| add_two : ∀n, even n → even (n + 2)

If the goal has the form even n → P n, applying induction produces the following induction cases:

case even.zero: P 0
case even.add_two: ∀n, P n → P (n + 2)

The Problem is that when the argument to even is not a variable, induction loses the information what the argument was

Thus, applying induction on even (2 * n + 1) → false produces the goals

false
false → false

To solve this, we need to replace 2 * n + 1 by a variable:

∀x, even x → x = 2 * n + 1 → false

Now we get the goals

2 * n + 1 = 0 → false
∀x, (2 * n + 1 = x → false) → 2 * n + 1 = x + 2 → false

The second goal is still not provable; we need to generalize n, to be able to instantiate it in the induction hypothesis

So in addition to add the equality, we must explicitly quantify over n:

∀x, even x → ∀n, x = 2 * n + 1 → false

Now we get the goals:

∀n, 2 * n + 1 = 0 → false
∀x, (∀n, 2 * n + 1 = x → false) → ∀n, 2 * n + 1 = x + 2 → false

These details are often glossed over in informal proofs, but proof assistants requires us to be precise

Demo

31_lecture.lean

Newly introduced syntax

Declaration

Syntax	Description
`import`	loads other Lean files

Constants

Name	Description
`rfl`	Reflexivity `a = a`. Can also be used in a pattern to perform substitution against `x = t` or `t = x`

Tactics

Name	Description
`generalize`	replaces a term in the goal by a new variable and an equality assumption
`specialize`	replaces a hypothesis `h` by an instance `h x y z`

References

Concrete Semantics: With Isabelle/HOL by Tobias Nipkow and Gerwin Klein, Chapter 7

Acknowledgment

These notes are largely inspired by Concrete Semantics: With Isabelle/HOL by Tobias Nipkow and Gerwin Klein and their slides