3a – Two Language Kernels (Expressions and Memory Models)

while and sa-decl

Mark Armstrong, PhD Candidate, McMaster University

Fall, 2019

1 Preamble

1.1 Notable references

  • Concepts, Techniques, and Models of Computer Programming, Van Roy & Haridi
    • Chapter 2 – Declarative Computation Model
      • Section 2.2 – The single-assignment store
  • Principles of Programming Languages, Dowek
    • Chapter 1 – Imperative Core
      • Section 1.3.1 – The Concept of a State
      • Section 1.3.2 – Decomposition of the State
      • Section 1.3.3 – A Visual Representation of a State

1.2 Update history

Sept. 25
  • Original version (part a) posted

1.3 TODO Table of contents

2 Language kernels

In this approach, a practical language is translated into a kernel language that consists of a small number of programmer-significant elements. The rich set of abstractions and syntax is encoded into the small kernel language. This gives both programmer and student a clear insight into what the language does.

— Concepts, Techniques, and Models of Computer Programming

2.1 A meta-note about exams

Some of these kernel languages will be proper subsets of the languages we use in the course (especially Oz), but some will be theoretical languages.

In midterms and exams, you will generally not be expected to have memorised the syntax of such theoretical languages.

In fact (unless it is announced otherwise before the tests), in questions regarding these languages we will provide you (the relevant portions of) the CFG defining the language syntax.

2.2 Justifying the kernel

For a kernel language to be “useful”, we must convince ourselves that it includes

  • enough abstractions to be practical, but
  • not so many that it becomes difficult to reason about.

Almost any general purpose language serves as an example that violates the second condition.

An example of a language which violates the first is the “Goto” language. In this language,

  • the only type is ℕ (the natural numbers),
  • the only control structure is a conditional jump, and
  • the only way to modify variables is to increment or decrement them by 1.
⟨stmt⟩ ∷= [label] ⟨command⟩
       | ⟨stmt⟩ ; ⟨stmt⟩

⟨command⟩ ∷= skip
          | var ++
          | var --
          | if-zero var goto label

This language is useful for studying computability, but not so much for studying language design.

2.3 Extending the kernel

There are two ways we will extend our kernel languages:

  • syntactic sugar, and
  • linguistic abstractions.

Syntactic sugar

  • is a shortcut notation that can be translated into the kernel language.
  • For example, omitting keywords or rearranging expressions.
    • Such as the prefix control structures in Ruby.

Whereas a linguistic abstraction

  • is a new abstraction which can be translated into the kernel language.
    • Remember than an abstraction is a tool or device that solves a particular problem.
  • For example, adding functions to a language with only procedures, or for loops to a language with only while loops.

3 A common set of expressions – Expr₀

For simplicity's sake, we will (for the moment) restrict our attention to languages with

  • only integer and boolean expressions,
  • only integer and reference variables,

and we will not assume any type checking.

The abstract syntax of these expressions is given by the grammar

⟨expr⟩ ∷= ⟨bexpr⟩ | ⟨iexpr⟩
⟨bexpr⟩ ∷= true | false
        | ⟨expr⟩ == ⟨expr⟩ | ⟨expr⟩ \= ⟨expr⟩
        | ⟨expr⟩ =< ⟨expr⟩ | ⟨expr⟩ < ⟨expr⟩
        | ⟨expr⟩ >= ⟨expr⟩ | ⟨expr⟩ > ⟨expr⟩
⟨iexpr⟩ ∷= number | var
        | ⟨iexpr⟩ + ⟨iexpr⟩ | ⟨iexpr⟩ - ⟨iexpr⟩
        | ⟨iexpr⟩ * ⟨iexpr⟩ | ⟨iexpr⟩ div ⟨iexpr⟩ | ⟨iexpr⟩ mod ⟨iexpr⟩

3.1 Why subscript zero?

We will later introduce more basic types than just integer and boolean (as well as discussing type checking).

So we call this simple language of expressions Expr₀.

4 The memory models

We are going to define two kernel languages; one imperative, and one declarative.

Before we get to the statements of the languages, we need to explain the memory model underlying each.

  • The languages are built specifically for their memory models.

We will call these two memory models:

  • Explicit state (with reference types).
    • The memory model that applies to most imperative languages.
    • So, likely what you are familiar with.
  • Single-assignment store.
    • A memory model suitable for declarative languages.
    • Particularly useful for its simple support of concurrency.

We begin though by discussing a simpler model, which we call explicit state without reference types.

4.1 Assignment vs Binding, Identifier vs Variable

A binding

  • is an association between a variable and some entity.
    • For instance, a value, or an address/memory reference.
  • Generally, we use the term binding to mean a permanent binding; once created, such a binding never changes.
  • We instead use assignment to refer to non-permanant bindings.

In the simplest memory models, values are simply assigned to variables; memory is a mapping between variable names and values.

In general, things are more complex, and it is useful to distinguish the terms

  • identifier to refer to the names of variables as they appear in the program source, and
  • variable to refer to the abstract notion of a cell of memory.

4.2 States, environments

We will generally denote the contents of memory using the symbols σ (sigma) and τ (tau), possibly with subscripts (σ₁, τ₂, etc.) or primes (σ′, τ‴, etc.).

We use notation similar to substitution to update memory states.

  • σ[x ≔ v] is read as σ with x updated to be v.

We will also need to discuss an environment for variables and procedures/functions; we will use Σ (Sigma), adorned with subscripts or primes, to denote environments.

4.3 Explicit state without reference types

In the explicit state without reference types model, we can view

  • state as a mapping from identifiers to values.

So a state σ is a mapping between identifiers and values.

  • σ : Identifier → Value

This is a nice, simple mental model.

  • But it is too simple; most imperative languages include a reference type abstraction that cannot be represented by this model.

4.3.1 Visualisation of explicit state without reference types

explicit-state-without-references.png

If σ is this state, the update σ[x ≔ v] changes the arrow coming from x to instead point to value v.

4.4 Explicit state with reference types

When we add the concept of references or addresses to our memory model, we must update our notion of state; we now view

  • state as a mapping from references to values.
    • Rather than from identifiers to values.
  • We then need an environment to map identifiers to references.

The addition of references to our model provides the abstractions

  • constants, as identifiers mapped to directly to values,
  • reference type variables, as identifiers mapped to a reference which maps to a reference.

4.4.1 Visualisation of explicit state with reference types

explicit-state-with-references.png

In this memory configuration, x is a constant, y is a (simple) variable, and z is a reference variable.

Assignment changes the arrows leaving references; the arrows leaving identifiers are fixed.

So if this state is σ, σ[r₁ ≔ v] would be a new state where r₁ points to v instead of 5.

4.5 Single-assignment store

In a single-assignment store, variables can be bound to values at most once.

  • Once (totally) bound, a variable is indistinguishable from the value it is bound to.

We consider a single-assignment store model which allows partial bindings.

  • A value can involve variables.
    • In which case we call it a partial value.
    • If a value is just a variable, we call the binding a variable-variable binding.
  • The variables in a value may be bound later.
    • A variable bound to a partial value is called partially bound.
    • As opposed to a totally bound variable.

4.5.1 Visualisation of single assignment store

single-assignment-store.png

In this memory configuration,

  • variable x is (totally) bound to the value 5,
  • variable y is (partially) bound to the value 2 + z,
  • variable z and z′ are bound together, but neither is bound to a value.