Before we can discuss syntax and semantics, we must differentiate the notions of
Recall that, in mathematics
Whereas
In the discussion of programming languages,
Usually we specifically use the term
Sometimes we are interested in both the value and the effect.
A side effect of a program is any change to non-local state.
Examples of side effects:
![]()
The problem with Haskell is that it's a language built on lazy evaluation and nobody's actually called for it.
While we are discussing side effects, let us futher discuss pure code.
A unit of code is called pure if, in addition to having no side effects, the result of its evaluation is always the same.
We are usually interested in pure functions in particular.
Side effects and impurity may be considered a “necessary evil”.
A solution to this problem:
How “side effect free” can we be, exactly?
We've said before, in order to facilitate running code on machines, programming languages must be formal.
There are two portions to this requirement.
Formal descriptions of semantics in particular are not always given!
Unfortunately, we cannot cleanly divide elements of programming languages into the categories of syntax and semantics.
We will see an additional reason to segregate such features;
The standard tools for describing programming language syntax are regular expressions and context-free grammars.
Regular expressions may be used to describe tokens, (categories of) the smallest syntactic units of a language.
Content-free grammars are used to describe (most of) the form of programs.
Specifically, grammars in extended Backus-Naur form (EBNF) are usually used.
As a brief example of EBNF (which we'll dissect soon),
⟨digit⟩ ∷= 1 ∣ 2 ∣ 3 ∣ 4 ∣ 5 ∣ 6 ∣ 7 ∣ 8 ∣ 9 ∣ 0
⟨nat⟩ ∷= ⟨digit⟩ { ⟨digit⟩ }
⟨int⟩ ∷= [ - ] ⟨nat⟩
⟨exp⟩ ∷= ⟨int⟩ ∣ ⟨exp⟩ (+ ∣ *) ⟨exp⟩
The smallest syntactic units of a programming language are called lexemes.
while
, if
, int
, +
,
some-variable-name
, a-function-name
, etc.Categories of lexemes are called tokens.
The first step in parsing a program is to convert it from plaintext to a list of tokens.
x = 0;
r = 1;
while (x < n) {
r = r * x;
x++;
}
⇓
id(x) eq lit(0) end_stmt
id(r) eq lit(1) end_stmt
while openbr id(x) op(<) id(n) closebr open_block
id(r) eq id(r) op(*)
id(x) end_stmt id(x) op(++) end_stmt
close_block
Disclaimer: this example is purely made up; it's not intended to be a completely accurate depiction of tokenising any particular language.
Consider again the example grammar using extended Backus-Naur form.
⟨digit⟩ ∷= 1 ∣ 2 ∣ 3 ∣ 4 ∣ 5 ∣ 6 ∣ 7 ∣ 8 ∣ 9 ∣ 0
⟨int⟩ ∷= ⟨digit⟩ { ⟨digit⟩ }
⟨exp⟩ ∷= ⟨int⟩ ∣ ⟨exp⟩ (+ ∣ *) ⟨exp⟩
⟨...⟩
.∷=
is used to begin a list of productions,
rather than →
or ⟶
.{ ... }
indicate their contents
may be repeated 0 or more times.
{ ... }⁺
to indicate 1 or more repetitions.[ ... ]
indicate their contents are optional.∣
may be used inside parentheses, (... ∣ ...)
to indicate choice in part of a production.Notations differ!
Recall that parsing a string (or deriving a string) using a grammar gives rise to a parse tree or derivation tree.
It is desirable to have a single parse tree for every program.
Three tools for removing ambiguity are
To enforce precedence using a grammar:
To enforce associativity using a grammar:
Recall that addition is an associative operator.
So the choice of whether addition in a language associates to the right or to the left may seem arbitrary.
“Simple”, ambiguous grammars do have a place in describing programming language syntax.
For most interesting languages, context-free grammars are not quite sufficient to describe well-formed programs.
Recall the Chomsky hierarchy of languages.
Regular ⊂ Context-free ⊂ Context-sensitive ⊂ Recursive ⊂ Recursively enumberable
Consider this simple grammar.
⟨S⟩ ∷= ⟨A⟩ ⟨B⟩ ⟨C⟩
⟨A⟩ ∷= ε ∣ a ⟨A⟩
⟨B⟩ ∷= ε ∣ b ⟨B⟩
⟨C⟩ ∷= ε ∣ c ⟨C⟩
Suppose we want to allow only strings of the form aⁿbⁿcⁿ
.
There is no CFG that can produce exactly such strings.
But we can enforce this condition using the above grammar
augmented with attributes.
⟨A⟩
, ⟨B⟩
and ⟨C⟩
are given an attribute
length
.⟨A⟩
, ⟨B⟩
or ⟨C⟩
on the left side, we attach
a rule to compute the length
.⟨S⟩ ∷= ⟨A⟩ ⟨B⟩ ⟨C⟩
enforces the condition with a predicate.⟨S⟩ ∷= ⟨A⟩ ⟨B⟩ ⟨C⟩
Predicate: ⟨A⟩.length = ⟨B⟩.length = ⟨C⟩.length
⟨A⟩ ∷= ε
Rule: ⟨A⟩.length ≔ 0
⟨A⟩₁ ∷= a ⟨A⟩₂
Rule: ⟨A⟩₁.length ≔ ⟨A⟩₂.length + 1
⟨B⟩ ∷= ε
Rule: ⟨B⟩.length ≔ 0
⟨B⟩₁ ∷= b ⟨B⟩₂
Rule: ⟨B⟩₁.length ≔ ⟨B⟩₂.length + 1
⟨C⟩ ∷= ε
Rule: ⟨C⟩.length ≔ 0
⟨C⟩₁ ∷= c ⟨C⟩₂
Rule: ⟨C⟩₁.length ≔ ⟨C⟩₂.length + 1
In productions with multiple occurrences of the same non-terminal, we number the occurrences so we can easily refer to them in the rules/predicates.
Unlike with syntax, there is not one universally used tool for describing programming language semantics.
In this course we will primarily consider operational semantics.
Additional approaches include
wp
; the “weakest-precondition” calculus.The “kernel language” approach to semantics can be used for languages with many features and constructs.
We will return to the discussion of semantics later in the course.