#+Title: 1 ─ Introduction
#+Subtitle: “Principles of programming languages”
#+Author: Mark Armstrong, PhD Candidate, McMaster University
#+Date: Fall, 2019
#+Description: A brief overview of topics we will be discussing in this course.
#+Options: toc:nil
#+Reveal_root: http://cdn.jsdelivr.net/reveal.js/3.0.0/
#+Reveal_init_options: width:1600, height:900, controlsLayout:'edges',
#+Reveal_init_options: margin: 0.1, minScale:0.125, maxScale:5
#+Reveal_extra_css: local.css
#+html:
* LaTeX Header :ignore:
#+LaTeX_header: \usepackage{amsthm}
#+LaTeX_header: \theoremstyle{definition}
#+LaTeX_header: \newtheorem{definition}{Definition}[section]
#+LaTeX_header: \newenvironment{NOTES}{
#+LaTeX_header: \begin{center}
#+LaTeX_header: \begin{tabular}
#+LaTeX_header: {|p{0.9\textwidth}|}
#+LaTeX_header: \hline\\
#+LaTeX_header: \begin{scriptsize}
#+LaTeX_header: }{
#+LaTeX_header: \end{scriptsize}
#+LaTeX_header: \\\\\hline
#+LaTeX_header: \end{tabular}
#+LaTeX_header: \end{center}
#+LaTeX_header: }
* Preamble
** Notable references
- Concepts of Programming Languages, Sebesta, 10th ed.
- Section 1.3 - Language Evaluation Criteria
- Section 5.4 - The Concept of Binding
- Section 5.5 - Scope
- Concepts, Techniques, and Models of Computer Programming
- Preface
- Section 2.1.1 - Language syntax
** Update history
- Sept. 12 ::
- The description of the lifetime of stack dynamic variables
has been changed.
- From: “lifetime is the amount of runtime spent
in that scope or sub-scopes”.
- To: “lifetime is the portion of runtime between
when the unit of code containing the variable is entered
and when control returns to the invoker of the unit of code.”
** Table of contents
# The table of contents are added using org-reveal-manual-toc,
# and so must be updated upon changes or added last.
#+HTML:
#+begin_scriptsize
- [[Welcome!][Welcome!]]
- [[Instructor: Mark Armstrong][Instructor: Mark Armstrong]]
- [[Teaching assistant: Musa Alhassy][Teaching assistant: Musa Alhassy]]
- [[Teaching assistant: Maryam Kord][Teaching assistant: Maryam Kord]]
- [[About the notes and slides for this course][About the notes and slides for this course]]
- [[Emacs][Emacs]]
- [[Org mode][Org mode]]
- [[Literate programming][Literate programming]]
- [[Purpose and goals of this course][Purpose and goals of this course]]
- [[Informal objectives][Informal objectives]]
- [[Course preconditions][Course preconditions]]
- [[Course postconditions][Course postconditions]]
- [[Formal rubric for the course][Formal rubric for the course]]
- [[Principles of programming languages][Principles of programming languages]]
- [[A brief definition of “programming language”][A brief definition of “programming language”]]
- [[The two components of every programming language][The two components of every programming language]]
- [[Implementation strategy: another component?][Implementation strategy: another component?]]
- [[Paradigms: classifying programming languages][Paradigms: classifying programming languages]]
- [[The commonly discussed paradigms][The commonly discussed paradigms]]
- [[A taxonomy of programming paradigms][A taxonomy of programming paradigms]]
- [[One language, many paradigms][One language, many paradigms]]
- [[Exercise: On paradigms][Exercise: On paradigms]]
- [[Subjective criteria for comparing and evaluating languages][Subjective criteria for comparing and evaluating languages]]
- [[Readability][Readability]]
- [[Writability][Writability]]
- [[Reliability][Reliability]]
- [[Cost][Cost]]
- [[Key concepts and terminology][Key concepts and terminology]]
- [[Compiled, interpreted, and hybrid approaches][Compiled, interpreted, and hybrid approaches]]
- [[Compilation][Compilation]]
- [[Interpreters][Interpreters]]
- [[Hybrid methods][Hybrid methods]]
- [[Exercise: Researching implementations][Exercise: Researching implementations]]
- [[Abstraction][Abstraction]]
- [[Levels of abstraction][Levels of abstraction]]
- [[Static vs. dynamic][Static vs. dynamic]]
- [[Scope][Scope]]
- [[Exercise: On dynamic scoping][Exercise: On dynamic scoping]]
- [[Exercise: Entities which introduce scope][Exercise: Entities which introduce scope]]
- [[Variables and memory][Variables and memory]]
- [[The data segment, stack, and heap][The data segment, stack, and heap]]
- [[Kinds of memory allocation][Kinds of memory allocation]]
- [[Exercise: Memory allocation in C++][Exercise: Memory allocation in C++]]
- [[Lifetime][Lifetime]]
- [[Garbage collection][Garbage collection]]
- [[Exercises][Exercises]]
#+end_scriptsize
#+HTML:
* Welcome!
#+begin_center
#+attr_html: :style text-align:center
Welcome to the course!
#+end_center
** Instructor: Mark Armstrong
#+begin_quote
#+attr_org: :width 200px
#+attr_html: :width 200px
#+attr_latex: :width 200px
#+attr_html: :alt Mark Armstrong photo
[[./images/markarmstrong.jpg]]
#+end_quote
- Email: armstmp “at” mcmaster.ca
- Website: http://www.cas.mcmaster.ca/~armstmp/
- Office: ITB-229
# - Office hours by request
** Teaching assistant: Musa Alhassy
#+begin_quote
#+attr_org: :width 200px
#+attr_html: :width 200px
#+attr_latex: :width 200px
#+attr_html: :alt Musa Alhassy photo
[[./images/musa_coffee.jpg]]
#+end_quote
- Email: alhassm “at” mcmaster.ca
- Website: https://alhassy.github.io
- Office: ITB-229
# - Office hours TBD
** Teaching assistant: Maryam Kord
- Email: kordm “at” mcmaster.ca
# - Office: ...
# - Office hours TBD
* About the notes and slides for this course
- The notes/slides for this course are developed in Emacs using Org mode.
- Along with almost everything else for the course!
- Specifically,
- the slides are created by
exporting the Org files to HTML using
the [[https://github.com/yjwen/org-reveal][org-reveal]] package, and
- the document style ~pdf~'s are created by
exporting the Org files to LaTeX.
- The slides are laid out in two dimensions.
- When viewing them in your browser,
press ~?~ to see a help page describing the possible interactions.
- Most importantly, press ~esc~ or ~o~ to see an overview of the slide deck.
** Emacs
Emacs is
#+begin_quote
An extensible, customizable, free/libre text editor — and more.
At its core is an interpreter for Emacs Lisp,
a dialect of the Lisp programming language
with extensions to support text editing.
— GNU Emacs [[https://www.gnu.org/software/emacs/emacs.html][homepage]]
#+end_quote
- The learning curve is fairly steep compared to most similar tools, but:
- Distributions (configurations) such as [[http://spacemacs.org/][Spacemacs]]
and [[https://github.com/hlissner/doom-emacs][doom-emacs]]
optionally ease the learning curve.
- As the longest lived extensible text editor,
Emacs has an unrivaled ecosystem of extensions.
- Emacs is best configured /by writing code/,
a natural exercise of your talents.
- Especially if you have not settled on an editor for general use,
check out Emacs!
** Org mode
Even if you do not use Emacs, I recommend checking out
Org mode.
#+begin_quote
Org mode is for keeping notes, maintaining TODO lists,
planning projects, and authoring documents
with a fast and effective plain-text system.
— Org mode [[https://orgmode.org/][homepage]]
#+end_quote
While Org mode is used in plain-text documents, it supports
exporting to various formats, including ~pdf~'s (through LaTeX),
~html~, and ~odt~, all “out of the box”!
- Write once, export many.
Packages for Org mode are available for both VSCode and Atom.
** Literate programming
:Properties:
#+Link: litprog http://www.literateprogramming.com/
:End:
Org mode is especially useful in that it provides support
for /literate programming/.
#+begin_quote
Let us change our traditional attitude to the construction of programs:
Instead of imagining that our main task is to instruct a computer
what to do, let us concentrate rather on
explaining to human beings what we want a computer to do.
— Donald E. Knuth, “[[litprog:][Literate Programming]]”
#+end_quote
Regardless of the tools you choose to use,
in this course it will be expected that you follow this philosphy.
- The purpose of your programs here is to
/explain to the course staff what you want the computer to do/.
That is, documentation is not optional, and not to be an afterthought.
* Purpose and goals of this course
The undergraduate calendar description of this course is
#+begin_quote
Design space of programming languages;
abstraction and modularization concepts and mechanisms;
programming in non-procedural (functional and logic) paradigms;
introduction to programming language semantics.
#+end_quote
But we should be more specific.
** Informal objectives
- Expose you to several programming languages.
- A relatively shallow but comprehensive survey.
- Focussing on general-purpose languages.
- /Formally/ describe programming language syntax and semantics.
- An application of theory you have learned previously.
- Analyse the building blocks of languages.
- Programs can be /massive/ entities, but they are composed of
relatively few /small/ pieces.
- Provide you with criteria by which to judge languages.
- So you can identify what languages fit what tasks.
- Examine the origins of certain languages/groups of languages.
- Historical context provides insight into why languages
are designed the way they are.
** Course preconditions
Before beginning this course:
1. Students should know and understand:
1. Basic concepts about integers, sets, functions, & relations.
2. Induction and recursion.
3. First order logic, axiomatic theories & simple proof techniques.
4. Regular expressions & context-free grammars.
5. Programming in imperative language
6. Basic concepts of functional programming languages.
2. Students should be able to:
1. Produce proofs involving quantifiers and/or induction.
2. Understand the meaning of a given axiomatic theory.
3. Construct regular sets & context-free languages.
4. Produce small to medium scale programs in imperative languages.
5. Produce small scale programs in functional languages.
** Course postconditions
After completion of this course:
1. Students should know and understand:
1. The basics of several programming languages.
2. Formal definitions of syntax & semantics for various
simple programming languages.
3. Various abstraction & modularisation techniques
employed in programming languages.
2. Students should be able to:
1. Reason about the design space of programming languages,
in particular tradeoffs & design issues.
2. Produce formal descriptions of syntax & semantics
from informal descriptions, identifying ambiguities.
3. Select appropriate abstraction & modularisation techniques
for a given problem.
4. Produce (relatively simple) programs in various languages,
including languages from non-procedural paradigms.
** Formal rubric for the course
# This HTML is probably a bad hack... but it works as a hammer.
#+HTML:
#+begin_scriptsize
+--------------+------------+--------------+------------+------------+
|Topic | Below | Marginal | Meets | Exceeds |
+--------------+------------+--------------+------------+------------+
|Familiarity |Shows some |Shows |Achieves |Achieves |
|with various |competence |competence |competence |competence |
|programming |in |in |with the |with |
|languages |procedural |procedural |basic |intermediate|
|(PLs) |languages, |languages |usage of |usage of |
| |but not |and limited |various |various |
| |languages |competence |languages |languages |
| |from other |in | | |
| |paradigms |languages | | |
| | |from other | | |
| | |paradigms | | |
+--------------+------------+--------------+------------+------------+
|Ability to |Cannot |Identifies |Identifies |Identifies |
|identify and |consistently|such |such |sucj |
|make use of |identify |constructs, |constructs |constructs |
|abstraction, |such |but does not |and shows |and shows |
|modularisation|constructs |consistently |some ability|mastery of |
|constructs | |make use of |to make use |them when |
| | |them when |of them when|programming |
| | |programming |programming | |
+--------------+------------+--------------+------------+------------+
|Ability to |Unable or |Comprehends |Makes only |Consistently|
|comprehend and|rarely |given |minor |fully |
|produce formal|able to |grammars, |errors |understands |
|descriptions |comprehend |but |regarding |given |
|of PL syntax |given |produces |precedence |grammars and|
| |grammars; |grammars |or |produces |
| |does not |which are |ambiguity |correct |
| |identify |ambiguous |when |grammars. |
| |ambiguity |or which do |reading or | |
| |or |not |producing | |
| |precedence |correctly |grammars | |
| |rules |specify | | |
| | |precedence | | |
+--------------+------------+--------------+------------+------------+
|Ability to |Rarely or |Usually |Comprehends |Comprehends |
|comprehend and|never |comprehends |such |such |
|produce |comprehends |such semantic |semantic |semantic |
|operational |such |descriptions, |descriptions|descriptions|
|semantics for |semantic |but cannot |and produces|and produces|
|simple PLs |descriptions|consistently |them with |them without|
| | |produce them |only minor |errors |
| | | |errors | |
+--------------+------------+--------------+------------+------------+
|Ability to |Rarely or |Inconsistently|Consistently|Consistently|
|comprehend |never |comprehends |comprehends |comprehends |
|denotational |comprehends |such semantic |such |and can |
|and axiomatic |such |descriptions |semantic |produce some|
|semantics for |semantic | |descriptions|simple |
|simple PLs |descriptions| | |semantic |
| | | | |descriptions|
+--------------+------------+--------------+------------+------------+
#+end_scriptsize
#+HTML:
* Principles of programming languages
Before we begin in earnest, let us set the stage by answering
some basic questions.
- What are “programming languages”?
- What are their components?
- How can we compare, group or classify them?
** A brief definition of “programming language”
:Properties:
#+Link: C11-standard http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
:End:
- A finite, formal language for describing (potentially) infinite processes.
- Usually the language is actually quite small!
- A CFG for /all of C/ has only 91 nonterminals
(as constructed in the C11
[[C11-standard:][international standard]],
appendix 2).
- We will construct simple languages with far less than
100 /productions/.
- /Formality/ is required because we want our programs to be run on machines.
- Machines fair poorly with informal language.
- Considering just the set of commands executable by the processor,
modern computers are fairly simple machines.
** The two components of every programming language
Languages can be distinguished by two components:
- *Syntax*: the rules for the form of legal programs.
- *Semantics*: the rules for the meaning of legal programs.
#+BEGIN_NOTES
/Colourless green ideas sleep furiously/.
#+END_NOTES
Change either, and you have a different language.
*** Implementation strategy: another component?
Sometimes /implementation strategy/ is also considered
a component of a programming language.
- i.e., whether it is compiled or interpreted.
We do not take this view.
- Implementation strategy is external to the language.
- A language may have many compilers, interpreters,
runtime environments, etc.
** Paradigms: classifying programming languages
Classifications of programming languages based on
what kind of abstractions they (natively) support.
- One language may support many paradigms.
- It is often possible to write programs following a paradigm
the language does not natively support.
- For instance, following object oriented principles
when writing C code.
- Native support is important!
- It frees the programmer from having to
“hold” the abstraction in their own mind.
- We will see time and time again that well designed languages
save the programmer work and mental effort!
- Supported abstractions should be /orthogonal/; separate from each other,
but able to be combined in sensible, predictable ways.
- Ideally there is a hierarchy of abstractions.
*** The commonly discussed paradigms
It is common to consider only a small number of paradigms,
usually in this hierachy.
+---------------------------------------+---------------------+
| Imperative/Procedural | Declarative |
+-----------------+---------------------+-------------+-------+
| Object-oriented | Non-Object Oriented | Functional | Logic |
+-----------------+---------------------+-------------+-------+
But this is far from a complete picture!
*** A taxonomy of programming paradigms
#+begin_quote
#+attr_org: :width 500px
#+attr_html: :width 800px
#+attr_latex: :width 400px
[[./images/vanroy-paradigm-chart.png]]
— Van Roy, P.
“Programming Paradigms for Dummies: What Every Programmer Should Know”
#+end_quote
*** One language, many paradigms
Different paradigms lend themselves to different tasks,
so it is beneficial for a language to support more than one.
But then, “More is not better (or worse), just different”!
#+begin_quote
All computation models have their place.
It is not true that models with more concepts are better or worse.
This is because a new concept is like a two-edged sword.
Adding a concept to a computation model
introduces new forms of expression,
making some programs simpler,
but it also makes reasoning about programs harder.
#+end_quote
#+BEGIN_NOTES
For example, adding mutable variables to functional languages
enables object-oriented programming techniques.
But reasoning in the presence of mutable variables is
notoriously difficult!
#+END_NOTES
*** Exercise: On paradigms
Pick a language you are familiar with,
and research what paradigms it supports.
(Don't expect to find a definite yes/no answer for every
paradigm from Van Roy's taxonomy).
Then try to determine paradigms which are not natively supported,
but are still feasible to follow when coding.
** Subjective criteria for comparing and evaluating languages
We can use a mixture of /objective/ and /subjective/
criteria to compare and evaluate languages we discuss.
We use four main subjective criteria. Note there is overlap!
- Readability
- Writability
- Reliability
- Cost
*** Readability
- How easily can code be interpreted by humans?
- How much “noise” is there?
- How easily can errors be detected by visual inspection?
- How many constructs must the reader be familiar with?
- How many ways can constructs be combined?
- Do the combinations behave in predicatable ways?
The last two points deal with /orthogonality/.
#+begin_quote
Orthogonality in a programming language means
that a relatively small set of primitive constructs can be combined
in a relatively small number of ways to build the control
and data structures of the language.
— Concepts of Programming Languages 10th ed.
#+end_quote
*** Writability
- How easily can humans create code?
- How many abstractions are supported?
- How easily can abstract ideas be captured in the language?
- Of course this depends upon how well the idea fits the paradigm!
- How much /syntactic sugar/ is available?
- Syntactic sugar is syntax which does not add constructs/features,
only (presumably simpler) ways to
express existing constructs/features.
#+BEGIN_NOTES
For instance, the actor model of concurrency
naturally fits in an object oriented language,
but may not fit as well in a functional language.
But the functional reactive model of concurrency
will fit better in the functional paradigm.
#+END_NOTES
*** Reliability
- To what extent is /syntactically valid/ code guaranteed to /be/ valid?
- What types of errors are caught before runtime?
- What tools are provided to the programmer for
recovering from errors that do occur?
- Are “unsafe” features included in the language?
- What tools are provided/available to the programmer for
/specifying/ and /verifying/ code?
- Formal specification involves using mathematical language
to describe the meaning of programs.
- /Axiomatic semantics/.
- Sometimes, automated tools can check code meets formal
specifications.
- See CS3EA3: Software Specifications and Correctness.
*** Cost
- How much work is involved in
- Training programmers in a language?
- Writing new code?
- Writability!
- Maintaining a code base?
- Readability and reliability!
- Are good programming tools/environments available?
Do they have a cost?
- How efficient are compilers for the language
(if code is to be compiled)?
- How efficient is compiled code (or the interpreter,
if the code is to be interpreted).
- What optimisation options are availble for compiled code?
#+BEGIN_NOTES
There is a tradeoff between efficiency of the compiler
and the amount it optimises code.
- For ease of development, you want quick compilation,
and can sacrifice efficiency of the compiled code.
- For a good end product, you want efficient compiled code,
even if compilation takes longer.
#+END_NOTES
* Key concepts and terminology
Following are some key concepts and terms which
- we will either see repeatedly throughout the course, or
- which we should clarify before we move on.
(Some of this information simply won't have a better place to go).
** Compiled, interpreted, and hybrid approaches
Languages exist in a hierarchy.
- We are focused on the “upper portion” of the hierarchy.
- High-level languages!
- We may use the term low-level for some languages,
C in particular, but this is relative.
- C is much higher level than assembly!
- We need to translate into machine code to run our programs.
- Compilation, interpretation and “hybrid approaches”
are strategies for doing this.
*** Compilation
Translate the whole program
(and any libraries or other code resources needed)
ahead of running it.
- High upfront cost (time), for increased efficiency at runtime
- Not portable; machine code is machine dependent.
*** Interpreters
Translate the program /as we are running it/.
- No upfront cost, but less efficient.
- Portable; can be run on any machine with an interpreter.
- Alleviates some of the programmer's responsibility.
- One user (or group) writes the interpreter /once/
(per machine type);
it can be used by any number of users for any number programs.
- Efficiency is improved by using *just-in-time compilation*.
- Store the result of interpretation so it can be used again.
- Can achieve better error reporting.
- Relationship between original and translated codes is known at runtime.
- This relationship is discarded when compiling code.
*** Hybrid methods
Compile into an intermediate language,
which is then translated into assembly.
- This intermediate language is usually similar to assembly.
- But targets a virtual machine, not actual hardware!
- Usually called /bytecode/.
- Greatly offsets efficiency cost of interpretation.
- More portable than compiled code; just need
a bytecode interpreter for each target machine.
*** Exercise: Researching implementations
Research for yourself what implementations are available
for languages you are familiar with.
Consider:
- Can you find different approaches (compiler, interpreter, hybrid)
for the language?
- Is one approach “the norm” or “preferred” for the language?
- What is the availability for each implementation?
- Who maintains them?
- Are they open source?
** Abstraction
The concept of an /abstraction/ is central to programming languages.
#+begin_quote
We define an abstraction loosely as a tool or device
that solves a particular problem.
— Concepts, Techniques, and Models of Computer Programming
#+end_quote
#+begin_quote
Briefly, abstraction means the ability to define and then use
complicated structures or operations in ways
that allow many of the details to be ignored.
— Concepts of Programming Languages 10th ed.
#+end_quote
*** Levels of abstraction
+----------------------------+
| Highly abstract |
| (User knows fewer details) |
+----------------------------+
| A writing instrument |
| ↓ |
| A pencil |
| ↓ |
| A mechanical pencil |
| ↓ |
| This pencil |
+----------------------------+
| Highly concrete |
| (User knows all details) |
+----------------------------+
Language ~A~ is “higher-level” than language ~B~ if it allows you
to work at a higher level of abstraction than language ~B~.
#+BEGIN_NOTES
For instance, C++ and C# are higher-level than C,
because as a modularisation construct,
classes hide more implementation details
than the C alternative of defining structs and procedures using them.
(Pure) functional languages are higher-level than imperative
languages, because the user need not be aware of the state of memory.
Logic lanauges are even higher-level than functional languages,
because the programmer does not concern themselves with
the algorithm, only the problem description.
#+END_NOTES
** Static vs. dynamic
We will see these terms in particular applied to many concepts.
- Static: set before runtime; fixed throughout the runtime.
- Sometimes the phrase “compile time” is used.
- But “static” may also mean “load time”.
- I.e., the time at which the program is loaded into memory.
- And if we are using an interpreter, there is no “compile time”!
- Dynamic: set during runtime; (potentially) changes throughout runtime.
** Scope
The /scope of an entity/ is the portion of the program
in which it is “visible”.
- Usually scope is statically determined.
- So the scope of entities can be determined by reading the code.
- With dynamic scoping, the scope of entities depends upon the
“execution trace”, i.e., the “path” through the program.
- Dynamic scoping is definitely the exception.
- The detriments strongly outweigh the benefits.
A /scope/ is a portion of the program to which the visibility
of entities may be limited.
- Design decision: what constructs introduce scopes?
- Almost certainly subroutines do.
- Do conditional branches? Loop bodies?
*** Exercise: On dynamic scoping
- Historically, Lisps (languages directly descended from Lisp)
were dynamically scoped.
- Presently, the only widely used Lisp which uses dynamic scoping
by default is Emacs Lisp (eLisp).
Try out this snippet of Emacs lisp code,
either in Emacs or using an
[[https://repl.it/languages/elisp][online REPL]].
#+begin_src emacs-lisp
(let ((x 2) (y 5)) ; "global" variables x and y
(progn
(defun mult-x-y ()
(* x y)) ; returns x * y
(defun A ()
(let ((x 3)) ; local variable x
(mult-x-y)))
(defun B ()
(let ((y 10)) ; local variable y
(A)))
(message "mult-x-y returns %d, A returns %d and B returns %d"
(mult-x-y) (A) (B))))
#+end_src
1) Ensure you understand the results.
2) Using this understanding, formulate some advantages and disadvantages
of dynamic scoping.
*** Exercise: Entities which introduce scope
In a few different programming languages, investigate whether
condition branches and loop bodies introduce scopes.
I.e., test out code such as
#+begin_src text
if B then
int x = 0
endif
y = x % Is x still in scope?
#+end_src
Specifically, investigate languages which have /iterating loops/,
(usually called ~for~ loops). What is the scope of an iterator
for such loops?
** Variables and memory
- A /variable/ is an abstraction of a memory cell.
- Come in many forms.
- Typically we use /variable/ to mean a named, mutable cell of memory.
- When possible, we use more specific words.
- Argument, parameter, constant, reference, etc.
- Or qualify the word variable.
*** The data segment, stack, and heap
In the context of running a program (or a thread of a program):
- The /data segment/ is a portion of memory used to store the
program itself and static/global variables.
- The /stack/ is an organised portion of memory allocated for the program.
- Blocks at the “top” of the stack are reserved when entering
units of code.
- Units of code include subroutines and/or unnamed blocks,
depending upon the language.
- They are /popped off/ the stack when leaving the unit of code.
- The /heap/ is an unorganised portion of memory allocated for the program.
- Allocation/deallocation may be independent of units of code.
- Disorganisation can lead to inefficiency and other problems.
- Cache misses, page faults, etc.
- Running out of memory or appropriately sized continguous blocks.
- Garbage, dangling references, etc.
*** Kinds of memory allocation
We can distinguish four kinds of memory allocation.
- Static
- Allocation is done /before runtime/ (static), in the data segment.
- At load time, when the program is loaded into memory.
(Not compile time).
- Stack dynamic
- Allocation is dynamic, automatic, and on the stack.
- Amount of memory needed must be known when entering their scope.
- This is “the usual method” in most imperative languages.
#+REVEAL: split:t
- Implicit heap dynamic
- Allocation is dynamic, automatic, and on the heap.
- Memory is simply allocated when the variable is assigned;
the programmer does not need to “do anything”.
- Explicit heap dynamic
- Allocation is dynamic, /manual/, and on the heap.
- The programmer must use a command to allocate memory
/before/ assignments can be made.
- E.g., ~malloc~ or ~new~.
- In fact these blocks of memory can be considered
/nameless variables/; we need other,
/reference/ variables to /point/ to their address.
*** Exercise: Memory allocation in C++
Investigate:
- Which of the five kinds of memory allocation are available in C++.
- How each kind of memory allocation is denoted.
*** Lifetime
The /lifetime/ of a variable is the portion of the runtime
(not the code) during which it has memory allocated for it.
- There may be several /instances/ of a single variable in the code,
each with its own lifetime.
- The lifetime of a variable depends upon the memory allocation
strategy used for it.
- Static: lifetime is the whole of runtime.
- Stack dynamic: lifetime is the portion of runtime between
when the unit of code containing the variable is entered
and when control returns to the invoker of the unit of code.
- In the case of functions/procedures, local variables
end their lifetime when the function/procedure returns.
- Note that a variable go out of scope, but still be alive.
- Implicit heap-dynamic: lifetime begins when assigned.
The end of lifetime depends upon garbage collection.
- Explicit heap-dynamic: lifetime begins when the allocation
command is used. The end of lifetime may be
- when the programmer issues a deallocation command, or
- depend upon garbage collection.
*** Garbage collection
#+begin_definition
/Garbage/: allocated memory which is no longer accessible.
#+end_definition
#+begin_definition
/Garbage collection/: deallocating garbage.
#+end_definition
- A complex activity.
- We will discuss the basics here, and then likely not
touch on it again.
- Has a large influence on the efficiency of
compiled/interpreted code.
- Two main categories of garbage collection algorithms.
- Tracing
- Which includes the “mark & sweep” algorithm.
- Reference counting
- Additionally, language implementations may include compile time
“escape analysis”.
- Memory may be allocated off the stack instead of the heap
/if/ no references to the variable are available outside its scope
(if it doesn't “escape”).
#+REVEAL: split:t
- Tracing
- Usually triggered when a certain memory threshold is reached.
- Starting from some set of base references
(usually those in the stack and data segment),
“trace” references, somehow marking those that are reachable.
- Starting from the set of base references,
“determine the transitive closure of reachability”.
- Once done, free memory that is not marked.
- “Naive” mark & sweep:
- Each object in memory has a bitflag for this “marking”,
kept clear except during garbage collection.
- When activated, traverse all reachable memory,
“marking” what's reached, then “sweep” away all unmarked memory
(and reset the flags of marked memory).
- Have to pause the program to do this.
- Better methods do not require pausing.
#+REVEAL: split:t
- Reference counting
- An “eager” approach.
- As the program runs, keep a count of the number of references
to every object in memory.
- Must be constantly monitoring!
- When the number of references reaches zero, can free the memory
for that object.
- Have to account for cycles!
- Which increases the overhead.
* Exercises
- [[Exercise: On paradigms][On paradigms]]
- [[Exercise: Researching implementations][Researching implementations]]
- [[Exercise: On dynamic scoping][On dynamic scoping]]
- [[Exercise: Entities which introduce scope][Entities which introduce scope]]
- [[Exercise: Memory allocation in C++][Memory allocation in C++]]