#+Title: 4 – Introduction to Types
#+Author: Mark Armstrong, PhD Candidate, McMaster University
#+Date: Fall, 2019
#+Description: The most fundamental abstraction in programming: types
#+Options: toc:nil
#+Reveal_root: https://cdn.jsdelivr.net/reveal.js/3.0.0/
#+Reveal_init_options: width:1600, height:900, controlsLayout:'edges',
#+Reveal_init_options: margin: 0.1, minScale:0.125, maxScale:5
#+Reveal_extra_css: local.css
#+html:
* LaTeX Header :ignore:
#+LaTeX_header: \usepackage{amsthm}
#+LaTeX_header: \theoremstyle{definition}
#+LaTeX_header: \newtheorem{definition}{Definition}[section]
#+LaTeX_header: \newenvironment{NOTES}{
#+LaTeX_header: \begin{center}
#+LaTeX_header: \begin{tabular}
#+LaTeX_header: {|p{0.9\textwidth}|}
#+LaTeX_header: \hline\\
#+LaTeX_header: \begin{scriptsize}
#+LaTeX_header: }{
#+LaTeX_header: \end{scriptsize}
#+LaTeX_header: \\\\\hline
#+LaTeX_header: \end{tabular}
#+LaTeX_header: \end{center}
#+LaTeX_header: }
#+LaTeX_header: \usepackage{unicode-math}
#+LaTeX_header: \usepackage{unicode}
* Preamble
** Notable references
- Concepts of Programming Languages, Sebesta, 10th ed.
- Chapter 6 – Data Types
** Update history
- Nov. 12th ::
- An error was corrected regarding product types; they are
/heterogeneous/ collections, not /homogeneous/.
- Slight cleanup; added references and table of contents.
- Oct. 3rd ::
- Original version posted
** Table of contents
# The table of contents are added using org-reveal-manual-toc,
# and so must be updated upon changes or added last.
#+HTML:
#+begin_scriptsize
- [[Preamble][Preamble]]
- [[Notable references][Notable references]]
- [[Update history][Update history]]
- [[Table of contents][Table of contents]]
- [[What is a type?][What is a type?]]
- [[But what are types?][But what are types?]]
- [[The common methods to construct types][The common methods to construct types]]
- [[Exercise: tuples vs lists][Exercise: tuples vs lists]]
- [[Classifications of languages based on types][Classifications of languages based on types]]
- [[“Strong” and “weak” typing][“Strong” and “weak” typing]]
- [[Exercise][Exercise]]
- [[Explicit and implicit typing][Explicit and implicit typing]]
- [[Static and dynamic typing][Static and dynamic typing]]
- [[Subtyping and polymorphism][Subtyping and polymorphism]]
- [[Parametric polymorphism][Parametric polymorphism]]
- [[Duck typing][Duck typing]]
- [[Primitive types][Primitive types]]
- [[Uncommon basic types][Uncommon basic types]]
- [[Ordinal types][Ordinal types]]
- [[Sequence types][Sequence types]]
- [[Homogeneous or heterogeneous][Homogeneous or heterogeneous]]
- [[Array types][Array types]]
- [[List types][List types]]
- [[Associative array (hash, map, table) types][Associative array (hash, map, table) types]]
- [[Exercise: Objects as associative arrays][Exercise: Objects as associative arrays]]
- [[String types][String types]]
- [[Product types][Product types]]
- [[Exercise: how many elements in a product?][Exercise: how many elements in a product?]]
- [[Exercise: classes as a product type][Exercise: classes as a product type]]
- [[Union (sum) types][Union (sum) types]]
- [[Pointer and reference types][Pointer and reference types]]
- [[Where do we go from here?][Where do we go from here?]]
#+end_scriptsize
#+HTML:
* What is a type?
** Section introduction :ignore:
Types are the /most/ important abstraction in programming.
- Without types, the programmer deals entirely in bitstrings.
- I.e., the programmer must keep a mental map of representation of data.
#+begin_quote
Types are the central organising principle of
the theory of programming languages.
Language features are manifestations of type structure.
The syntax of a language is governed
by the constructs that define its types,
and its semantics is determined by the interactions among those constructs.
The soundess of a language design — the absence of ill-defined programs —
follows naturally.
— Robert Harper, /Practical Foundations for Programming Languages/
#+end_quote
** But what are types?
A /type/ is simply a collection of values.
Recall that in set theory, a set is a collection of values.
- In fact, we will see when we begin using Agda
that it uses the word ~Set~ instead of ~Type~.
You have seen previously many methods to construct sets:
- Listing their members (for finite sets).
- Giving the means of constructing elements,
for instance by an inductive definition.
- Combining other sets.
- Unions, /A ∪ B./
- /Disjoint/ or /tagged/ unions, /A ⊎ B/.
- Intersections, /A ∩ B/.
- Products, /A × B/ and /Aⁿ/.
- (Finite) sequences, /A*/, and infinite sequences /A^{ω}/.
- Functions, /A → B/.
- Others:
- Differences, /A/B/.
- Power sets, /℘ A/.
** The common methods to construct types
That list of methods for constructing sets contains a lot,
not all of which are necessarily practical in the context
of types in programming languages.
We will see that most languages include some
/primitive/, (also called /atomic/ or /basic/) types,
along with these ways to combine types
to form new types:
- Products
- Sequences
- Unions
- Functions
** Exercise: tuples vs lists
What is the difference between
- /Aⁿ/, the set of /n/-ary products of /A/ (i.e., /A × A × … × A/), and
- /A*/, the set of sequences of /A/?
Why would we want both types in our language?
* Classifications of languages based on types
Before we discuss types themselves further, lets discuss various
classifications of languages based on types.
** “Strong” and “weak” typing
These are comparative terms.
- We'll consider them a subjective criteria.
“Strongly typed”
- Languages are frequently called strongly typed.
- But less frequently do they state what they mean by that.
- The term is used inconsistently.
- “C is a strongly typed, weakly checked language”
– Dennis Ritchie, creator of C
- We will take it to mean “type clashes are restricted”.
- Not a good objective criteria, by that definition.
- What does restricted mean?
- Is it a warning or an error?
- Does type casting violate this?
- What qualifies as a type clash?
- Is implicit type casting allowed?
“Weakly typed” simply means not strongly typed.
*** Exercise
Try for yourself: pick a language “X”, and search for
#+begin_center
“is X strongly typed?”
#+end_center
Try to
- find arguments for and against it, and
- evaluate the quality of the arguments.
- Of course your ability to do this depends on your familiarity
with the language.
** Explicit and implicit typing
Languages may require annotations on variables and functions
(/explicit typing/) or allow them to be omitted (/implicit typing/).
- Implicit typing does not weaken the typing system in any way!
- A very common misconception.
- In general, type inference is an undecidable problem
(its not guaranteed that the compiler/interpreter can
determine the type).
- Most languages have relatively simple type systems,
and this is not a problem.
Some languages make type annotations a part of the name,
or annotate names with sigils to indicate type details.
- In older versions of Fortran, names beginning with
~i~, ~j~ or ~k~ were for integer variables,
and all variables were of floating point.
- In Perl, names beginning with the sigil
- ~$~ have scalar type,
- ~@~ have array type,
- ~%~ have hash type, and
- ~&~ have subroutine type.
** Static and dynamic typing
Are types checked before or at runtime?
- It's somewhat natural for interpreted languages to be dynamically typed.
- Consider the interpreted “scripting” languages;
Python, Ruby, Javascript, Lua, Perl, PHP, etc.
- But it's far from universally true!
- Haskell has an interpreter, and is definitely
statically typed.
“Dynamically typed” is a misnomer.
- It would be better to say “dynamically type checked”.
* Subtyping and polymorphism
Being strict about types (preventing type clashes)
introduces a (solveable) problem; it prohibits code reuse!
- Subroutines can only be used on a particular type of arguments.
Subtyping and polymorphism provide solutions to this.
- With *subtyping*, there is a sub/super relation between types.
- Consider the notion of /subsets/.
- Subtypes can be used in place of their supertypes.
- Sub-/classing/ is one instance of subtyping.
- As are subrange or, sometimes, enumeration types.
- With *polymorphism*, a subroutine can have several types.
One notion of polymorphism is *ad hoc* polymorphism,
also called *overloading*.
- Not actually a feature of a type system.
- Subroutine names can be reused as long as the types of arguments
differ (in some specified way).
- So, the programmer must define the “same” subroutine many times.
** Parametric polymorphism
With parametric polymorphism, subroutines have a
/most general/ type, based on the /shape/ or /form/ of their arguments.
- The subroutines behaviour can only be based on the form,
not the specific types.
- Commonly used in functional languages.
** Duck typing
More formally called /row polymorphism/.
- Types are not actually checked.
- We only check that an entity has the correct method defined for it.
- “If it walks like a duck, and quacks like a duck, it's a duck!”
* Primitive types
Most languages include some subset of these
/primitive/, (/atomic/, /basic/) types.
- *Integers*; ~int~
- Including possibly signed, unsigned, short, and/or long variants.
- *Floating point* numbers
- Including possibly single precision and double precision variants.
- *Characters*
- Sometimes an alternate name for the byte type (8-bit integers).
- *Booleans*
- *Unit* (the /singleton/ type)
- Sometimes called void, nil-type, null-type or none-type.
- In C like languages, you cannot store something of type ~void~.
- Commonly represented as the type of 0-ary tuples,
whose only element is ~()~.
- *Empty*
- Unlike nil, null or none type, which have a single value
(called ~nil~, ~null~ or ~none~), there is *nothing* in the empty type.
** Uncommon basic types
A few languages include these basic types.
- *Complex* numbers
- Especially for scientific computation.
- *Decimal* (representation of) numbers
- Especially for business (monetary) applications.
- There are decimal numbers that cannot be properly represented
using binary (e.g. 0.3 = 0.010011, repeating)
** Ordinal types
Many languages include a means of defining other /finite/ types.
Instances include
- Enumeration types
- Subset/Subrange types
* Sequence types
Recall from set theory that
- sets are collections of elements where
order and multiplicity “don't matter”,
- bags are collections of elements where
order “doesn't matter” (but multiplicity does), and
- sequences are collections of elements where
order and multiplicity both matter.
- (This is part of the /boom/ hierarchy).
There are multiple ways sequence types are represented
in programming languages.
** Homogeneous or heterogeneous
One design decision for any sequence
(or more generally, any collection) type
is whether it is /homogeneous/ or /heterogeneous/.
- Can it store elements of differing types?
- “Heterogeneous”
- Or only elements of the same type?
- “Homogeneous”
** Array types
Arrays are an abstraction of finite sequences of
adjacent memory cells.
- Programmers are guaranteed certain properties.
- ~O(1)~ access time for any element.
- May be computationally costly or impossible to modify length.
- There are different classifications of arrays.
- “Array lists” alleviate this problem.
- We'll discuss this more later in the course.
- The programmer may have to handle memory allocation.
** List types
Lists are simply an abstract notion of sequences.
- May be implemented by arrays or by structures such as linked lists.
- Implementation details and properties may not be guaranteed.
- Often not ~O(1)~ access time.
Lazily (non-strictly) constructed lists may even be “infinite”.
For instance, the infinite list of 1's in Haskell:
#+begin_src haskell
ones = 1 :: ones
#+end_src
** Associative array (hash, map, table) types
Associative arrays, also called /hashes/, /maps/ or sometimes /tables/,
are sets of key/value pairs.
- Abstracts away the ordering of the sequence somewhat.
- Though we could order the keys, and so impose
an order on the collection.
- Programmer can imagine they are lists of key/value pairs.
A quick side note about sets and bags
- It is notoriously difficult to represent
such unordered collections on computers,
- Computers are extremely ordered machines.
- When available, “set types” are usually
implemented using trees or hash tables.
*** Exercise: Objects as associative arrays
Consider:
- How can we represent objects as a homogeneous array?
(Not classes, objects).
** String types
Strings are simply sequences of characters.
- Often they are built in,
- but they could be excluded because programmer
can implement them using characters and other sequence types.
- Probably they should be built in, or at least part
of standard libraries, since they are so commonly used.
* Product types
A /heterogeneous/ collection of a /fixed/ number of elements.
- Implemented by, for instance:
- ~struct~'s
- Records
- Tuples
- Can be implemented as records with unnamed fields
- Classes
- In lower level languages, programmers may be concerned
with the alignment/packing of the data.
** Exercise: how many elements in a product?
Consider two types /A/ and /B/.
- How many elements are in the product type
/A × B/ (a product with one element of each).
** Exercise: classes as a product type
Consider:
- How can we represent classes as a record or tuple?
(Not objects, classes).
* Union (sum) types
Whereas an element of a product type contains
- a collection of elements of some types,
a /union/ type contains
- one element of a selection of types.
Unions can be /tagged/ or /untagged/.
- With an untagged union, we have no idea /which/ of the possible
types it is storing at any time.
- So it's type is dynamic! (Amongst the types it can store).
- This is unsafe; it allows for type clashes!
- Whereas a tagged union keeps a /tag/ identifying which type
of element it is currently storing.
Tagged unions are also known as
- /sum/, /option/ and /either/ types.
Note that union types are unnecessary in a dynamically typed language.
* Pointer and reference types
Pointer and reference types capture the notion of a
memory address.
- Not just alternate namings! They have very different properties.
- Addresses are an extremely low level construct.
- Pointers are fairly low level themselves.
- References are more abstract.
- Present numerous challenges.
- Aliasing
- Especially through pointer arithmetic; given a pointer
as an “address”, we can access “nearby” memory!
- /Dangling/ or /wild/ pointers/references
- Garbage
* Where do we go from here?
- We will continue to discuss types; specifically,
- the concept of an /abstract/ datatype,
- /inductive/ (or algebraic, or recursive) datatypes,
- along with some introductory /type theory/,
- more details about type implementations,
- especially arrays.