#+Title: 4 – Introduction to Types
#+Author: Mark Armstrong, PhD Candidate, McMaster University
#+Date: Fall, 2019
#+Description: The most fundamental abstraction in programming: types
#+Options: toc:nil
#+Reveal_root: https://cdn.jsdelivr.net/reveal.js/3.0.0/
#+Reveal_init_options: width:1600, height:900, controlsLayout:'edges',
#+Reveal_init_options: margin: 0.1, minScale:0.125, maxScale:5
#+Reveal_extra_css: local.css
#+html: <script src="https://cdnjs.cloudflare.com/ajax/libs/headjs/0.96/head.min.js"></script>

* LaTeX Header                                  :ignore:

#+LaTeX_header: \usepackage{amsthm}
#+LaTeX_header: \theoremstyle{definition}
#+LaTeX_header: \newtheorem{definition}{Definition}[section]

#+LaTeX_header: \newenvironment{NOTES}{
#+LaTeX_header:   \begin{center}
#+LaTeX_header:   \begin{tabular}
#+LaTeX_header:   {|p{0.9\textwidth}|}
#+LaTeX_header:   \hline\\
#+LaTeX_header:   \begin{scriptsize}
#+LaTeX_header: }{
#+LaTeX_header:   \end{scriptsize}
#+LaTeX_header:   \\\\\hline
#+LaTeX_header:   \end{tabular}
#+LaTeX_header:   \end{center}
#+LaTeX_header: }

#+LaTeX_header: \usepackage{unicode-math}
#+LaTeX_header: \usepackage{unicode}

* Preamble

** Notable references

- Concepts of Programming Languages, Sebesta, 10th ed.
  - Chapter 6 – Data Types

** Update history

- Nov. 12th ::
  - An error was corrected regarding product types; they are
    /heterogeneous/ collections, not /homogeneous/.
  - Slight cleanup; added references and table of contents.
- Oct. 3rd ::
  - Original version posted

** Table of contents

# The table of contents are added using org-reveal-manual-toc,
# and so must be updated upon changes or added last.

#+HTML: <font size="-1">
#+begin_scriptsize
- [[Preamble][Preamble]]
  - [[Notable references][Notable references]]
  - [[Update history][Update history]]
  - [[Table of contents][Table of contents]]
- [[What is a type?][What is a type?]]
  - [[But what are types?][But what are types?]]
  - [[The common methods to construct types][The common methods to construct types]]
  - [[Exercise: tuples vs lists][Exercise: tuples vs lists]]
- [[Classifications of languages based on types][Classifications of languages based on types]]
  - [[“Strong” and “weak” typing][“Strong” and “weak” typing]]
    - [[Exercise][Exercise]]
  - [[Explicit and implicit typing][Explicit and implicit typing]]
  - [[Static and dynamic typing][Static and dynamic typing]]
- [[Subtyping and polymorphism][Subtyping and polymorphism]]
  - [[Parametric polymorphism][Parametric polymorphism]]
  - [[Duck typing][Duck typing]]
- [[Primitive types][Primitive types]]
  - [[Uncommon basic types][Uncommon basic types]]
  - [[Ordinal types][Ordinal types]]
- [[Sequence types][Sequence types]]
  - [[Homogeneous or heterogeneous][Homogeneous or heterogeneous]]
  - [[Array types][Array types]]
  - [[List types][List types]]
  - [[Associative array (hash, map, table) types][Associative array (hash, map, table) types]]
    - [[Exercise: Objects as associative arrays][Exercise: Objects as associative arrays]]
  - [[String types][String types]]
- [[Product types][Product types]]
  - [[Exercise: how many elements in a product?][Exercise: how many elements in a product?]]
  - [[Exercise: classes as a product type][Exercise: classes as a product type]]
- [[Union (sum) types][Union (sum) types]]
- [[Pointer and reference types][Pointer and reference types]]
- [[Where do we go from here?][Where do we go from here?]]
#+end_scriptsize
#+HTML: </font>

* What is a type?

** Section introduction                        :ignore:

Types are the /most/ important abstraction in programming.
- Without types, the programmer deals entirely in bitstrings.
  - I.e., the programmer must keep a mental map of representation of data.

#+begin_quote
Types are the central organising principle of
the theory of programming languages.
Language features are manifestations of type structure.
The syntax of a language is governed
by the constructs that define its types,
and its semantics is determined by the interactions among those constructs.
The soundess of a language design — the absence of ill-defined programs —
follows naturally.

— Robert Harper, /Practical Foundations for Programming Languages/
#+end_quote

** But what are types?

A /type/ is simply a collection of values.
Recall that in set theory, a set is a collection of values.
- In fact, we will see when we begin using Agda
  that it uses the word ~Set~ instead of ~Type~.

You have seen previously many methods to construct sets:
- Listing their members (for finite sets).
- Giving the means of constructing elements,
  for instance by an inductive definition.
- Combining other sets.
  - Unions, /A ∪ B./
    - /Disjoint/ or /tagged/ unions, /A ⊎ B/.
  - Intersections, /A ∩ B/.
  - Products, /A × B/ and /Aⁿ/.
  - (Finite) sequences, /A*/, and infinite sequences /A^{ω}/.
  - Functions, /A → B/.
  - Others:
    - Differences, /A/B/.
    - Power sets, /℘ A/.

** The common methods to construct types

That list of methods for constructing sets contains a lot,
not all of which are necessarily practical in the context
of types in programming languages.

We will see that most languages include some
/primitive/, (also called /atomic/ or /basic/) types,
along with these ways to combine types
to form new types:
- Products
- Sequences
- Unions
- Functions

** Exercise: tuples vs lists

What is the difference between
- /Aⁿ/, the set of /n/-ary products of /A/ (i.e., /A × A × … × A/), and
- /A*/, the set of sequences of /A/?

Why would we want both types in our language?

* Classifications of languages based on types

Before we discuss types themselves further, lets discuss various
classifications of languages based on types.

** “Strong” and “weak” typing

These are comparative terms.
- We'll consider them a subjective criteria.

“Strongly typed”
- Languages are frequently called strongly typed.
  - But less frequently do they state what they mean by that.
  - The term is used inconsistently.
    - “C is a strongly typed, weakly checked language”
      – Dennis Ritchie, creator of C
- We will take it to mean “type clashes are restricted”.
- Not a good objective criteria, by that definition.
  - What does restricted mean?
    - Is it a warning or an error?
    - Does type casting violate this?
  - What qualifies as a type clash?
    - Is implicit type casting allowed?

“Weakly typed” simply means not strongly typed.

*** Exercise

Try for yourself: pick a language “X”, and search for
#+begin_center
“is X strongly typed?”
#+end_center

Try to
- find arguments for and against it, and
- evaluate the quality of the arguments.
  - Of course your ability to do this depends on your familiarity
    with the language.

** Explicit and implicit typing

Languages may require annotations on variables and functions
(/explicit typing/) or allow them to be omitted (/implicit typing/).
- Implicit typing does not weaken the typing system in any way!
  - A very common misconception.
- In general, type inference is an undecidable problem
  (its not guaranteed that the compiler/interpreter can
  determine the type).
  - Most languages have relatively simple type systems,
    and this is not a problem.

Some languages make type annotations a part of the name,
or annotate names with sigils to indicate type details.
- In older versions of Fortran, names beginning with
  ~i~, ~j~ or ~k~ were for integer variables,
  and all variables were of floating point.
- In Perl, names beginning with the sigil
  - ~$~ have scalar type,
  - ~@~ have array type,
  - ~%~ have hash type, and
  - ~&~ have subroutine type.

** Static and dynamic typing

Are types checked before or at runtime?
- It's somewhat natural for interpreted languages to be dynamically typed.
  - Consider the interpreted “scripting” languages;
    Python, Ruby, Javascript, Lua, Perl, PHP, etc.
  - But it's far from universally true!
    - Haskell has an interpreter, and is definitely
      statically typed.

“Dynamically typed” is a misnomer.
- It would be better to say “dynamically type checked”.

* Subtyping and polymorphism

Being strict about types (preventing type clashes)
introduces a (solveable) problem; it prohibits code reuse!
- Subroutines can only be used on a particular type of arguments.

Subtyping and polymorphism provide solutions to this.
- With *subtyping*, there is a sub/super relation between types.
  - Consider the notion of /subsets/.
  - Subtypes can be used in place of their supertypes.
  - Sub-/classing/ is one instance of subtyping.
  - As are subrange or, sometimes, enumeration types.
- With *polymorphism*, a subroutine can have several types.

One notion of polymorphism is *ad hoc* polymorphism,
also called *overloading*.
- Not actually a feature of a type system.
- Subroutine names can be reused as long as the types of arguments
  differ (in some specified way).
- So, the programmer must define the “same” subroutine many times.

** Parametric polymorphism

With parametric polymorphism, subroutines have a
/most general/ type, based on the /shape/ or /form/ of their arguments.
- The subroutines behaviour can only be based on the form,
  not the specific types.
- Commonly used in functional languages.

** Duck typing

More formally called /row polymorphism/.
- Types are not actually checked.
- We only check that an entity has the correct method defined for it.
- “If it walks like a duck, and quacks like a duck, it's a duck!”

* Primitive types

Most languages include some subset of these
/primitive/, (/atomic/, /basic/) types.
- *Integers*; ~int~
  - Including possibly signed, unsigned, short, and/or long variants.
- *Floating point* numbers
  - Including possibly single precision and double precision variants.
- *Characters*
  - Sometimes an alternate name for the byte type (8-bit integers).
- *Booleans*
- *Unit* (the /singleton/ type)
  - Sometimes called void, nil-type, null-type or none-type.
    - In C like languages, you cannot store something of type ~void~.
    - Commonly represented as the type of 0-ary tuples,
      whose only element is ~()~.
- *Empty*
  - Unlike nil, null or none type, which have a single value
    (called ~nil~, ~null~ or ~none~), there is *nothing* in the empty type.

** Uncommon basic types

A few languages include these basic types.
- *Complex* numbers
  - Especially for scientific computation.
- *Decimal* (representation of) numbers
  - Especially for business (monetary) applications.
  - There are decimal numbers that cannot be properly represented
    using binary (e.g. 0.3 = 0.010011, repeating)

** Ordinal types

Many languages include a means of defining other /finite/ types.
Instances include
- Enumeration types
- Subset/Subrange types

* Sequence types

Recall from set theory that
- sets are collections of elements where
  order and multiplicity “don't matter”,
- bags are collections of elements where
  order “doesn't matter” (but multiplicity does), and
- sequences are collections of elements where
  order and multiplicity both matter.
- (This is part of the /boom/ hierarchy).

There are multiple ways sequence types are represented
in programming languages.

** Homogeneous or heterogeneous

One design decision for any sequence
(or more generally, any collection) type
is whether it is /homogeneous/ or /heterogeneous/.
- Can it store elements of differing types?
  - “Heterogeneous”
- Or only elements of the same type?
  - “Homogeneous”

** Array types

Arrays are an abstraction of finite sequences of
adjacent memory cells.
- Programmers are guaranteed certain properties.
  - ~O(1)~ access time for any element.
  - May be computationally costly or impossible to modify length.
    - There are different classifications of arrays.
    - “Array lists” alleviate this problem.
    - We'll discuss this more later in the course.
  - The programmer may have to handle memory allocation.

** List types

Lists are simply an abstract notion of sequences.
- May be implemented by arrays or by structures such as linked lists.
- Implementation details and properties may not be guaranteed.
  - Often not ~O(1)~ access time.

Lazily (non-strictly) constructed lists may even be “infinite”.
For instance, the infinite list of 1's in Haskell:
#+begin_src haskell
ones = 1 :: ones
#+end_src

** Associative array (hash, map, table) types

Associative arrays, also called /hashes/, /maps/ or sometimes /tables/,
are sets of key/value pairs.
- Abstracts away the ordering of the sequence somewhat.
  - Though we could order the keys, and so impose
    an order on the collection.
- Programmer can imagine they are lists of key/value pairs.

A quick side note about sets and bags
- It is notoriously difficult to represent
  such unordered collections on computers,
  - Computers are extremely ordered machines.
- When available, “set types” are usually
  implemented using trees or hash tables.

*** Exercise: Objects as associative arrays

Consider:
- How can we represent objects as a homogeneous array?
  (Not classes, objects).

** String types

Strings are simply sequences of characters.
- Often they are built in,
  - but they could be excluded because programmer
    can implement them using characters and other sequence types.
- Probably they should be built in, or at least part
  of standard libraries, since they are so commonly used.

* Product types

A /heterogeneous/ collection of a /fixed/ number of elements.
- Implemented by, for instance:
  - ~struct~'s
  - Records
  - Tuples
    - Can be implemented as records with unnamed fields
  - Classes
- In lower level languages, programmers may be concerned
  with the alignment/packing of the data.

** Exercise: how many elements in a product?

Consider two types /A/ and /B/.
- How many elements are in the product type
  /A × B/ (a product with one element of each).

** Exercise: classes as a product type

Consider:
- How can we represent classes as a record or tuple?
  (Not objects, classes).

* Union (sum) types

Whereas an element of a product type contains
- a collection of elements of some types,
a /union/ type contains
- one element of a selection of types.

Unions can be /tagged/ or /untagged/.
- With an untagged union, we have no idea /which/ of the possible
  types it is storing at any time.
  - So it's type is dynamic! (Amongst the types it can store).
  - This is unsafe; it allows for type clashes!
- Whereas a tagged union keeps a /tag/ identifying which type
  of element it is currently storing.

Tagged unions are also known as
- /sum/, /option/ and /either/ types.

Note that union types are unnecessary in a dynamically typed language.

* Pointer and reference types

Pointer and reference types capture the notion of a
memory address.
- Not just alternate namings! They have very different properties.
- Addresses are an extremely low level construct.
  - Pointers are fairly low level themselves.
  - References are more abstract.
- Present numerous challenges.
  - Aliasing
    - Especially through pointer arithmetic; given a pointer
      as an “address”, we can access “nearby” memory!
  - /Dangling/ or /wild/ pointers/references
  - Garbage

* Where do we go from here?

- We will continue to discuss types; specifically,
  - the concept of an /abstract/ datatype,
  - /inductive/ (or algebraic, or recursive) datatypes,
    - along with some introductory /type theory/,
  - more details about type implementations,
    - especially arrays.