\documentclass[letterpaper,twoside,11pt]{article}
\usepackage{anysize}

% lhs2TeX --poly src/ExprParse.lhs > ExprParse.ltx; pdflatex ExprParse.ltx

\parindent0pt\parskip0.6ex % Better for literate code.

%include polycode.fmt

\title{CS 3EA3: Example Haskell Code for Recursive Descent Parsing}

\author{Wolfram Kahl}

\date{\today}

\usepackage{hyperref}
\begin{document}
\maketitle

The module |Data.Char| provides us with the functions |isDigit|, |isLetter|, and |ord :: Char -> Int|.

\begin{code}
module ExprParse where

import Data.Char
\end{code}

An extremely simple expression datatype (abstract syntax):

\begin{code}
data Expr
  = Var String
  | Num Integer
  | Add Expr Expr
  | Mul Expr Expr
\end{code}

For the concrete syntax, we encode the different precedence levels
into separate nonterminals to achieve a context-free grammar
that allows recursive descent parsing:

\begin{verbatim}
Expr    ::= Term + Expr     |   Term
Term    ::= Factor * Term   |   Factor
Factor  ::= Number          |   Identifier   |    ( Expr )
\end{verbatim}

The precedence levels can easil be accommodated by the unparsing function%
\footnote{Idiomatic Haskell would define the more efficient
|showsPrecExpr :: Int -> Expr -> String -> String|.
}%footnote
;
whether parentheses are added is decided depending on the precedence
argument, which stands for the precedence of the \emph{surrounding}
constructor.
Adding parentheses uses an auxiliary function |paren|:
\begin{code}
paren :: String -> String
paren s = '(' : s ++ ")"
\end{code}

\begin{code}
showPrecExpr :: Int -> Expr -> String
showPrecExpr p (Var s) = s
showPrecExpr p (Num k) = show k
showPrecExpr p (Add e1 e2) = (if p > 4 then paren else id)
   (showPrecExpr 4 e1 ++ " + " ++ showPrecExpr 4 e2)
showPrecExpr p (Mul e1 e2) = (if p > 5 then paren else id)
   (showPrecExpr 5 e1 ++ " * " ++ showPrecExpr 5 e2)
\end{code}

For installing this as standard |show| function for |Expr| values,
we choose lowest precedence by default:

\begin{code}
instance Show Expr where show = showPrecExpr 0
\end{code}

For recursive descent parsing
see also \url{http://en.wikipedia.org/wiki/Recursive_descent};
for an extremely ``low-tech'' Haskell version
we use a parser type that takes a |String| as argument,
and returns a pair consisting of a parsed |Expr|,
and the not-yet-parsed rest of the |String|:

\begin{code}
parseExpr, parseTerm, parseFactor :: String -> (Expr, String)
\end{code}

This is a very simple choice;
it forces us to use Haskell run-time errors
for reacting to parse errors.

The grammar rules can now straight-forwardly be translated into
Haskell, always continuing to work on the rest strings:

\begin{code}
parseExpr cs = let (t, cs') = parseTerm cs in case cs' of
   '+' : cs'' ->  let (e, cs''') = parseExpr cs'' in (Add t e, cs''')
   _ -> (t, cs')
\end{code}

\begin{code}
parseTerm cs = let (f, cs') = parseFactor cs in case cs' of
   '*' : cs'' ->  let (t, cs''') = parseTerm cs'' in (Mul f t, cs''')
   _ -> (f, cs')
\end{code}

\begin{code}
parseFactor ('(' : cs) = let (e, cs') = parseExpr cs in case cs' of
                            ')' : cs'' -> (e, cs'')
                            _ -> error "closing parenthesis expected"
parseFactor (c : cs)
  | isLetter c  =  let (ident, cs') = parseIdentRest [c] cs
                   in (Var ident, cs')
  | isDigit c   =  let (num, cs') = parseNumberRest (numFromDigit c) cs
                   in (Num num, cs')
parseFactor cs  =  error ("factor expected; " ++ show cs ++ " found")
\end{code}

Since we allow multi-letter dentifiers and multi-digit numbers,
we use auxiliary parser to complete them
after we recognised the first character:

\begin{code}
parseIdentRest :: String -> String -> (String, String)
parseIdentRest s (c : cs) | isLetter c  =  parseIdentRest (s ++ [c]) cs
parseIdentRest s cs = (s, cs)

parseNumberRest :: Integer -> String -> (Integer, String)
parseNumberRest n (c : cs)
   | isDigit c = parseNumberRest (10 * n + numFromDigit c) cs
parseNumberRest n cs = (n, cs)

numFromDigit :: Char -> Integer
numFromDigit c = toInteger (ord c - ord '0')
\end{code}

\end{document}