Deriving the Function `lines`

What can be Found in the Textbook?

In [Bird-Wadler-1988], the function

> lines :: String -> [String]

which decomposes text into lines not containing the newline character '\n' is specified as one-sided inverse to the function

> unlines :: [String] -> String
> unlines = foldr1 concLine

with the following auxiliary function:

> concLine :: String -> String -> String
> concLine xs ys = xs ++ '\n' : ys

From this specification, a constructive definition of lines is then derived.

Reality is Different

Most probably since nowadays it is considered bad style to end a file with anything but a newline character, the prelude definition of unlines now is the following:

> unlines    = concatMap (++ "\n")

In GHC and HBC this is implemented in the following way:

unlines [] = []
unlines (l:ls) = l ++ '\n' : unlines ls

This is equivalent to the following version that we shall stick to in the sequel:

> unlines   :: [String] -> String
> unlines = foldr concLine ""
> concLine xs ys = xs ++ '\n' : ys

This reduces the difference between the version of [Bird-Wadler-1988] and that of the prelude to the foldr1 having been replaced by a foldr, and thus corresponds to exercise 4.3.3 in [Bird-Wadler-1988], which we now solve: We start from the original specification for lines:

< lines (unlines xss) = xss              -- (spec)

and adhere to the assumption that we can express lines via foldr:

> lines :: String -> [String]
> lines = foldr f e

with suitably chosen auxiliary values

> e :: [String]
> f :: Char -> [String] -> [String]

We now just redo the development of [Bird-Wadler-1988], so we first derive the following laws about lines and unlines from the definition of foldr:

< unlines [] = []                                    -- (unlines.1)
< unlines (xs : xss) = concLine xs (unlines xss)     -- (unlines.2)

< lines [] = e                                       -- (lines.1)
< lines (x : xs) = f x (lines xs)                    -- (lines.2)

It is straightforward to derive e:

< e = lines []                                       -- (lines.1)
<   = lines (unlines [])                             -- (unlines.1)
<   = []                                             -- (spec)

For f, we initially obtain:

< f x xss = f x (lines (unlines xss))                -- (spec)
<         = lines (x : unlines xss)                  -- (lines.2)

Performing a case distinction on x we first get:

< f '\n' xss = lines ('\n' : unlines xss)
<            = lines ("" ++ '\n' : unlines xss)      -- (++)
<            = lines (concLine "" (unlines xss))     -- (concLine)
<            = lines (unlines ("" : xss))            -- (unlines.2)
<            = "" : xss                              -- (spec)

For the case that x /= '\n' we again have to distinguish two cases, let us first assume that xss is non-empty, i.e., that xss = ys:yss for some ys :: String and yss :: [String]:

< f x xss = lines (x : unlines (ys:yss))
<         = lines (x : concLine ys (unlines yss))         -- (unlines.2)
<         = lines (x : (ys ++ '\n' : unlines yss))        -- (concLine)
<         = lines ((x:ys) ++ '\n' : unlines yss)          -- (++)
<         = lines (concLine (x:ys) ++ '\n' : unlines yss) -- (concLine)
<         = lines (unlines ((x:ys):yss))                  -- (unlines.2)
<         = (x:ys) : yss                                  -- (spec)

In the case that xss is the empty list we are in a dilemma since non-empty strings not containing any newline characters are outside the range of unlines. We therefore have to decide on a reasonable totalisation, and we extend the specification with:

< lines xs = [xs]  , if  not (null xs) && '\n' `notElem` xs      -- (spec2)

This allows us to finish the calculation:

< f x xss = lines (x : unlines [])
<         = lines (x : [])                    -- (unlines.1)
<         = lines [x]
<         = [[x]]                             -- (spec2)

Altogether we have:

> lines = foldr f []
>  where
>   f '\n' xss = "" : xss
>   f x [] = [[x]]
>   f x (ys:yss) = (x:ys) : yss

Comparing this with the equivalent prelude definition:

> lines	  :: String -> [String]
> lines "" = []
> lines s  = let (l, s') = break (== '\n') s
>            in  l : case s' of
>                     []      -> []
>                     (_:s'') -> lines s''

we note that our newly derived definition for lines is not obviously more cryptic than the prelude definition; it even turns out to be faster.

However, as pointed out by Koen Claessen, it is also less lazy, in that only complete lines are passed on.

Here are

the best running times (out of ten runs) of interact (unlines . lines) on a 1.4Mb source file, with compiled Haskell, and
reduction counts for lines "Science\nis true.\nDon't be misled\nby facts." with Haskell interpreted by Hugs, and a term graph translation of a joint Haskell script animated by HOPS

There are also HOPS reduction sequences in PostScript for SpanLines2: and for NewLines:
The animation PostScript files are not designed for printing! Holding down the space key in GhostView might give you a usable animation.

Experiments have been conducted with the following variants of lines (those in italics are lazy within lines):

PrelLines: the prelude definition with built-in implementation
SpanLines: the prelude definition with break (== '\n') replaced by span (/= '\n')
SpanLines2: as before, but with span modified to use an irrefutable pattern
SpanLines3: as SpanLines2, but with span (/= '\n') replaced by span ('\n' /=)
SpanLines4: as SpanLines3, but with span (/= '\n') partially evaluated
NewLines: the definition derived above
NewLines2: our definition with foldr expanded
NewLines2a: as NewLines2, but lazified
NewLines2b: as NewLines2, but lazified without pairs
NewLines3: as NewLines2, additionally with f expanded
NewLines3a: as NewLines2a, additionally with f expanded
NewLines3b: as NewLines2b, additionally with f expanded
NewLines3c: as NewLines3b, but with character literal pattern reinstituted instead of equality test, yielding a more accurate expansion of NewLines2b, since character literal patterns are treated as constructors, at least by GHC
AccumLines: an accumulating variant provided by Simon Marlow (with correction by Koen Claessen)
AccumLines2: as AccumLines, but with functional accumulator

First the fully lazy variants:

              ghc-4.06 -O    | hbc-0.9999.5b -O |     HUGS     |  HOPS
              real    user   |   real    user   |  red. cells  |  red.  .ps   .ps.gz   .pdf
PrelLines:   1.813s  1.710s  |  2.630s  2.430s  |  1025  1579  |  479 
SpanLines:   1.837s  1.700s  |  2.610s  2.360s  |  1105  1701  |
SpanLines2:  1.827s  1.720s  |  2.435s  2.180s  |  1105  1701  |  307  7882kB  641kB  4035kB
SpanLines3:  1.853s  1.660s  |  2.659s  2.390s  |  1063  1655  | 
SpanLines4:  1.763s  1.650s  |  2.057s  1.920s  |  1101  1727  |
NewLines2a:  1.817s  1.660s  |  2.229s  2.060s  |   956  1543
NewLines2b:  1.745s  1.580s  |  2.166s  1.990s  |   917  1309
NewLines3a:  1.784s  1.640s  |  2.303s  2.050s  |   956  1582
NewLines3b:  1.700s  1.580s  |  2.024s  1.850s  |   959  1393
NewLines3c:  1.755s  1.620s  |  2.176s  2.020s  |   917  1348

And second the variants that are not lazy inside lines:

NewLines:    1.328s  1.210s  |  1.440s  1.280s  |   839  1231  |  257  5316kB  328kB  2912kB
NewLines2:   1.213s  1.070s  |  1.373s  1.230s  |   839  1231  |  255
NewLines3:   1.222s  1.080s  |  1.290s  1.130s  |   878  1312
AccumLines:  1.205s  1.060s  |  1.275s  1.120s  |   880  1316
AccumLines2: 1.196s  1.060s  |  1.520s  1.360s  |   841  1234

It then comes as no surprise that the function words, too, can be improved over the prelude version:

             ghc-4.06 -O    |  hbc-0.9999.5b -O
             real    user   |   real    user
PrelWords:  1.681s  1.550s  |  2.379s  2.120s
NewWords:   1.162s  1.110s  |  1.430s  1.290s
NewWords2:  1.216s  1.100s  |  1.418s  1.260s

Here (in NewWords) we have used the following directly programmed version:

> words' :: String -> [String]
> words' [] = []
> words' (x:xs) | Char.isSpace x  =  words1 xs
>               | otherwise       =  case words' xs of
>                                      []  ->  [[x]]
>                                      (ys:yss) -> (x:ys):yss
>  where
>   words1 [] = []
>   words1 xs@(y:ys) | Char.isSpace y  =  words1 ys
>                    | otherwise       =  [] : words' xs

1 May 2000
8 May 2000: added spanLines; new timings
10 May 2000: added SpanLines2, SpanLines3, SpanLines4, HUGS, and HOPS
11 May 2000: corrected typo in NewLines3, thanks to Ronny Wichers Schreur
added AccumLines from Simon Marlow
added AccumLines2, NewLines2a
12 May 2000: added NewLines2b, NewLines3b
16 May 2000: added NewLines3a, NewLines3c; all lazy lines retimed

Wolfram Kahl

Deriving the Function lines

What can be Found in the Textbook?

Reality is Different

Deriving the Function `lines`