`lines`

In [Bird-Wadler-1988], the function

> lines :: String -> [String]which decomposes text into lines not containing the newline character

`'\n'`

is specified as one-sided inverse to the function
> unlines :: [String] -> String > unlines = foldr1 concLinewith the following auxiliary function:

> concLine :: String -> String -> String > concLine xs ys = xs ++ '\n' : ysFrom this specification, a constructive definition of

`lines`

is then derived.
Most probably since nowadays it is considered bad style
to end a file with anything but a newline character,
the prelude definition of `unlines`

now is the following:

> unlines = concatMap (++ "\n")In GHC and HBC this is implemented in the following way:

unlines [] = [] unlines (l:ls) = l ++ '\n' : unlines lsThis is equivalent to the following version that we shall stick to in the sequel:

> unlines :: [String] -> String > unlines = foldr concLine "" > concLine xs ys = xs ++ '\n' : ysThis reduces the difference between the version of [Bird-Wadler-1988] and that of the prelude to the

`foldr1`

having been replaced by a `foldr`

,
and thus corresponds to exercise 4.3.3 in
[Bird-Wadler-1988], which we now solve:
We start from the original specification for `lines`

:
< lines (unlines xss) = xss -- (spec)and adhere to the assumption that we can express

`lines`

via `foldr`

:
> lines :: String -> [String] > lines = foldr f ewith suitably chosen auxiliary values

> e :: [String] > f :: Char -> [String] -> [String]We now just redo the development of [Bird-Wadler-1988], so we first derive the following laws about

`lines`

and
`unlines`

from the definition of `foldr`

:
< unlines [] = [] -- (unlines.1) < unlines (xs : xss) = concLine xs (unlines xss) -- (unlines.2) < lines [] = e -- (lines.1) < lines (x : xs) = f x (lines xs) -- (lines.2)It is straightforward to derive

`e`

:
< e = lines [] -- (lines.1) < = lines (unlines []) -- (unlines.1) < = [] -- (spec)For

`f`

, we initially obtain:
< f x xss = f x (lines (unlines xss)) -- (spec) < = lines (x : unlines xss) -- (lines.2)Performing a case distinction on

`x`

we first get:
< f '\n' xss = lines ('\n' : unlines xss) < = lines ("" ++ '\n' : unlines xss) -- (++) < = lines (concLine "" (unlines xss)) -- (concLine) < = lines (unlines ("" : xss)) -- (unlines.2) < = "" : xss -- (spec)For the case that

`x /= '\n'`

we again have to distinguish two cases,
let us first assume that `xss`

is non-empty,
i.e., that `xss = ys:yss`

for some `ys :: String`

and `yss :: [String]`

:
< f x xss = lines (x : unlines (ys:yss)) < = lines (x : concLine ys (unlines yss)) -- (unlines.2) < = lines (x : (ys ++ '\n' : unlines yss)) -- (concLine) < = lines ((x:ys) ++ '\n' : unlines yss) -- (++) < = lines (concLine (x:ys) ++ '\n' : unlines yss) -- (concLine) < = lines (unlines ((x:ys):yss)) -- (unlines.2) < = (x:ys) : yss -- (spec)In the case that

`xss`

is the empty list
we are in a dilemma since
non-empty strings not containing any newline characters
are outside the range of `unlines`

.
We therefore have to decide on a reasonable totalisation,
and we extend the specification with:
< lines xs = [xs] , if not (null xs) && '\n' `notElem` xs -- (spec2)This allows us to finish the calculation:

< f x xss = lines (x : unlines []) < = lines (x : []) -- (unlines.1) < = lines [x] < = [[x]] -- (spec2)Altogether we have:

> lines = foldr f [] > where > f '\n' xss = "" : xss > f x [] = [[x]] > f x (ys:yss) = (x:ys) : yssComparing this with the equivalent prelude definition:

> lines :: String -> [String] > lines "" = [] > lines s = let (l, s') = break (== '\n') s > in l : case s' of > [] -> [] > (_:s'') -> lines s''we note that our newly derived definition for

`lines`

is not obviously more cryptic than the prelude definition;
it even turns out to be faster.
However, as pointed out by Koen Claessen, it is also less lazy, in that only complete lines are passed on.

Here are

- the best running times (out of ten runs) of
`interact (unlines . lines)`

on a 1.4Mb source file, with compiled Haskell, and - reduction counts for
`lines "Science\nis true.\nDon't be misled\nby facts."`

with Haskell interpreted by Hugs, and a term graph translation of a joint Haskell script animated by HOPS

There are also HOPS reduction sequences in PostScript for *SpanLines2*:
and for
NewLines:

**The animation PostScript files are not designed for printing!**
Holding down the space key in GhostView might give you a usable animation.

Experiments have been conducted with the
following variants of `lines`

(those in italics are lazy within lines):

*PrelLines*: the prelude definition with built-in implementation*SpanLines*: the prelude definition with`break (== '\n')`

replaced by`span (/= '\n')`

*SpanLines2*: as before, but with`span`

modified to use an irrefutable pattern*SpanLines3*: as SpanLines2, but with`span (/= '\n')`

replaced by`span ('\n' /=)`

*SpanLines4*: as SpanLines3, but with`span (/= '\n')`

partially evaluated- NewLines: the definition derived above
- NewLines2: our definition with
`foldr`

expanded *NewLines2a*: as NewLines2, but lazified*NewLines2b*: as NewLines2, but lazified without pairs- NewLines3: as NewLines2, additionally with
`f`

expanded *NewLines3a*: as NewLines2a, additionally with`f`

expanded*NewLines3b*: as NewLines2b, additionally with`f`

expanded*NewLines3c*: as NewLines3b, but with character literal pattern reinstituted instead of equality test, yielding a more accurate expansion of NewLines2b, since character literal patterns are treated as constructors, at least by GHC- AccumLines: an accumulating variant provided by Simon Marlow (with correction by Koen Claessen)
- AccumLines2: as AccumLines, but with functional accumulator

ghc-4.06 -O | hbc-0.9999.5b -O | HUGS | HOPS real user | real user | red. cells | red. .ps .ps.gz .pdfAnd second the variants that are not lazy inside lines:PrelLines: 1.813s 1.710s | 2.630s 2.430s | 1025 1579 | 479SpanLines: 1.837s 1.700s | 2.610s 2.360s | 1105 1701 |SpanLines2: 1.827s 1.720s | 2.435s 2.180s | 1105 1701 | 307 7882kB 641kB 4035kBSpanLines3: 1.853s 1.660s | 2.659s 2.390s | 1063 1655 |SpanLines4: 1.763s 1.650s | 2.057s 1.920s | 1101 1727 |NewLines2a: 1.817s 1.660s | 2.229s 2.060s | 956 1543NewLines2b: 1.745s 1.580s | 2.166s 1.990s | 917 1309NewLines3a: 1.784s 1.640s | 2.303s 2.050s | 956 1582NewLines3b: 1.700s 1.580s | 2.024s 1.850s | 959 1393NewLines3c: 1.755s 1.620s | 2.176s 2.020s | 917 1348

NewLines: 1.328s 1.210s | 1.440s 1.280s | 839 1231 | 257 5316kB 328kB 2912kB NewLines2: 1.213s 1.070s | 1.373s 1.230s | 839 1231 | 255 NewLines3: 1.222s 1.080s | 1.290s 1.130s | 878 1312 AccumLines: 1.205s 1.060s | 1.275s 1.120s | 880 1316 AccumLines2: 1.196s 1.060s | 1.520s 1.360s | 841 1234

It then comes as no surprise that the function

`words`

, too,
can be improved over the prelude version:
ghc-4.06 -O | hbc-0.9999.5b -O real user | real user PrelWords: 1.681s 1.550s | 2.379s 2.120s NewWords: 1.162s 1.110s | 1.430s 1.290s NewWords2: 1.216s 1.100s | 1.418s 1.260sHere (in

`NewWords`

)
we have used the following directly programmed version:
> words' :: String -> [String] > words' [] = [] > words' (x:xs) | Char.isSpace x = words1 xs > | otherwise = case words' xs of > [] -> [[x]] > (ys:yss) -> (x:ys):yss > where > words1 [] = [] > words1 xs@(y:ys) | Char.isSpace y = words1 ys > | otherwise = [] : words' xs

1 May 2000

8 May 2000: added

`spanLines`

; new timings
10 May 2000: added

`SpanLines2`

, `SpanLines3`

, `SpanLines4`

, HUGS, and HOPS
11 May 2000: corrected typo in

`NewLines3`

, thanks to Ronny Wichers Schreur
added

`AccumLines`

from Simon Marlow
added

`AccumLines2`

, `NewLines2a`

12 May 2000: added

`NewLines2b`

, `NewLines3b`

16 May 2000: added

`NewLines3a`

, `NewLines3c`

;
all lazy `lines`

retimed
Wolfram Kahl