lines
In [Bird-Wadler-1988], the function
> lines :: String -> [String]which decomposes text into lines not containing the newline character
'\n'
is specified as one-sided inverse to the function
> unlines :: [String] -> String > unlines = foldr1 concLinewith the following auxiliary function:
> concLine :: String -> String -> String > concLine xs ys = xs ++ '\n' : ysFrom this specification, a constructive definition of
lines
is then derived.
Most probably since nowadays it is considered bad style
to end a file with anything but a newline character,
the prelude definition of unlines
now is the following:
> unlines = concatMap (++ "\n")In GHC and HBC this is implemented in the following way:
unlines [] = [] unlines (l:ls) = l ++ '\n' : unlines lsThis is equivalent to the following version that we shall stick to in the sequel:
> unlines :: [String] -> String > unlines = foldr concLine "" > concLine xs ys = xs ++ '\n' : ysThis reduces the difference between the version of [Bird-Wadler-1988] and that of the prelude to the
foldr1
having been replaced by a foldr
,
and thus corresponds to exercise 4.3.3 in
[Bird-Wadler-1988], which we now solve:
We start from the original specification for lines
:
< lines (unlines xss) = xss -- (spec)and adhere to the assumption that we can express
lines
via foldr
:
> lines :: String -> [String] > lines = foldr f ewith suitably chosen auxiliary values
> e :: [String] > f :: Char -> [String] -> [String]We now just redo the development of [Bird-Wadler-1988], so we first derive the following laws about
lines
and
unlines
from the definition of foldr
:
< unlines [] = [] -- (unlines.1) < unlines (xs : xss) = concLine xs (unlines xss) -- (unlines.2) < lines [] = e -- (lines.1) < lines (x : xs) = f x (lines xs) -- (lines.2)It is straightforward to derive
e
:
< e = lines [] -- (lines.1) < = lines (unlines []) -- (unlines.1) < = [] -- (spec)For
f
, we initially obtain:
< f x xss = f x (lines (unlines xss)) -- (spec) < = lines (x : unlines xss) -- (lines.2)Performing a case distinction on
x
we first get:
< f '\n' xss = lines ('\n' : unlines xss) < = lines ("" ++ '\n' : unlines xss) -- (++) < = lines (concLine "" (unlines xss)) -- (concLine) < = lines (unlines ("" : xss)) -- (unlines.2) < = "" : xss -- (spec)For the case that
x /= '\n'
we again have to distinguish two cases,
let us first assume that xss
is non-empty,
i.e., that xss = ys:yss
for some ys :: String
and yss :: [String]
:
< f x xss = lines (x : unlines (ys:yss)) < = lines (x : concLine ys (unlines yss)) -- (unlines.2) < = lines (x : (ys ++ '\n' : unlines yss)) -- (concLine) < = lines ((x:ys) ++ '\n' : unlines yss) -- (++) < = lines (concLine (x:ys) ++ '\n' : unlines yss) -- (concLine) < = lines (unlines ((x:ys):yss)) -- (unlines.2) < = (x:ys) : yss -- (spec)In the case that
xss
is the empty list
we are in a dilemma since
non-empty strings not containing any newline characters
are outside the range of unlines
.
We therefore have to decide on a reasonable totalisation,
and we extend the specification with:
< lines xs = [xs] , if not (null xs) && '\n' `notElem` xs -- (spec2)This allows us to finish the calculation:
< f x xss = lines (x : unlines []) < = lines (x : []) -- (unlines.1) < = lines [x] < = [[x]] -- (spec2)Altogether we have:
> lines = foldr f [] > where > f '\n' xss = "" : xss > f x [] = [[x]] > f x (ys:yss) = (x:ys) : yssComparing this with the equivalent prelude definition:
> lines :: String -> [String] > lines "" = [] > lines s = let (l, s') = break (== '\n') s > in l : case s' of > [] -> [] > (_:s'') -> lines s''we note that our newly derived definition for
lines
is not obviously more cryptic than the prelude definition;
it even turns out to be faster.
However, as pointed out by Koen Claessen, it is also less lazy, in that only complete lines are passed on.
Here are
interact (unlines . lines)
on a 1.4Mb source file,
with compiled Haskell, and
lines "Science\nis true.\nDon't be misled\nby facts."
with Haskell interpreted by Hugs,
and a term graph translation of a
joint Haskell script animated by HOPS
There are also HOPS reduction sequences in PostScript for SpanLines2:
and for
NewLines:
The animation PostScript files are not designed for printing!
Holding down the space key in GhostView might give you a usable animation.
Experiments have been conducted with the
following variants of lines
(those in italics are lazy within lines):
break (== '\n')
replaced by span (/= '\n')
span
modified to use an irrefutable pattern
span (/= '\n')
replaced by span ('\n' /=)
span (/= '\n')
partially evaluated
foldr
expanded
f
expanded
f
expanded
f
expanded
ghc-4.06 -O | hbc-0.9999.5b -O | HUGS | HOPS real user | real user | red. cells | red. .ps .ps.gz .pdf PrelLines: 1.813s 1.710s | 2.630s 2.430s | 1025 1579 | 479 SpanLines: 1.837s 1.700s | 2.610s 2.360s | 1105 1701 | SpanLines2: 1.827s 1.720s | 2.435s 2.180s | 1105 1701 | 307 7882kB 641kB 4035kB SpanLines3: 1.853s 1.660s | 2.659s 2.390s | 1063 1655 | SpanLines4: 1.763s 1.650s | 2.057s 1.920s | 1101 1727 | NewLines2a: 1.817s 1.660s | 2.229s 2.060s | 956 1543 NewLines2b: 1.745s 1.580s | 2.166s 1.990s | 917 1309 NewLines3a: 1.784s 1.640s | 2.303s 2.050s | 956 1582 NewLines3b: 1.700s 1.580s | 2.024s 1.850s | 959 1393 NewLines3c: 1.755s 1.620s | 2.176s 2.020s | 917 1348And second the variants that are not lazy inside lines:
NewLines: 1.328s 1.210s | 1.440s 1.280s | 839 1231 | 257 5316kB 328kB 2912kB NewLines2: 1.213s 1.070s | 1.373s 1.230s | 839 1231 | 255 NewLines3: 1.222s 1.080s | 1.290s 1.130s | 878 1312 AccumLines: 1.205s 1.060s | 1.275s 1.120s | 880 1316 AccumLines2: 1.196s 1.060s | 1.520s 1.360s | 841 1234
words
, too,
can be improved over the prelude version:
ghc-4.06 -O | hbc-0.9999.5b -O real user | real user PrelWords: 1.681s 1.550s | 2.379s 2.120s NewWords: 1.162s 1.110s | 1.430s 1.290s NewWords2: 1.216s 1.100s | 1.418s 1.260sHere (in
NewWords
)
we have used the following directly programmed version:
> words' :: String -> [String] > words' [] = [] > words' (x:xs) | Char.isSpace x = words1 xs > | otherwise = case words' xs of > [] -> [[x]] > (ys:yss) -> (x:ys):yss > where > words1 [] = [] > words1 xs@(y:ys) | Char.isSpace y = words1 ys > | otherwise = [] : words' xs
spanLines
; new timings
SpanLines2
, SpanLines3
, SpanLines4
, HUGS, and HOPS
NewLines3
, thanks to Ronny Wichers Schreur
AccumLines
from Simon Marlow
AccumLines2
, NewLines2a
NewLines2b
, NewLines3b
NewLines3a
, NewLines3c
;
all lazy lines
retimed