Notes on "Haskell Programming – from first principles"
From November, 13th 2017 to June, 9th 2018, a friend and I were working our way through the 1285 pages of “Haskell Programming – from first principles” by Christopher Allen and Julie Moronuki. That’s more than six pages per day! While reading and discussing, I took a few notes here and there, which I want to publish in this post. Some of the sentences are directly taken from the book, which I highly recommend to anyone who wants to learn Haskell, by the way.
Table of Contents
- Introduction
- Getting Started
- Strings
- Basic Data Types
- Types
- Typeclasses
- Functional Patterns
- Recursion
- Lists
- Folding Lists
- Algebraic Data Types
- Signaling Adversity
- Building Projects
- Testing
- Monoid and Semigroup
- Functor
- Applicative
- Monad
- Applying Structure
- Foldable
- Traversable
- Reader
- State
- Parser Combinators
- Composing Types
- Monad Transformers
- Nonstrictness
- Basic Libraries
- IO
- Error Handling
- Quotes
1 Introduction
A function maps from its domain to its image (which is a subset of the co-domain). Each input is invariably mapped to exactly one output.
In lambda calculus an abstraction is an anonymous function. It consists of head and body, for example λx.x. The head binds the parameter(s) to the body of the function.
The lambdas λx.x and λy.y are alpha equivalent.
Beta reduction is the process of replacing all occurrences of a parameter with a value or a function; for example (λx.x+x)1 becomes 1 or (λx.x)(λa.2a) turns into (λa.2a).
If a variable occurs in a function’s body, but not in the head, it is referred to as a free variable. Lambdas with multiple arguments such as λxy.xy are a shorthand for multiple nested lambdas λx.(λy.xy).
Combinators are lambda terms with no free variables.
Lambda terms can diverge if evaluation does not terminate. For example λx.xx diverges if applied to itself. Evaluation happens in normal order, i.e. outer-most and left-most terms get evaluated first.
Notes on syntax: λab.a(b) means that b will be applied to a on evaluation (if possible). However, (λa.λb.a)b evaluates to λb.b’.
2 Getting Started
Prelude is a library of standard types, classes, and functions, such as pi
, Bool
, Monad
, map
. Haskell files can be loaded to GHCi REPL using :load file.hs
. All compiler warnings can be enabled with -Wall
(or equivalently {-# OPTIONS_GHC -Wall #-}
).
An expression is in normal form, or irreducible, when there are no more evaluations steps that can be taken.
Every Haskell function is an expression that takes one argument. They always return a result. A definition may look like that: piTimesSquare x = pi * (x ^ 2)
. A function parameter stands for a value, while an argument is an actual value that is being passed on to the function. Functions are in prefix style by default.
Infix operators are functions that can be used in prefix fashion by wrapping them in parentheses: (+) 1 2
. The $
operator has the lowest possible precedence (0). The following example explains its usage: (5 *) $ 1 + 1
equals 5 * (1 + 1)
. The GHCi command info
provides signature and precedence information about functions.
An expression is a combination of symbols that conforms to syntactic rules and can be evaluated to some result.
A value is an expression that can not be evaluated any further. Haskell uses lazy evaluation, i.e. it only evaluates an expression when it is forced to by other terms which refer to the expression.
3 Strings
The GHCi command :type
prints the type of a variable / expression. a :: b
means that a
has the type b
.
String
is a type alias for [Char]
, i.e. a list of characters.
For outputting variables, print
can be used. putStr
and putStrLn
are also printing, however, they are restricted to the type String
.
The do
syntax allows for sequencing of actions, as shown below.
a :: String -- declaration with type
a = "a" -- value assignment
main :: IO ()
main = do
putStr a
putStrLn "b"
Strings can be concatenated with the infix operator ++
or the concat
function (e.g. concat ["a", "b"]
).
Functions and types can be defined globally (top level definitions) or locally (local definition); the scope is different. The were
and let
clauses are key to defining local functions or variables, as can be seen in the example below.
area d = pi * (r * r) -- top level
where r = d / 2 -- local
The :
operator builds a list: 'a' : "bc"
. The functions head
and tail
can be applied to strings in order to retrieve the first character (head) or everything but the first character (tail). A substring starting at index 0 can be retrieved using take
: take n string
. It will return a list containing the first n
elements of the list (which can be a String
). Contrary, drop
removes the first n
elements from a list.
-- sub list with length l of list x starting at index s
-- pointfree version: https://timodenk.com/blog/making-slice-pointfree/
slice :: Int -> Int -> [a] -> [a]
slice s l x = take l (drop s x)
4 Basic Data Types
A data type is a set of values with an abstract commonality. A data declaration defines a new data type. For example, the data type Bool
is defined with the following data declaration.
data Bool = False | True
Pattern matching is a feature of Haskell that allows multiple implementations of the same function. When calling the function, the implementation will be chosen depending on the argument. _
is called catch-all and will match any argument value.
Typeclasses is a polymorphic type that adds functionality (i.e. faculties or interfaces) to types that is reusable across all inheriting types. A type alias is a way of making a type available through a different name: type Name = Integer
.
The ordering typeclass Ord
enforces implementation of the following operators.
compare :: a -> a -> Ordering
(<) :: a -> a -> Bool
(<=) :: a -> a -> Bool
(>) :: a -> a -> Bool
(>=) :: a -> a -> Bool
max :: a -> a -> a
min :: a -> a -> a
Haskell’s inequality symbol is /=
. The equality typeclass Eq
requires the following.
(==) :: a -> a -> Bool
(/=) :: a -> a -> Bool
A typeclass constraint can be made for parameters with the following syntax (here for the function equality operator which requires both operands to implement Eq
): (==) :: Eq a => a -> a -> Bool
.
Variables in type signatures are commonly named according to the following rules: (1) Type variables are called a
, b
, …; (2) function variables are called f
, g
, … (3) Arguments to functions are often called x
, y
, and z
. (4) Lists of x
values are called xs
. (5) All names can also occur with numbers or the prime symbol appended to them, e.g. x1
or f'
.
4.1 Numbers
Numbers are inheriting from the typeclass Num
.
Int
. An integral number (aka. integer) with a fixed precision, that is it has upper and lower bound (size: 8 byte).GHC.Int
adds the integer typesInt8
,Int16
,Int32
, andInt64
, with the number indicating the number of bits used to store the value. The value range ofInt
is [-9223372036854775808, 9223372036854775807].Integer
. An integral number that supports arbitrarily large or small numbers.Float
. Single-precision floating point number (size: 4 byte).Double
. Double-precision floating point number (size: 8 byte).Rational
. Represents a fraction of two integer numbers. The data type wraps twoInteger
s and is hence arbitrarily precise.Scientific
. Floating point number with anInteger
base andInt
exponent. Therefore, the numbers can be arbitrarily large and precise. This data type is not part of GHC and must be installed separately (stack install scientific
).
The Integer
type should be preferred over Int
, and Scientific
and Rational
(typeclass Fractional
) should be preferred over Float
and Double
, unless computational efficiency matters.
4.2 Boolean
The boolean data type can either be True
or False
and is defined as data Bool = False | True
. Operators for booleans are &&
for and, ||
for or, and the function not
for inversion.
Haskell features if expressions with the following syntax: if <condition> then <a> else <b>
. The entire if expression evaluates to either <a>
or <b>
, depending on the condition.
4.3 Tuples
Tuples are types that store a fixed number n of constituents which may have different types themselves. n is referred to as arity (number of parameters that a function takes). A tuple can be created with its constructor, (,,) x1 x2 x3
, here with n=3. Tuples with n=1 must not exist, however, n=0 is possible and called unit ()
.
For convenience, the first element in a tuple can be accessed using fst :: (a, b) -> a
; snd
serves equally for the second value. Data.Tuple
contains the tuple manipulation functions curry
, uncurry
, and swap
.
A tuple can be unpacked when passed to a function with the following syntax: tupleSum (a, b) = a + b
4.4 Lists
The list data type stores n values of equal type, where n can be changed dynamically.
The n
-th element of a list can be accessed with the !!
operator (n
is zero based): "abc" !! n
.
5 Types
Type systems have been defined to enforce correctness. In Haskell, typing is static and typechecking occurs at compile time. A data type declaration defines a type constructor and data constructors. Haskell functions are created from the function type constructor ->
and the function is a value.
A function signature may have multiple typeclass constraints (Num a, Num b) => a -> b -> b
. In the example, a
could be an Integer
and both b
s could be Double
s. However, different types for the second argument and the return type would not be possible with this definition.
The =>
is called typeclass arrow. The right associative type constructor for functions ->
realizes currying: f :: a -> a -> a
is read as f :: a -> (a -> a)
. Due to currying, functions can be partially applied. Infix operators can be partially applied to a first or second parameter, e.g. (2^)
or (^2)
.
Polymorphism is the provision of a single interface to entities of different types. In Haskell it is either parametric or constrained (aka. bounded, ad-hoc). The former is polymorphism that accepts any type, whereas the latter accepts only some types. Multiple class constrains must be wrapped in parentheses: f :: (Eq a, Num b) => a -> b
. The opposite of polymorphism is monomorphism, in Haskell called concrete. Applied to variables, polymorphism is a property of variables which may refer to more than one concrete type.
Type inference is the process of determining a variables principle type by looking at the way it is being used. The principle type is the most generic type that can be assigned to a variable.
6 Typeclasses
Typeclasses generalize over a set of types in terms of consumption or usage in computation. After declaring a data type with the data Typename
keyword, typeclasses can be assigned with e.g. instance Typeclass Typename
. Typeclasses can inherit from a superclass (e.g. class Num a => Fractional a
).
A typeclass can be defined with the class
keyword. The Num
typeclass for example is defined as follows.
class Num a where
(+) :: a -> a -> a
(*) :: a -> a -> a
(-) :: a -> a -> a
negate :: a -> a
abs :: a -> a
signum :: a -> a
fromInteger :: Integer -> a
Types which implement the Integral
type are also required to implement Real
and Enum
.
class (Real a, Enum a) => Integral a
A typeclass is implemented for a type with instance
. The implementation is called instance and might look as follows.
data Suit = Spade | Diamond | Club | Heart
instance Eq Suit where
(==) Spade Spade = True
(==) Diamond Diamond = True
(==) Club Club = True
(==) Heart Heart = True
(==) _ _ = False
Typeclasses default to certain types. Num
defaults to Integer
for example default Num Integer
. This can be better explained given an example: When entering a 5
into GHCi, a show method must be called. :t 5
however gives Num
so it is left to Haskell to choose a show
method from the inheriting types. In this case Integer
is chosen by default.
Typeclass instances are unique pairings of a typeclass and a type.
Effects are observable actions programs may take, such as writing to a file or printing to the console. IO
is the type for values whose evaluation bears the possibility of causing side effects.
6.1 Derivable Typeclasses
The following typeclasses can be automatically derived. That means they can be automatically instantiated for a given type, based on how it is defined.
Bounded
. Types that have an upper and lower bound.Enum
. The type’s values can be enumerated. Provides methods such assucc
(successor; comparable to incrementing),pred
(predecessor),enumFromTo
, andenumFromThenTo
(which uses a step size based on the second argument).Eq
. The type’s values can be tested for equality.Ord
. The type’s values can be put into sequential order. ImpliesEq
and can be implemented by defining thecompare
method which returnsEQ
,LT
, orGT
.Read
. Values can be parsed from strings. It is often a partial function as it does not return a proper value for all possible inputs.Show
. Values can be converted to strings (e.g. for output). Enforces implementation ofshowsPrec
,show
, andshowList
. Printing things is possible in Haskell, even though it is purely functional, because theprint
method invokesIO
which has the side effect of outputting text. It returns the unit()
because it has no relevant return value.
6.2 Typeclass Inheritance
Inheritance structure of common typeclasses. Ord
inherits from Eq
. Real
inherits from Ord
and Num
. Fractional
inherits from Num
. Integral
inherits from Real
, Fractional
, and Enum
.
7 Functional Patterns
Inner variables can shadow outer variables, as can bee seen in the following function which always returns 5
: func x = let x = 5 in x
.
Anonymous functions (aka. lambdas) are functions which are not bound to an identifier and can be declared with this syntax: (\x -> x * 4) :: Num a => a -> a
. They are often used if a function is passed to another function with the former being needed only once. The signature can be omitted.
The signature of higher order functions contains functions itself. For example distributor :: (a -> b -> c) -> (a -> b) -> (a -> c)
takes two functions and returns a new one.
The guard syntax of Haskell allows to write compact functions with multiple outcomes depending on boolean conditions. Each line behind a pipe is called guard case. otherwise
is a constant that equals True
, i.e. the catch-all case.
clip :: (Num a, Ord a) => a -> a -> a -> a
clip min max x
| x < min = min
| x > max = max
| otherwise = x
Pointfree versions of functions drop arguments for the sake of readability and performance. For example, print a = (putStrLn . show) a
becomes print = putStrLn . show
.
Binding is the assignment of an argument to a parameter.
8 Recursion
A recursive function is defined in terms of itself. The base case ends the recursion, e.g. factorial of 0 is 1.
In Haskell, bottom is a non-value that is used to indicate that a function can not return a value. Possible reasons are errors, partial functions, or infinite recursion / loops.
An example for an elegantly formulated recursive function, performing an integral division:
dividedBy :: Integral a => a -> a -> (a, a)
dividedBy num denom = go num denom 0
where go n d count
| n < d = (count, n)
| otherwise = go (n - d) d (count + 1)
9 Lists
In Haskell, lists are (1) a collection of elements of the same type, or (2) an infinite series of values (i.e. stream).
The list definition is data [] a = [] | a : [a]
.
The initialization [1, 2, 3] ++ [4]
is syntactic sugar for (1 : 2 : 3 : []) ++ 4 : []
. The range syntax allows for the definition of sequences from n to m with [n..m]
and a step size of 1. It uses enumFromTo
behind the scenes; and enumFromThenTo
works for variable step sizes.
List comprehensions are a means of generating a new list from an existing list (or multiple lists). For instance [sqrt x | x <- [0..10], sqrt x < 3]
generates a list of square roots of the numbers from __ to 10, for the cases where the square root is smaller than 3. x <- [0..10]
is called generator. Multiple generators can be used to create a new list, e.g. [x*y | x <- [0..10], y <- [10..12]]
. In such a case, each element of the first list will be processed with every element of the second, and so forth.
In the case of a list, the spine is a linear succession of one cons cell wrapping another cons cell (1 : 2 : 3 : []
). Spines are evaluated independently of values. Here, the spine is the structure of a collection, i.e. not the values contained therein. Calling the length
function with a list does not necessarily lead to an evaluation of all values. The sprint
command (which is a GHCi feature, not part of the Haskell language) allows you to see how much of a value has been evaluated at this point (https://stackoverflow.com/a/35200329/3607984).
Values in Haskell get reduced to weak head normal form by default. Normal form means that an expression is fully evaluated. Weak head normal form means the expression is only evaluated as far as is necessary to reach a data constructor. "a" ++ "b"
is neither of both because the outermost component of the expression is a function.
9.1 List Utility Functions
!!
returns the _n_th elementtake
returns the first n elements of a list.take :: Int -> [a] -> [a]
drop
returns all but the first n elements of a list.drop :: Int -> [a] -> [a]
takeWhile
iterates over the list and returns all elements until the condition mismatches.takeWhile :: (a -> Bool) -> [a] -> [a]
dropWhile
iterates over the list and discards all elements until the condition mismatches.dropWhile :: (a -> Bool) -> [a] -> [a]
splitAt
returns a tuple containing the first n and the remaining elements of the list.splitAt :: Int -> [a] -> ([a], [a])
head
returns the first element of a list. If the list is empty, and exception is thrown.last
returns the last element of a list. Throws an exception if the list is empty.tail
returns all elements but the first (head). If the list is empty, an exception is thrown.init
returns all elements but the last. Throws an exception if the list is empty.elem
checks whether an element is in a list or not.elem :: (Eq a, Foldable t) => a -> t a -> Bool
map
applies a function to all elements.map :: (a -> b) -> [a] -> [b]
zip
creates a list of tuples out of two lists. It stops as soon as one list runs out of values.zip :: [a] -> [b] -> [(a, b)]
unzip
creates a tuple of two lists out of a list of tuples.unzip :: [(a, b)] -> ([a], [b])
zipWith
combines two lists into one by subsequently applying a function to two elements.zipWith :: (a -> b -> c) -> [a] -> [b] -> [c]
10 Folding Lists
Folding is the reduction of a structure. It happens at two stages, namely (1) traversal and (2) reduction. Folding, as a concept, is also refered to as catamorphism, that is the unique homomorphism (structure preserving map) from an initial algebra into some other algebra.
The right associative function fold right, foldr :: Foldable t => (a -> b -> b) -> b -> t a -> b
, applies a base value and the last value of a foldable type to a function, takes the result and recursively applies the function to a sequence of values, yielding one value as its final result. The function folds a foldable type with the function f :: a -> b -> b
.
When computing the product of all values of a foldable, the base value (identity) is 1; for sums it would be __. The identity is also returned, if the foldable data structure contains no value, e.g. an empty list []
.
The left fold is traversing the data structure in the same order as the right fold, however it is left associative. It is inappropriate to use in combinations with very long lists or impossible with infinite lists. foldl'
is the strict version of foldl
. The relationship between foldl
and foldr
is (for finite lists xs
) foldr f z xs = foldl (flip f) z (reverse xs)
.
Scans return a list of all intermediate values of a fold. scanr :: (a -> b -> b) -> b -> [a] -> [b]
and scanl
are the Haskell function for right fold and left fold respectively. scanl
can for example be used to create an infinite list of Fibonacci numbers: fibs = 1 : scanl (+) 1 fibs
.
11 Algebraic Data Types
Type constructors are used at the type level, in type signatures, typeclass declarations, and instances. They are static and resolved at compile time. Data constructors construct values and can be interacted with at runtime. Type and data constructors with no arguments are constants, for instance Bool
.
The arity of a constructor is the number of parameters it takes. A type or data constructor with no arguments are called nullary and are type constant. Data constructors that take exactly one argument are called unary, with more than one they are referred to as products.
A type constructor argument that does not occur alongside with any value constructor is called phantom. For example a
is a phantom in the declaration data Type a = Value
.
The record syntax allows for the definition of types, where the contained values have names. For example data Person = Person { name :: String, age :: Int }
. The values can then be accessed by e.g. name person
.
11.1 Kinds
Kinds are the types of types. They can be queried in GHCi with :kind
. For example the kind of []
is * -> *
because it needs to be applied to one type (in order to yield *
, which is fully applied).
Type constructing is referring to the application of a type to a type constructor.
As-patterns are a way of unpacking an argument, still keeping a reference to the entire argument. The @
-sign is used for that:
f t@(a, _) = do
print a
return t
11.2 Newtype
type
creates an alias (e.g. type TwoBool = (Bool, Bool)
), while data
creates arbitrary data structures. newtype
creates types with a single unary data constructor. Resulting from this, the cardinality of the new type equals the cardinality of the type it contains. A newtype
cannot be a product type, sum type, or contain a nullary value constructor. It has no runtime overhead, because it is reduced to the type it contains.
An example of usage for newtype
. The Int
in B
is wrapped and can therefore be processed differently by tooMany
.
class TooMany a where
tooMany :: a -> Bool
instance TooMany Int where
tooMany n = n > 42
newtype B = B Int deriving (Eq, Show)
instance TooMany B where
tooMany (B n) = n > 100
If B
shall fall back on the default tooMany
implementation, the deriving
keyword can be used in combination with a compiler pragma:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
class TooMany a where
tooMany :: a -> Bool
instance TooMany Int where
tooMany n = n > 42
newtype B = B Int deriving (Eq, Show, TooMany)
11.3 Cardinality
The cardinality of a type is the number of values it can possibly have. The cardinality |A| of a type A = A1 a11 ... a1n | ... | An an1 ... ann
is given by |A1|+…+|An|, where |Ai|=|ai1|×…×|ain|. For instance, the cardinality of Bool = False | True
is 1+1=2.
Sum types are or connections of multiple types; e.g. A = B | C
. Product types are and connections and have e.g. the following shape: A = B c d
. Here B
contains c
and d
.
The number of possible input-output-mappings of a function from a
to b
is computed by |b|^|a|. a -> b -> c
gives |c|^(|b|×|a|).
12 Signaling Adversity
In Haskell it is common to use so called smart constructors https://wiki.haskell.org/Smart_constructors. These constructors validate their arguments and return Maybe
, i.e. either the desired object or Nothing
(or throw an error). For more detailed information about the error, the return type may also be Either
, which holds a Left
and a Right
value. The former is commonly the error object.
Lifted and unlifted types have different kinds, namely *
and #
respectively. Lifted types are much more common and differ from unlifted types by their property of being able to be inhabited by bottom.
The type construction [Maybe]
is invalid, because [] :: * -> *
and Maybe
is not *
but * -> *
itself.
Opposed to folds, unfolds build up data structures from a single starting value (anamorphism). iterate :: (a -> a) -> a -> [a]
does that infinitely, unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
(in Data.List
) is the generalization which may terminate.
13 Building Projects
Haskell Cabal (Common Architecture for Building Applications and Libraries) is a package manager. A package is a program that may have dependencies.
Stack is a program for developing Haskell projects. It is built on top of Cabal. The command stack build
builds a project and stack setup
[…]. stack ghci
starts GHCi in the context of a program, where functions can be executed. stack new <project-name> simple
creates a new project using the template “simple”.
The .cabal file (located in a project’s root folder) contains information about the project. For example whether it is a library or an executable.
library | executable program-name
hs-source-dirs: src
[exposed-modules: Module1, Module2] -- for libraries
[main-is: Main.hs] -- for executables
default-language: Haskell2010
build-depends: base >= 4.7 && < 5
However, with Stack projects, there is also a stack.yaml
file, which contains dependencies. It should be preferred over the Cabal file.
A program can be started with stack exec <program-name>
from every directory. However, the program executable is only present if the .cabal
file that was built before contains the line executable <program-name>
and if the program was built.
By default a module exports all its content. This can be changed by adding a list of exported items:
module ModuleName
(function1, constant1)
where
-- implementation of function1, constant 1, and possibly more
The importing module can also choose what to import. This is dome similarly, through a list of items, e.g. import Data.Bool (bool)
. The other way around, certain things can be excluded from an import: import Database.SQLite.Simple hiding (close)
.
Qualified imports persist the fully qualified name of the imported items. That means with import qualified Data.Bool
the function bool
is only accessible through Data.Bool.bool
.
With an alias, e.g. import qualified Data.Bool as B
, bool
is accessible through B.bool
.
In GHCi the :browse <Module>
command lists all items exported by a module. Prelude
can be disabled with the command stack ghci --ghci-options -XNoImplicitPrelude
.
13.1 Read CSV File
Example snippet that reads the file “data.csv”:
module CSVReader (readCsv) where
import Data.List.Split (splitOn)
readCsv :: IO [[String]]
readCsv = do
raw <- readFile "data.csv"
return $ parseCsv raw
where
parseCsv :: String -> [[String]]
parseCsv s = map (splitOn ",") (lines s)
14 Testing
There are generally four recognized levels of tests:
- Unit testing tests small units of code, generally on function level, or in object-oriented programming environments on class level.
- Integration testing verifies the interfaces between components against design specifications. It ensures that the units (tested in 1.) are wired up properly.
- Component interface testing controls the data that is passed between units. The data is commonly logged. Unusual data values in an interface can help explain unexpected performance in the next unit.
- System testing (aka. end-to-end testing) tests a completely integrated system to verify that the system meets its requirements.
A property-based testing framework runs the same test over and over with generated input.
14.1 Hspec
Hspec (website) is a Haskell testing framework. In order to work with it, the dependency hspec
must be added or it can be installed manually.
cabal install hspec
Sample snippet
import Test.Hspec
main :: IO ()
main = hspec $ do
describe "Addition" $ do
it "1 + 1 is greater than 1" $ do
(1 + 1) > 1 `shouldBe` True
14.2 QuickCheck
QuickCheck (website) was the first library to offer what is today called property testing. The dependency is spelled QuickCheck
.
cabal install QuickCheck
Sample snippet which uses QuickCheck in combination with hspec. QuickCheck itself does not provide the describe
and it
methods.
import Test.Hspec
import Test.QuickCheck
main :: IO ()
main = hspec $ do
describe "Addition" $ do
it "x + 1 is always greater than x" $ do
property $ \x -> x + 1 > (x :: Int)
Another workflow is calling quickCheck (fn :: signature)
.
QuickCheck validates the property by plugging in random values and edge cases. These are generated in this manner: sample (arbitrary :: Gen Int)
.
Generators select values from a list, e.g. lowercase characters.
genChar :: Gen Char
genChar = elements ['a'..'z']
This generator is more sophisticated and capable of generating lists of n elements of type a
, where n is randomly chosen within an upper and lower bound.
arbitraryList :: (Arbitrary a) => (Int, Int) -> Gen [a]
arbitraryList = flip genList arbitrary
genList :: (Int, Int) -> Gen a -> Gen [a]
genList (minL, maxL) g = sized $ \n -> do
k <- choose (minL, min (max minL n) maxL)
sequence [ g | _ <- [1..k] ]
CoArbitrary
is used when random functions need to be generated.
15 Monoid and Semigroup
15.1 Monoid
A monoid is a binary associative operation with an identity. In other words, it is an operator that takes two arguments that follow the rules associativity and identity.
class Monoid a where
mempty :: a
mappend :: a -> a -> a
mconcat :: [a] -> a
{-# MINIMAL mempty, mappend #-}
Monoids are all types that let you join values together through the mappend
function, in accordance with associativity. A mempty
value exists for which the mappend
becomes the identity.
Much more extended functionality lies in the package Data.Monoid
. Opposed to many other Haskell typeclasses, monoids do often have multiple implementations per type. That is realized by wrapping the type with newtype
. For example the newtype
Sum
, which wraps Num
s and determines to use the addition monoid for the wrapped value. Calling mappend
with two Product
values, however, would multiply them. The resulting type wraps the sum or the product. The actual number can be retrieved through getSum
and getProduct
respectively. Similarly, the Bool
monoid is wrapped in either Any
(boolean disjuction) or All
(boolean conjunction).
mconcat
applies mappend
to an arbitrary number of values. For the empty list it returns mempty
, for a list with one entry it is the identity.
The Abelian monoid has the commutative property, i.e. for all x, y, mappend x y == mappend y x
holds.
An orphan instance is an instance that is defined for a datatype and a typeclass, but not in the same module as either of them. If neither typeclass nor datatype were defined manually, the best workaround is to create a newtype
which wraps the datatype.
15.2 Semigroup
A semigroup (Haskell package Data.Semigroup
) is a monoid without the identity property. That is an operation which takes two inputs and reduces them to one, and suffices the law of associativity. In code, that means the semigroup defines
class Semigroup a where
(<>) :: a -> a -> a
while satifying associativity, i.e. (a <> b) <> c == a <> (b <> c)
.
The NonEmpty
datatype resides in Data.List.NonEmpty
. It is a list that contains one or more elements.
16 Functor
A functor is a structure preserving mapping. Such a mapping requires a function that is applied to each of the values that the wrapping type encloses. A functor satisfies that for an identity mapping, the values remain the same, also the composition law fmap (f . g) == fmap f . fmap g
holds. The infix operator for fmap
is <$>
.
class Functor (f :: * -> *) where
fmap :: (a -> b) -> f a -> f b
Applying a function to a value that is inside of a structure is refered to as lifting.
For nested functor application, e.g. when applying a function to characters which are stored in a list of String
s, a syntax such as (fmap . fmap) strFn dataStruct
can be used.
In order to use a higher kinded Type, e.g. * -> * -> *
, as a Functor
, one of the type parameters has to be applied. This can either be done with a concrete type such as Integer
or with a type variable a
, and results in the kind * -> *
. Sample snippet:
data Two a b = Two a b deriving (Eq, Show)
instance Functor (Two a) where
fmap f (Two a b) = Two a (f b)
A natural transformation is changing the structure while preserving the content.
{-# LANGUAGE RankNTypes #-}
type Nat f g = forall a . f a -> g a
17 Applicative
An applicative is a monoidal functor. Opposed to fmap
, with <*>
the function (that is applied to the enclosed values) is inside a functor itself. Intuitively this can be understood as mapping a plurality of functions over a plurality of values. The type info is the following:
class Functor f => Applicative (f :: * -> *) where
pure :: a -> f a
(<*>) :: f (a -> b) -> f a -> f b
The function pure
can be though of as embedding a value into any structure (functor). For example pure 1 :: [Int]
gives [1]
.
An Applicative
satisfies the following four laws:
- Identity:
pure id <*> v = v
- Composition:
pure (.) <*> u <*> v <*> w = u <*> (v <*> w)
- Homomorphism (structure preserving):
pure f <*> pure x = pure (f x)
- Interchangeability:
u <*> pure y = pure ($ y) <*> u
17.1 Examples
Command | Result |
---|---|
(,) <$> [1, 2] <*> [3, 4] | [(1,3),(1,4),(2,3),(2,4)] |
(+) <$> [1, 2] <*> [3, 5] | [4,6,5,7] |
liftA2 (+) [1, 2] [3, 5] | [4,6,5,7] |
17.2 Testing
Validating whether a data structure satisfies the mentioned laws can be done with the checkers package. The following snippets validates an Applicative
. Note that the value is not actually being used. Its purpose is to indicate which types to validate.
module ApplicativeTests where
import Test.QuickCheck
import Test.QuickCheck.Checkers
import Test.QuickCheck.Classes
list = [("b", "w", 1)]
main = do
quickBatch $ applicative list
17.3 Maybe
Haskell Prelude implementation of Maybe
‘s Applicative
instance (source).
-- | @since 2.01
instance Applicative Maybe where
pure = Just
Just f <*> m = fmap f m
Nothing <*> _m = Nothing
liftA2 f (Just x) (Just y) = Just (f x y)
liftA2 _ _ _ = Nothing
Just _m1 *> m2 = m2
Nothing *> _m2 = Nothing
18 Monad
Monad is a typeclass reifying an abstraction that is commonly used in Haskell. Instead of an ordinary function of type a
to b
, it is functorially applying a function which produces more structure itself and using join to reduce the nested structure that results. In other words, it is the process of taking a function that converts a value of type a
into another type (b
), wrapped within a third type c
. This function is applied to a value (of type a
) wrapped within c
. The resulting structure is then reduced from c c b
to c b
.
class Applicative m => Monad (m :: * -> *) where
(>>=) :: m a -> (a -> m b) -> m b
(>>) :: m a -> m b -> m b
return :: a -> m a
fail :: String -> m a
{-# MINIMAL (>>=) #-}
>>=
is called bind operator. Intuitively it can be understood as given a couple of wrapped values and a function that can be applied to these, the bind operator applies the function to each of the values. Special about it is (compared to fmap
) that the argument order is flipped and the mapping function returns a monad itself which is joined to make sure the output is not nested. The application to the list monad clarifies what that means: (>>=) :: [a] -> (a -> [b]) -> [b]
.
*>
for Applicative
corresponds to >>
for Monad
. The do
syntax is converted into each line being concatenated with the following line using one of the two operators. Variable assignments <-
are converted to >>=
, for example
do
name <- getLine
putStrLn name
becomes the following:
getLine >>= \name -> putStrLn name
Control.Monad
contains a join
function. The book introduced it with the example join $ putStrLn <$> getLine
, which would, without join
, fail because of nested IO
s.
Example of using the do
syntax in combination with the List
monad:
twiceWhenEven :: [Integer] -> [Integer]
twiceWhenEven xs = do
x <- xs
if even x
then [x*x, x*x]
else [x*x]
The Monad laws are
- Right identity
m >>= return = m
. Applyingreturn
leaves the data untouched. - Left identity
return x >>= f = f x
. Applyingreturn
leaves the data untouched. - Associativity
(m >>= f) >>= g = m >>= (\x -> f x >>= g)
. Regrouping the functions should not have any impact on the final result.
Using Checkers (as in 17.2) with quickBatch (monad [(a, b, c)])
where a
, b
, and c
are three values which indicate the type to be used.
The Kleisli composition (fish operator: >=>
) is about composing two functions which both return monads. It can be imported with import Control.Monad ((>=>))
and has the following signature (in comparison to normal function composition):
(.) :: (b -> c) -> (a -> b) -> a -> c
(>=>) :: Monad m => (a -> m b) -> (b -> m c) -> a -> m c
Given a Monad
instance, the instances for Functor
and Applicative
can be implemented automatically, as shown in the following snippet.
instance Functor (State s) where
fmap = Control.Monad.liftM
instance Applicative (State s) where
pure = return
(<*>) = Control.Monad.ap
19 Applying Structure
The operators *>
, <*
, and >>
discard one of their arguments and are often used in combination with functions that emit side effects.
19.1 JSON Parsing Example
The data is nested inside of the Parser
monad so the value constructor Payload
needs to be lifted.
parseJSON :: Value -> Parser a
(.:) :: FromJSON a => Object -> Text -> Parser a
instance FromJSON Payload where
parseJSON (Object v) =
Payload <$> v .: "from"
<*> v .: "to"
<*> v .: "subject"
<*> v .: "body"
<*> v .: "offset_seconds"
parseJSON v = typeMismatch "Payload" v
20 Foldable
The foldable typeclass has the following definition:
class Foldable (t :: * -> *) where
Data.Foldable.fold :: Monoid m => t m -> m
foldMap :: Monoid m => (a -> m) -> t a -> m
foldr :: (a -> b -> b) -> b -> t a -> b
Data.Foldable.foldr' :: (a -> b -> b) -> b -> t a -> b
foldl :: (b -> a -> b) -> b -> t a -> b
Data.Foldable.foldl' :: (b -> a -> b) -> b -> t a -> b
foldr1 :: (a -> a -> a) -> t a -> a
foldl1 :: (a -> a -> a) -> t a -> a
Data.Foldable.toList :: t a -> [a]
null :: t a -> Bool
length :: t a -> Int
elem :: Eq a => a -> t a -> Bool
maximum :: Ord a => t a -> a
minimum :: Ord a => t a -> a
sum :: Num a => t a -> a
product :: Num a => t a -> a
{-# MINIMAL foldMap | foldr #-}
For its definition it is sufficient to provide an implementation for either foldMap
or foldr
. All other functions can be deduced from that.
Monoid
s are related to the functions foldr
and foldMap
. The former uses the monoid definitions of elements inside of the foldable structure to combine them. The latter converts the elements to monoidal values and folds subsequently. In both cases the default value is provided by the monoid identity.
The null
function returns True
if the data structure is empty. Note, that e.g. null (Left 1)
is True
, because Left
is considered empty whereas Right
is not.
Noteworthy are also toList
, length
, and elem
. All three ignore the non-monoid values, for instance length (1, 1)
is 1
.
Both, maximum
and minimum
, require the contained types to be Ord
and return the maximum and minimum value respectively. They cannot be applied to empty structures (otherwise an exception is thrown).
21 Traversable
Traversable
allows for the processing of values inside a data structure as if they were in sequencial order. Opposed to Functor
, where function applications happen semantically in parallel. Return values of later function applications of Traversable
can depend upon the earlier results. That can be seen as an accumulation of applicative contexts. The typeclass definition is the following:
class (Functor t, Foldable t) => Traversable (t :: * -> *) where
traverse :: Applicative f => (a -> f b) -> t a -> f (t b)
sequenceA :: Applicative f => t (f a) -> f (t a)
mapM :: Monad m => (a -> m b) -> t a -> m (t b)
sequence :: Monad m => t (m a) -> m (t a)
{-# MINIMAL traverse | sequenceA #-}
The typeclass satisfies the following rules:
- Naturality:
t . traverse f = traverse (t . f)
- Identity:
traverse Identity = Identity
. That isTraversable
instances cannot inject any additional structure. - Composition:
traverse (Compose . fmap g . f) = Compose . fmap (traverse g) . traverse f
. Multiple traversals can be collapsed into a single traversal usingCompose
(which combines structure).
traverse
and sequenceA
can be defined in terms of each other: traverse f = sequenceA . fmap f
and
sequenceA :: Applicative f => t (f a) -> f (t a)
sequenceA = traverse id
The function catMaybes
from Data.Maybe
converts a Traversable
of Maybe
values into a Traversable
with all Just
values. For instance catMaybes [Just 1, Just 2, Nothing]
is [1,2]
.
The function traverse
applies a function (a -> f b)
to values inside a data structure t a
and flips the result by returning f (t b)
.
Traverable
implementation for Either
:
instance Traversable (Either a) where
traverse _ (Left x) = pure (Left x)
traverse f (Right y) = Right <$> f y
Traversable
requires Foldable
because it is proven that any Traversable
can also implement Foldable
. The constraint is just there to enforce that there must be an instance (source).
22 Reader
The core problem that Reader
solves is the application of an argument to many functions. It is inconvenient to have a similar signature across many functions. The Reader
is wrapping a function which maps from r
to a
.
newtype Reader r a = Reader { runReader :: r -> a }
The Functor
instance of a function (-> r)
is composition. The Applicative
and Monad
instances allow for the mapping of a function that expects an a
over another function that expects an a
as well. The result of the first function is fed into the second together with an a
(see code snippet below).
(<*>) :: (r -> a -> b) -> (r -> a) -> (r -> b)
(<$->>) :: (a -> b) -> (r -> a) -> (r -> b)
(<$->>) = (<$>)
(<*->>) :: (r -> a -> b) -> (r -> a) -> (r -> b)
(<*->>) = (<*>)
Here is the Monad
instance with and without function:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
(>>=) :: (r -> a) -> (a -> r -> b) -> (r -> b)
The function fmap
can be used for function composition: Let f
and g
be two functions, than f . g = fmap f g
.
The extension {-# LANGUAGE InstanceSigs #-}
allows for the explicit definition of function signatures of typeclasses. It is not necessary for any compilation purpose because the compiler knows the signatures anyways. However, it can be helpful for the sake of clarification.
23 State
The state type is rather a state processor. It takes a state, outputs a new state, and thereby emits something. The newtype
definition is
newtype State s a = State { runState :: s -> (a, s) }
A random number generator serves as a good example of usage: It requires some seed in order to generate a number and outputs a new seed which may be fed into the generator the next time.
24 Parser Combinators
A parser converts textual input into some data structure output. The textual input must be in conformance with a set of rules. A parser combinator is a higher order function which composes multiple parsers to yield a single parser.
The chapter uses the parser module trifeca
(import Text.Trifecta
), however, it seems to be common to use Megaparsec. Links:
- Text.Trifecta.Parser.Char
- Text.Trifecta.Parser.Combinators contains e.g.
eof
(end of file)
trifecta
contains parsers and is used in combination with the parsers
library with defines common parser classes, abstracting over common things parsers do.
Parsers can thrown an error with the unexpected
function.
import Text.Trifecta
stop :: Parser a
stop = unexpected "stop"
The parser type is very similar to State
. It takes a string, parses it, and returns Nothing
in case of failure. If parsing was successful, it returns a data structure and the remainder of the String
.
type Parser a = String -> Maybe (a, String)
parseString :: Parser a -> Delta -> String -> Result a
is used to run a Trifecta parser. Delta
may be mempty
. The Attoparsec equivalent is parseOnly
. Parsing functions of the most common libraries.
trifP :: Show a => Parser a -> String -> IO ()
trifP p i = print $ parseString p mempty i
parsecP :: (Show a) => Parsec String () a -> String -> IO ()
parsecP = parseTest
attoP :: Show a => A.Parser a -> ByteString -> IO ()
attoP p i = print $ parseOnly p i
nobackParse :: (Monad f, CharParsing f) => f Char
nobackParse = (char '1' >> char '2') <|> char '3'
In the following simple example, a parser accepts only the character sequence ab
:
abAccepted = char 'a' >> char 'b'
abParser = parseString abAccepted mempty
abParser "ab"
-- Success 'b'
abParser "a"
-- Failure (ErrInfo {_errDoc = [1m(interactive)[0m:[1m1[0m:[1m2[0m: [91merror[0m: unexpected
-- EOF, expected: "b"
-- a[1m[94m<EOF>[0;1m[0m
-- [92m^[0m , _errDeltas = [Columns 1 1]})
The parser above does not necessarily consume all given input. For instance, for "abc"
it would return Success
as well. That can be changed using eof
: string "ab" >> eof
. Inputs such as "abc"
would then result in Failure [...] expected: end of input [...]
. eof = notFollowedBy anyChar
In most cases, a parser should not raise any error other than the parsing Failure
that it may return. Other exceptions should be prevented from happening. The following example catches a division by zero error using fail
.
parseFraction :: Parser Rational
parseFraction = do
numerator <- decimal
_ <- char '/'
denominator <- decimal
case denominator of
0 -> fail "Denominator most not be zero"
_ -> return (numerator % denominator)
Parser libraries in Haskell are parsec
, attoparsec
, megaparsec
, and trifecta
. aeson
parses JSON, cassava
parses CSV. Polymorphic parsers are written in a general manner an can be executed with any parser that implements the functions. The following is an example signature of a polymorphic parser.
parseFraction :: (Monad m, TokenParsing m) => m Rational
Example for a parser which parses a number or a string:
-- parse number or string
type NumberOrString = Either Integer [Char]
parseNos :: Parser NumberOrString
parseNos =
skipMany (oneOf "\n ") >>
(Left <$> integer) <|> (Right <$> some letter)
print $ parseString parseNos mempty "123"
print $ parseString parseNos mempty "\nabcdf"
The <?>
operator can be used to annotate branches of a parsing instruction. In the following example, Tried 12
will be printed if the 12
branch was chosen instead of the 3
:
tryAnnot :: (Monad f, CharParsing f) => f Char
tryAnnot = (try (char '1' >> char '2') <?> "Tried 12") <|> (char '3' <?> "Tried 3")
My BNF-Parser project (using Megaparsec).
25 Composing Types
The following newtype
constructs a datatype by composing datatype constructors. The kind is Compose :: (* -> *) -> (* -> *) -> * -> *
, comparable to function composition (.)
.
newtype Compose f g a = Compose { getCompose :: f (g a) }
deriving (Eq, Show)
The specialty of the Monad
type is that it cannot be implemented for the Compose
newtype
from above. That is, the following function does not exist in a generic manner: (>>=) :: Compose f g a -> (a -> Compose f g b) -> Compose f g b
, see (Mark P. Jones and Luc Duponcheel (1993), “Composing Monads”). This is where transformers come in, for instance IdentityT
newtype IdentityT f a = IdentityT { runIdentityT :: f a }
deriving (Eq, Show)
instance (Monad m) => Monad (IdentityT m) where
return = pure
(IdentityT ma) >>= f =
IdentityT $ ma >>= runIdentityT . f
where IdentityT [...] runIdentityT . f
is required because the used >>=
is the one of the Monad m
, so IdentityT
needs to be unpacked.
Transformers have additional information on how to do the unpacking and subsequent boxing that is specific to the given monads and cannot be generalized. Conversion from f (g (f b))
to f (f b)
in order to be able to bind and return g (f b)
. Binding means converting f (f b)
to f b
, which is possible since f
is a Monad
.m (T m b) -> m (m b) -> m b -> T m b
With two nested Functors, say a List
of Maybe
s, there is a guarantee that the nested data type is also a Functor
. The same applies to Applicative
. For Monad, this does not hold.
26 Monad Transformers
A monad transformer is a type constructor that takes a monad as an argument. This monad is wrapped around another contained data type. The transformers themselfes are not generically monads, but implement the Monad
typeclass logic. The inner type corresponds commonly to the transformer, i.e. the maybe-transformer MaybeT
is m (Maybe a)
.
The transformers library contains many implementations of transformers and should be preferred over own implementations, if possible.
Identity
can be used as a Monad
which converts transformers into their original types. For instance type Maybe a = MaybeT Identity a
.
The base monad is the outer-most monad. Here it would be m
: StateT { runStateT :: s -> m (a, s) }
Maybe Transformer MaybeT
newtype MaybeT m a = MaybeT { runMaybeT :: m (Maybe a) }
State Transformer StateT
{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE InstanceSigs #-}
module StateT where
import Control.Arrow (first)
newtype State s a = State { runState :: s -> (a, s) }
-- Structure: StateT function > Monad > Tuple
newtype StateT s m a = StateT { runStateT :: s -> m (a, s) }
instance (Functor m) => Functor (StateT s m) where
fmap f (StateT a) = StateT $ \s -> first f <$> a s
instance (Monad m) => Applicative (StateT s m) where
pure a = StateT $ \s -> pure (a, s)
StateT g <*> StateT h = StateT $ \s -> do
(f, s') <- g s
(x, s'') <- h s'
return (f x, s'')
instance (Monad m) => Monad (StateT s m) where
return = pure
StateT sma >>= f = StateT $ \s -> sma s >>= \(a, s') -> runStateT (f a) s'
Lifting functions exist with multiple different signatures but should always do the same thing. The functions exist only for historical reasons.
fmap :: Functor f => (a -> b) -> f a -> f b
liftA :: Applicative f => (a -> b) -> f a -> f b
liftM :: Monad m => (a -> r) -> m a -> m r
MonadTrans is a typeclass with a lift method. Lifting means embedding an expression in a larger context by adding structure that does not do anything.
class MonadTrans t where
lift :: (Monad m) => m a -> t m a
MonadIO
resides in Control.Monad.IO.Class
and is intended to keep lifting an IO action until it is lifted over all structure that surrounds the outermost IO type.
liftIO . return = return
liftIO (m >>= f) = liftIO m >>= (liftIO . f)
For streaming it is generally not recommended to use Writer
, WriterT
, or ListT
for performance reasons. Libraries such as pipes or conduit are better choices.
27 Nonstrictness
Evaluation can be strict or nonstrict. Haskell is nonstrict. Nonstrictness refers to semantics, that is how an expression is being evaluated. It means that expressions are evaluated outside-in. Lazy refers to operational behavior, the way code is executed on a real computer, namely as late as possible. SO answer on the differences.
In Haskell an expression does not need to be recomputed, once it is evaluated. “Don’t re-evaluate if you don’t have to.”
The evaluation of an expression can be forced using the function seq :: a -> b -> b
. It evaluates its first argument as soon as evaluation of the second argument is required.
The following snippet forces evaluation of x
by tying it to b
:
discriminatory :: Bool -> Int
discriminatory b =
let x = undefined
in case x `seq` b of
False -> 0
True -> 1
This translates to two nested case
blocks:
discriminatory =
\ b_a10D ->
let {
x_a10E
x_a10E = undefined } in
case
case x_a10E of _ {
__DEFAULT -> b_a10D
} of _ {
False -> I# 0;
True -> I# 1
}
This statement will not bottom out case undefined of { _ -> False}
whereas case undefined of { DEFAULT -> False }
cannot be simplified by the compiler and will result in an error.
There are some compiler options available that allow for observation of the interpretation of the Haskell code (core dump). For instance :set -ddump-simpl
or :set -dsuppress-all
.
Call strategies are call by value, i.e. evaluation of an argument before passing it to a function, and call by reference arguments are not necessarily evaluated. Call by need ensures that expressions are only evaluated once.
A thunk is used to reference suspended computations that might be performed or computed at a later point in a program.
The trace
function in Debug.Tace
allows for logging at places, where the IO
type is not present. That way, evaluation can be debugged. For instance let a = trace "a" 1
would log "a"
, as soon as a
is being evaluated.
Writing pointfree functions allows for caching, as can be seen in the following example:
import Debug.Trace
f :: Int -> Int
f = trace "f called" (+) 1
f' :: Int -> Int
f' b = trace "f' called" 1 + b
f 5
f 6
f' 5
f' 6
which outputs
f 5
f called
f' 5
f' called
f' 6
f' called
If f
was, however, defined with the signature f :: Num a => a -> a
, it would not be cached. That is because typeclass constraints will be resolved into additional arguments.
Forcing a value not to be shared can be done by applying unit to it, as if it was a function. let f x = (x ()) + (x ())
, where the signature is f'' :: (() -> Int) -> Int
.
let
can be used to force sharing. Here, (1 + 1) * (1 + 1)
, 1 + 1
would be computed twice. With let
only once: let x = 1 + 1 in x * x
.
A function’s pattern matching can be refutable (German: widerlegbar) or irrefutable. The former is for instance f True = ...
, the latter f _ = ...
or f x = ...
(always matching). The terminology is not about the function, but about a single pattern matching expression, i.e. one line.
Lazy pattern matches can be implemented using the ~
. The first function will fail, when invoked with undefined
, whereas the latter works. The latter also makes the pattern irrefutable though.
strictPattern :: (a, b) -> Integer
strictPattern (a,b) = const 1 a
lazyPattern :: (a, b) -> Integer
lazyPattern ~(a,b) = const 1 a
Bang patterns can be used to force evaluation of function parameters (see example below) or value contstructor parameters.
banging :: Bool -> Int
banging !b = 1
The extensions {-# LANGUAGE Strict #-}
and StrictData
force strictness for expressions in the particular source code file. Thereby, they avoid seq
, ~
, and !
being all over the place, if everything is supposed to be strict. The meaning of ~
is then inverted, i.e. it forces lazyness.
28 Basic Libraries
28.1 Benchmarking and Profiling
The library criterion
can be used for benchmarking.
- Import:
import Criterion.Main
- Compiler options:
stack ghc -- -O2 file.hs
(or without Stack:ghc -O2 file.hs
).-02
enables the highest level of optimization - Measures how long it takes (on average) to evaluate a certain expression.
whnf
(weak head normal form) evaluates until it reaches the first data constructor (used most of the time);nf
(normal form) evaluates everything.
The sample snippet
main :: IO ()
main = defaultMain
[ bench "test" $ whnf ([1..9999] !!) 9998 ]
could have this output:
benchmarking test
time 41.14 μs (37.59 μs .. 46.23 μs)
0.871 R² (0.739 R² .. 0.992 R²)
mean 39.91 μs (37.36 μs .. 46.33 μs)
std dev 13.83 μs (3.873 μs .. 25.74 μs)
variance introduced by outliers: 98% (severely inflated)
CAFs (constant applicative forms) are expressions that have no free variables and are held in memory (pointfree top-level declarations). For instance (in the file profile.hs)
memoizedFib :: Int -> Integer
memoizedFib = (map fib [0 ..] !!)
where fib 0 = 0
fib 1 = 1
fib n = memoizedFib (n-2) + memoizedFib (n-1)
main :: IO ()
main = putStrLn . show $ memoizedFib 1000
the profiling
stack ghc -- -prof -fprof-auto > -rtsopts -O2 profile.hs
./profile +RTS -P
cat profile.prof
outputs (excerpt):
COST CENTRE MODULE SRC %time %alloc ticks bytes
memoizedFib.fib Main profile.hs:(3,10)-(5,54) 100.0 48.2 7 124200
CAF GHC.IO.Handle.FD <entire-module> 0.0 13.5 0 34704
CAF GHC.IO.Encoding <entire-module> 0.0 1.1 0 2768
main Main profile.hs:8:1-41 0.0 8.5 0 21800
memoizedFib Main profile.hs:(2,1)-(5,54) 0.0 28.0 0 72120
28.2 Map
Package: Data.Map.Strict
Access to values through their keys, access time O(log n).
28.3 Set
Package: Data.Set
A set stores values (none must occur more than once). It can be seen as a map without values. Access time complexity: O(log n).
28.4 Sequence
Package: Data.Sequence
While appending to a normal Haskell list has O(n) complexity, appending to a sequence is as fast as prepending to a normal list, i.e. O(1).
28.5 Vector
Package: Data.Vector
(https://hackage.haskell.org/package/vector)
A vector wraps an array. Should be used when having high performance requirements, accessing elements by indexing with Int
, slicing (partitioning) is done, or uniform access time is needed.
28.6 Strings
String
: Standard, slow, list of characters, possibly infinite.Text
: Text encoded as UTF-16, more efficient thanString
in terms of storage.ByteString
: Internally represented as a vector ofWord8
values (i.e. bytes), can contain non-text data. Easy to use via theOverloadedStrings
extension.
29 IO
The IO Monad is just an instance of the ST monad, where the state is the real world.
The IO
:info
:
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
instance Applicative IO
instance Functor IO
instance Monad IO
instance Monoid a => Monoid (IO a)
IO
disables the reordering of operations.
An expression is referentially transparent, if it can be replaced with its value without changing the behavior of a program.
30 Error Handling
The Exception
class lives in GHC.Exception
and is defined as follows:
class (Typeable e, Show e) => Exception e where
toException :: e -> SomeException
fromException :: SomeException -> Maybe e
displayException :: e -> String
instance Exception SomeException
instance Exception ErrorCall
instance Exception ArithException
Some types that have an instance of the Exception
class are IOException
, ErrorCall
, AssertionFailed
, and ArithException
. The latter contains several values, namely Overflow
, Underflow
, LossOfPrecision
, DivideByZero
, Denormal
, and RatioZeroDenominator
.
Existential quantification allows for the definition of an exception type that represents a variety of values, some of which may have been unknown at the type the exception type was defined. SomeException
works that way and is defined as follows:
data SomeException where
SomeException :: Exception e => e -> SomeException
Exceptions occur most commonly in IO
, because the function calls depend on the outside world. For a simple writeFile
call, exception handling may look like that:
import Control.Exception
import Data.Typeable
handler :: SomeException -> IO ()
handler (SomeException e) = do
print (typeOf e)
putStrLn show e
main :: IO ()
main = writeFile "file.txt" "content" `catch` handler
A main function can be called with command line arguments from within REPL using the command :main -arg -arg2
.
Both, trow
and throwIO
, allow for raising an exception. Generally, throwIO
is being used.
A sum type is a convenient way of grouping several exceptions which can be caught collectively.
Asynchronous exceptions are exceptions raised in a thread, other than the one which will handle the exception.
Quotes
Some funny quotes from the book.
- As natural as any competitive bodybuilder:
data Nat = Zero | Succ Nat
- Do notation considered harmful! Just kidding.
- If that succeeded, let’s fire up a REEEEEEEPL and see if we can call sayHello.
- We’re going to return to the topic of natural transformations in the next chapter, so cool your jets for now.
- If this seems confusing, it’s because it is.
- And putStrLn takes a String argument, performs I/O, and returns nothing interesting — parents of children with an allowance can sympathize.
- Fail fast, like an overfunded startup
- This is how you learn to play type Tetris with the pros.
- The rest of the chapter will wait while you verify these things.
- Try it a couple of times to see what we mean. It seems unlikely that this will develop into a gambling addiction.
- In reality, a modern and mature parser design in Haskell will often look about as familiar to you as the alien hellscape underneath the frozen crust of one of the moons of Jupiter.
- In this chapter we will… …work through an
Identity
crisis. - Keep in mind what these are doing, follow the types, lift till you drop.
- We will… …live the Thunk Life
- We have measured time; now we shall measure space. Well, memory anyway; we’re not astrophysicists.
- Preserve context and try to make it so somebody could understand the problem you’re solving from the types. If necessary. On a desert island. With a lot of rum. And sea turtles.