I have been working on property testing for years now. My current conclusion is that property testing still has so much uncovered potential that with just some more effort, it can become an even greater tool for software development than it already is today. The concept is relatively young and the design space relatively unexplored. This post seeks to provide an overview of the different approaches and to outline a comparison between the most commonly used libraries for property-based testing in Haskell. These include two popular libraries: QuickCheck and HedgeHog, and a new approach called “Validity-based Testing”.
There are many ways to try to make sure that code works. They each have their own trade-offs and we should consider these trade-offs carefully. The first step is to consider how expensive breakage would be. The next step is to use the right tools. Use the right programming language, build system, etc, so that code is safe in certain ways automatically. Is it worth it to formally verify all of your code? Maybe, probably not. When you have gone through these steps, you turn to testing, and most likely: unit testing.
By “unit test”, here, I mean a test that checks whether the
output of a function, for a single example, is equal to the
expected output. An example hspec snippet in
Haskell:
myFunction exampleInput `shouldBe` expectedOutput
There are two standard problems with unit tests:
Property testing is a different approach to testing. The idea is that you can make a test parametric in a certain argument, and the test should still pass for every argument that you pass in. The next question is then: Where do you get these arguments? That depends on the kind of property test:
An exhaustive property test would be the most comprehensive, but
for argument types inhabited by a very large or infinite number of
values, like Int64 or String, this is
infeasible. To make the runtime of a property test reasonable, we
usually opt for a randomised property test. Randomised property
testing probabilistically solves the coverage problem of tests, but
it exacerbates the developer cost problem. It is generally harder
to come up with a general property of your code than an
input-output example. You could argue that one would never write as
many unit tests as property tests will generate for them, but the
point here is about the strict amount of thinking and typing that
the programmer has to do to get started. It is a trade-off.
In this post, we will use the following running example of a property test:
(str -> reverse (reverse str) == str) :: String -> Bool
This property states that if you reverse a string twice, then
you get the original string. (For the purposes of this posts, we
assume that data structures are finite. This is important because
testing frameworks require that data is Show-able in
order to print counter examples.)
Randomised property testing requires a way to generate the random values. These generators are responsible for generating the arguments to the property tests. An appropriate generator can often be the difference between finding a bug, and not finding a bug.
For example, when only generating small lists, the following property test may seem to hold:
(ls -> length ls < 100)
It is important to note that not all types allow easy sampling
of values. Generating a value of type Word8 is
relatively simple: We can just choose a value between
0x00 and 0xff uniformly at random.
However, generating a value from an unbounded type like
String is tricky. Indeed, there is no way to sample a
value uniformly from the type String because there are
infinitely many Strings, and there isn’t one obvious way to sample
from an infinite set. To solve this problem, generators often use
some way to guide how large of a value to generate. This is usually
called the size parameter. This size parameter then allows the
generator for Strings to generate a
String up to the given size.
Generators can usually also be adapted to the exact test at hand. For example, combinators can usually be composed to build bigger combinators, or filtered to only return the generated values that satisfy a given predicate.
When a property test fails, it is likely that the value that made the test fail is large. This makes it hard to reason about why the test failed. A standard solution to this problem is to try and shrink the value that makes the test fail. The test is re-run with smaller values to make sure that the test still fails with the smaller value. After shrinking, the resulting value should be of a more manageable size.
Suppose we believe that the following property holds:
After lower-casing every character in a string, all the characters in this string will be lower-case.
str -> all isLower (map toLower str)
When we run the property test without shrinking, we may get a counterexample like the following:
"As5Enu3auO4"
It is certainly not immediately obvious to me why this is a counterexample for the above property. However, when we rerun the property test with shrinking, the counterexample is shrunk to the following:
"1"
Now it becomes rather obvious why the test failed, if we start debugging as follows:
λ toLower '1'
'1'
λ isLower '1'
False
Indeed, applying toLower to the character
'1' returns '1', but it is also not
considered lower-case by isLower. In this case, the
property that we thought would hold just does not hold. We could
try to specialise it so that we get a property that does hold, but
that is beyond the scope of this example.
Note that the shrunk counterexample: "1" is not
necessarily part of the original counter example. This brings me to
my next point:
It is important that the shrunk value does not just fail the test, it must also have been generatable by the generator that was used. For example, consider the following property test that you may mistakenly write:
For every number larger than six: the number is larger than five and odd. (pseudocode)
∀ n. (n > 6) => (n > 5 && odd n)
This property is designed to be tested with values larger than
six. We can constrain the set to draw random values from to the set
of integers larger than six, using randomised specialised testing.
However, if the shrinking method is not adapted as well, to
preserve this constraint on input set, then something unexpected
might occur. At some point, the generator might generate
8 as its random integer larger than 6,
and fail the property test. Before this value is reported, it is
shrunk to zero because this is the smallest integer, and shrinking
might as well try this first. (Note that shrinking is for humans,
which is why integers are shrunk to zero, (debatably) the simplest
integer, instead of minBound :: Int, the smallest
integer.) Zero still fails the property test, because zero is not
greater than five, so we report that this property test fails
because zero fails to make the property evaluate to
True. This will make the output look something like
the following:
failed: the following number larger than six does not pass the property test: 0
It is therefore important that the shrinking method preserves the intended invariants of the subset that one wants to test. A value should only ever be shrunk to a smaller value that is still within this subset.
The QuickCheck library was the pioneer of randomised property
testing in Haskell. QuickCheck makes use of a type class called
Arbitrary.
The Arbitrary type class contains two
functions:
arbitrary :: Arbitrary a => Gen a to generate
random values. arbitrary is the default generator to
use to generate values of a given type.shrink :: Arbitrary a => a -> [a] to shrink
a given value.You can run a quickcheck by passing something
Testable to the
quickCheck function. Something
Testable is usually of the form FooBar ->
Bool where FooBar is a member of the
Arbitrary type class. Quickcheck then generates random
values of type FooBar and evaluates the property with
it to check that the resulting boolean is True.
The example property of the reverse function is
written as follows, using QuickCheck:
λ> quickCheck $ (str :: String) -> reverse (reverse str) == str
+++ OK, passed 100 tests.
The QuickCheck approach has certain trade-offs, some of which we’ll discover and analyse below.
To test whether a given binary relation <|*|>
is reflexive, we can write a property test as follows:
x -> x <|*|> x
If we want to test reflexivity of many binary relations, we may write a function that takes a binary relation and produces a property. Such a function is called a property combinator:
reflexive :: Arbitrary a => (a -> a -> Bool) -> Property
reflexive rel = x -> x `rel` x
Now we can use this property combinator to test reflexivity of
any binary relation for which the arguments are a member of
Arbitrary.
Note that there is an important limitation to this approach.
This property combinator can only test reflexivity of a given
binary relation if the relation is reflexive for ALL possible
values that may be generated by the arbitrary
generator.
This is already a problem with Rational (from
GHC.Real) and Eq, for example.
The Eq instance for Ratio a asssumes
that values are normalised:
> 2 :% 2 == 1 :% 1
False
If arbitrary :: Gen Rational ever generates
un-normalised values, then we can no longer test properties which
require that (==) works as if the values are actual
ratios, even if the property holds for all values of
Rational except un-normalised values. This brings me
to the next point
Arbitrary type
classTo continue with the previous example of Rational.
How should we implement the Arbitrary instance for
Rational:
0 :% 0?1 :% 0?5 :% -1? How about
-5 :% 1?There are some considerations here:
arbitrary :: Gen Rational never generates
un-normalised values, then property tests about functions that deal
with (==) assuming that values are actual ratios will
never test what happens for this edge-case. Maybe they work, maybe
they do not.arbitrary :: Gen Rational generates
un-normalised values, then we cannot test properties which require
that (==) works as if the values are actual ratios,
even if the property holds for all values of Rational
except un-normalised values.There is a solution for problems like this in
QuickCheck: Using newtypes to guide generation. For
example, the Positive a newtype wrapper has a
specialised generator to make sure that arbitrary :: Positive
Rational only generates positive Rational
values. One could imagine a newtype wrapper like Normalised
a that could have a specialised generator to make sure that
Normalised Rational never generates un-normalised
values, if arbitrary :: Gen Rational is implemented to
sometimes generate un-normalised values.
However, this newtype approach has an important limitation as well. The property test to test for reflexivity would now look something like this:
reflexive ((Normalised nn1) (Normalised nn2) -> nn1 == nn2)
(Note that you cannot just supply a custom generator if you want
to use property combinators that assume an Arbitrary
constraint.) This is rather cumbersome. It is a direct result of
the fact that values like 5 :% -5 are not considered
valid values. Who decided that these values are invalid even though
they are certainly members of their type? It is the programmer who
wrote the GHC.Real module and decided that
Ratio values should satisfy the invariant that they
are normalised. These invariants are not statically checked.
QuickCheck’s Arbitrary typeclass makes it cumbersome
to work with values for which not all values are considered valid
values.
The authors of QuickCheck chose to not have a default
implementation for Arbitrary. This is because there
are certain problems with the automatic generation of
Arbitrary instances.
The QuickCheck documentation mentions that:
There is no generic arbitrary implementation included because we don’t know how to make a high-quality one. If you want one, consider using the testing-feat or generic-random packages.
There are multiple seperate libraries to allow programmers to
have GHC implement Arbitrary instances automatically
using Generic instances. However, using one of these
is not common practice.
As a result, there is no default implemetation of
arbitrary and the default implementation for
shrink is const []. This means that by
default, shrink is never overridden with a more
sensible implementation and counter examples are never shrunk.
Because programmers have to implement generators and shrinking
themselves, using QuickCheck (and Arbitrary in
particular) can be considered expensive (in developer-time) and
friction-full.
ArbitraryA minor problem, but an annoying one nonetheless, is the
question of where to write the Arbitrary
instances.
Arbitrary instances next to the
data types, then the entire library that we are writing must depend
on QuickCheck.Arbitrary instances in the test
suite, then they are orphan instances and we have to re-write them
for every test suite.Arbitrary instances in a seperate
library, then we can depend on this library for the
Arbitrary instances, but this adds maintenance
overhead.The most commonly chosen solution seems to be to put
Arbitrary instances in the
quickcheck-instances library, but this has another
problem: now the test suite depends on extra libraries for which
the instances are in quickcheck-instances, that the
test suite may not be using at all.
Hedgehog is a newer way to do property testing.
The example property of the reverse function is
written as follows, using Hedgehog:
import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range
prop_reverse :: Property
prop_reverse =
property $ do
xs <- forAll $ Gen.list (Range.linear 0 100) Gen.alpha
reverse (reverse xs) === xs
Hedgehog uses a different implementation of generators. In particular, hedgehog generators have built-in shrinking, so that every generated value can automatically be shrunk without any extra developer effort to implement a shrinking function. This built-in shrinking also automatically takes care to ensure that shrunk values obey the same invariants as the generator.
Whereas QuickCheck declares “shrink” as a function, in Hedgehog generating and shrinking are performed simultaneously. Instead of generating a single value, we can generate a rose tree of them, where the main output is at the root, and the other nodes carry shrunk versions of it, with the smallest ones at the leaves.
Generators are defined more generally, to allow for monadic
effects during generation. As such, generators are not seperated
from properties entirely. Indeed, as you can see in the
prop_reverse example above, the generation of data and
testing of properties can be interleaved monadically.
Hedgehog explicitly makes a tradeoff when it comes to shrinking. The Hedgehog authors chose to make shrinking sub-optimal in order to build it into generators.
The following is an example of such a tradeoff. Here Hedgehog is generating two characters that are equal, by generating them independently and then filterig the equal ones. It is a contrived example, admittedly, but it makes my point.
λ > let gen = Gen.enum 'a' 'b' in Gen.printTree (Gen.filter (uncurry (==)) ((,) <$> gen <*> gen))
=== Outcome ===
('b','b')
=== Shrinks ===
('a','a')
('a','a')
('a','a')
As you can see, Hedgehog correctly shrinks the generated value
('b', 'b') to tuples with equal characters ('a',
'a'), but it shrinks to the same value many times. This
problem grows as the generated datastructure grows in size, and as
the amount of options to try while shrinking grows. This is the
main drawback with integrating shrinking with generating: It
requires some duplicated work during shrinking.
Side note: QuickCheck handles this case wrong as well, but in a different way:
λ filter (uncurry (==)) $ QC.shrink ('b','b')
[]
Spoiler (see the rest of the post): genvalidity
handles this case correctly.
λ filter (uncurry (==)) $ shrinkUnchecked ('b', 'b')
[('a','a')]
Hedgehog does not use any type classes for generators like
QuickCheck does with Arbitrary. Instead, every
generator must be a manually-written value.
This means that Hedgehog does not have any problems with orphan instances or with a lack of laws or semantics. However, it also means that writing a generator for a value in Hedgehog now requires even more developer effort.
It is still possible to write property combinators for Hedgehog properties, but now the programmer will have to explicitly pass in a given generator:
associative :: Gen a -> (a -> a -> a) -> Property
This makes writing property combinators more cumbersome, and tests using property combinators more verbose.
Validity-based testing attempts to use the advantages from each of both QuickCheck and Hedgehog and make a different tradeoff. It focuses on making property testing cheap (in terms of developer time) and removing as much friction as possible.
The goals are:
The example property of the reverse function is written as
follows, using the validity-based libraries:
forAllUnchecked $ str -> reverse (reverse str) == (str :: String)
GenUnchecked type-class in the
genvalidity library for free generators and free
shrinking.The
GenUnchecked type class defines both a generator:
genUnchecked :: Gen a and a shrinking function
shrinkUnchecked :: a -> [a], just like
QuickCheck does with Arbitrary.
Unlike QuickCheck’s Arbitrary,
GenUnchecked has very clear semantics: Any value that
can possibly be represented by the type should be generatable
(excluding bottom values and infinite values). Because of these
very clear semantics, the implementations of both the
genUnchecked and shrinkUnchecked
functions are generated via the Generic instance by
default. This means that the programmer never has to write such a
generator themselves, and shrinking comes for free as well.
validity library to define invariants
explicitlyThe next step for validity-based testing is to assert that types with internal invariants are as natural as types themselves. Indeed, types exist to constrain the possible values of a given variable using compile-time checks whereas runtime invariants exist to constraint the possible values of a given variable using programming-time cleverness. This will be important later, to further remove friction, but requires a bit of extra up-front thought.
The validity library allows programmers to
explicitly define, in code (instead of documentation),
what it means for a value of a given type to be valid. Types that
have an instance of the Validity typeclass implement
the validate :: a -> Validation to explain all the
reasons why a value may be invalid. The semantics of this type
class is that an invalid value of any type should never exist at
runtime, otherwise the programmer has made a mistake.
For example, the type Map k v has an internal
invariant: it should always have
Data.Map.valid :: Ord k => Map k v -> Bool
evaluate to True.
Its Validity instance defines this explicitly:
-- | A 'Map' of things is valid if all the keys and values are valid and the 'Map' itself
-- is valid.
instance (Ord k, Validity k, Validity v) => Validity (Map k v) where
validate m =
mconcat
[ declare "The Map structure is valid." $ M.valid m
, delve "Map elements" $ M.toList m
]
This implementation checks that the internal map structure is
valid: M.valid, and that the individual elements of
the map are valid values by themselves as well.
The validate function does not just check whether a
value is valid, but also reports all the reasons why a value may
not be valid. This makes for very useful error messages. For
example:
newtype Prime = Prime { unPrime :: Int } -- INVARIANT: isPrime
instance Validity Prime where
validate (Prime p) = declare "The number is prime" $ isPrime p
λ let Left e = prettyValidate [Prime 2, Prime 3, Prime 5, Prime 6]
λ putStrLn e
The element at index 3 in the list
Violated: The number is prime -- Aha!
The default implementation of Validity uses a
type’s Generic instance to derive an implementation
that defines a value to be valid if all of its sub-parts are valid.
This makes it frictionless to instantiate Validity for
types of which the validity of its values depends only on its
sub-parts.
GenValid type-class in the
genvalidity library for free generators and free
shrinking for valid values.To generate valid values, genvalidity has the
GenValid type class. This type class also has very
clear semantics.
genValid :: Gen a generator must only ever
generate valid values.shrinkValid :: a -> [a] shrinking function
should must only ever shrink valid values to valid values..genValid generator should be able to generate
all possible valid values that can exist at runtime.These semantics also allow for default implementations:
genValid :: GenValid a => Gen a
genValid = genUnchecked `suchThat` isValid
shrinkValid :: GenValid a => a -> [a]
shrinkValid = filter isValid . shrinkUnchecked
This makes it relatively frictionless to generate valid values as well.
Note that if there are not many valid values of a given type, or
if checking validity is expensive, the default
genValid implementation using suchThat
may be too slow, and a programmer may have to override it with a
faster implementation. This is the tradeoff that
genvalidity makes to remove friction, but there are
helper functions in genvalidity to write faster
generators too.
As a result, the following is all that’s necessary for a programmer to write a custom type with generators, and start writing property tests:
{-# LANGUAGE DeriveGeneric #-}
module Example where
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance Validity MyType
instance GenUnchecked MyType
instance GenValid MyType
genValid to make it fasterWhile genValid is always correct by default, it can
sometimes be too slow. In such cases we want to override
genValid to make it faster, but then it is up to us to
make sure that it is still correct. When overriding
genValid, it is advisable to add
a test that makes sure that genValid still only
generates valid values It is also important that
genValid can generate any value that would be
considered valid. We cannot test for this, so it is up to the
programmer to maintain this property of the generator.
As an example, consider a newtype Prime = Prime Int
with a Validity instance that describes that the
integer inside is prime. The genUnchecked generator is
generated using the Generic instance, and it works
exactly as it should. The default implementation of
genValid is equivalent to the following:
genValid :: Gen Prime
genValid = genUnchecked `suchThat` isValid
There are two ways to speed up a generator that uses
suchThat:
isValid afterward. This is particularly
useful if the isValid predicate is expensive to
compute. In the case of Prime: primality testing is
relatively expensive.genUnchecked and then filter them with
isValid. This is particularly useful if there aren’t
many valid values. In the case of Prime: the
likelihood for an arbitrary number goes down as the number grows
larger.In the case of the Prime example, we could speed up
genValid by replacing genUnchecked by a
generator that either generates 2, or an odd number.
The result would look like this:
genValid :: Gen Prime
genValid = (oneof [pure 2, (x -> 2*x + 1) <$> genUnchecked]) `suchThat` isValid
genvalidity-propertyOnce these type classes have been implemented, a programmer can
use the many property combinators from
genvalidity-property and even test suite combinators
from genvalidity-hspec.
-- # Property combinators
-- Test whether two functions are inverses
inverseFunctions :: (Show a, Eq a, GenUnchecked a) => (a -> b) -> (b -> a) -> Property
-- Test whether a binary relation is symmetric
symmetry :: (Show a, GenUnchecked a) => (a -> a -> Bool) -> Property
-- Test whether a binary operator is commutative:
commutative :: (Show a, Eq a, GenUnchecked a) => (a -> a -> a) -> Property
-- Test whether a function maintains validity
producesValidsOnValids :: (Show a, Show b, GenValid a, Validity b) => (a -> b) -> Property
Each property combinator in genvalidity-property
comes in multiple variants:
symmetry for unchecked valuessymmetryOnValid for values generated by
GenValidsymmetryOnArbitrary for values generated by
Arbitrary (for compatibility with QuickCheck)symmetryOnGens for values generated by a given
generatorgenvalidity-hspecWith the TypeApplications extension that is new in
GHC 8.0, we can now write entire test suites that are parametric in
type argument (without the use of proxies). These so-called test
suite combinators are of the form testSuiteCombinator ::
forall a. Spec.
The genvalidity-hspec library and its companion
libraries already contain many test suite combinators.
-- Test whether a custom Eq instance of a type makes sense
eqSpec :: forall a. (Show a, Eq a, Typeable a, GenUnchecked a) => Spec
-- Test that a Monad doesn't violate the monad laws
monadSpec :: forall (f :: * -> *). (Monad f, Eq (f Int), Show (f Int), Typeable f, GenUnchecked (f Int)) => Spec
-- Test whether a lens doesn't violate the lens laws.
lensSpec :: forall s b. (Show b, Eq b, GenUnchecked b, Validity b, Show s, Eq s, GenUnchecked s, Validity s) => Lens' s b -> Spec
Each property combinator in genvalidity-hspec comes
in multiple variants:
eqSpec for unchecked valueseqSpecOnValid for values generated by
GenValideqSpecOnArbitrary for values generated by
Arbitrary (for backward compatibility)eqSpecOnGens for values generated by a given
generatorProperty testing can be a great tool in your quality arsenal. This point has outlined the major similarities and differences between property testing tools. You should now be more equipped to make the necessary tradeoffs between these tools, but more important than trying to optimise your usage is that you give property testing a try.
Below are some exercises to get you started with Validity-based
testing. validity
and the related libraries can all be found on Stackage.
Objective: Generate valid Odd values using
genValid by completing the following piece of
code.
newtype Odd = Odd Int -- INVARIANT: The Int must be odd
Solution:
instance Validity Int where
validate (Odd i) = declare "The Int is odd" $ odd i
instance GenUnchecked Odd
instance GenValid Odd
Objective: Test whether the JSON instance in the following piece
of code works, using genvalidity-hspec-aeson, by
completing the following piece of code:
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance FromJSON MyType where
parseJSON = withObject "MyType" $ o -> MyType <$> o .: "bool" <*> o .: "Rational"
instance ToJSON MyType where
toJSON mt = object ["bool" .= (myBool mt), "double" .= (myRational mt)]
Solution:
data MyType = MyType
{ myBool :: Bool
, myRational :: Rational
} deriving (Show, Eq, Generic)
instance FromJSON MyType where
parseJSON = withObject "MyType" $ o -> MyType <$> o .: "bool" <*> o .: "Rational"
instance ToJSON MyType where
toJSON mt = object ["bool" .= (myBool mt), "double" .= (myRational mt)]
instance Validity MyType
instance GenUnchecked MyType
instance GenValid MyType
spec :: Spec
spec =
describe "MyType" $
jsonSpecOnValid @MyType
Subscribe to our blog via email
Email subscriptions come from our Atom feed and are handled by Blogtrottr. You will only receive notifications of blog posts, and can unsubscribe any time.