Randomized Testing

Property-Based Testing

One approach to software testing is unit testing, where we write individual input-output pairs which we expect to hold for our code. This approach is sometimes sufficient and can be good at leveraging programmer knowledge of corner cases. However, often times unit tests fail to capture a large enough percentage of program behaviour and can miss corner cases not anticipated by the programmer.

Another approach is totally randomized testing, where inputs are randomly generated and fed to a program in the hopes of finding inputs which cause the program to crash. This can often cover a far greater percentage of program behaviour and no longer relies on programmer knowledge to generate test cases. However, it is not able to identify software vulnerabilities that do not cause the program to crash (unless you include assert(property) for all such properties at all relevant points in the code).

Property-based testing falls somewhere in the middle of these two. Property-based testing allows programmers to write general properties that are expected of software, and then the tester automatically generates random tests which test that property. This approach strikes a nice balance between automation and programmer intuition.

QuickCheck for Simple Properties

QuickCheck is a property-based testing tool for Haskell created by Koen Claessen and John Hughes roughly 20 years ago.

In order to introduce QuickCheck, let’s first discuss some properties of lists.

reverse [x] = [x]
reverse (xs++ys) = reverse ys ++ reverse xs
reverse (reverse xs) = xs

Ok, let’s try to write these in Haskell QuickCheck syntax so that we can verify that these properties hold.

prop_revrev :: [Int] -> Bool
prop_revrev xs = reverse (reverse xs) == xs

Ok, let’s test this using the QuickCheck function quickcheck.

quick­Check :: (Testable prop) => prop -> IO ()

The type signature says that quickcheck takes a Testable property and returns an IO monad. For now, we can just assume the Testable typeclass defines functions which allow us to randomly sample the inputs of our property, in this case [Int].

> quick­Check prop_revrev
+++ OK, passed 100 tests.

Huh, cool, ok. quickcheck randomly tested the double reverse property for lists on 100 different inputs, and found that the property held for all randomly generated inputs.

Let’s try another one.

prop_revapp :: [Int] -> [Int] -> Bool
prop_revapp xs ys = reverse (xs++ys) == reverse xs ++ reverse ys

And let’s use quickcheck again.

> quick­Check prop_revapp
*** Failed! Fal­si­fi­able (after 4 tests and 7 shrinks):
[0]
[1]

Huh? Do you see the mistake?

Our property is slightly wrong. It should be

prop_revapp' :: [Int] -> [Int] -> Bool
prop_revapp' xs ys = reverse (xs++ys) == reverse ys ++ reverse xs

And now we try again.

> quick­Check prop_revapp'
+++ OK, passed 100 tests.

QuickCheck for Conditional Properties

Let’s suppose we have the following insertion function for lists.

insert :: a -> [a] -> [a]
insert x []                 = [x]
insert x (y:ys) | x < y     = x : y : ys
                | oth­er­wise = y : insert x ys

insert places an element into a list where the element to its left is either the beginning of the list or less than it, and the element to its right is either the end of the list or greater than it.

> insert 4 [1,3,5,7]
[1,3,4,5,7]

Ok, let’s try to verify that calling insert results in an ordered list. In order to do this, let’s define a predicated that checks if a list is ordered.

is­Ordered ::         (Ord a) => [a] -> Bool
is­Ordered (x1:x2:xs) = x1 <= x2 && is­Ordered (x2:xs)
is­Ordered _          = True

Ok, now let’s define our property.

prop_insert_ordered :: Int -> [Int] -> Bool
prop_insert_ordered x xs = is­Ordered (insert x xs)

Let’s see what happens when we check this.

> quick­Check prop_insert_ordered
*** Failed! Fal­si­fi­able (after 4 tests and 3 shrinks):
0
[0,-1]

What’s the problem this time?

Ah, I see, we only expect that the resulting list will be ordered if the original one is too.

prop_insert_ordered' :: Int -> [Int] -> Bool
prop_insert_ordered' x xs = (not $ is­Ordered xs) || is­Ordered (insert x xs)

And now let’s try that.

> quick­Check prop_insert_ordered'
+++ OK, passed 100 tests.

Great, it worked! ... or did it? Do we see any problems here?

How many of our randomly generated lists are actually ordered? Well, let’s see.

prop_insert_ordered'' :: Int -> [Int] -> Prop­erty
prop_insert_ordered'' x xs =
    clas­sify (is­Ordered xs) "ord" $
    (not $ is­Ordered xs) || is­Ordered (insert x xs)

classify will tell us how many inputs were actually ordered (reporting them as “ord”). Notice we had to change the return type from Bool to Property. This is due to the return type of classify. We will discuss in a moment what a Property is, but for now, think of it synonymously with Bool.

> quick­Check prop_insert_ordered''
+++ OK, passed 100 tests (9% ord).

Hmmm, of 100 tests, only 9 were actually ordered. That’s not great. We could just run quickCheck more times. Can we do something slightly better? Conditional properties to the rescue!

(==>) :: Testable prop => Bool -> prop -> Prop­erty

This function acts just like the implication arrow, except it guarantees that every test case reported satisfies the condition to the left of the arrow. The type Property, encodes this notion.

prop_insert_ordered''' :: Int -> [Int] -> Prop­erty
prop_insert_ordered''' x xs =
    clas­sify (is­Ordered xs) "ord" $
    is­Ordered xs ==> is­Ordered (insert x xs)

Ok, surely, this should now verify our property!

> quick­Check prop_insert_ordered'''
*** Gave up! Passed only 70 tests (100% ord).

Oh no! It only generated ordered inputs, but it couldn’t find enough ordered lists! This makes sense. Every zero or single element list is ordered. With two elements, only 50% of lists are sorted. The chances keep going down as we go higher. Let’s view this in action.

prop_insert_ordered'''' :: Int -> [Int] -> Prop­erty
prop_insert_ordered'''' x xs =
    col­lect (length xs) $
    is­Ordered xs ==> is­Ordered (insert x xs)

collect is similar to classify, and will report the number and percentage of tests of different length that quickcheck uses.

> quick­Check prop_insert_ordered''''
*** Gave up! Passed only 77 tests:
40% 0
39% 1
17% 2
 3% 3
 1% 4

Well, that’s not a very good distribution. 40% of the tests were just on the empty list...

Looks like we’re going to have to do something a bit more involved.

User-Defined Generators

A Haskell term that generates a random value of type a has the type Gen.

new­type Gen a = Mk­Gen { un­Gen :: Std­Gen -> Int -> a }

We can think of Haskell terms of type Gen taking a random number generator StdGen, and a seed Int, and returning a random term of type a.

QuickCheck now defines the Arbitrary typeclass, which are used to signify which types can be randomly generated.

class Arbi­trary a where
  arbi­trary :: Gen a

The Arbitrary typeclass has one function arbitrary which is a generator of a random value of type a as described above.

We can construct new instance of the typeclass to define more complex generators.

instance (Arbi­trary a, Arbi­trary b) => Arbi­trary (a,b) where
  arbi­trary = (,) <$> arbi­trary <*> arbi­trary

QuickCheck has built in a number of useful generator combinators which can help with defining new Arbitrary instances.

choose :: (Sys­tem.Ran­dom.Ran­dom a) => (a, a) -> Gen a
ele­ments :: [a] -> Gen a
oneof :: [Gen a] -> Gen a
fre­quency :: [(Int, Gen a)] -> Gen a

choose takes an interval and returns a random element from that interval. elements takes a list of elements and returns a random element from that list. oneof takes a list of generators and randomly selects one element from one of the generators. frequency generalizes oneof such that each generator in the list of generators is given an explicit weight.

sample takes a generator and generates example values from it.

> sam­ple $ choose (0,3)
3
3
3
1
2
0
0
0
3
3
2
> sam­ple $ ele­ments [10, 20..100]
20
60
20
80
40
100
100
10
70
70
50
> sam­ple $ oneof [ele­ments [10, 20..100], choose (0,3)]
90
10
100
3
3
0
50
0
0
0
1
> sam­ple $ fre­quency [(1, ele­ments [10, 20..100]), (9, choose (0,3))]
3
3
0
0
3
1
100
3
2
1
2

Ok, let’s combine all of this to define our generator for lists that we wanted!

gen­List ::  (Arbi­trary a) => Gen [a]
gen­List = do
  x  <- arbi­trary
  xs <- gen­List
  return (x:xs)

Does this work?

Well, sort of, but this only generates infinite lists. Let’s define one that uses one of our combinators from above to generate random lists of integers of at most length m.

gen­List' ::  Int -> Gen [Int]
gen­List' m = do
  xs  <- gen­List
  n   <- ele­ments [0..m]
  return $ take n xs

Let’s generate some random lists!

> sam­ple $ gen­List' 10
[0,0,0]
[-2,-2]
[2,4,-4,2,-3]
[-3,5,4,-2,4,4]
[0,-6,-3,6,-4]
[1,7,-8,-4,-1,-2,-3]
[-4]
[-9,-7,12,-14,13,-12]
[-16,7,-12]
[0,-13,16,12,-1,0,12]
[19,-9,-9,-6,6,15]

Cool! Recall, we wanted a generator for only sorted lists, so we have one more step.

gen­Ord­List :: Int -> Gen [Int]
gen­Ord­List m = sort <$> gen­List' m

So now we get the following.

> sam­ple $ gen­Ord­List 10
[0,0,0,0]
[-2,-1,0]
[2]
[-6,-4,-4,-4,-3,2,3,4]
[]
[-5,-3,0,2,3,6,8]
[-8,-8,-8,-4,-4,-2,0,3,3,4]
[-8,-7,-7,-6,-2,-1,3,4,5,7]
[-16,-13,-9,-6,-3,-1,5]
[-15,-5,2,2,12,14]
[-6,7,16]

We can now use the forall combinator to actually test our original property.

for­All :: (Show a, Testable prop)
       => Gen a -> (a -> prop) -> Prop­erty

forall takes a generator of a type a and a function which produces a Testable type, and returns a Property.

prop_insert :: Int -> Prop­erty
prop_insert x = for­All gen­Ord­List $
  \xs -> clas­sify (is­Ordered xs) $
         is­Ordered (insert x xs)

Now finally, let’s test it.

> quick­Check prop_insert
+++ OK, passed 100 tests (100% ord).

Woohoo!

A QuickCheck Case Study: Primality Testing

Consider the following two functions.

fac­tors :: Int -> [Int]
fac­tors n = [i | i <- [1..n], n `mod` i == 0]

is­Prime :: Int -> Bool
is­Prime n = fac­tors n == [1,n]

The first function finds all of the factors of a number n in ascending order. The second determines if a number is prime by checking if its only factors are 1 and itself.

Let’s consider an alternative solution, say the following.

is­Prime' :: Int -> Bool
is­Prime' n = length (fac­tors n) == 2

Let’s try to check if our new implementation is still correct (relative to the original implementation).

prop_is­Prime1 :: Int -> Prop­erty
prop_is­Prime1 n =
   clas­sify (n < 0) "neg" $
   is­Prime n == is­Prime' n

Let’s test it out.

> quick­Check prop_is­Prime1
+++ OK, passed 100 tests (50% neg).

That did ok, but half of the test cases were negative, which seems a bit silly. Let’s fix that.

prop_is­Prime2 :: Int -> Prop­erty
prop_is­Prime2 n = (n >= 0) ==> is­Prime n == is­Prime' n

Keep in mind when using ==> that you should be very careful to make sure your predicate to the left of the arrow is not overly prohibitive. Consider the following.

bad­Prime :: Int -> Bool
bad­Prime _ = True

prop_is­Prime3 :: Int -> Prop­erty
prop_is­Prime3 n = (is­Prime n) ==> is­Prime n == bad­Prime n

This property only checks if the two functions are equal on inputs that are prime, so the property will be satisfied by the silly bad prime finder.

> quick­Check prop_is­Prime3
+++ OK, passed 100 tests.

A Second QuickCheck Case Study: The Coloring Problem

Recall the graph coloring problem from HW1.

data Balkan = Alba­nia | Bul­garia | Bosnia­And­Herze­gov­ina
            | Kosovo | Mace­do­nia | Mon­tene­gro
  deriv­ing (Show, Eq)

data Color = Blue | Red | Green | Yel­low
  deriv­ing (Show, Eq)

balkans :: [Balkan]
balkans = [Alba­nia, Bul­garia, Bosnia­And­Herze­gov­ina, Kosovo, Mace­do­nia, Mon­tene­gro]

col­ors :: [Color]
col­ors = [Blue, Red, Green, Yel­low]

The isGoodColoring function takes a list of adjacent Balkans and a coloring of those Balkans, and returns true if the coloring was good and false otherwise. Here are two potential solutions to that problem.

is­Good­Col­or­ing1 :: [(Balkan, Balkan)] -> [(Balkan,Color)] -> Bool
is­Good­Col­or­ing1 adj col­or­ing
  = null [ (c1,c2) | (c1,c2) <- adj, lookup c1 col­or­ing == lookup c2 col­or­ing &&
                                     lookup c1 col­or­ing /= Noth­ing]

is­Good­Col­or­ing2 :: [(Balkan, Balkan)] -> [(Balkan,Color)] -> Bool
is­Good­Col­or­ing2 adj col­or­ing = null $ do
  (c1, c2) <- adj
  if ((lookup c1 col­or­ing)==(lookup c2 col­or­ing))&& (lookup c1 col­or­ing /= Noth­ing)
    then
    return (c1, c2)
    else
    []

Let’s try to test if they are equivalent. We will start by defining instances of Arbitrary for both Balkan and Color.

instance Arbi­trary Balkan where
  arbi­trary = ele­ments balkans

instance Arbi­trary Color where
  arbi­trary = ele­ments col­ors

Let’s create a property that we would like to test.

test­Col­or­ing1 :: [(Balkan, Balkan)] -> [(Balkan, Color)] -> Bool
test­Col­or­ing1 adj col =
  is­Good­Col­or­ing1 adj col == is­Good­Col­or­ing2 adj col

And now let’s run it.

> quick­Check test­Col­or­ing1
+++ OK, passed 100 tests.

Is this sufficient? Maybe not... these colorings are not guaranteed to be complete, i.e. have a coloring for each Balkan. Let’s fix that.

same­Elems :: Eq a => [a] -> [a] -> Bool
same­Elems xs ys =  length xs == length ys && all (`elem` ys) xs

com­plete :: [(Balkan, Color)] -> Bool
com­plete col­or­ing = balkans `same­Elems` col_balks
  where
    col_balks = map fst col­or­ing

test­Col­or­ing­Com­plete :: [(Balkan, Balkan)] -> [(Balkan, Color)] -> Prop­erty
test­Col­or­ing­Com­plete adj col =
  com­plete col ==>
  is­Good­Col­or­ing_teach adj col == is­Good­Col­or­ing_stud adj col

This is good, but will fail because it is unlikely to randomly generate a complete coloring. Let’s define a generator for complete colorings.

gen­Rand­Com­plete­Col­or­ing :: Gen [(Balkan, Color)]
gen­Rand­Com­plete­Col­or­ing = do
  c1 <- arbi­trary
  c2 <- arbi­trary
  c3 <- arbi­trary
  c4 <- arbi­trary
  c5 <- arbi­trary
  c6 <- arbi­trary
  return [(Alba­nia, c1), (Bul­garia, c2), (Bosnia­And­Herze­gov­ina, c3), (Kosovo, c4),
          (Mace­do­nia, c5), (Mon­tene­gro, c6)]

Finally, let’s make sure there are no repetitions in the adjacency list, nor are there any self adjacencies.

gen­Rand­Adja­cen­cies :: Gen [(Balkan, Balkan)]
gen­Rand­Adja­cen­cies = do
  adj <- arbi­trary
  return $ fil­ter (\(x,y) -> x /= y)
         $ nub­By (\(x1,y1) (x2,y2) -> (x1,y1) == (x2,y2) ||
                                      (x1,y1) == (y2, x2)) adj

So now we finally have the following.

test­Col­or­ing :: Prop­erty
test­Col­or­ing =
  for­All gen­Rand­Com­plete­Col­or­ing $ \col­or­ing ->
  for­All gen­Rand­Adja­cen­cies $ \adj ->
  clas­sify com­plete "comp" $
  is­Good­Col­or­ing1 adj col­or­ing == is­Good­Col­or­ing2 adj col­or­ing