Randomized Testing
Property-Based Testing
One approach to software testing is unit testing, where we write individual input-output pairs which we expect to hold for our code. This approach is sometimes sufficient and can be good at leveraging programmer knowledge of corner cases. However, often times unit tests fail to capture a large enough percentage of program behaviour and can miss corner cases not anticipated by the programmer.
Another approach is totally randomized testing, where inputs are randomly generated and fed to a program in the hopes of finding inputs which cause the program to crash. This can often cover a far greater percentage of program behaviour and no longer relies on programmer knowledge to generate test cases. However, it is not able to identify software vulnerabilities that do not cause the program to crash (unless you include assert(property) for all such properties at all relevant points in the code).
Property-based testing falls somewhere in the middle of these two. Property-based testing allows programmers to write general properties that are expected of software, and then the tester automatically generates random tests which test that property. This approach strikes a nice balance between automation and programmer intuition.
QuickCheck for Simple Properties
QuickCheck is a property-based testing tool for Haskell created by Koen Claessen and John Hughes roughly 20 years ago.
In order to introduce QuickCheck, let’s first discuss some properties of lists.
reverse [x] = [x] reverse (xs++ys) = reverse ys ++ reverse xs reverse (reverse xs) = xs
Ok, let’s try to write these in Haskell QuickCheck syntax so that we can verify that these properties hold.
prop_revrev :: [Int] -> Bool prop_revrev xs = reverse (reverse xs) == xs
Ok, let’s test this using the QuickCheck function quickcheck
.
quickCheck :: (Testable prop) => prop -> IO ()
The type signature says that quickcheck
takes a Testable
property and returns an IO
monad. For now, we can just assume the Testable
typeclass defines functions which allow us to randomly sample the inputs of our property, in this case [Int]
.
> quickCheck prop_revrev +++ OK, passed 100 tests.
Huh, cool, ok. quickcheck
randomly tested the double reverse property for lists on 100 different inputs, and found that the property held for all randomly generated inputs.
Let’s try another one.
prop_revapp :: [Int] -> [Int] -> Bool prop_revapp xs ys = reverse (xs++ys) == reverse xs ++ reverse ys
And let’s use quickcheck
again.
> quickCheck prop_revapp *** Failed! Falsifiable (after 4 tests and 7 shrinks): [0] [1]
Huh? Do you see the mistake?
Our property is slightly wrong. It should be
prop_revapp' :: [Int] -> [Int] -> Bool prop_revapp' xs ys = reverse (xs++ys) == reverse ys ++ reverse xs
And now we try again.
> quickCheck prop_revapp' +++ OK, passed 100 tests.
QuickCheck for Conditional Properties
Let’s suppose we have the following insertion function for lists.
insert :: a -> [a] -> [a] insert x [] = [x] insert x (y:ys) | x < y = x : y : ys | otherwise = y : insert x ys
insert
places an element into a list where the element to its left is either the beginning of the list or less than it, and the element to its right is either the end of the list or greater than it.
> insert 4 [1,3,5,7] [1,3,4,5,7]
Ok, let’s try to verify that calling insert
results in an ordered list. In order to do this, let’s define a predicated that checks if a list is ordered.
isOrdered :: (Ord a) => [a] -> Bool isOrdered (x1:x2:xs) = x1 <= x2 && isOrdered (x2:xs) isOrdered _ = True
Ok, now let’s define our property.
prop_insert_ordered :: Int -> [Int] -> Bool prop_insert_ordered x xs = isOrdered (insert x xs)
Let’s see what happens when we check this.
> quickCheck prop_insert_ordered *** Failed! Falsifiable (after 4 tests and 3 shrinks): 0 [0,-1]
What’s the problem this time?
Ah, I see, we only expect that the resulting list will be ordered if the original one is too.
prop_insert_ordered' :: Int -> [Int] -> Bool prop_insert_ordered' x xs = (not $ isOrdered xs) || isOrdered (insert x xs)
And now let’s try that.
> quickCheck prop_insert_ordered' +++ OK, passed 100 tests.
Great, it worked! ... or did it? Do we see any problems here?
How many of our randomly generated lists are actually ordered? Well, let’s see.
prop_insert_ordered'' :: Int -> [Int] -> Property prop_insert_ordered'' x xs = classify (isOrdered xs) "ord" $ (not $ isOrdered xs) || isOrdered (insert x xs)
classify
will tell us how many inputs were actually ordered (reporting them as “ord”). Notice we had to change the return type from Bool
to Property
. This is due to the return type of classify
. We will discuss in a moment what a Property
is, but for now, think of it synonymously with Bool
.
> quickCheck prop_insert_ordered'' +++ OK, passed 100 tests (9% ord).
Hmmm, of 100 tests, only 9 were actually ordered. That’s not great. We could just run quickCheck more times. Can we do something slightly better? Conditional properties to the rescue!
(==>) :: Testable prop => Bool -> prop -> Property
This function acts just like the implication arrow, except it guarantees that every test case reported satisfies the condition to the left of the arrow. The type Property
, encodes this notion.
prop_insert_ordered''' :: Int -> [Int] -> Property prop_insert_ordered''' x xs = classify (isOrdered xs) "ord" $ isOrdered xs ==> isOrdered (insert x xs)
Ok, surely, this should now verify our property!
> quickCheck prop_insert_ordered''' *** Gave up! Passed only 70 tests (100% ord).
Oh no! It only generated ordered inputs, but it couldn’t find enough ordered lists! This makes sense. Every zero or single element list is ordered. With two elements, only 50% of lists are sorted. The chances keep going down as we go higher. Let’s view this in action.
prop_insert_ordered'''' :: Int -> [Int] -> Property prop_insert_ordered'''' x xs = collect (length xs) $ isOrdered xs ==> isOrdered (insert x xs)
collect
is similar to classify
, and will report the number and percentage of tests of different length that quickcheck uses.
> quickCheck prop_insert_ordered'''' *** Gave up! Passed only 77 tests: 40% 0 39% 1 17% 2 3% 3 1% 4
Well, that’s not a very good distribution. 40% of the tests were just on the empty list...
Looks like we’re going to have to do something a bit more involved.
User-Defined Generators
A Haskell term that generates a random value of type a
has the type Gen
.
newtype Gen a = MkGen { unGen :: StdGen -> Int -> a }
We can think of Haskell terms of type Gen
taking a random number generator StdGen
, and a seed Int
, and returning a random term of type a
.
QuickCheck now defines the Arbitrary
typeclass, which are used to signify which types can be randomly generated.
class Arbitrary a where arbitrary :: Gen a
The Arbitrary
typeclass has one function arbitrary
which is a generator of a random value of type a
as described above.
We can construct new instance of the typeclass to define more complex generators.
instance (Arbitrary a, Arbitrary b) => Arbitrary (a,b) where arbitrary = (,) <$> arbitrary <*> arbitrary
QuickCheck has built in a number of useful generator combinators which can help with defining new Arbitrary
instances.
choose :: (System.Random.Random a) => (a, a) -> Gen a elements :: [a] -> Gen a oneof :: [Gen a] -> Gen a frequency :: [(Int, Gen a)] -> Gen a
choose
takes an interval and returns a random element from that interval. elements
takes a list of elements and returns a random element from that list. oneof
takes a list of generators and randomly selects one element from one of the generators. frequency
generalizes oneof
such that each generator in the list of generators is given an explicit weight.
sample
takes a generator and generates example values from it.
> sample $ choose (0,3) 3 3 3 1 2 0 0 0 3 3 2 > sample $ elements [10, 20..100] 20 60 20 80 40 100 100 10 70 70 50 > sample $ oneof [elements [10, 20..100], choose (0,3)] 90 10 100 3 3 0 50 0 0 0 1 > sample $ frequency [(1, elements [10, 20..100]), (9, choose (0,3))] 3 3 0 0 3 1 100 3 2 1 2
Ok, let’s combine all of this to define our generator for lists that we wanted!
genList :: (Arbitrary a) => Gen [a] genList = do x <- arbitrary xs <- genList return (x:xs)
Does this work?
Well, sort of, but this only generates infinite lists. Let’s define one that uses one of our combinators from above to generate random lists of integers of at most length m
.
genList' :: Int -> Gen [Int] genList' m = do xs <- genList n <- elements [0..m] return $ take n xs
Let’s generate some random lists!
> sample $ genList' 10 [0,0,0] [-2,-2] [2,4,-4,2,-3] [-3,5,4,-2,4,4] [0,-6,-3,6,-4] [1,7,-8,-4,-1,-2,-3] [-4] [-9,-7,12,-14,13,-12] [-16,7,-12] [0,-13,16,12,-1,0,12] [19,-9,-9,-6,6,15]
Cool! Recall, we wanted a generator for only sorted lists, so we have one more step.
genOrdList :: Int -> Gen [Int] genOrdList m = sort <$> genList' m
So now we get the following.
> sample $ genOrdList 10 [0,0,0,0] [-2,-1,0] [2] [-6,-4,-4,-4,-3,2,3,4] [] [-5,-3,0,2,3,6,8] [-8,-8,-8,-4,-4,-2,0,3,3,4] [-8,-7,-7,-6,-2,-1,3,4,5,7] [-16,-13,-9,-6,-3,-1,5] [-15,-5,2,2,12,14] [-6,7,16]
We can now use the forall
combinator to actually test our original property.
forAll :: (Show a, Testable prop) => Gen a -> (a -> prop) -> Property
forall
takes a generator of a type a
and a function which produces a Testable
type, and returns a Property
.
prop_insert :: Int -> Property prop_insert x = forAll genOrdList $ \xs -> classify (isOrdered xs) $ isOrdered (insert x xs)
Now finally, let’s test it.
> quickCheck prop_insert +++ OK, passed 100 tests (100% ord).
Woohoo!
A QuickCheck Case Study: Primality Testing
Consider the following two functions.
factors :: Int -> [Int] factors n = [i | i <- [1..n], n `mod` i == 0] isPrime :: Int -> Bool isPrime n = factors n == [1,n]
The first function finds all of the factors of a number n
in ascending order. The second determines if a number is prime by checking if its only factors are 1 and itself.
Let’s consider an alternative solution, say the following.
isPrime' :: Int -> Bool isPrime' n = length (factors n) == 2
Let’s try to check if our new implementation is still correct (relative to the original implementation).
prop_isPrime1 :: Int -> Property prop_isPrime1 n = classify (n < 0) "neg" $ isPrime n == isPrime' n
Let’s test it out.
> quickCheck prop_isPrime1 +++ OK, passed 100 tests (50% neg).
That did ok, but half of the test cases were negative, which seems a bit silly. Let’s fix that.
prop_isPrime2 :: Int -> Property prop_isPrime2 n = (n >= 0) ==> isPrime n == isPrime' n
Keep in mind when using ==>
that you should be very careful to make sure your predicate to the left of the arrow is not overly prohibitive. Consider the following.
badPrime :: Int -> Bool badPrime _ = True prop_isPrime3 :: Int -> Property prop_isPrime3 n = (isPrime n) ==> isPrime n == badPrime n
This property only checks if the two functions are equal on inputs that are prime, so the property will be satisfied by the silly bad prime finder.
> quickCheck prop_isPrime3 +++ OK, passed 100 tests.
A Second QuickCheck Case Study: The Coloring Problem
Recall the graph coloring problem from HW1.
data Balkan = Albania | Bulgaria | BosniaAndHerzegovina | Kosovo | Macedonia | Montenegro deriving (Show, Eq) data Color = Blue | Red | Green | Yellow deriving (Show, Eq) balkans :: [Balkan] balkans = [Albania, Bulgaria, BosniaAndHerzegovina, Kosovo, Macedonia, Montenegro] colors :: [Color] colors = [Blue, Red, Green, Yellow]
The isGoodColoring
function takes a list of adjacent Balkans and a coloring of those Balkans, and returns true if the coloring was good and false otherwise. Here are two potential solutions to that problem.
isGoodColoring1 :: [(Balkan, Balkan)] -> [(Balkan,Color)] -> Bool isGoodColoring1 adj coloring = null [ (c1,c2) | (c1,c2) <- adj, lookup c1 coloring == lookup c2 coloring && lookup c1 coloring /= Nothing] isGoodColoring2 :: [(Balkan, Balkan)] -> [(Balkan,Color)] -> Bool isGoodColoring2 adj coloring = null $ do (c1, c2) <- adj if ((lookup c1 coloring)==(lookup c2 coloring))&& (lookup c1 coloring /= Nothing) then return (c1, c2) else []
Let’s try to test if they are equivalent. We will start by defining instances of Arbitrary
for both Balkan
and Color
.
instance Arbitrary Balkan where arbitrary = elements balkans instance Arbitrary Color where arbitrary = elements colors
Let’s create a property that we would like to test.
testColoring1 :: [(Balkan, Balkan)] -> [(Balkan, Color)] -> Bool testColoring1 adj col = isGoodColoring1 adj col == isGoodColoring2 adj col
And now let’s run it.
> quickCheck testColoring1 +++ OK, passed 100 tests.
Is this sufficient? Maybe not... these colorings are not guaranteed to be complete, i.e. have a coloring for each Balkan. Let’s fix that.
sameElems :: Eq a => [a] -> [a] -> Bool sameElems xs ys = length xs == length ys && all (`elem` ys) xs complete :: [(Balkan, Color)] -> Bool complete coloring = balkans `sameElems` col_balks where col_balks = map fst coloring testColoringComplete :: [(Balkan, Balkan)] -> [(Balkan, Color)] -> Property testColoringComplete adj col = complete col ==> isGoodColoring_teach adj col == isGoodColoring_stud adj col
This is good, but will fail because it is unlikely to randomly generate a complete coloring. Let’s define a generator for complete colorings.
genRandCompleteColoring :: Gen [(Balkan, Color)] genRandCompleteColoring = do c1 <- arbitrary c2 <- arbitrary c3 <- arbitrary c4 <- arbitrary c5 <- arbitrary c6 <- arbitrary return [(Albania, c1), (Bulgaria, c2), (BosniaAndHerzegovina, c3), (Kosovo, c4), (Macedonia, c5), (Montenegro, c6)]
Finally, let’s make sure there are no repetitions in the adjacency list, nor are there any self adjacencies.
genRandAdjacencies :: Gen [(Balkan, Balkan)] genRandAdjacencies = do adj <- arbitrary return $ filter (\(x,y) -> x /= y) $ nubBy (\(x1,y1) (x2,y2) -> (x1,y1) == (x2,y2) || (x1,y1) == (y2, x2)) adj
So now we finally have the following.
testColoring :: Property testColoring = forAll genRandCompleteColoring $ \coloring -> forAll genRandAdjacencies $ \adj -> classify complete "comp" $ isGoodColoring1 adj coloring == isGoodColoring2 adj coloring