A woodsman was once asked, “What would you do if you had just five minutes to chop down a tree?” He answered, “I would spend the first two and a half minutes sharpening my axe.” Let us take a few minutes to sharpen our perspective.
Implementing research papers is tricky. Research is done over the long-term as a running conversation between volunteers. Papers are small pieces of this and it’s hard to know exactly how much of this conversation is covered in the paper. Often times, there’s information that gets left implicit in the treatment of a topic because the authors and the supposed audience are so close in knowledge. This is great once you’re informed but a real challenge at the outset of a new project.
Implementing a piece of research means you have to get up to speed on the ongoing conversation, enough that you understand what the authors of your starting paper are talking about, as well as what the authors of their references are talking about, more or less. This is important for implementors: authors will often drop little asides in their writing that are intended to open up future avenues of research but are sometimes not pertinent to the project at hand. Being able to tell the difference means being cognizant of the broader conversation.
Before I put further time into a SmallCheck implementation for Rust I believe it will be valuable for this series of articles if I spend some effort bringing more of this background conversation to the fore. I think, hopefully, that it’ll make it easier to understand the choices I make in implementation and, besides, I’m sure to learn much in the elucidation.
“SmallCheck and Lazy SmallCheck” is one tiny piece in an ongoing investigation into software testing which does not rely on humans doing case by case analysis of software units. This sort of testing – where you, the engineer, choose inputs, apply them to isolated pieces of a software system and then check that the results of that application are ‘correct’ – is tedious, time-consuming and, worst of all, not particulary effective. The efficacy problem comes in when you consider that we mere people are not very dedicated at performing tedious, repetitious tasks and that, more, if we had a hand in the construction of a system we’ll tend to be blind to its faults as those faults are the result of our own systematic biases.
If you look at it squarely case analysis testing is three things:
- discovery (through thinking) of ‘important’ inputs for some software system
- application of these ‘important’ inputs
- validation of a set of constraints against the result of application
Say you’re doing unit-testing on a piece of software that calculates the total charge on a delivery order:
def order_charge(base_cost, tip): tax = base_cost * 0.30 our_cut = base_cost * 0.10 return base_cost + our_cut + tax + tip
Set aside the code-stink of the magical constants; this is basically what such a system looks like. You, the plucky unit-test writing engineer knock together some cases
and are satisfied from these cases that the computed values is correct, implying the system under test is also correct. Maybe you work the figures by hand to confirm. But! there’s implicit business constraints here that aren’t being tested for. The problem with case analysis testing is that it's easy to choose inputs which don't uncover these neglected, perhaps forgotten, constraints. It's easy, but wrong, to believe that correct function in some cases implies correct function in _all _ cases. In this implementation we've neglected to ensure that the result of
order_charge is never negative, a fact which was not uncovered by our hand-chosen cases. Consider that
charges negative money for an order which, if the billing system isn't carefully written, gives money to the customer. Oops. You, the plucky unit-test writing engineer, then assert that tip is never negative, write a new test to assert that for a new class of cases the result of
order_charge is never negative and call it a day. This is great, unless there’s a bug somewhere else that makes
base_cost negative, in which case, oops,
order_charge again is negative, maybe. Writing software with care is a perpetual cycle of whataboutism.
What case analysis testing is doing is a special-case of a more general approach, in which:
- a method to compute or automatically search for ‘important’ inputs to some software system is specified
- these inputs are applied
- explicitly enumerated constraints are validated against the result of the application
In the above example, the valid inputs are all positive and negative numbers and the constraint on the sign of the return value of
order_charge is just as we said. All we have to do is figure out some way to automatically search the domain of all positive and negative numbers in order to find ‘interesting’ inputs. That’s the real trick.
The conversation that “SmallCheck” contributes to is the ongoing discovery of different ways of searching for ‘interesting’ inputs. QuickCheck’s approach is to choose, at random, values from the possible domain of input. SmallCheck enumerates cases from ‘simple’ on up. Where this gets especially tricky is when the inputs are subject to constraints. The original “SmallCheck” paper, and the previous post in this series, discuss an example where inputs are sorted arrays. Lazy SmallCheck uses Haskell’s lazy evaluation to avoid complete enumeration and discard of produced values which don’t cut it. What if you want to find values that meet “any positive integers divisible by three” or “any valid lambda calculus expression”?
Over the next handful of posts I’ll dig into the references of “SmallCheck and Lazy SmallCheck: Automatic Exhaustive Testing for Small Values” and also “Advances in Lazy SmallCheck”. My aim is to place SmallCheck into its historical context but also to make clear the concepts the two papers use without explanation. The ambition is that this'll let me tease SmallCheck away from the Haskell features the paper assumes.