Sunday, July 13, 2014

Testing: How to get the data into the system

Even though the correct term for a lot of the “testing” going on would be verification let‘s just stick with “testing” in the titles for the time being...

General verification workflow

The general way to verify that a piece of software does what it is meant to do seems quite simple:

  • Formulate the desired outcome for a defined series of actions
  • Put the system in a known state (or the sub-system or the “unit” – depending on your testing goal)
  • Execute the aforementioned defined actions
  • Verify that that the desired outcome is actually achieved
  • [Optional] Clean up the systems [1]

While this process sounds simple enough, there are enough pitfalls hidden in these few steps to have spawned a whole industry and produce dozens of books.

In this post I want to tackle a very specific aspect – the part where the system is put into a “known state”.

Putting the system into a known state might involve several – more or less complex – actions. Nowadays, where it's possible to automate and orchestrate the whole creation and setup of machines with tools like vagrant and puppet it is even possible to set up the whole environment programmatically.

You might not want to that for each unit test, which brings us to the question of when to setup what wich I will try to address in some future post.

The problem with the data

However big or small the test-setup is, one thing that is very hard to avoid is providing data.

The state of the system (including data) if often called a fixture and having those fixtures – known states of the system with reliable, known data – is a fundamental prerequisite for any kind of serious testing - may it be manually or automated.

For any system of significant size if there are no fixtures, there is no way to tell if the system behaves as desired.

Getting the data into the system: Some options

In general there are three ways to get the data into the system

  • Save a known state of the data and import it into the system before the tests are run.
    In this scenario the important question is “which part of the data do I load at which time“ because the tests might of course interfere with each other and probably mess up the data – especially if they fail. Consider using this approach only in conjunction with proper setups before each test, amended by assertions and backed up by “on the fly” data-generation where necessary.
  • Create the data on the fly via the means of the system.
    Typically for acceptance tests this means UI-interaction – probably not the way you want to go if you have to run hundreds of tests. Consider implementing an interface, that can be accessed programmatically from outside the system, that uses the same internal mechanisms for data creation as the rest of the software.
  • Create the data on the fly directly (via the datastore layer).
    This approach has the tempting property that it can be extremely fast and can be implementing without designing the system under test specifically for testability. The huge problem with this approach is that it duplicates knowledge (or assumptions) about the systems internal structures and concepts – a thing that we usually try to avoid. Consider just not using this approach!

So, do you actually have fixtures? And how do you get to your data?

’til next time
  Michael Mahlberg


[1]

(One can either put the effort in after the test or in the setup of the test - or split the effort between the two places, but the effort to make sure that the system is in the correct state always has to go into the setup. Cleaning up after the test can help a lot in terms of performance and ramp-up time, but it can not serve a substitute for a thorough setup.