Cracking five hard problems in load testing

Load testing (or performance testing) is the practice of test systems for adequate performance under load. Load testing used to be an arduous and expensive activity, to be done at the last possible moment in the software development process. That was before cloud infrastructure was readily available - since then, it's become much easier to generate enough load, and to spin up realistically-sized test environments without breaking the bank. Cloud-based load testing tools make it easy to simulate traffic on web applications, and capture and analyze load test results.

Still, load testing is not easy. Hard problems remain to be solved without good tool support:  

1. Realism (what to test)

Recording or writing a test scenario is easy. Making sure that the scenario matches real user behavior is hard - without knowing how realistic a scenario is, load test results are not worth much. To measure realism, the behavior of simulated users needs to be compared to the behavior of real users. That's not as easy as it sounds (more about this in a separate post). I haven't seen any tools that can solve this problem - have you?

2. Environments (where to test)

The goal in load testing is to learn how the system will behave in production. Strictly speaking, no non-production environment is a good target for such tests, since no environment can be 100% identical to production. But testing in production creates new problems:

  • managing the risk of crashing the environment (through scheduling, monitoring and emergency plans)
  • modifying real data, causing real-world transactions and interfering with reporting (by designing systems to identify and treat test traffic)
  • distinguishing between simulated traffic and real traffic when checking test results (by measuring both types of traffic separately)

3. Test data (which data to use?)

With the problem of environments comes the problem of test data. Load test scenarios need realistic test data, but data changes frequently and gets stale quickly. The problem, therefore, is how to keep the test data realistic ) and up-to-date. Of course, managing test data is challenging also in functional testing. But in functional testing, you can often get away by keeping a set of reliable test data in your test environment. When testing in production, keeping real data and test data in the same system creates some interesting problems.

4. Dealing with results (when is performance good enough?)

Load testing is supposed to answer the question "is performance good enough"? Answering that question is harder than it sounds even after running load tests: there can be a large amount of number-crunching involved in checking whether performance objectives are met, and not everyone is interested in statistics. A typical performance test report doesn't have a binary answer to "is it good enough", and surfaces a large number of small issues instead, causing follow-up questions: "do we need to be concerned about this degradation? did an infrastructure issue cause this spike? …"

5. Ownership (who does the testing?)

In a monolithic development organization, load testing can be handled by a central QA team. That approach yields consistent test results, but has the drawback that test scenarios get stale and break as the underlying system keeps changing. In a microservices world, small teams can take more ownership of their components' performance and load testing. It's easier to keep code and test in sync that way, but coordinating and consolidating load test results gets much harder, as various teams need to work together to test the whole system.

Cracking the problem

Providing an answer to problem of ownership is the key for solving the others - breaking the problem of load testing the whole system down into smaller parts. By organizing load testing as a bottom-up activity in various teams, a small set of very difficult problems can be turned into a larger set of easier problems that can be solved in each team as needed. Tool support for each sub-problems then sounds much more achievable:

  • building modular test scenarios for individual components, instead of the whole system
  • automating quality gates for performance in the CI/CD pipline
  • spinning up test instances of services on demand if needed and injecting or retrieving test data
  • evaluating test results based on per-endpoint SLIs (service level indicators) and giving binary answers to the question "is it good enough" based on service level objectives.  
  • orchestrating testing activities between teams, and managing dependencies

The remaining hard problem is to coordinate those testing activities, and to ensure that the tests realistically simulate real user behavior. Tool vendors, here is your opportunity to streamline the testing process!