Testing applications is a maturing discipline with tools that help quality assurance teams develop and automate functional tests, run load and performance tests, perform static code analysis, wrap APIs with unit tests, and validate applications against known security issues. Teams practicing devops can implement continuous testing by including all or a subset of their automated tests in their CI/CD pipelines and use the results to determine whether a build should be delivered to the target environment.
But all these testing capabilities can easily ignore one crucial set of tests that is critical for any application processing or presenting data, analytics, or data visualizations.
Is the data accurate and are the analytics valid? Are the data visualizations showing results that make sense to subject matter experts? Furthermore, as a team makes enhancements to data pipelines and databases, how should they ensure that changes don’t harm a downstream application or dashboard?
In my experience developing data and analytics rich applications, this type of testing and validation is often a second thought compared to unit, functional, performance, and security testing. It’s also a harder set of test criteria to do for several reasons:
- Validating data and analytics is hard for developers, testers, and data scientists that are usually not the subject matter experts, especially on how dashboards and applications are used to develop insights or drive decision-making.
- Data by itself is imperfect, with known and often unknown data-quality issues.
- Trying to capture validation rules isn’t trivial because there are often common rules that apply to most of the data followed by rules for different types of outliers. Trying to capture and code for these rules may be a difficult and complex proposition for applications and data visualizations that process large volumes of complex data sets.
- Active data-driven organizations are loading new data sets and evolving data pipelines to improve analytics and decision-making.
- Data-processing systems are often complex, with different tools for integrating, managing, processing, modeling, and delivering results.
The first-time teams present bad data or invalid analytics to stakeholders is usually the first wake-up call that their practices and tools may be needed to test, diagnose, and resolve these data issues proactively.