Practical Tips for ETL Testing
By: Kaushal Amin
The main component (brain and heart) of the data warehouse system is data Extraction, Transformation, and Loading (ETL). The challenge is to extract the data, often from a variety of disparate systems, to transform that data, so that it is uniform in terms of format and content, and finally to load the data into a warehouse where it can serve as the basis for future business intelligence needs.
It's very important that the integrity of the data is maintained. It must be stored clearly and concisely without loss, and it must be accessible for analysts. In some industries, such as healthcare, there will be strict governance rules and potentially severe consequences for error. If data warehouses are to deliver value, they require careful ETL testing to ensure that processes are working as intended.
Unique challenges of ETL testing
Testing the data warehouse and verifying that ETL processes are working correctly is very different to traditional application testing. Here are some of the main barriers to overcome:
There is no user interface – In traditional application testing, testers can validate functionality via the input / output of data through the user interface. The user interface is expected to behave a certain way based on data input and actions taken. In data warehouse testing instead of a user interface that a tester can generally grasp quite quickly, you have data and its relationships. The key testing skills needed here are the ability to look at data, validate data processing rules, and analyze final data output. Knowledge of database query languages, such as SQL, is essential for testers to accurately do this where traditional manual testing skills are not enough.
The volume of data is huge – We could be looking at millions of transactions per day. To verify that the extraction, transformation, and loading of that data working as designed, in real-time as code is updated.
Data is from multiple sources – This is not a single system, it could be a long list of different systems that are feeding daily transactional data in the warehouse. Some of the data may even come from other systems utilized in cloud computing or hosted by 3rd party. The systems and their data won't necessarily agree on the format and content with each other. How do you collate that data and make sure everything gets processed consistently and connected with each other?
Incomplete data and exceptions– The information collected by various source systems may not be complete and in many cases full of exceptions. You may find that some systems collect more data than others. You may find the data is incomplete or imperfect.
Rules are not static – The source systems can change over time due to release upgrades or witching to another 3rdparty vendor's solution. You must be able to cope with these changes without having to redesigning data warehouse.
High level approaches for ETL testing
The entire process is about taking data from a source file, cleaning it up in a temporary staging area, and then pushing it into the data warehouse. It's vital to ensure that no data goes missing or gets incorrectly transformed. There are two high level approaches to ETL data validation that can be considered:
Approach I: Validate the data from the data sources directly in the data warehouse. This approach validates that all the data in the source appears in the data warehouse according to the business rules. This approach does not validate the intermediate staging area and transformation processes in between the source and data warehouse.
Approach II: Validate the data from the data sources through each step of the extract and transformation process including final load in the data warehouse. Validate the data at each transformation. For example, in stage one you would take the data from the file and verify it's all there; in stage two, you would remove any junk information that you don't want to process; in stage three, you would translate the data to clean it up and make it uniform, and so on
... to read more articles, visit http://sqa.fyicenter.com/art/