Test Data Virtualization


Create, Provide and Manage Custom Test Sets

Multiple Goals


The creation and management of safe, intelligent test data continues to remain a vexing part of QA and development cycles. Manual and off-the-shelf approaches have proven to be time-consuming and costly, and their incomplete solutions have resulted in inadequate testing and missed SLA/delivery dates. Using the latest, still unmasked production data as a fail safe, is simply unsafe.


By leveraging the long-proven SortCL program and graphical facilities of Eclipse (IRI Workbench) that comprise the IRI Voracity data management platform, you can address multiple, complex test data requirements. These requirements -- beyond those discussed in the other tabbed pages of this section involving realistic file or schema targets -- may involve more customized, virtual test sets.


One of the lesser known, but inherent benefits of Voracity as a data integration and governance platforms, is its ability to combine static and streaming ETL with simultaneous data masking, data synthesis, data transformation, and custom formatting. These features -- normally and also available standalone in IRI FieldShield, RowGen, and CoSort respectively -- are how Voracity enables both ad hoc and automated capture, manipulation, and provisioning of both ad hoc (virtual) and persistent test sets ... test sets that reflect production data precisely without compromising any of its confidentiality or affecting any live systems.

Multiple Options

First consider the business rules driving your need for an ad hoc solution. IRI provides advice on considering them in this series of test data management articles, and several facilities to help you discover the data you have to work with in sources like these; i.e., in files, databases, to dark data documents.

Your test targets may need a combination of data masking and synthesis like this. Or, you may want to mask and thus produce realistic test data while:

Once techniques are decided upon, you can also choose how to design the job(s), modify and/or share then, and how and where to run them. Voracity supports multiple job design and runtime methods; see the IRI Workbench section on this page. And for every generation process, multiple differently formatted persistent and virtual targets can be defined simultaneously. Such efficiency and flexibility are especially valuable to DevOps teams who need to work in parallel.

Unlike other virtual TDM solutions, with IRI you do not need to clone databases, set up a virtual TDM appliance, or anything that complex (or expensive). Test data engineers can serve up as many persistent or virtual copies as they need, and immediately populate their testers' repositories as the test data is generated. However if you do want to a fully masked or synthetic database clone, IRI FieldShield and RowGen jobs can be run as scripts called simultaneously from Actifio, Commvault, and Windocks (virtualized container image) operations!

IRI subsetting, masking, and synthesis jobs for structured data are also supported in the Value Labs Test Data Hub we application, which produces data sets on demand into file, DB and API targets. For TDM involving semi-structured (e.g, HL7, JSON, and XML) and unstructured text or file (e.g., PDF, MS Office, and image data), you can make application or web services calls to the DarkShield API supporting the same masking functions and an extended set of search methods to find and de-identify production data for test targets.

Finally, the governance of test data can be just as important as the governance of your production data. In addition to the inherent data security governance in Voracity's many static data masking functions, multiple data quality features allow you to validate and stabilize the collections. Worfklow diagrams and automated batch file generation support graphical design of independent and dependent work chains. And, multiple data and metadata lineage options are supported so that you can track the changes to source data and your test data projects.