VLDB & HDFS Data Protection

Big Data Masking & Test Data Generation

IRI customers can discover, classify, and protect the sensitive and personally identifiable information (PII) in their big data environments with affordable software solutions front-ended in Eclipse.


IRI's data-centric security solutions support multiple legacy and modern data sources with multiple masking and encryption functions that run identically in your local file system (with the power of CoSort) and Hadoop clusters for unlimited scalability.


Mask existing production data, or generate safe test data from scratch, using a powerful Eclipse GUI that includes or supports:

  • IRI FieldShield - for data masking before, during and after big data integration, federation, MDM, BI, etc.
  • IRI RowGen - for smart, big test data generation
  • IRI Voracity  - the total data management platform supporting both of the above, along with big data packaging and provisioning.
  • IRI Chakra Max -  to granuarly and rapidly firewall high-traffic, very large database (VLDB) environments with activity monitoring (DAM), audit and protection (DAP) technology

Big Data Masking

Target each item with a data protection function from 12 categories per business and data privacy rules.

For example, choose format-preserving encryption or tokenization for credit card values, pseudonymization for names, randomization for ages, redaction for formulas, and character masking on national ID values.

For more information, see:

Solutions > Data Masking

What's "big data" about these?

The ability to profile (discover) and protect (mask) PII in both newer Hadoop Hive, etc., NoSQL, and cloud/SaaS platform sources, as well as in massive structured data volumes (flat files and DB tables) directly. And, the ability to search, extract, and structure PII values in unstructured data sources through the new data restructuring wizard in the IRI Workbench GUI, built on Eclipse™. IRI data masking jobs can leverage either the big data processing engines of CoSort (multi-threaded) in traditional file systems, or via Hadoop (multi-node) Map Reduce 2, Spark, Spark Stream, Storm, and Tez in HDFS.

Big Test Data

Generate and populate massive volumes of safe, realistic test data in file, table, and report targets.

Use production metadata - but not production data - to build structurally and referentially correct volumes that conform to the appearances, value ranges, frequency distributions, and layouts of real-world VLDB, EDW, and HDFS environments.

For more information, see:

Solutions > Test Data