Big Data Protection

Mask & Test in VLDBs, HDFS, Dark Data

Protecting Big Data Assets


You can now discover, classify, and protect personally identifiable information (PII) and other sensitive data in big data environments with affordable IRI software front-ended in Eclipse.


IRI's proven data-centric security solutions support multiple legacy and modern data sources with multiple masking and encryption functions that run identically in your local file system (with the power of CoSort) and Hadoop clusters for unlimited scalability.


Mask existing production data or generate safe test data from scratch using a powerful Eclipse™ IDE that includes or supports:

  • IRI DarkShield - to discover, deliver, and delete PII in unstructured text files, PDFs, and other dark data repository formats
  • IRI FieldShield - to find and mask PII in large flat or JSON files, or very large databases, before, during, and after ETL, analytics, etc.
  • IRI Voracity - the total data management platform supporting both of the above, along with big data packaging and provisioning
  • IRI RowGen - for smart, big test data generation and virtualization

What's "Big Data" About These?

The ability to profile (discover) and protect (mask) PII in both newer Hadoop Hive, S3, NoSQL, and cloud/SaaS platform sources, as well as in massive structured, semi-structured, and certain unstructured data volumes directly. Multiple data discovery and profiling wizards in the IRI Workbench IDE for Voracity, built on Eclipse™, allow you find, classify, extract, and redact PII in structured and unstructured sources. IRI data masking jobs leverage proven redaction and big data processing engines in multi-core servers or multi-node Hadoop environments.

Big Data Masking

Target each item with a data protection function from 12 categories per business and data privacy rules.

For example, choose format-preserving encryption or tokenization for credit card values, pseudonymization for names, randomization for ages, redaction for formulas, and character masking on national ID values.

For more information, see:

Solutions > Data Masking

Big Test Data

Generate and populate massive volumes of safe, realistic test data in file, table, and report targets.

Use production metadata - but not production data - to build structurally and referentially correct volumes that conform to the appearances, value ranges, frequency distributions, and layouts of real-world DB, DWH, and Hadoop environments.

For more information, see:

Solutions > Test Data