Data Discovery in Eclipse
Profile Database, Flat-File, and Dark Data Sources
Within its single pane of glass built on Eclipse™, IRI Voracity® offers multiple data discovery and metadata definition tools for searching and organizing your data sources on local and remote systems:
Data Classification
Define enterprise-wide data class libraries, automatically search your sources and catalog the data in them, and then apply transformation and protection rules that you matched to your classes.
Metadata Discovery
Connect to structured and semi-structured files and relational databases. Define or re-define column names, offsets, and data types so you can save, share, and re-use the metadata for your data sources in central data definition files (DDFs) that are compatible with every IRI software application.
Database Profiling
Compile statistics, check referential integrity, and search for lookup, string-, pattern-, and fuzzy-matching values in any JDBC-connected data source.
Flat-File Profiling
Compile statistics, and search for lookup, string-, pattern-, and fuzzy-matching values in any sequential file format that IRI supports.
ER Diagramming
Define enterprise-wide data class libraries, automatically search your sources and catalog the data in them, and then apply transformation and protection rules that you matched to your classes.
Directory Data Class Search
The Directory Data Class Search wizard in IRI Workbench (WB) matches data in structured files within one or more directories to configured data classes. The search process compares the matchers in the data classes with the data in those files to determine the best match, if any. The matchers can be either patterns or set file lookups. If only a few, selected structured files need to be searched, use the Data Class Library editor for faster results
Schema Pattern Search
Find data from an entire schema using specific patterns such as credit cards or ID numbers. Automatically scan through every column in the schema rather than one table at a time. You can also associate these results with data classes this way.
Dark Data Search
Find data that match the patterns or values within in a lookup files within the MS Office and Outlook files, .pdf and .rtf documents, NoSQL DB collections, HTML, JSON, XML, or other text (log) files, plus images and faces, that "hide" on your computer or LAN. Extract that dark data and its associated metadata into flat, query-ready DDF files. Simultaneously mask that data with IRI DarkShield.
Schema Data Class Search
Find and leverage all data schema-wide that matches attributes of your data classes or data class groups. Automatically scan through every column in the schema rather than one table at a time. Use this in conjunction with the Data Class DB Masking wizard.
There is also a Directory Data Class Search (and corresponding Data Class File Masking) wizard to find and de-identify PII in one or more flat-files distributed across a LAN.
Data Quality Assessment
Use pattern definition and computational validation scripts to locate and verify the formats and values of data you define in data classes or groups (catalogs) for the purposes of discovery and function-rule assignment (e.g., in Voracity cleansing, transformation, or masking jobs). You can also use SortCL field-level if-then-else logic and 'iscompare' functions to isolate null values and incorrect data formats in DB tables and flat files. Or, use outer joins to silo source values that do not conform to master (reference) data sets. Use data formatting templates and their date validation capabilities, for example, to check the correctness of input days and dates.