XML files

 

Convert, Transform, Mask, Report, Test

Challenges


Though XML is a popular file interchange format for SOA, database, and other applications, it has not been a practical format for carrying large data volumes. In addition, conversions between legacy index or flat files and XML rely on slower parsing technologies like XSLT, and they do not simultaneously enable processing.

 

Transforms using XQuery cannot turn high volumes of XML data into meaningful information in XML (or any other format) quickly, if at all. There has been no efficient way to rapidly convert, process, protect, or create huge XML files.

 

For example, you may need to:

  • Sort a huge XML file
  • Extract data, or report, from an XML file
  • Convert a CSV, LDIF, or other file to XML
  • Convert XML to text, CSV, LDIF, ISAM, etc.
  • Mask, encrypt or otherwise de-identify fields in an XML file
  • join data in an XML file with another XML, or different, source
  • Load XML data to a spreadsheet or database
  • Create an XML file from a legacy or extract file
  • Generate test data in XML file formats

 

You may even need to perform more than one of these functions at the same time, against many massive source and target files.

Solutions

IRI delivers XML and other file conversion functionality in several products. Choose based on need:

Use the free Lite edition of IRI NextForm to convert huge, flat XML files* to other formats (like CSV, LDIF, COBOL, text, etc.), or from those other formats into XML. If your XML files are semi-structured or more unstructured, you can use the:

  1. dark data discovery wizard to search and structure the elements you need based on pattern matches
  2. Sonra's Flexter Data Liberator add-on for Voracity
  3. NextForm Legacy Edition to view the your source files in the data source explorer using an XML driver.

NextForm includes an XML file parser to automatically create the XDEF field layouts used in the file conversion scripts. NextForm also supports data type conversion at the field level, and the remapping of record layouts. NextForm job definitions also work in CoSort SortCL and Voracity if you later decide to upgrade!

Use the SortCL program in the IRI Voracity platform or IRI CoSort package to convert, transform, mask, report from, and create new XML file and other targets that represent structured data.

Declare one or more XML and non-XML files for input and output as part of any SortCL job involving data:

  • filtering (select, scrub, links to DQ tools)
  • transformation (sort, join, aggregate, calc, etc.)
  • conversion (data-type and file-format migrations)
  • reporting (CDC, detail and summary formats)
  • protection (field encryption, de-ID, masking)

SortCL makes all of these capabilities, one or more at a time, available to data architects who need to work with XML and other sources.

Use IRI FieldShield or IRI DarkShield to encrypt, redact or otherwise de-identify PII in XML files. Both tools share data classes and masking functions, and are supported in IRI Workbench and included in Voracity.

Use IRI RowGen to build XML test files. RowGen uses the same layout metadata as CoSort, NextForm, and FieldShield, so you can easily move between test data generation and real data transformation.

* XML data elements must conform to a flattened structure and extract one element of the same name at a given level. If you have multiple tags of the same name, IRI engines extract the last occurring tag of a given name. Field names must be unique, and comprise a single record with no additional dependencies.