Text Files

 

Convert and Process Fixed and Delimited Formats

Challenges

Structured

 

Structured text files are fixed or variable, sequential (flat) files that can be as small as one record or contain billions of archived rows from database extracts, web logs, transaction feeds, mainframe datasets, and other applications.

 

You may need to:

  • Sort a huge text file
  • Extract data or create a report from a text file
  • Convert between text and XML file formats
  • Convert a text file to another format
  • Encrypt or de-identify fields in an text file
  • Load text data to a spreadsheet or database
  • Reformat a text file from legacy or binary data

 

You may need to do more than one of these functions at the same time, and with many massive source and target files.

Unstructured

 

Unstructured text sources, including files and repositories in these formats:

  • ASN.1 TAP3
  • .DOC, .DOCX
  • .EML, .OST, .PST
  • .PDF, RTF
  • .PPT., .PPTX
  • .TXT, .XML
  • .XLS, .XLSX

 

can be converted, but the data within them cannot be readily extracted or used in the ways structured data can.

Solutions

Text File Conversion Only

Use IRI NextForm to convert structured text files to other formats (e.g., CSV, ODBC, XML, etc.), or from other formats to text. NextForm supports data type conversion at the field level, and record layout remapping. The NextForm 'Unstructured data' edition can parse and structure data in unstructured text files for the operations described on this page and throughout the IRI product stack.

NextForm file definitions also work in SortCL programs under IRI CoSort. Re-use the metadata if you upgrade to CoSort for fast data transformation and reporting.

Text Data Transformation & Conversion

The SortCL program in CoSort can:

  • transform the data (i.e., sort, join, aggregate, cross-calculate, etc.) in text files
  • convert text files to other file formats and create text files from those formats
  • report from text file sources

using a simple 4GL for layout and manipulation definitions, or a powerful free GUI built on Eclipse.

Map one or more input files in text format to and from other file formats. Create detail, summary, or delta (change data capture and slowly changing dimension) reports from text files sources. Hand off pre-sorted, filtered, and converted subsets to BI tools, database load utilities, or other applications.

Text Data Masking & Test Data

Use IRI FieldShield to protect fields in structured text files with encryption, masking, etc.

Use IRI RowGen if you need test data in text file formats. RowGen uses the same layout metadata as CoSort (SortCL) and NextForm so you can easily move between test data generation and real data transformation.