SortCL Functionality

 

Combine Data Transformation, Migration, Masking & Reporting

What Can SortCL Do?

 

The Sort Control Language (SortCL) program in the IRI CoSort product or IRI Voracity platform accepts multiple inputs, including:

  • sequential (delimited or fixed-position), COBOL index, and semi-structured (flat JSON/XML) files
  • pipes
  • relational (and some NoSQL) database tables (collections) via ODBC
  • URLs for static and streaming sources, including S3/GCP/AzureBlob, HTTP/S, FTP/S, HDFS, MongoDB, Kafka, and MQTT
  • user procedures

in multiple formats, processes them in many ways, and produces one or more targets in multiple formats -- as well as customized reports -- all at once. See the table below and this diagram in the context of CoSort, or the data integration, migration, governance, and analytic portions of this diagram in the broader context of Voracity.

 

Specifically, SortCL can, in one job script and I/O pass, rapidly perform and combine data transformation, conversion, protection, reporting, and related processes:

 

Function

Actions

Filter

At the byte, field, and record level, plus duplicate removal and saving

Segment

Conditional (include/omit) selection with if-then-else, else-if logic

Sort

Multiple keys, directions, sequences

Merge

Two or more pre-sorted files

Join (Match)

Two or more un/sorted sources on many conditions for ETL, file compares and change data capture (delta reporting) ops

Aggregate

Parallel roll-up and drill-down sum, min, max, average, and count values; accumulate (running); rank; lead and lag (sliding value windows)

Check

Verify source data is pre-sorted prior to sort or join operations

Re-Map

Resize, reposition, and realign fields

Convert

Change data types (e.g., EBCDIC<>ASCII, Packed<>Numeric)

Re-format

Convert between file formats (e.g., Text <>XML<>VS<>RS<>ISAM<>Vision<>LDIF<>CSV<>JSON)

Pivot / Unpivot

De-normalize and normalize dimensional layouts

Cleanse

De-duplicate, validate, homogenize, filter, find/replace, and re-structure

Enrich

Integrate and segment data enhance row and column detail; create new data forms and layouts through conversions, calculations and expressions, and composite (templates)

Migrate DBs

via remapping and replication of columns and tables

Calculate

Math and trig functions across detail and summary rows, plus internal and external stats functions

Sub-string

Bit-level manipulations and Perl-compatible regular expression logic for pattern matching, etc.

Validate

Check that character and field attributes match their specifications (i.e., "iscompares", gap analysis)

Sequence

For custom indexing, reporting, and database load operations, plus UUID/GUID value insertion

Set Lookup

Discrete field substitutions, pseudonymization, etc., using "set" file field dimensions

Fuzzy Lookup

For slowly changing dimension (SCD) reporting and data quality

Federate

Get discrete (lookup) values and virtualize results in reports and replicas

Mask (Protect)

Encrypt and mask data at the field level and audit data security measures; also anonymization, de-identification, filtering, and pseudonymization

Mask (Format)

Numeric and date layout masking to replace and customize new value formats

Lookup

Discrete or random draws from set files for use in ETL lookup transforms, pseudonymziation, and test data generation

Synthesize

Create randomly-generated or set-selected (safe) test data files (see RowGen)

Report

Custom-formatted, segmented detail, and summary targets

Replicate

Copy, manipulate, and move data from one or more sources to one or more targets

Custom

Complex field-level user functions (e.g., 3rd-party DQ libraries)

Beyond data staging, manipulation, and migration, use SortCL to report on changed data (inserts, updates, deletes), slowly changing dimensions, and trend line intersection.

 Additional SortCL features support: metadata and master data management, clickstream analytics (data webhousing), real-time and near-real-time processing, customer data integration and segmentation, data wrangling (data preparation for BI and analytics), and data governance objectives.