Sort / Merge Utility Operations
Fast Sorting and Merging of Big, Structured Data
Sorting remains a critical component of data processing. Ordering data is part of:
- database load, index, and query/search operations
- data warehouse sort, join, and aggregate transformations
- reporting, analytical, and testing environments
But as data source sizes increase, from hundreds of megabytes into terabyte levels and beyond, sorting can place an exponential demand on computing resources.
Mainframe sort/merge utilities are expensive to operate, use cryptic JCL syntax, and are functionally limited. Sort functions in databases, ETL and BI reporting tools, operating systems, and compilers are just not designed for big data.
Robustness Issues |
Management Concerns |
sort speed and scalability in volume |
sorting and related functionality |
data and file type support |
GUI and/or parm syntax simplicity |
event monitoring and error handling |
metadata framework |
performance tuning and logging |
pricing and licensing models |
plug-in compatibility or parm conversion accuracy |
technical support speed |
third-party hardware and software interoperability |
vendor capabilities and reputation |
implementation paradigm |
skills gap (e.g. Hadoop), maintenance costs |
Solutions
As the volume of data grows, so grows the value of IRI CoSort. CoSort is the world's first commercial sort/merge package for open systems. It continues to sell as a robust, commercial grade, cross-compatible:
- Unix file sort utility
- Windows sort program
- ETL, BI, and DB sort verb alternative
- Mainframe JCL sort/merge replacement
with state-of-the-art performance, industry-leading functionality, and the most familiar, intuitive user interfaces ... and without more hardware, Hadoop, in-memory DBs, or appliances.
CoSort will sort any number, size, and type of structured fields, keys, records, and files -- including mainframe binary, IP addresses, multi-byte Asian characters, Unicode, and so on. The CoSort engine scales linearly in volume, and allows granular tuning of CPU, memory, disk, and related resources. Multiple gigabytes sort in seconds on multi-CPU servers.
___________________________________________________________________________________
127,268,900 rows * 405 bytes/row = 51.5GB input file
CoSort total job time w/20-byte sort key @ 131 seconds = 2m:11s
Platform: x86 Linux development server using 32 of 64 cores
___________________________________________________________________________________
CoSort can also replace or convert third-party sort functions with proven libraries, tools, or services - saving time and money in batch operations and embedded applications. Ask about special incentives for migrating from a legacy sort product, and discounts for integrated distribution.
Sorting is Just the Beginning
CoSort also delivers the unique ability to simultaneously transform, migrate, report, and protect data at risk. The CoSort Sort Control Language (SortCL) program combines these functions in the same job script and I/O pass. Map multiple sources to multiple targets and formats while you sort.
SortCL is only one of several interfaces in the CoSort package available for standalone or integrated sort/merge operations. All sorting and transformation jobs can be scheduled, monitored, logged, audited, and otherwise managed in the IRI Workbench GUI, built on Eclipse™.
Beyond the CoSort package, these same SortCL-driven operations are also integral to the CoSort-including IRI Voracity data management platform where big data discovery, integration, migration, governance, and analytics are performed and combined. In Voracity, the CoSort sort engine (and SortCL scripts) are automatically used in (and created for): ETL, change data capture, DB subsetting, pseudonymization, synthetic test data, data wrangling, and bulk DB loading operations.