Slowly Changing Dimensions


Use 'Fuzzy Logic' Look-ups for Discrete Reporting Solutions

Challenges

 

Slowly changing dimensions, or SCD, is the problem in data warehousing of tracking changes in the values (facts) of a datum. "Slowly" implies time but not necessarily "slow" time; the concepts are the same if changes occur in seconds or centuries. The interval between changes need not be consistent. The search argument(s) must be unique, and the resultant value is discrete.

 

Slowly changing dimensions are discussed on numerous sites, with reference to the known techniques for storing and accessing such data. Basically, the user can ignore changes; overwrite the existing fact; expand the stored record; or, create additional records (tuple-versioning) using surrogate keys. This is often a complex process in ETL tools or SQL.

Solutions

IRI took a fresh approach to reporting on slowly changing dimensions. The CoSort product's SortCL program uses a high-performance, fuzzy logic search for fact data in set files. 

 From this core capability now also comes a new visual SCD job creation wizard for types 1, 2, 3, 4 and 6 that IRI Voracity users can leverage at no charge in the IRI Workbench GUI, built on Eclipse™. 

 Query for discrete values based on changing information like date and time. For example, given an arbitrary search date, find and display the address that was in effect before, on, or after that date.

 Because you're operating on data in fields, which change at different times, you can use more than one search argument to determine the returned value.

While fundamentally basic, IRI's file system approach to slowly changing dimensions offers opportunities for simplicity, reduced storage, speed, and increased capability. It enables:

  • very fast look-up performance
  • SCD types 0-6 update reporting
  • searches on any strictly increasing values
  • complex, multi-level search criteria
  • simple job script maintenance and sharing
  • new values to be quickly applied and integrated
  • support for built-in comments
  • the elimination of DB overhead, reorgs, etc.

By using SortCL scripts or the Voracity wizard in Eclipse for SCD reporting, you can also integrate sorting, expression evaluation, aggregation, new formatting, encrypting, etc. - all in the same job script and I/O pass. See this blog series for more details.