Snowflake ETL and PII Masking

 

Fast, Affordable Data Mapping & Governance

Challenges

 

You may face one or more of these time-consuming issues working with Snowflake:

  • Data searches, profiling, and/or classification
  • Integrating or wrangling data for DW/BI ops
  • Data movement/migration to/from tables
  • Transforming or loading large tables
  • Change data capture or replication
  • Clustering or query performance
  • Generating smart, safe test data
  • Masking sensitive data

Specific performance diagnoses and tuning also take time and may affect other users. Finally, stored SQL procedures may also be programmed inefficiently, require optimization, then still take too long to run.

Solutions

IRI CoSort to pre-sort flat files for bulk loads and inserts, and to bypass slower in-database transformation like sorting, joining and filtering by using the external CoSort SortCL data processing program against Snowflake data. This removes the overhead of that work from Snowflake if it needs to be done, improving the performance of clustering and commonly performed queries.

IRI Voracity to leverage the multi-threaded, memory-optimized, and task consolidating power of CoSort to perform ETL operations and act as a production analytics platform to simultaneous prepare, package (and even report) simultaneously. For more information, see the tabs here.

IRI NextForm Database Edition to acquire, re-map, re-format, and build/populate new tables during migrations to and from Snowflake. You can also use NextForm or the SortCL program in CoSort to re-map and convert data in Snowflake, produce custom reports, copies, and federated views of data.

IRI FieldShield to classify, find and mask structured (or IRI DarkShield for semi/unstructured) data in Snowflake columns, like personally identifiable information (PII) or protected health information (PHI). Apply redaction, encryption, pseduonymization, blurring and other de-identifying functions to comply with privacy laws like HIPAA, PCI DSS, FERPA, and GDPR and support DevOps. For structured data, see how you can connect to Snowflake here, and mask and map data in Snowflake here.

IRI RowGen to populate Snowflake operations rapidly with safe test data. RowGen uses relational data models to generate realistic test data automatically for an entire database or DataVault 2.0 models with referential (or business-key) integrity. IRI RowGen, FieldShield, and subsetting operations are also tightly integrated with the ValueLabs Test Data Hub for test data management (TDM) in Snowflake.