Dimension Modeling: Late Arriving Dimensions

Dinesh Shankar
3 min readMay 2, 2022

--

Overview

A data warehouse designed using the principles of dimensional modeling techniques will require all the dimensions loads to be completed before starting the fact table ETL load.In certain cases the fact records appear before the dimension data is loaded.

Example:Online ordering analytics data warehouse receives orders data from order processing ETL before new product data is loaded into products dimension table.

In this article I will cover the various design approaches to handle early arriving fact or late arriving dimension data.

Design Choices

Late arriving dimension data is common scenario in the data warehousing environment.This can happen in various scenarios like ETL job failures, systems going out of sync or replication delays. Below are the options in using which the late arriving dimension scenarios can be handled.

  1. Default Dimension Record
  2. HOLD Fact Record
  3. Infer Dimension Record

Option 1: Default Dimension Record

First option is to assign the “unknown” dimension value to the fact records.The unknown records will be maintained the backlog table which can be reprocessed with the actual dimension value once the dimension data is loaded.

Pros:

  • No delay in loading the fact record

Cons:

  • Late Dimension records are grouped in unknown bucket which will lead to inaccuracy in reporting.

Option 2: HOLD the FACT Record

Second option is to hold the fact record in the a backlog table until the dimension record arrives. Backlog table contains a flag which denotes the processed records. Every day ETL job checks for the unprocessed records against the dimensions and updates the flag accordingly.

Pros:

  • Easier to implement
  • Suitable for batch processing with nightly loads which are not sensitive for data delays

Cons:

  • Fact record is not available for reporting until the dimension record the loaded. Business teams might be impacted due to missing data.

Option 3: Infer the Dimension record

Third option is to insert the dimension record with a new surrogate key and use the same value to load the fact table. This option is less feasible when enough information about the dimension is not available in the fact table.

Summary

There are multiple ways to handle the late arriving date problem but choosing the right approach entirely depends on the business use cases ,stakeholder requirements and nature of the data. Impact of Data accuracy and missing data plays a vital role in the design choice.

Thanks for reading!

--

--