Implementing bi-temporal historization

In my last post, I explained about bi-temporal historization. Today, I’d like to shed some light into how we implemented this feature in MS SQL Server (the usual tool of my choice). For the implementation, there are a couple of criteria to consider:

  • one or many source tables
  • full load or delta load
  • are there gaps in the business history

For this post, we’ll jump right in, and do it the hard way. I’ll build a SQL statement, that can do delta loads, takes multiple source tables as input and even handles gaps. The strategy that we use is to first select the Business keys (remember, they consist of the natural Key-fields and the business valid from Attribute). In a second step we join all the other Attributes to the business keys. Finally, the data is insert (or updated) in the destination table.

 “So how do we get at our business keys?”

imageIf you look at the picture at the right, you see that there are two source tables, Customer and Address, that provide data for our DimCustomer destination table. The data starts January 1st 2010, and there are some changes in the data for the Address in 2014 and 2015. In 2015 there is a gap between January 1st and March 31st, wher we have no information about the wherabouts of our client. The gap will be handled, as unkown data (so the fields will be filled with NULL values). Combined with the changes in the other source table, this means that there are 6 time slices – or business keys – for this customer.

So actually, the code for the first part is pretty simple:

WITH SrcDates AS — CTE will be used later on
(SELECT [Cust_No], [ValidFrom] FROM [src].[Customer]
  UNION
  SELECT [Cust_No], [ValidFrom] FROM [src].[CustomerAddress]
  UNION
  SELECT [Cust_No], DATEADD(DAY,1,[ValidTo]) FROM [src].[Customer] WHERE [ValidTo] < ‘9999-12-31’
  UNION
  SELECT [Cust_No], DATEADD(DAY,1,[ValidTo]) FROM [src].[CustomerAddress] WHERE [ValidTo] < ‘9999-12-31’
)

Ifyou don’t have any gaps, you will not need the SELECT with the ValidTo, and I have asumed, that the source system fills the ValidTo field with the “infinity” date (December 31st 9999).

If ou only do Full loads, you are now allready done with your Business Key selection. But alas, we still have some work to do. Let’s suppose, that we have allready inserted the data from the initial load. In our next load, we see, that our client has moved. That means, that the old Addressis outdated (the valid from date is change from infinity to June 13th 2017) and a new Address record is created, that is valid form the day after. When you look at the picture below, you see that we three time slices, that might have to be updated or inserted.

image

So, here’s one way to pull this off in T-SQL:

;WITH CustomerTimeSlics AS — CTE will be used later on
(SELECT [Cust_No], [ValidFrom] FROM SrcDates
  UNION
  SELECT dc.[CustomerBK], [ValidFrom] = dc.[BusValidFrom]
  FROM [dest].[DimCustomer] dc
  INNER JOIN (SELECT [Cust_No], MinDate = MIN([ValidFrom])
             FROM SrcDates GROUP BY [Cust_No]) src
                ON  src.[Cust_No] = dc.CustomerBK
                AND dc.[BusValidTo] >= src.MinDate
)

This leaves us with the last part, selecting the Attribute information from the sources and/or the destination:

SELECT [CustomerBK] = cts.[Cust_No]
       ,[BusValidFrom] = ISNULL(cts.[ValidFrom], dim.[BusValidFrom])
      ,[Rating] = ISNULL(c.[Rating], dim.[Rating])
      ,[Address] = ISNULL(ca.[Address], dim.[Address])
       ,[PostCode] = ISNULL(ca.[PostCode], dim.[PostCode])
      ,[City] = ISNULL(ca.[City], dim.[City])
      ,[Country] = ISNULL(ca.[Country], dim.[Country])
      ,[BusValidTo] = LEAD (DATEADD(DAY,-1,cts.[ValidFrom]), 1, ‘9999-12-31’) OVER (PARTITION BY cts.[Cust_No] ORDER BY cts.[ValidFrom] ASC)
  FROM CustomerTimeSlice cts
  LEFT OUTER JOIN [src].[Customer] c
    ON c.[Cust_No] = cts.[Cust_No] AND cts.[ValidFrom] BETWEEN c.[ValidFrom] AND c.[ValidTo]
  LEFT OUTER JOIN [src].[CustomerAddress] ca
     ON ca.[Cust_No] = cts.[Cust_No] AND cts.[ValidFrom] BETWEEN ca.[ValidFrom] AND ca.[ValidTo]
  LEFT OUTER JOIN [dest].[DimCustomer] dim
    ON dim.[CustomerBK] = cts.Cust_No AND cts.[ValidFrom] BETWEEN dim.[BusValidFrom] AND dim.[BusValidTo]
— don’t forget to filter for the current SCD record in DimCustomer

If you are in a full load scenario, you can of course leave out the references to the destination DimCustomer. Notice also the calculation of the BusValidTo field with on of those helpful analytical functions. I’ll leave the gory implementation details like figuring out whether to Update, Insert or do nothing, the SCD2 handling etc. to you.

Below, you find the definition of the Tables I’ve used in my example, as well as some sample data for you to try things out.


CREATE TABLE [src].[Customer](
    [Cust_No] [varchar](50) NOT NULL,
    [Name] [varchar](50) NULL,
    [Rating] [varchar](10) NULL,
    [ValidFrom] [datetime2](7) NULL,
    [ValidTo] [datetime2](7) NULL
) ON [PRIMARY]

GO

CREATE TABLE [src].[CustomerAddress](
    [Cust_No] [varchar](50) NOT NULL,
    [Address] [varchar](50) NULL,
    [PostCode] [varchar](10) NULL,
    [City] [varchar](10) NULL,
    [Country] [varchar](10) NULL,
    [ValidFrom] [datetime2](7) NULL,
     [ValidTo] [datetime2](7) NULL
) ON [PRIMARY]

GO

CREATE TABLE [dest].[DimCustomer](
    [CustomerID] [int] IDENTITY(1,1) NOT NULL,
    [CustomerBK] [varchar](50) NOT NULL,
    [Rating] [varchar](10) NULL,
    [Address] [varchar](50) NULL,
    [PostCode] [varchar](10) NULL,
    [City] [varchar](10) NULL,
    [Country] [varchar](10) NULL,
    [BusValidFrom] [datetime2](7) NOT NULL,
    [BusValidTo] [datetime2](7) NULL,
    [SCDValidFrom] [datetime2](7) NULL
) ON [PRIMARY]

GO

–Insert Initial Load:
TRUNCATE TABLE [src].[Customer]
INSERT INTO [src].[Customer]
           ([Cust_No],[Name],[Rating],[ValidFrom],[ValidTo])
     VALUES
           (‘CH00001′,’ABC AG’,’A’,’2010-01-01′,’2014-09-30′),
           (‘CH00001′,’ABC AG’,’B’,’2014-10-01′,’2016-12-31′),
           (‘CH00001′,’ABC AG’,’B’,’2017-01-01′,’9999-12-31′),
           (‘CH00002′,’XYZ GmbH’,’B’,’2010-01-01′,’2013-05-19′),
           (‘CH00002′,’XYZ GmbH’,’C’,’2013-05-20′,’2015-07-31′),
           (‘CH00002′,’XYZ GmbH’,’B’,’2015-9-1′ ,’9999-12-31′)
GO

TRUNCATE TABLE [src].[CustomerAddress]
INSERT INTO [src].[CustomerAddress]
           ([Cust_No],[Address],[PostCode],[City],[Country],[ValidFrom],[ValidTo])
     VALUES
        (‘CH00001′,’Bahnhofstr. 1′,’8000′,’Zürich’,’CH’, ‘2010-01-01′,’2014-03-31’),
        (‘CH00001′,’Bahnhofstr. 10′,’8000′,’Zürich’,’CH’,’2014-01-01′,’2015-01-01′),
        (‘CH00001′,’Bahnhofstr. 10′,’5400′,’Baden’,’CH’, ‘2015-04-01′,’9999-12-31’),
        (‘CH00002′,’Seefesldstr. 900′,’8000′,’Zürich’,’CH’,’2010-01-01′,’2010-01-10′),
        (‘CH00002′,’Seefesldstr. 900′,’8008′,’Zürich’,’CH’,’2010-01-11′,’2015-9-30′),
        (‘CH00002′,’Seefesldstr. 900′,’8000′,’Zürich’,’CH’,’2015-10-1′,’9999-12-31′)

——————————————————————————-

— Insert Data for Delta load
TRUNCATE TABLE [src].[Customer]

TRUNCATE TABLE [src].[CustomerAddress]
INSERT INTO [src].[CustomerAddress]
           ([Cust_No],[Address],[PostCode],[City],[Country],[ValidFrom],[ValidTo])
     VALUES
         (‘CH00001′,’Bahnhofstr. 10′,’5400′,’Baden’,’CH’,’2015-04-01′,’2017-06-13′),
         (‘CH00001′,’Bahnhofstr. 20′,’5400′,’Baden’,’CH’,’2017-06-14′,’9999-12-31′)


Leave a Reply

Your email address will not be published. Required fields are marked *