SSWUG vConf – Incremental Data Warehouse Loads with MERGE and Change Data Capture


2419154427_04dd3da737Presenter: Matt Masson
http://blogs.msdn.com/mattm/

This talk describes the new MERGE statement and Change Data Capture (CDC) feature introduced in SQL Server 2008, and how to take advantage of them in SQL Server Integration Services. Well cover the traditional ways of doing incremental data loads, and see how these new key features make the process a whole lot easier. If you’re interested in learning best practices for this new SQL Server 2008 functionality, you won’t want to miss this session!

CDC and Merge are new in SQL Server 2008

Also 2008 has improved SCD processing

Options to Capture Change

  • Audit columns – created via triggers
  • Log Scraping
  • Full DB Diff compare

(New Options) Change Data Capture in SQL Server

  • Changes captured from the log asynchronously
    • Really just integrated log scraping
  • Tells you what changed at the source
  • Enabled per table or DB
  • CDC API’s provide access to change data

CDC Components

  • CDC Tables
  • Capture Job
  • Cleanup Job
  • API

Change Tracking

  • Captures that a row was changed, but not the changed data
  • Synchronous capture mechanism
  • Less overhead for applications that do not require the historical information

Turn CDC on:

  • sp_cdc_enable_db
  • sp_cdc_enable_table
  • LSN = Log Sequence Number
  • Sp_cdc_generate_wrapper_function
    • Creates a custom function DDL for reading CDC data

Loading SSIS Package

  • Source from the CDC Wrapper function
  • Split on CDC operation
    • Insert
    • Update
    • Delete

Lookup Changes in SQL 2008

  • Explicit cache modes
    • Full
    • Partial
    • None
  • Cache connection manager

External Reference Data

  • Joins can work but can be difficult to manage
  • Instead, use the SSIS Lookup transform.

The MERGE Statement

  • Single statement can deal with Inserts, Updates and Deletes all at once
  • Use Cases
    • Conditionally insert or update rows in a target table
    • Synchronize two tables
  • Built to perform

Using MERGE in SSIS (Execute SQL Tasks in SSIS)

  • Stage
    • Store data in a temp table
  • Optimize
    • Create indexes on join cols
    • Provides addtl perf benefits
  • Execute
    • Run the MERGE SQL statement

When doing the insert, you must drop the FK constraints in the destination table.

Taking Advantage of MERGE

  • Consume CDC data
  • Update Dimensions
  • Alternative to the Slowly Changing Dimension Wizard

Additional Resources


Popularity: 5% [?]

About Tom

Christ follower, husband, father, technology and photography enthusiast. Attempting to live life out as a light in this world and stumbling at times in this fallen world. Got a topic you want to have me look into? Did I miss something in a post? Let me know. Just add a comment below.