Tech4Him – Technology with Integrity

A Christian technology chaos wrangler and his thoughts

SSWUG vConf – Dimension Table Design 101

Posted by Tom On April - 23 - 2009

2247481690_1976f16fdePresenter: Erik Veerman
erik@solidq.com

Dimension Tables are one of the core components in a dimensional design and it is critical to design your dimension tables correctly in order to lay a solid foundation to a Business Intelligence system. This session dives into the dimension design techniques and considers the core components of a dimension table, surrogate keys, attributes, and hierarchies. In addition, we will consider advanced hierarchy types such as parent-child hierarchies, snowflake designs and unbalanced hierarchies. Finally, we will consider the best practices in tracking changes historically.

This session was intense so I just could not keep up with the slides. Here are just notes apart from the slides once I fell behind. Erik presents excellently and at a level appropriate for a 101 session without being condescending. Great job!

Dimensions (review from Dimensional Modeling 101 Session)

Dimensions – Qualitative
Facts (Measures) – Quantitative

  • Dimensions (Qualitative information)
    • Business perspective from which data is looked upon
    • Collection of text attributes that are highly correlated
    • e.g. Product, Store, Time, Manager of Store
  • Conformity (How Dimensions relate to each other)
    • Shared with multiple fact relationships
    • Provides data correlation
  • Attributes (many times is related to a table column, not always)
    • Descriptive characteristics of an entity
    • Building blocks of dimensions, describe each instance
    • Usually text fields, with discrete values
    • e.g., the flavor of a product, the size of a product
  • Hierarchies
    • drill-paths within the dimension
    • Allows top-down analysis
    • Natural groupings of data relationships
    • e.g., date hierarchy is most common. Year to qtr to month to day for example)
  • Dimension Keys
    • Surrogate Keys (Primary key, DW created)
    • Candidate Business Keys/Alternate key (Source system keys, uniqueness)
  • Dimension Granularity
    • Granularity in general is the level of detail of data contained in an entity (lowest level detail)
    • A dimensions granularity is the lowest level object which uniquely identifies a member
    • Typically the identifying name of a dimension

Name surrogate key with SK_
Dimension naming DIM_
Fact naming FACT_

http://www.codeplex.com/MSFTDBProdSamples to download the AdventureWorksDW example.

Hierarchies

ALL level is usually the top level

All -> Region -> Country -> City -> Office

Seeing one we could use would related to the various Bible translation project locations currently under way. Need to keep in mind project sensitivities.

A dimension can have multiple hierarchies.

Standard Hierarchies

  • All levels have values.

Ragged Hierarchies

  • Missing members at the mid-levels.
    • All -> Country -> State -> City and then you might be missing a state is the country is Israel since there are not states there.

Unbalanced Hierarchies

  • Multiple grains in the hierarchy
  • May not always be able to drill down to the lowest detail level since the branch may stop short. (Think Org Chart)

Demo: SSMS connected to a SSAS Cube

Product – Standard hierarchy

Sales Territory – example of unbalanced hierarchy

Employee – another example of an unbalanced hierarchy

Dimension Keys

Business Keys

  • Column(s) identify the unique instance of a business record form the source system.
  • You may have multiple records with the same Business Key. This is right as it allows historical tracking.
  • Used in the process that ties fact records with dimension members
    • Business key is used to find the right surrogate key reference

Surrogate Keys

    • Defined the dimension’s primary key
    • Usually an integer
      • Important to pick the right column width.
      • 2,4,8 byte
        • SmallInt – 2 bytes go for up to around 10k records
        • BigInt – 8 bytes, for REALLY large dimensions
    • Consolidate multi-value business keys
    • Allows tracking of dimension history
    • Standardizes dimension tables
      • All are structured in the same way
        • Business Key
        • Surrogate key

Don’t put attributes and business keys in your fact table.

Design Practices

  • Avoid smart keys
  • Avoid production keys
  • production may decide to reuse keys
  • the company may acquire a competitor and thereby change the key building rules changed record, but deliberately not changed key

Types of Dimensions

Different dimensions types should be used in different scenarios.

Basic Dimension Types

  • Standard Star dimension
    • Single table
    • Usually date
  • Snowflake
    • More than one table in a cascading
  • Parent Child (unbalanced hierarchy)
    • Self referencing surrogate to parent surrogate key relationship
    • Self referencing business key relationship
    • e.g. Org Chart, Chart of Accounts

Advanced Types

  • Degenerate (Seldom used. Be careful)
    • Dimensions business key with no corresponding dimension table
    • Are embedded in the fact table
    • Typically use only when your Dim record count is about the same as the fact table record count.
    • Usually in line item oriented tables like sales tables
  • Profile or Junk Dimensions
    • Convenient grouping of flags and attributes to get them out of the fact table into a useful dimensional framework.
    • Good for one-off lookups. Put them into a single dimension rather than a bunch of small dimensions
      • True/False type attributes
  • Role Playing Dimension
    • A single dimension used for multiple purposes
    • Typical example is the date dimension or the geography dimension (outrigger)
  • Time Dimension
    • Multiple calendars
      • Fiscal
      • Natural
    • Usually a single table

Tracking History

Industry best practices.

Changing Dimensions

  • Slowly changing Dimensions
    • o – No change
      • Birthdate
    • 1 – Not interested in history (Updating a row/record)
    • 2 – Slow changes. Adds new row/record
    • 3 – Fast changes. Adds new column
  • Rapidly changing Dimensions
    • Large dimensions
      • Limit type 2’s

Good example graphical slides of Type 1 versus Type 2

Getting There

  • Understand dimension hierarchies and drill-paths
  • Confirm historically tracked attribute req’s.
    • Don’t be afraid to push back a little
  • Check source system data integrity, cleanliness
  • Review current reports


Popularity: 6% [?]

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • LinkedIn
  • PDF
  • RSS
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter

Leave a Reply