Tech4Him – Technology with Integrity

A Christian technology chaos wrangler and his thoughts

SSWUG vConf – Dimensional Modeling 101

Posted by Tom On April - 23 - 2009

Presenter: Erik Veerman
erik@solidq.com

This session focuses on the basic design patterns for building relational database structures for Business Intelligence applications on SQL Server. By laying the foundation to dimensional modeling, this session provides an overview of dimension modeling theory and the justification of dimension modeling by reviewing the core structures involved in data marts and data warehouses and contrasting these design techniques to other types of systems. If you are new to Business Intelligence or feel you need a refresher on dimensional modeling, this session is for you.

We are not talking about transactional design., this would be a poor design for BI.

Data Warehousing

  • A relational database repository that contains current and historical enterprise-wide business data structured in a way that is optimized for data retrieval and enables business decisions.

Data Mart

R. Kimball -”a data mart is a flexible set of data, ideally based on the most atomic (granular) data possible to extract from operational source, and presented in a symmetric (dimensional) model that is resilient when faced with unexpected user queries”

  • “in its most simplistic form a data mart represent data from a single business process” Business process= purchase order, store inventory, etc”

OLAP = On-line Analytical Processing

A reporting system designed to allow different flexible analysis in real time, on-line, with data structures designed for fast retrieval, with redundancy included to support performance.

Note: “On-line” doesn’t indicate data from on-line systems, rather on-the-fly

Business Intelligence

  • Forrester definition: A process of transforming data into information and making it available to users in time to make a difference.
    • Focus on delivery of information to the user
    • May or may not be created at a corporate level

Don’t approach this with tunnel vision. Ensure you REALLY understand what the business user needs.

Three Type of Business Intelligence

  • Strategic Business Intelligence (Strategic Decisions)
    • Who: strategic leaders
    • What: formulate strategy and monitor corporate performance
    • Examples: Balance scorecard, Strategic Planning
  • Analytical Business Intelligence (They want to know Why. Love Excel)
    • Who: analysts, knowledge worker, controller
    • What: ad-hoc analysis
    • Examples: Financial and Sales Analysis, Customer Segmentation, Click stream analysis
  • Operational Business Intelligence (Operational, parameterized reports)
    • Who: operational managers
    • What: execution of strategy against objectives
    • Examples: Budgeting, Sales forecasting

Example Questions to be answered for various BI types.

  • Strategic Questions
    • Is the overall product margin meeting the planned targets?
    • Are we meeting our quarterly revenue objectives and what is the trend?
    • Are our vendor backorders affecting overall product sales?
  • Analysis Questions
    • What factors affect order processing time?
    • How did each product line (or product) contribute to a district’s profit (or store) last quarter (or month, or year)?
    • Which products have the lowest Gross Margin Return on Inventory (GMROI)?
  • Operational Questions
    • When did that order ship?
    • What was the revenue for a sales district last quarter?
    • What was the average inventory level for a product last year?

Strategy map. Visually see how an organization and KPI’s look.

What is Dimensional Modeling?

  • The process and outcome of designing logical database schemas created to support OLAP and Data Warehousing solutions

Not about Normal Forms. Short, squatty tables, not tall, wide tables.

Transactional vs. Analytical

  • Production/Transactional supports (HR and Financial Systems)
    • Granular transactions
    • Real time production systems
    • Current, changing data
  • Business Intelligence/Data Warehousing supports (current and historical data)
    • Summarized queries
    • Consistent, heterogeneous data
    • Voluminous, historical, stable data
  • Transactional and DW applications require different design and storage

Transactional (OLTP) is about speed, efficiency, normal forms, reduced redundancy.

AdventureWorks transactional example

AdeventureWorksDW for data warehouse example

Reporting Challenges with OLTP

  • Schema doesn’t clearly call out subjects, objects, events, states…
    • Difficult to prepare reports and analysis views
    • Requires multiple joins
    • Indexes not optimized for reporting
  • Models business process, not information (IN not OUT)
  • Levels show only current state, history is not tracked

Data Warehousing, The Solution

  • Schema designed with reporting and analysis in mind
  • With redundant data, specially prepared for analysis, we can do more:
    • Prepare data over time
    • Prepare aggregates
    • Add data from other sources, not only OLTP
    • Sales value shows much more if we know also market capacity and our market share

You will create some redundant data. That’s okay and part of the point. Easy reporting and aggregation

Excellent introduction by Erik. I think I can see why he would be a good SQL mentor on the subject.

Business Intelligence Architecture

Dimensional Modeling

  • Used by most contemporary BI solutions
    • “Right” mix of normalization and denormalization often called Dimensional Normalization
    • Some use for full data warehouse design
    • Others use for data mart designs
  • Consists of two primary types of tables
    • Dimension tables
    • Fact tables

Getting the “right” mix of normalization and denormalization can be tricky and can be more art than science.

Dimensional Vs. Transactional

Dimensional Normalization Transactional Normalization
Logical design technique that presents data in an intuitive way allowing high-performance access Logical design technique to eliminate data redundancy, to keep data consistency, and storage efficiency
Targets decision support information Makes transactions simple and deterministic
Focused on easy user navigation and high performance design ER models for enterprise are usually complex often containing hundreds, or even thousands, of entities/tables

Dimension Tables

  • Contain attributes related to business entities
    • Customers, vendors, employees
    • Products, materials, even invoices (attributes!)
    • Dates and sometimes time (hours, minutes, etc.)
  • Often employ surrogate keys (newly created primary key of the table typically)
    • Defined within the dimensional model
    • Not the same as source system primary, alternate, or business keys
    • Usually an identity integer
  • Not uncommon to have many, many columns (60,70,80 are okay)

Fact Tables

(The numbers that describe to quantitive data. Also called Measures)

  • Contain numbers and other business metrics
    • Define the basic measures users want to analyze
    • Numbers are then aggregated according to related dimensions
    • (quantities, prices, counts)
  • Fact tables contain dimension keys
    • Defines relationship between measures and dimensions using surrogate keys
  • Typically narrow tables, but often very large (mostly numeric)

Star Schema Design

Entity diagram of the DW schema

  • Fact table holds measures for events, levels and states
    • Provides the relationship between dimension table
    • Highly normalized structure
  • Dimension tables track attributes such as subjects and objects
    • Star Schema dimension tables all connect directly to one or more fact tables
    • Star Schema dimension tables are highly denormalized to reduce joins

Snowflake Schema Design

  • Snowflake schema has normalized dimensions
    • Cascading hierarchy of tables for a single dimension with several 1-M relationships
    • More complicated schema
    • Allows dimension to be used in fact tables with different grains
    • Often easier management of attributes

SQL Server 2008 can optimize joins between dimension and fact tables.

Why Dimensional Modeling?

  • Logical model is easy to understand
  • Optimized for performance
  • Historical tracking of information


Popularity: 4% [?]

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • LinkedIn
  • PDF
  • RSS
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter

Leave a Reply