Presenter: Erik Veerman
erik@solidq.com
In a Business Intelligence solution, the fact tables hold the core data that you are analyzing – facts (also called measures). Therefore, fact tables are a critical component to get right the first time. Poor fact table design will lead to poor performance and difficult calculations. This session dives into fact table and considers the basic column types, measure aggregation types, fact table types, and volume considerations.
Facts
- The fact itself
- The “measure” that is being tracked. The thing
- Quantity, count, amount, percent
- Most always numerical, continuous values
- e.g., price of a product, quantity sold, budget value, count of customers
- Facts (or measures) can be classified by…
- Numerical data type
- Aggregation type
- Additive nature
- Granularity – level of detail stored
- Fact tables
- Capture measures/facts
- Association with dimensions (using surrogate key as foreign key in fact table)
- No dimensions attributes!
- Some tracking information included
- Different types of fact tables
- Transactional – Additive facts tracking events over time (Star Schema)
- Most common type of fact table
- Track the occurrence of events, each detailed event is captured into a row in the fact table
- Measures are typically additive across all dimensions
- Common transactional fact table types
- Sales, Visits, Web-page hits, Account transactions
- Most common type of fact table
- Snapshot or inventory – Pictures in time of levels or balances
- Periodic
- Accumulating
- Known as inventory level fact tables
- Time dimension used to identify grain
- Non additive measures across time, but typically additive across all other dimensions
- Common transactional fact table types
- Inventory levels, Event booking levels, Chart of account balance levels
- Factless – Dimensionality relationships
- No measured facts!
- Are useful to describe events and coverage
- Information that something has or has not happened
- Often used to represent many-to-many relationships
- Contain only dimension keys
- Common factless fact tables:
- Class attendance, Event tracking, Coverage tables, Promotion or campaign facts
- No measured facts!
- Transactional – Additive facts tracking events over time (Star Schema)
- Fact Table Granularity
- Never mix the grain of the table!
- The level of detail of data contained in the fact table
- The description of a single instance (a record) of the fact table
- Typically includes a time level and a distinct combinations of other dimensions
- e.g. Daily item totals by product, by store, Weekly snapshot of store inventory by product
Measures – Additive Nature
- Additive: Facts that can be summed up/aggregated across all of the dimensions in the fact table
- e.g. discrete numerical measures of activity, i.e. quantity sold, dollars sold
- Semi-Additive: Facts that can be summed up for some of the dimensions in the fact table, but not the others
- e.g. numerical measures of intensity, i.e. account balance, inventory level, distinct counts
- Non-Additive: Facts that cannot be summed up for any of the dimensions present in the fact table.
- e.g. room temp
Aggregations
- Aggregation (Aggs): A summarization of base-level fact table records
- Common aggregation scenarios
- Category product by store by day
- District store by product by day
- Monthly sales by product by store
- Category product by store district by day
- Category product by store district by month
- Common aggregation scenarios
- Aggregations need to account for the additive nature of the measures
- Aggregations can be created on-the-fly or by the process of pre-aggregation
- Common aggregations
- Sum
- Count, Distinct Count
- Max, Min
- Average
- Semi-additive: Last Child, Last Non-empty Child
http://www.microsoft.com/fasttrack
SQL Server Fast Track Data Warehouse accelerates your data warehouse roadmap with new SQL Server 2008 Enterprise scalable reference architectures for HP, Dell and Bull. Reduce costs, save time and reduce risk with reliable, pre-tested hardware and best practices for warehousing. Read more.
Design with Additive in Mind
- Think dimensionally!
- Complex requirements don’t need to be designed with complex queries
- Many times new fact tables can be designed that can answer specific questions, such as date attributes and ranges
Getting Started
- Step 1: Identify high-value business process to model (orders, invoices, shipments, inventory)
- Confirm data source availability
- Understand value vs. complexity
- Step 2: Identify reporting grain of the business process
- The grains is the level of detail at which the data should be represented for analytics
- This may not be the same grain as the source!
- For snapshot facts, determine what time level will be captured for each snapshot (daily, weekly, monthly)
- Step 3: Identify dimensionality that will apply to each fact table
- Time, product, customer, store, etc.
- Some dimensions are not grain-identifying
- Validate the source can associate to the fact table
- Step 4: Identify measured facts that will populate fact table
- Validate the base measures are identifiable from the source
- Some measures may be derived
- Measure examples: product count, quantity sold, dollars sold, inventory quantity
- Identify business questions:
- How much total business did my newly remodeled stores do compared with the chain average?
- How did leather goods items costing less than $5 do with my most frequent shoppers?
- What was the revenue comparison of non-holiday weekend days to holiday weekend days?
- Analyze questions to assist design!
I really like these “Getting There” bullets. Erik does a good job summarize and giving you something practical to walk away with. Nice stuff.
Popularity: 3% [?]