<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Tech4Him - Technology with Integrity &#187; modeling</title>
	<atom:link href="http://blog.tech4him.com/tags/modeling/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.tech4him.com</link>
	<description>A Christian technology chaos wrangler and his thoughts</description>
	<lastBuildDate>Wed, 24 Mar 2010 23:15:39 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>SSWUG vConf &#8211; Intro to SQL Server Analysis Services</title>
		<link>http://blog.tech4him.com/2009/04/sswug-vconf-intro-to-sql-server-analysis-services/</link>
		<comments>http://blog.tech4him.com/2009/04/sswug-vconf-intro-to-sql-server-analysis-services/#comments</comments>
		<pubDate>Fri, 24 Apr 2009 15:19:58 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[modeling]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sql server 2008]]></category>
		<category><![CDATA[SSAS]]></category>

		<guid isPermaLink="false">http://blog.tech4him.com/?p=552</guid>
		<description><![CDATA[Presenter: Brian Knight
bknight@pragmaticworks.com
Is your customer looking for drag and drop reports or capabilities inside of Excel. SQL Server Analysis Services (SSAS) is the answer for you then. You&#8217;ll be amazed how quickly you can develop sophisticated reports after watching the basics of this session.
SSAS is its own server. It is not part of the SQL [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/jef_safi/312795799/" target="_blank"><img class="alignleft size-medium wp-image-555" style="border: 0pt none; margin: 10px;" title="312795799_42d968acec" src="http://blog.tech4him.com/wp-content/uploads/312795799_42d968acec-300x300.jpg" alt="312795799_42d968acec" width="300" height="300" /></a>Presenter: Brian Knight<br />
<a href="mailto:bknight@pragmaticworks.com">bknight@pragmaticworks.com</a></p>
<p>Is your customer looking for drag and drop reports or capabilities inside of Excel. SQL Server Analysis Services (SSAS) is the answer for you then. You&#8217;ll be amazed how quickly you can develop sophisticated reports after watching the basics of this session.<span id="more-552"></span></p>
<p>SSAS is its own server. It is not part of the SQL Server service.</p>
<ul type="disc">
<li>IIS</li>
<li>XMLA      between server and client</li>
</ul>
<p>MDX &#8211; SQL Language for Cubes<br />
BIDS &#8211; Building the Cubes<br />
SSMS &#8211; Managing SSAS</p>

<h3>Analysis Services</h3>
<p>What are cubes?</p>
<ul type="disc">
<li>What      are measures? (<a href="/2009/04/sswug-vconf-dimensional-modeling-101/">See      Erik Veerman Dimensional modeling session</a>)
<ul type="circle">
<li>The       value you are measuring, e.g. the count of users with blue eyes, from WA       state, etc.</li>
<li>The       question you are trying to ask</li>
</ul>
</li>
<li>What      are measure groups?
<ul type="circle">
<li>Groupings       of these measures within your cube</li>
<li>Typically       define them by your business problem (e.g. Sales, HR, etc.)</li>
</ul>
</li>
<li>Cubes      are a grouping of measure groups.</li>
</ul>
<h3>Analysis Services Dimensions</h3>
<p>What are dimensions and hierarchies? (<a href="/2009/04/sswug-vconf-dimensional-modeling-101/">See Erik Veerman Dimensional modeling session</a>)</p>
<ul type="disc">
<li>What      is a dimension?
<ul type="circle">
<li>It       is what you want to categorize or pivot against</li>
<li>e.g.       How many users are from WA state?</li>
<li>The       dimension is geography, &#8220;WA&#8221; is the member</li>
</ul>
</li>
<li>â€¢What      is a hierarchy?
<ul type="circle">
<li>How       do you want to organize the members of your dimension</li>
<li>e.g.       Geography has numerous levels such as country, state, and city</li>
<li>Hierarchy       is the organization of those levels such as City -&gt;State -&gt;Country</li>
</ul>
</li>
</ul>
<p>Concrete examples like these are helpful for my learning and understanding. Thanks!</p>
<p><a rel="attachment wp-att-549" href="http://blog.tech4him.com/2009/04/sswug-vconf-intro-to-sql-server-analysis-services/introtossascube/"><img class="aligncenter size-medium wp-image-549" title="SSAS Cube" src="http://blog.tech4him.com/wp-content/uploads/introtossascube-300x224.png" alt="SSAS Cube" width="300" height="224" /></a></p>
<p><a rel="attachment wp-att-550" href="http://blog.tech4him.com/2009/04/sswug-vconf-intro-to-sql-server-analysis-services/introtossasdimmodel/"><img class="aligncenter size-medium wp-image-550" title="Dimensional Model" src="http://blog.tech4him.com/wp-content/uploads/introtossasdimmodel-300x225.png" alt="Dimensional Model" width="300" height="225" /></a></p>
<p><a rel="attachment wp-att-551" href="http://blog.tech4him.com/2009/04/sswug-vconf-intro-to-sql-server-analysis-services/introtossasmeasuregroup/"><img class="aligncenter size-medium wp-image-551" title="Measure Group" src="http://blog.tech4him.com/wp-content/uploads/introtossasmeasuregroup-300x224.png" alt="Measure Group" width="300" height="224" /></a></p>
<h3>User Interfaces</h3>
<ul type="disc">
<li>Query      language is MDX</li>
<li>BIDS      for developers</li>
<li>Excel      for most users is adequate or Reporting Services</li>
<li>SharePoint      for web users</li>
<li>3rdparty      applications like Proclarity or Panorama</li>
<li>Controls      you can purchase and build into your own apps</li>
</ul>
<p>Create User defined hierarchies.</p>
<ul>
<li>Remember that in 2008 the hierarchy can be thought of as reversed.</li>
<li>Looks graphically like date -&gt; month -&gt; qtr -&gt; year
<ul>
<li>Means: to get to a date you must go through year, qtr, month, date</li>
</ul>
</li>
</ul>
<p>Deal with problematic keys.</p>
<p>Most end users will use Excel. 2007 provides additional features.</p>
<img src="http://blog.tech4him.com/?ak_action=api_record_view&id=552&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://blog.tech4him.com/2009/04/sswug-vconf-intro-to-sql-server-analysis-services/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSWUG vConf &#8211; Loading a Data Warehouse in SSIS</title>
		<link>http://blog.tech4him.com/2009/04/sswug-vconf-loading-a-data-warehouse-in-ssis/</link>
		<comments>http://blog.tech4him.com/2009/04/sswug-vconf-loading-a-data-warehouse-in-ssis/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 20:56:27 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[data warehouse]]></category>
		<category><![CDATA[modeling]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[SSIS]]></category>
		<category><![CDATA[sswug]]></category>

		<guid isPermaLink="false">http://blog.tech4him.com/?p=537</guid>
		<description><![CDATA[Presenter: Brian Knight
bknight@pragmaticworks.com
Owner, Pragmatic Works
In this session, you&#8217;ll learn how to load a typical data warehouse in SSIS efficiently. You&#8217;ll start by seeing some of the strengths and weaknesses of the Slowly Changing Dimension (SCD) Wizard in SSIS and how you can get around some of the weaknesses including your own home-brewed solution. You&#8217;ll then [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/brento/2089748072/" target="_blank"><img class="alignleft size-medium wp-image-539" style="border: 0pt none; margin: 10px;" title="2089748072_b60a211f97" src="http://blog.tech4him.com/wp-content/uploads/2089748072_b60a211f97-225x300.jpg" alt="2089748072_b60a211f97" width="225" height="300" /></a>Presenter: Brian Knight<br />
<a href="mailto:bknight@pragmaticworks.com">bknight@pragmaticworks.com<br />
</a>Owner, Pragmatic Works</p>
<p>In this session, you&#8217;ll learn how to load a typical data warehouse in SSIS efficiently. You&#8217;ll start by seeing some of the strengths and weaknesses of the Slowly Changing Dimension (SCD) Wizard in SSIS and how you can get around some of the weaknesses including your own home-brewed solution. You&#8217;ll then see how to load a fact table using SSIS and how to make the common components scale.<span id="more-537"></span></p>
<h3>Dimensional Modeling (<a href="/2009/04/sswug-vconf-dimensional-modeling-101/">See Dimensional Modeling 101 Notes</a>)</h3>
<ul type="disc">
<li>Data      Separated into fact and dimension tables</li>
<li>Dimension      tables answer the pivot or where clause
<ul type="circle">
<li>Make       as wide and descriptive as possible</li>
<li>Surrogate       keys operate as unique ID for each row</li>
<li>Keep       surrogate keys as small as possible</li>
</ul>
</li>
<li>Fact      tables answer the what questions or select statement
<ul type="circle">
<li>Intersect       all dimension tables</li>
<li>Surrogate       keys from each dimension in this table</li>
<li>Measures       are the &#8220;what&#8221; like Price, Quantity, Duration</li>
</ul>
</li>
</ul>
<p> The challenge is how you move the data from the OLTP (Relational) DB into the Data Warehouse.</p>
<p>Discusses SCD (Slowly Changing Data) Dimension types (<a href="/2009/04/sswug-vconf-dimension-table-design-101/">Already here</a>)</p>
<p>Don&#8217;t make everything Type two or your DB will bloat significantly. Also your reports would be more difficult to write.</p>
<p>Probably want to fix NULL values to be something. Use a Derived Column transform. Makes Nulls to be something like 0, Unknown, etc&#8230;</p>
<p>SCD Wizard Strengths</p>
<ul type="disc">
<li>SSIS      transform that creates many other transforms conditionally</li>
<li>Reduces      design time of SCD load by 80%-90% to minutes per dimension</li>
<li>Can be      customized easily</li>
<li>Compares      differences between source and destination to find changes and new records</li>
<li>Outputs:</li>
<li>Type      0,1,2 update</li>
<li>Inferred      members</li>
<li>New      rows</li>
<li>Duplicate      rows</li>
</ul>
<p>Historical Attribute Options &#8211; How do you want to set the expiration of historical records.</p>
<p>Problem with SCD Wizard is that any time you go back and change the configuration, all the output logic below it gets re-written. You lose what you created.</p>
<h3>SCD Wizard Weaknesses</h3>
<ul type="disc">
<li>Scalability      -Generally up to about 50,000 records into the transform but varies based      on number of updates</li>
<li>Maintainability      -After you customize, rerunning the wizard recreates all the transforms</li>
<li>Uses      OLE DB Command transforms for updates is row-level. Creates scalability      issue here if lots of updates.</li>
</ul>
<h3>Making Your Own SCD Wizard</h3>
<ul type="disc">
<li>Can      use a Merge Join or Lookup Transform
<ul type="circle">
<li>If       no match found, it is an insert (Ignore Errors)</li>
</ul>
</li>
<li>Lookup      Transform will scale better than Merge Join but lacks parameterization</li>
<li>Add a      Conditional Split transform after Lookup to direct to insert, duplicate or      update path</li>
</ul>
<h3>Additional Scalability</h3>
<ul type="disc">
<li>Watch      your Lookup Transformation for scalability issues (don&#8217;t cache too much!)
<ul type="circle">
<li>Potentially       cache only the last 1 years worth of data with Partial Caching</li>
<li>Only       cache columns needed</li>
</ul>
</li>
<li>Additional      scalability can be reached by landing updates into a staging table
<ul type="circle">
<li>Then       set-based update with an Execute SQL task.</li>
</ul>
</li>
<li>Checksum      Transform can be used to detect changes across many columns
<ul type="circle">
<li>Or       HASHBYTES T-SQL statement</li>
</ul>
</li>
</ul>
<h3>Inferred Members</h3>
<ul type="disc">
<li>Created      during the fact load
<ul type="circle">
<li>A       new Dim record is created using the value of &#8220;unknown&#8221; or NULL as a       placeholder</li>
<li>The       record is flagged as an inferred member</li>
</ul>
</li>
</ul>
<h3>Slowly Changing Dimension Wizard</h3>
<ul type="disc">
<li>SSIS      transform that creates many other transforms conditionally</li>
<li>Handles:</li>
<li>Type 0      (fixed attribute)</li>
<li>Type 1      (changing attribute)</li>
<li>Type 2      (historical attribute)</li>
<li>Inferred      members</li>
<li>Typically      can address 80% of the business scenarios</li>
</ul>
<h3>SCD Wizard Strengths</h3>
<ul type="disc">
<li>SSIS      transform that creates many other transforms conditionally</li>
<li>Reduces      design time of SCD load by 80%-90% to minutes per dimension</li>
<li>Can be      customized easily</li>
<li>Compares      differences between source and destination to find changes and new records</li>
<li>Outputs:</li>
<li>Type      0,1,2 update</li>
<li>Inferred      members</li>
<li>New      rows</li>
<li>Duplicate      rows</li>
</ul>
<h3>SCD Wizard Weaknesses</h3>
<ul type="disc">
<li>Scalability      -Generally up to about 50,000 records into the transform but varies based      on number of updates</li>
<li>Maintainability      -After you customize, rerunning the wizard recreates all the transforms</li>
<li>Uses      OLE DB Command transforms for updates is row-level. Creates scalability      issue here if lots of updates.</li>
</ul>
<p>Lookup Transform &#8211; Lookup source against target dimension table. Select ALL available lookup fields and alias at TARGET_ so you can match against them. Link by primary key. Ignore the failure of matches.</p>
<p>Then use a conditional split &#8211; If target PK is NULL it is a new record, else Update.</p>
<p>&#8220;Pretty much a pain in the butt to write.&#8221; &#8211; Brian Knight in reference to the else UPDATE piece from the conditional split.</p>
<p>Ugh&#8230;no transcript and this info is not in the slides. ARGH! Can&#8217;t remember it all.</p>
<p>Brian uses the Checksum source vs. Checksum destination to deal with the UPDATE referred to above, but not guaranteed to be unique.</p>
<p>Hashbytes on the other hand is more unique if you wish. But it does not work on numeric fields. Need to cast them.</p>
<h3>Fact Table Loads</h3>
<ul type="disc">
<li>Series      of Lookup Transforms
<ul type="circle">
<li>In       Type 2 Dimensions add WHERE EndDate IS NOT NULL</li>
</ul>
</li>
<li>Measures      created Derived Column Transforms</li>
<li>Aggregate      transform to roll up the grain</li>
<li>Lookup      failure would create an inferred member or set to unknown</li>
</ul>
<img src="http://blog.tech4him.com/?ak_action=api_record_view&id=537&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://blog.tech4him.com/2009/04/sswug-vconf-loading-a-data-warehouse-in-ssis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSWUG vConf &#8211; Fact Table Design 101</title>
		<link>http://blog.tech4him.com/2009/04/sswug-vconf-fact-table-design-101/</link>
		<comments>http://blog.tech4him.com/2009/04/sswug-vconf-fact-table-design-101/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 16:24:21 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[modeling]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sql server]]></category>
		<category><![CDATA[sswug]]></category>

		<guid isPermaLink="false">http://blog.tech4him.com/?p=505</guid>
		<description><![CDATA[Presenter: Erik Veerman
erik@solidq.com
In a Business Intelligence solution, the fact tables hold the core data that you are analyzing &#8211; facts (also called measures). Therefore, fact tables are a critical component to get right the first time. Poor fact table design will lead to poor performance and difficult calculations. This session dives into fact table and [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/filipx/2580276686/" target="_blank"><img class="alignright size-medium wp-image-506" style="border: 0pt none; margin: 10px;" title="2580276686_9e502d4b5e" src="http://blog.tech4him.com/wp-content/uploads/2580276686_9e502d4b5e-300x225.jpg" alt="2580276686_9e502d4b5e" width="300" height="225" /></a>Presenter: Erik Veerman<br />
<a href="mailto:erik@solidq.com">erik@solidq.com</a></p>
<p>In a Business Intelligence solution, the fact tables hold the core data that you are analyzing &#8211; facts (also called measures). Therefore, fact tables are a critical component to get right the first time. Poor fact table design will lead to poor performance and difficult calculations. This session dives into fact table and considers the basic column types, measure aggregation types, fact table types, and volume considerations.<span id="more-505"></span></p>
<h3>Facts</h3>
<ul type="disc">
<li>The      fact itself
<ul type="circle">
<li>The       &#8220;measure&#8221; that is being tracked. The thing</li>
<li>Quantity,       count, amount, percent</li>
<li>Most       always numerical, continuous values
<ul type="square">
<li>e.g.,        price of a product, quantity sold, budget value, count of customers</li>
</ul>
</li>
</ul>
</li>
<li>Facts      (or measures) can be classified by&#8230;
<ul type="circle">
<li>Numerical       data type</li>
<li>Aggregation       type</li>
<li>Additive       nature</li>
<li>Granularity       &#8211; level of detail stored</li>
</ul>
</li>
<li>Fact      tables
<ul type="circle">
<li>Capture       measures/facts</li>
<li>Association       with dimensions (using surrogate key as foreign key in fact table)
<ul type="square">
<li>No        dimensions attributes!</li>
</ul>
</li>
<li>Some       tracking information included</li>
</ul>
</li>
<li>Different      types of fact tables
<ul type="circle">
<li>Transactional       &#8211; Additive facts tracking events over time (Star Schema)
<ul type="square">
<li>Most        common type of fact table
<ul type="disc">
<li>Track         the occurrence of events, each detailed event is captured into a row in         the fact table</li>
<li>Measures         are typically additive across all dimensions</li>
<li>Common         transactional fact table types
<ul type="circle">
<li>Sales,          Visits, Web-page hits, Account transactions</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Snapshot       or inventory &#8211; Pictures in time of levels or balances
<ul type="square">
<li>Periodic</li>
<li>Accumulating</li>
<li>Known        as inventory level fact tables
<ul type="disc">
<li>Time         dimension used to identify grain</li>
<li>Non         additive measures across time, but typically additive across all other         dimensions</li>
<li>Common         transactional fact table types
<ul type="circle">
<li>Inventory          levels, Event booking levels, Chart of account balance levels</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Factless       &#8211; Dimensionality relationships
<ul type="square">
<li>No        measured facts!
<ul type="disc">
<li>Are         useful to describe events and coverage</li>
<li>Information         that something has or has not happened
<ul type="circle">
<li>Often          used to represent many-to-many relationships</li>
<li>Contain          only dimension keys</li>
<li>Common          factless fact tables:</li>
</ul>
</li>
<li>Class         attendance, Event tracking, Coverage tables, Promotion or campaign         facts</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li>Fact      Table Granularity
<ul type="circle">
<li>Never       mix the grain of the table!</li>
<li>The       level of detail of data contained in the fact table</li>
<li>The       description of a <strong><span style="text-decoration: underline;">single       instance</span></strong> (a record) of the fact table</li>
<li>Typically       includes a time level and a distinct combinations of other dimensions
<ul type="square">
<li>e.g.        Daily item totals by product, by store, Weekly snapshot of store        inventory by product</li>
</ul>
</li>
</ul>
</li>
</ul>
<p> Maybe include ETL load date/time in the fact table.</p>
<h3>Measures &#8211; Additive Nature</h3>
<ul type="disc">
<li>Additive:      Facts that can be summed up/aggregated across <strong><span style="text-decoration: underline;">all</span></strong> of the dimensions in the fact table
<ul type="circle">
<li>e.g.       discrete numerical measures of activity, i.e. quantity sold, dollars sold</li>
</ul>
</li>
<li>Semi-Additive:      Facts that can be summed up for <strong><span style="text-decoration: underline;">some</span></strong> of the dimensions in the fact table, but not the others
<ul type="circle">
<li>e.g.       numerical measures of intensity, i.e. account balance, inventory level,       distinct counts</li>
</ul>
</li>
<li>Non-Additive:      Facts that <strong><span style="text-decoration: underline;">cannot</span></strong> be      summed up <strong><span style="text-decoration: underline;">for any</span></strong> of the      dimensions present in the fact table.
<ul type="circle">
<li>e.g.       room temp</li>
</ul>
</li>
</ul>
<h3>Aggregations</h3>
<ul type="disc">
<li>Aggregation      (Aggs): A summarization of base-level fact table records
<ul type="circle">
<li>Common       aggregation scenarios
<ul type="square">
<li>Category        product by store by day</li>
<li>District        store by product by day</li>
<li>Monthly        sales by product by store</li>
<li>Category        product by store district by day</li>
<li>Category        product by store district by month</li>
</ul>
</li>
</ul>
</li>
<li>Aggregations      need to account for the additive nature of the measures
<ul type="circle">
<li>Aggregations       can be created on-the-fly or by the process of pre-aggregation</li>
<li>Common       aggregations
<ul type="square">
<li>Sum</li>
<li>Count,        Distinct Count</li>
<li>Max,        Min</li>
<li>Average</li>
<li>Semi-additive:        Last Child, Last Non-empty Child</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><a href="http://www.microsoft.com/fasttrack">http://www.microsoft.com/fasttrack</a><br />
SQL Server Fast Track Data Warehouse accelerates your data warehouse roadmap with new SQL Server 2008 Enterprise scalable reference architectures for HP, Dell and Bull. Reduce costs, save time and reduce risk with reliable, pre-tested hardware and best practices for warehousing. Read more.</p>
<h3>Design with Additive in Mind</h3>
<ul type="disc">
<li>Think      dimensionally!</li>
<li>Complex      requirements don&#8217;t need to be designed with complex queries</li>
<li>Many      times new fact tables can be designed that can answer specific questions,      such as date attributes and ranges</li>
</ul>
<h3>Getting Started</h3>
<ul type="disc">
<li>Step      1: Identify high-value business process to model (orders, invoices,      shipments, inventory)
<ul type="circle">
<li>Confirm       data source availability</li>
<li>Understand       value vs. complexity</li>
</ul>
</li>
<li>Step      2: Identify reporting grain of the business process
<ul type="circle">
<li>The       grains is the level of detail at which the data should be represented for       analytics</li>
<li>This       may not be the same grain as the source!</li>
<li>For       snapshot facts, determine what time level will be captured for each       snapshot (daily, weekly, monthly)</li>
</ul>
</li>
<li>Step      3: Identify dimensionality that will apply to each fact table
<ul type="circle">
<li>Time,       product, customer, store, etc.</li>
<li>Some       dimensions are not grain-identifying</li>
<li>Validate       the source can associate to the fact table</li>
</ul>
</li>
<li>Step      4: Identify measured facts that will populate fact table
<ul type="circle">
<li>Validate       the base measures are identifiable from the source</li>
<li>Some       measures may be derived</li>
<li>Measure       examples: product count, quantity sold, dollars sold, inventory quantity</li>
</ul>
</li>
<li>Identify      business questions:
<ul type="circle">
<li>How       much total business did my newly remodeled stores do compared with the       chain average?</li>
<li>How       did leather goods items costing less than $5 do with my most frequent       shoppers?</li>
<li>What       was the revenue comparison of non-holiday weekend days to holiday weekend       days?</li>
</ul>
</li>
<li>Analyze      questions to assist design!</li>
</ul>
<p>I really like these &#8220;Getting There&#8221; bullets. Erik does a good job summarize and giving you something practical to walk away with. Nice stuff.</p>
<img src="http://blog.tech4him.com/?ak_action=api_record_view&id=505&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://blog.tech4him.com/2009/04/sswug-vconf-fact-table-design-101/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSWUG vConf &#8211; Dimension Table Design 101</title>
		<link>http://blog.tech4him.com/2009/04/sswug-vconf-dimension-table-design-101/</link>
		<comments>http://blog.tech4him.com/2009/04/sswug-vconf-dimension-table-design-101/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 15:28:22 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[modeling]]></category>
		<category><![CDATA[sql server]]></category>
		<category><![CDATA[SSAS]]></category>
		<category><![CDATA[sswug]]></category>

		<guid isPermaLink="false">http://blog.tech4him.com/?p=500</guid>
		<description><![CDATA[Presenter: Erik Veerman
erik@solidq.com
Dimension Tables are one of the core components in a dimensional design and it is critical to design your dimension tables correctly in order to lay a solid foundation to a Business Intelligence system. This session dives into the dimension design techniques and considers the core components of a dimension table, surrogate keys, [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.flickr.com/photos/wilkersonfurniture/2247481690/" target="_blank"><img class="alignleft size-medium wp-image-501" style="border: 0pt none; margin: 10px;" title="2247481690_1976f16fde" src="http://blog.tech4him.com/wp-content/uploads/2247481690_1976f16fde-300x203.jpg" alt="2247481690_1976f16fde" width="300" height="203" /></a>Presenter: Erik Veerman<br />
<a href="mailto:erik@solidq.com">erik@solidq.com</a></p>
<p>Dimension Tables are one of the core components in a dimensional design and it is critical to design your dimension tables correctly in order to lay a solid foundation to a Business Intelligence system. This session dives into the dimension design techniques and considers the core components of a dimension table, surrogate keys, attributes, and hierarchies. In addition, we will consider advanced hierarchy types such as parent-child hierarchies, snowflake designs and unbalanced hierarchies. Finally, we will consider the best practices in tracking changes historically.<span id="more-500"></span></p>
<p>This session was intense so I just could not keep up with the slides. Here are just notes apart from the slides once I fell behind. Erik presents excellently and at a level appropriate for a 101 session without being condescending. Great job!</p>
<h3>Dimensions (review from Dimensional Modeling 101 Session)</h3>
<p>Dimensions &#8211; Qualitative<br />
Facts (Measures) &#8211; Quantitative</p>
<ul type="disc">
<li>Dimensions      (Qualitative information)
<ul type="circle">
<li>Business       perspective from which data is looked upon</li>
<li>Collection       of text attributes that are highly correlated</li>
<li>e.g.       Product, Store, Time, Manager of Store</li>
</ul>
</li>
<li>Conformity      (How Dimensions relate to each other)
<ul type="circle">
<li>Shared       with multiple fact relationships</li>
<li>Provides       data correlation</li>
</ul>
</li>
<li>Attributes      (many times is related to a table column, not always)
<ul type="circle">
<li>Descriptive       characteristics of an entity</li>
<li>Building       blocks of dimensions, describe each instance</li>
<li>Usually       text fields, with discrete values</li>
<li>e.g.,       the flavor of a product, the size of a product</li>
</ul>
</li>
<li>Hierarchies
<ul type="circle">
<li>drill-paths       within the dimension</li>
<li>Allows       top-down analysis</li>
<li>Natural       groupings of data relationships</li>
<li>e.g.,       date hierarchy is most common. Year to qtr to month to day for example)</li>
</ul>
</li>
<li>Dimension      Keys
<ul type="circle">
<li>Surrogate       Keys (Primary key, DW created)</li>
<li>Candidate       Business Keys/Alternate key (Source system keys, uniqueness)</li>
</ul>
</li>
<li>Dimension      Granularity
<ul type="circle">
<li>Granularity       in general is the level of detail of data contained in an entity (lowest       level detail)</li>
<li>A       dimensions granularity is the lowest level object which uniquely       identifies a member</li>
<li>Typically       the identifying name of a dimension</li>
</ul>
</li>
</ul>

<p>Name surrogate key with SK_<br />
Dimension naming DIM_<br />
Fact naming FACT_</p>
<p><a href="http://www.codeplex.com/MSFTDBProdSamples">http://www.codeplex.com/MSFTDBProdSamples</a> to download the AdventureWorksDW example.</p>
<h3>Hierarchies</h3>
<p>ALL level is usually the top level</p>
<p>All -&gt; Region -&gt; Country -&gt; City -&gt; Office</p>
<p>Seeing one we could use would related to the various Bible translation project locations currently under way. Need to keep in mind project sensitivities.</p>
<p>A dimension can have multiple hierarchies.</p>
<p><em>Standard Hierarchies</em></p>
<ul type="disc">
<li>All      levels have values.</li>
</ul>
<p><em>Ragged Hierarchies </em></p>
<ul type="disc">
<li>Missing      members at the mid-levels.
<ul type="circle">
<li>All       -&gt; Country -&gt; State -&gt; City and then you might be missing a       state is the country is Israel       since there are not states there.</li>
</ul>
</li>
</ul>
<p><em>Unbalanced Hierarchies</em></p>
<ul type="disc">
<li>Multiple      <em><span style="text-decoration: underline;">grains</span></em> in the hierarchy</li>
<li>May      not always be able to drill down to the lowest detail level since the      branch may stop short. (Think Org Chart)</li>
</ul>
<p>Demo: SSMS connected to a SSAS Cube</p>
<p>Product &#8211; Standard hierarchy</p>
<p>Sales  Territory &#8211; example of unbalanced hierarchy</p>
<p>Employee &#8211; another example of an unbalanced hierarchy</p>
<h3>Dimension Keys</h3>
<p>Business Keys</p>
<ul type="disc">
<li>Column(s)      identify the unique instance of a business record form the source system.</li>
<li>You      may have multiple records with the same Business Key. This is right as it      allows historical tracking.</li>
<li>Used in      the process that ties fact records with dimension members
<ul type="circle">
<li>Business       key is used to find the right surrogate key reference</li>
</ul>
</li>
</ul>
<p>Surrogate Keys</p>
<ul type="disc">
<li>
<ul type="circle">
<li>Defined       the dimension&#8217;s primary key</li>
<li>Usually       an integer
<ul type="disc">
<li>Important        to pick the right column width.</li>
<li>2,4,8        byte
<ul type="disc">
<li>SmallInt         &#8211; 2 bytes go for up to around 10k records</li>
<li>BigInt         &#8211; 8 bytes, for REALLY large dimensions</li>
</ul>
</li>
</ul>
</li>
<li>Consolidate       multi-value business keys</li>
<li>Allows       tracking of dimension history</li>
<li>Standardizes       dimension tables
<ul type="disc">
<li>All        are structured in the same way
<ul type="disc">
<li>Business         Key</li>
<li>Surrogate         key</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Don&#8217;t put attributes and business keys in your fact table.</p>
<p>Design Practices</p>
<ul type="disc">
<li>Avoid      smart keys</li>
<li>Avoid      production keys</li>
<li>production      may decide to reuse keys</li>
<li>the      company may acquire a competitor and thereby change the key building rules      changed record, but deliberately not changed key</li>
</ul>
<h3>Types of Dimensions</h3>
<p>Different dimensions types should be used in different scenarios.</p>
<p>Basic Dimension Types</p>
<ul type="disc">
<li>Standard      Star dimension
<ul type="circle">
<li>Single       table</li>
<li>Usually       date</li>
</ul>
</li>
<li>Snowflake
<ul type="circle">
<li>More       than one table in a cascading</li>
</ul>
</li>
<li>Parent      Child (unbalanced hierarchy)
<ul type="circle">
<li>Self       referencing surrogate to parent surrogate key relationship</li>
<li>Self       referencing business key relationship</li>
<li>e.g.       Org Chart, Chart of Accounts</li>
</ul>
</li>
</ul>
<p>Advanced Types</p>
<ul type="disc">
<li>Degenerate      (Seldom used. Be careful)
<ul type="circle">
<li>Dimensions       business key with no corresponding dimension table</li>
<li>Are       embedded in the fact table</li>
<li>Typically       use only when your Dim record count is about the same as the fact table       record count.</li>
<li>Usually       in line item oriented tables like sales tables</li>
</ul>
</li>
<li>Profile      or Junk Dimensions
<ul type="circle">
<li>Convenient       grouping of flags and attributes to get them out of the fact table into a       useful dimensional framework.</li>
<li>Good       for one-off lookups. Put them into a single dimension rather than a bunch       of small dimensions
<ul type="square">
<li>True/False        type attributes</li>
</ul>
</li>
</ul>
</li>
<li>Role      Playing Dimension
<ul type="circle">
<li>A single       dimension used for multiple purposes</li>
<li>Typical       example is the date dimension or the geography dimension (outrigger)</li>
</ul>
</li>
<li>Time      Dimension
<ul type="circle">
<li>Multiple       calendars
<ul type="square">
<li>Fiscal</li>
<li>Natural</li>
</ul>
</li>
<li>Usually       a single table</li>
</ul>
</li>
</ul>
<h3>Tracking History</h3>
<p>Industry best practices.</p>
<p>Changing Dimensions</p>
<ul class="unIndentedList">
<li> Slowly changing Dimensions
<ul>
<li>o &#8211; No change
<ul>
<li>Birthdate</li>
</ul>
</li>
<li>1 &#8211; Not interested in history (Updating a row/record)</li>
<li>2 &#8211; Slow changes. Adds new row/record</li>
<li>3 &#8211; Fast changes. Adds new column</li>
</ul>
</li>
</ul>
<ul class="unIndentedList">
<li> Rapidly changing Dimensions
<ul>
<li> Large dimensions
<ul>
<li>Limit type 2&#8217;s</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Good example graphical slides of Type 1 versus Type 2</p>
<p>Getting There</p>
<ul type="disc">
<li>Understand      dimension hierarchies and drill-paths</li>
<li>Confirm      historically tracked attribute req&#8217;s.
<ul type="circle">
<li>Don&#8217;t       be afraid to push back a little</li>
</ul>
</li>
<li>Check      source system data integrity, cleanliness</li>
<li>Review      current reports</li>
</ul>
<img src="http://blog.tech4him.com/?ak_action=api_record_view&id=500&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://blog.tech4him.com/2009/04/sswug-vconf-dimension-table-design-101/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SSWUG vConf &#8211; Dimensional Modeling 101</title>
		<link>http://blog.tech4him.com/2009/04/sswug-vconf-dimensional-modeling-101/</link>
		<comments>http://blog.tech4him.com/2009/04/sswug-vconf-dimensional-modeling-101/#comments</comments>
		<pubDate>Thu, 23 Apr 2009 14:07:36 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[business intelligence]]></category>
		<category><![CDATA[modeling]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[sswug]]></category>

		<guid isPermaLink="false">http://blog.tech4him.com/?p=495</guid>
		<description><![CDATA[Presenter: Erik Veerman
erik@solidq.com
This session focuses on the basic design patterns for building relational database structures for Business Intelligence applications on SQL Server. By laying the foundation to dimensional modeling, this session provides an overview of dimension modeling theory and the justification of dimension modeling by reviewing the core structures involved in data marts and data [...]]]></description>
			<content:encoded><![CDATA[<p>Presenter: Erik Veerman<a href="mailto:erik@solidq.com"><br />
erik@solidq.com</a></p>
<p>This session focuses on the basic design patterns for building relational database structures for Business Intelligence applications on SQL Server. By laying the foundation to dimensional modeling, this session provides an overview of dimension modeling theory and the justification of dimension modeling by reviewing the core structures involved in data marts and data warehouses and contrasting these design techniques to other types of systems. If you are new to Business Intelligence or feel you need a refresher on dimensional modeling, this session is for you.<span id="more-495"></span></p>
<p>We are not talking about transactional design., this would be a poor design for BI.</p>
<h3>Data Warehousing</h3>
<ul type="disc">
<li>A <span style="text-decoration: underline;">relational      database</span> repository that contains <span style="text-decoration: underline;">current and historical</span> <span style="text-decoration: underline;">enterprise-wide      business data</span> structured in a way that is optimized for <span style="text-decoration: underline;">data      retrieval</span> and <span style="text-decoration: underline;">enables business decisions</span>.</li>
</ul>
<h3>Data Mart</h3>
<p>R. Kimball -&#8221;a data mart is a flexible set of data, ideally based on the most atomic (granular) data possible to extract from operational source, and presented in a symmetric (dimensional) model that is resilient when faced with unexpected user queries&#8221;</p>
<ul>
<li>&#8220;in its most simplistic form a data mart represent data from a single business process&#8221; Business process= purchase order, store inventory, etc&#8221;</li>
</ul>
<h3>OLAP = On-line Analytical Processing</h3>
<p>A reporting system designed to allow different flexible analysis in real time, on-line, with data structures designed for fast retrieval, with redundancy included to support performance.</p>
<p>Note: &#8220;On-line&#8221; doesn&#8217;t indicate data from on-line systems, rather on-the-fly</p>
<h3>Business Intelligence</h3>
<ul type="disc">
<li>Forrester      definition: A process of transforming data into information and making it      available to users <strong><em>in time to make a difference.</em></strong>
<ul type="circle">
<li>Focus       on delivery of information to the user</li>
<li>May       or may not be created at a corporate level</li>
</ul>
</li>
</ul>
<p>Don&#8217;t approach this with tunnel vision. Ensure you REALLY understand what the business user needs.</p>
<h3>Three Type of Business Intelligence</h3>
<ul type="disc">
<li>Strategic      Business Intelligence (Strategic Decisions)
<ul type="circle">
<li>Who:       strategic leaders</li>
<li>What:       formulate strategy and monitor corporate performance</li>
<li>Examples:       Balance scorecard, Strategic Planning</li>
</ul>
</li>
<li>Analytical      Business Intelligence (They want to know Why. Love Excel)
<ul type="circle">
<li>Who:       analysts, knowledge worker, controller</li>
<li>What:       ad-hoc analysis</li>
<li>Examples:       Financial and Sales Analysis, Customer Segmentation, Click stream analysis</li>
</ul>
</li>
<li>Operational      Business Intelligence (Operational, parameterized reports)
<ul type="circle">
<li>Who:       operational managers</li>
<li>What:       execution of strategy against objectives</li>
<li>Examples:       Budgeting, Sales forecasting</li>
</ul>
</li>
</ul>
<p>Example Questions to be answered for various BI types.</p>
<ul type="disc">
<li>Strategic      Questions
<ul type="circle">
<li>Is       the overall product margin meeting the planned targets?</li>
<li>Are       we meeting our quarterly revenue objectives and what is the trend?</li>
<li>Are       our vendor backorders affecting overall product sales?</li>
</ul>
</li>
<li>Analysis      Questions
<ul type="circle">
<li>What       factors affect order processing time?</li>
<li>How       did each product line (or product) contribute to a district&#8217;s profit (or       store) last quarter (or month, or year)?</li>
<li>Which       products have the lowest Gross Margin Return on Inventory (GMROI)?</li>
</ul>
</li>
<li>Operational      Questions
<ul type="circle">
<li>When       did that order ship?</li>
<li>What       was the revenue for a sales district last quarter?</li>
<li>What       was the average inventory level for a product last year?</li>
</ul>
</li>
</ul>
<p>Strategy map. Visually see how an organization and KPI&#8217;s look.</p>
<h3>What is Dimensional Modeling?</h3>
<ul type="disc">
<li>The      process and outcome of designing logical database schemas created to      support OLAP and Data Warehousing solutions</li>
</ul>
<p>Not about Normal Forms. Short, squatty tables, not tall, wide tables.</p>
<h3>Transactional vs. Analytical</h3>
<ul type="disc">
<li>Production/Transactional      supports (HR and Financial Systems)
<ul type="circle">
<li>Granular       transactions</li>
<li>Real       time production systems</li>
<li>Current,       changing data</li>
</ul>
</li>
<li>Business      Intelligence/Data Warehousing supports (current and historical data)
<ul type="circle">
<li>Summarized       queries</li>
<li>Consistent,       heterogeneous data</li>
<li>Voluminous,       historical, stable data</li>
</ul>
</li>
<li>Transactional      and DW applications require different design and storage</li>
</ul>
<p>Transactional (OLTP) is about speed, efficiency, normal forms, reduced redundancy.</p>
<p>AdventureWorks transactional example</p>
<p>AdeventureWorksDW for data warehouse example</p>
<h3>Reporting Challenges with OLTP</h3>
<ul type="disc">
<li>Schema      doesn&#8217;t clearly call out subjects, objects, events, states&#8230;
<ul type="circle">
<li>Difficult       to prepare reports and analysis views</li>
<li>Requires       multiple joins</li>
<li>Indexes       not optimized for reporting</li>
</ul>
</li>
<li>Models      business process, not information (IN not OUT)</li>
<li>Levels      show only current state, history is not tracked</li>
</ul>
<h3>Data Warehousing, The Solution</h3>
<ul type="disc">
<li>Schema      designed with reporting and analysis in mind</li>
<li>With      redundant data, specially prepared for analysis, we can do more:
<ul type="circle">
<li>Prepare       data over time</li>
<li>Prepare       aggregates</li>
<li>Add       data from other sources, not only OLTP</li>
<li>Sales       value shows much more if we know also market capacity and our market       share</li>
</ul>
</li>
</ul>
<p>You will create some redundant data. That&#8217;s okay and part of the point. Easy reporting and aggregation</p>
<p>Excellent introduction by Erik. I think I can see why he would be a good SQL mentor on the subject.</p>
<p style="text-align: center;"><a href="http://blog.tech4him.com/wp-content/uploads/biarchitecture.png"><img class="size-medium wp-image-496 aligncenter" style="border: 0pt none; margin: 10px;" title="Business Intelligence Architecture" src="http://blog.tech4him.com/wp-content/uploads/biarchitecture-300x208.png" alt="Business Intelligence Architecture" width="300" height="208" /></a></p>
<h3>Dimensional Modeling</h3>
<ul class="unIndentedList">
<li> Used by most contemporary BI solutions
<ul>
<li> &#8220;Right&#8221; mix of normalization and denormalization often called Dimensional Normalization</li>
<li> Some use for full data warehouse design</li>
<li> Others use for data mart designs</li>
</ul>
</li>
<li> Consists of two primary types of tables
<ul>
<li> Dimension tables</li>
<li> Fact tables</li>
</ul>
</li>
</ul>
<p>Getting the &#8220;right&#8221; mix of normalization and denormalization can be tricky and can be more art than science.</p>
<h3 style="text-align: center;">Dimensional Vs. Transactional</h3>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="295" valign="top"><strong>Dimensional Normalization</strong></td>
<td width="295" valign="top"><strong>Transactional Normalization</strong></td>
</tr>
<tr>
<td width="295" valign="top">Logical design technique that presents data in an   intuitive way allowing high-performance access</td>
<td width="295" valign="top">Logical design technique to eliminate data redundancy, to   keep data consistency, and storage efficiency</td>
</tr>
<tr>
<td width="295" valign="top">Targets decision support information</td>
<td width="295" valign="top">Makes transactions simple and deterministic</td>
</tr>
<tr>
<td width="295" valign="top">Focused on easy user navigation and high performance design</td>
<td width="295" valign="top">ER models for enterprise are usually complex often containing hundreds, or even thousands, of entities/tables</td>
</tr>
</tbody>
</table>
<h3>Dimension Tables</h3>
<ul class="unIndentedList">
<li> Contain attributes related to business entities
<ul>
<li> Customers, vendors, employees</li>
<li> Products, materials, even invoices (attributes!)</li>
<li> Dates and sometimes time (hours, minutes, etc.)</li>
</ul>
</li>
<li> Often employ surrogate keys (newly created primary key of the table typically)
<ul>
<li> Defined within the dimensional model</li>
<li> <strong>Not the same as source system primary, alternate, or business keys</strong></li>
<li> Usually an identity integer</li>
</ul>
</li>
<li> Not uncommon to have many, many columns (60,70,80 are okay)</li>
</ul>
<h3>Fact Tables</h3>
<p>(The numbers that describe to quantitive data. Also called Measures)</p>
<ul type="disc">
<li>Contain      numbers and other business metrics
<ul type="circle">
<li>Define       the basic measures users want to analyze</li>
<li>Numbers       are then aggregated according to related dimensions</li>
<li>(quantities,       prices, counts)</li>
</ul>
</li>
<li>Fact      tables contain dimension keys
<ul type="circle">
<li>Defines       relationship between measures and dimensions using surrogate keys</li>
</ul>
</li>
<li>Typically      narrow tables, but often very large (mostly numeric)</li>
</ul>
<h3>Star Schema Design</h3>
<p>Entity diagram of the DW schema</p>
<ul type="disc">
<li>Fact      table holds measures for events, levels and states
<ul type="circle">
<li>Provides       the relationship between dimension table</li>
<li>Highly       normalized structure</li>
</ul>
</li>
<li>Dimension      tables track attributes such as subjects and objects
<ul type="circle">
<li>Star       Schema dimension tables all connect directly to one or more fact tables</li>
<li>Star       Schema dimension tables are highly denormalized to reduce joins</li>
</ul>
</li>
</ul>
<h3>Snowflake Schema Design</h3>
<ul type="disc">
<li>Snowflake      schema has normalized dimensions
<ul type="circle">
<li>Cascading       hierarchy of tables for a single dimension with several 1-M relationships</li>
<li>More       complicated schema</li>
<li>Allows       dimension to be used in fact tables with different grains</li>
<li>Often       easier management of attributes</li>
</ul>
</li>
</ul>
<p>SQL Server 2008 can optimize joins between dimension and fact tables.</p>
<h3>Why Dimensional Modeling?</h3>
<ul type="disc">
<li>Logical      model is easy to understand</li>
<li>Optimized      for performance</li>
<li>Historical      tracking of information</li>
</ul>
<img src="http://blog.tech4him.com/?ak_action=api_record_view&id=495&type=feed" alt="" />]]></content:encoded>
			<wfw:commentRss>http://blog.tech4him.com/2009/04/sswug-vconf-dimensional-modeling-101/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
