Tech4Him – Technology with Integrity

A Christian technology chaos wrangler and his thoughts

SSWUG vConf – Doing More with Less by Automating SSIS

Posted by Tom On April - 24 - 2009

386963956_e774f147e9Presenter: John Welch
http://agilebi.com/cs/blogs/jwelch/Default.aspx

SSIS is a great tool for transferring data from one data source to another, and for implementing complex ETL processes. However, for simple, straightforward data transfer tasks, creating SSIS packages by hand can be time-consuming and repetitious. By attending this session, you’ll learn how to automate data transfers from a source to a target using SSIS. You’ll see the steps necessary to create both the control flow and the data flow portions of an SSIS package using the API and C# code. You’ll also learn about altering existing packages programmatically. Specific attention will be given to the generation of data flows with the proper source and destination components. We’ll take a look at some of the open source solutions available for this, plus some “roll your own” options.

Session Specific blog entry: http://agilebi.com/cs/blogs/jwelch/archive/2009/03/27/sswug-business-intelligence-virtual-conference-and-sds.aspx

Common Scenarios for SSIS

  • Doing a simple import to or export from a SQL Server table
  • Moving data from server to server on a regular basis
  • Performing complex ETL to transform data

Should You Automate?

  • Does the process follow a consistent pattern?
  • Do you need to perform the same operation against many tables?
  • Do you have lots of packages that look exactly the same?
  • If so, the answer is YES!

Automation Benefits

  • DRY (Don’t Repeat Yourself) [why haven't I heard this TLA before?]
    • Save Time and Money
    • Reduces the chance of introducing bugs due to human error
  • Can help enforce consistency in package design
  • Speeds up repetitive, autopilot tasks

All these tools below are a revelation to me. This rocks!

SSIS API

  • Benefits
    • Complete access to all the features of SSIS
    • You are in complete control
    • Control Flow manipulation is straightforward
  • Drawbacks
    • Steep learning curve, but more examples and resources being published.
    • Data Flow manipulation is complex due to COM layer
    • Requires a lot of code to perform simple tasks

EzAPI

  • Open source project from SSIS team
  • Benefits
    • Vastly reduced amount of code to create packages
    • Automates much of the column mapping process
  • Drawbacks
    • Hides some features of SSIS
    • Not under active development (Sounds like death to me)

Package Generation Sample

  • Open source project from SSIS team
  • Benefits
    • Command line interface
    • Good learning code
  • Drawbacks
    • Only supports SQL Server, Excel, and flat files

MDDE Studio

  • Open source project originally from Microsoft
  • Benefits
    • Template based approach
    • Works with 2005
  • Drawbacks
    • Not under active development
    • Only works with 2005

Project Vulcan

  • Open source project from Microsoft
  • Benefits
    • Uses XML files to drive package generation
    • Generates SSIS and SSAS objects
    • Active development team
  • Drawbacks
    • Very complex, little to no documentation

Recommended Approach

  • Simple Generation
    • Use EzAPI to minimize coding
    • Use the SSIS API for things EzAPI doesn’t expose
  • Industrial Strength Generation
    • Evaluate MDDE and Vulcan to determine which one is closest to your requirements
    • Customize from there

Where To Start

  • Control Flow is easiest to learn
  • Data Flow is considerably more complex
    • Start with simple column mappings
  • Review existing packages to get details
    • View XML for package
    • Load through the API and iterate through the package

Be Aware

  • Many features in BIDS are implemented in the BIDS UI, not the SSIS object model
    • flat file metadata detection, automatic column mapping, etc.
  • Script Tasks and Components are especially problematic
    • Pre-compiling scripts.
    • Don’t use if you don’t have to.

Considerations

  • Should you automate?
    • Not a great choice if packages are rarely the same twice
    • Can you get the same results from templates?
  • One time automation vs. recurring generation
    • Producing a “first pass” package can be a good accelerator
    • Recurring generation may limit flexibility, but can be more powerful

SSIS Automation Resources


Popularity: 17% [?]

Share and Enjoy:
  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • LinkedIn
  • PDF
  • RSS
  • Slashdot
  • StumbleUpon
  • Technorati
  • Twitter

Leave a Reply