Data Warehousing Fundamentals

Request info

Data Warehousing Training

Summary:A successful data warehousing project relies on a well-designed dimensional model that meets the organisations’ reporting requirements. If you are involved with designing a data warehouse from scratch or need to maintain an existing data warehouse, then understanding the dimensional modelling design process is critical. After discussing data warehousing terms on the first day, this course explores dimensional modelling using Ralph Kimball’s methodology. Delegates will also have the opportunity of participating in a data warehousing case study on the final day.
This dimensional modelling techniques course is designed to answer questions, such as the following:

  • What is data warehousing?
  • What is a data mart?
  • What are the data modelling options?
  • What is Extract, Transform and Load (ETL)?
  • What are the terms and concepts specific to data warehousing and OLAP design?
  • What are Slowly Changing Dimensions (SCDs)?
  • How do we approach the dimensional modelling design process?
  • What are star schemas vs. snowflakes?
  • What are the different fact and dimension table types?
  • How to plan and implement a data warehouse with high availability, simplified manageability and optimal performance
  • What are common statistics, analytic and OLAP SQL queries?
Objectives:Upon completion of this course, the participant should be able to design a data warehouse using Ralph Kimball’s dimensional modelling to create both star and snowflake schemas. And the delegate should understand the implication of such terms as cubes, dimensions, attributes, joins, hierarchies, measures, etc.
Audience:Would-be data warehouse architects, IT developers, database administrators or anyone responsible for a data warehouse or related discipline.
Prerequisites:At least six (6) months in a relational database environment.
Format:Lecture, discussion and case study

Core Topics

1. DATA WAREHOUSING OVERVIEW
  • What is Business Intelligence (BI)?
  • What is a Data Warehouse?
  • What is a Data Mart?
  • Operational vs. Historical Data
  • How do OLTP & DW Differ?
  • What is ER vs. Dimensional Modelling?
  • What is a Star Schema?
  • What is a Snowflake Schema?
  • Dimensional Modelling Life Cycle
  • What is Extract, Transform, Load (ETL)?
  • What is Metadata?
  • What Are Materialized Views (MVs)?
  • What is Online Analytic Processing (OLAP)?
  • What is Data Mining?
  • What is an Operational Data Store (ODS)?
  • ODS vs. DW
  • Advantages of an ODS
  • When to Consider a Separate Datastore
2. DIMENSIONAL MODELLING DESIGN CHARACTERISTICS
  • Data Warehouse Objects and Relationships
  • Some Fact Table Characteristics
  • Grain of a Fact Table
  • Four Fact Table Types
  • Some Dimension Table Characteristics
  • Slowly Changing Dimensions (SCD)
  • Surrogate Keys
  • Mini-dimensions
  • Outriggers
  • Role Playing Dimensions
  • Junk/Garbage Dimensions
  • Degenerate Dimensions
  • What are Dimension Hierarchies?
  • Dimensional Modelling Design Characteristics
3. DATA ANALYSIS & REPORTING CONCEPTS
  • BI User Types
  • Query & Reporting Concepts
  • BI Reporting Client Server Design
  • Multidimensional Analysis
  • Multidimensional Analysis Examples, e.g., Slice, Dice, Pivoting
4. OLAP & CUBE CONCEPTS
  • What is OLAP?
  • OLAP & Cubes
  • OLAP Architecture
  • OLAP Tools
5. DATA MODELLING PARADIGMS
  • A Data Warehouse per Bill Inmon vs. Ralph Kimball
  • Top-down vs. bottom-up
  • Hybrid Design Approach
  • Four Phases of Dimensional Modelling
  • Dimensional Model Types
  • Data Warehouse Architecture Alternatives
  • ODS within DW Architecture
  • What is Agile Data Warehousing?
  • Big Data & the Data Warehouse
6. DIMENSIONAL MODELLING – ESTABLISH BUSINESS PROCESSES
  • Enterprise Business Process List
  • Prioritising the Business Process
  • Identify Conformed Dimensions/Facts
  • Establish Data Sources
  • Requirements Gathering
  • Business Process Analysis Report
7. DIMENSIONAL MODELLING – DECLARE THE GRAIN
  • Fact Table Granularity
  • Multiple Granularities – How to Handle
  • Fact Table Types
8. DEVELOPING THE DIMENSION TABLES
  • Steps for Identifying the Dimensions
  • Dimensions Per Fact (Grain)
  • Degenerate Dimension Design
  • Surrogate Key Considerations
  • Dimension Table: Attributes & Hierarchies
  • Determine Dimension Table Granularities
  • Dimension Hierarchy Design
  • Date Dimension and Granularity
  • Role-Playing Dimensions
  • Slowly Changing Dimension Activities
  • ‘Fast’ Changing Dimensions
  • When to Consider Snowflaking
  • Garbage/Junk Dimensions
  • Bridge Table for Multi-Valued Dimension
  • Bridge Table for Unbalanced Hierarchy
9. IDENTIFIYING THE FACT TABLE(S)
  • How to Infer Facts/Fact Tables from an ER Model
  • Fact Table Design Procedures and Phases
  • Four Fact Table Types
  • Grain of a Fact Table
  • Fact Table Sizing Concepts
10. DESIGN VERIFICATION
  • Dimensional Model Verification
11. METADATA CONSIDERATIONS
  • Dimension Table Metadata
  • Fact Table Metadata
  • Physical Design Metadata
  • Metadata Example
12. E/R MODEL TO DIMENSIONAL MODEL CONVERSION
  • ER Model to Dimensional Model Steps
  • Converting ER Model to Dimensional Model Example
13. PHYSICAL DESIGN CONSIDERATIONS
  • Aggregations (AVG, SUM, etc.)
  • Indexing Considerations
  • Partitioning Considerations
14. ETL CONCEPTS
  • ETL Overview
  • ETL vs. ELT
  • Kimball’s 38 ETL Subsystem
  • Data Cleansing Techniques
  • Data Transformation
  • ETL Staging Considerations
  • Source-To-Destination Mapping Considerations
15. CASE STUDY
  • Project definition and scoping
  • Specify the requirements
  • Specify the grain (e.g., fact table types)
  • Specify the dimensions (e.g., handling slowly changing dimensions)
  • Specify the facts (e.g., conformed facts)

 

Duration: 3 Days

Cost: £1400 + VAT

SELECT wp_posts.*, wp_p2p.* FROM wp_posts INNER JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id ) INNER JOIN wp_p2p WHERE 1=1 AND ( ( wp_postmeta.meta_key = 'start_date' AND CAST(wp_postmeta.meta_value AS DATE) >= '2025-01-13' ) ) AND ((wp_posts.post_type = 'schedule' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled'))) AND (wp_p2p.p2p_type = 'schedule_to_courses' AND wp_posts.ID = wp_p2p.p2p_from AND wp_p2p.p2p_to IN (SELECT wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.ID IN (8898) AND ((wp_posts.post_type = 'courses' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled'))) ORDER BY wp_posts.post_date DESC )) GROUP BY wp_posts.ID ORDER BY CAST(wp_postmeta.meta_value AS DATE) ASC