Data Warehousing - SMU

Data Warehousing - SMU

Data Warehousing Tanvi Madgavkar CSE 7330 FALL 2009 What is Data Warehousing? Ralph Kimball states that : A data warehouse is a copy of transaction data specifically structured for query and analysis. What is Data Warehousing? Bill Inmon states that : A warehouse is a subject-oriented, integrated,

time-variant and non-volatile collection of data in support of management's decision making process. Advantages of Data Warehousing A data warehouse provides a common data model for all data of interest regardless of the data's source. Prior to loading data into the data warehouse, inconsistencies are identified and resolved.

The information in the warehouse can be stored safely for extended periods of time. OLTP It is a short for On Line Transaction Processing. OLTP refers to a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and information retrieval. It is characterized by a large number of short on- line transactions.

The main emphasis for OLTP systems is put on very fast query processing in multi-access environments. OLAP It is a short for On Line Analytical Processing. OLAP is an approach to quickly answer multi- dimensional analytical queries.

The term OLAP was created as a slight modification of the traditional database term OLTP. It is characterized by relatively low volume of transactions. OLAP v/s OLTP In general, OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it. OLTP OLAP Source of data

OLTPs are the original source of data Data comes from various OLTP databases Purpose of data To run fundamental transaction related tasks

To help with planning and decision support Queries Standardized and simple queries Complex queries involving Aggregation Processing

Speed Very Fast Depends on the amount of data involved Space Requirements Relatively small Larger due to existence

of historical data Types of OLAP Multidimensional OLAP - MOLAP This is the more traditional way of OLAP analysis. In MOLAP, data is not stored in the relational database but in a multidimensional cube. Relational OLAP - ROLAP It works directly with relational databases, the base data is stored as relational tables and new tables are created to hold the aggregated information.

Hybrid OLAP - HOLAP HOLAP attempt to combine the advantages of MOLAP and ROLAP. Here, a database will divide data between relational to hold the larger quantities of detailed data and specialized storage for smaller quantities of less-detailed data. OLAP Process Steps in OLAP creation process: OLAP cube OLAPs are designed to give an overview

analysis of what happened. Hence the data storage has to be set up differently. OLAP cubes also called a multidimensional cube or a hypercube data models. and are created from OLAP cubes are not strictly cuboids - it is the name given to the process of linking data from the different dimensions.

There can be number of cubes, developed along units of dimensions or a giant cube can be formed with all the dimensions. The OLAP cube is present at the core of any OLAP system and consists of number of tables arranged in a particular schema. The cube metadata is typically created from either a star schema or snowflake schema of

tables in a relational database. Star Schema The most common method is called the star design and it is called so, because it resembles a star in shape. The star schema also known as star join schema is the simplest style of data warehouse schema. The star schema consists of a few fact tables, normally possibly only one, justifying the name

referencing number of dimension tables. Model of Star Schema Create Table FACT1 (time_key INTEGER, item_key INTEGER, branch_key INTEGER, Location_key INTEGER, PRIMARY KEY (time_key)) Create Table TIME (time_key INTEGER, day VARCHAR(10),

month VARCHAR(10), year VARCHAR(10), day_of_work VARCHAR(10), quarter VARCHAR(10), FOREIGN KEY time_key REFERENCES FACT1) Create Table BRANCH (time_key INTEGER, branch_key INTEGER, branch_name VARCHAR(10), branch_type VARCHAR(10), FOREIGN KEY time_key REFERENCES FACT1) Advantages:

Simplest DW schema. Easy to understand. Easy to Navigate between the tables due to less number of joins. Most suitable for Query processing. Disadvantages: Occupies more space. Highly Denormalized. Snowflake Schema A snowflake schema is a logical arrangement of

tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. It is closely related to star schema as it is just a variation of it. The only difference being that dimensions are normalized into multiple related tables in a snowflake schema whereas the star schema's dimensions are denormalized with each dimension being represented by a single table. Model of Snowflake Schema

Create Table FACT1 (time_key INTEGER, item_key INTEGER, branch_key INTEGER, Location_key INTEGER, PRIMARY KEY (time_key))) Create Table ITEM(time_key INTEGER, item_key INTEGER, item_name VARCHAR(10), brand VARCHAR(10), type VARCHAR(10) , supplier_type VARCHAR(10) FOREIGN KEY time_key REFERENCES FACT1)

Create table SUPPLIER (time_key integer, supplier_key integer, supplier_type integer) FOREIGN KEY time_key REFERENCES FACT1) Create Table FACT1 (time_key INTEGER, item_key INTEGER, branch_key INTEGER, Location_key INTEGER, PRIMARY KEY (time_key)))

Create Table LOCATION(time_key INTEGER, location_key INTEGER, street VARCHAR (10), city VARCHAR(10), KEY(location_key) FOREIGN KEY time_key REFERENCES FACT1) PRIMARY Create table CITY (location_key INTEGER, city_key INTEGER, country VARCHAR (10), city VARCHAR (10),

state VARCHAR (10))s FOREIGN KEY location_key REFERENCES LOCATION) Advantages: These tables are easier to maintain. Saves the storage space. Disadvantages: Due to large number of joins it is complex to navigate.

Star v/s Snowflake Star schema is a better option to choose from users point of view. This schema exposes users to the underlying table structures and also the queries are simpler in nature. It is more likely to be used when the data warehouse is large. Snowflake schema are often better with more sophisticated query tools and smaller data warehouse. Even though its maintenance is relatively easy, it is based on environments having numerous queries with complex criteria and hence

more query execution time. Questions? Bibliography W.H. Inmon. What is a Data Warehouse?, Prism, Volume 1, Number 1, 1995. Ralph Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. Jun Yang. WareHouse Information Prototype at Stanford. C. Caldeira. "Data Warehousing Concepts and Models". RainMaker DataWarehousing. OLAP_vs_OLTP.pdf, http://www.rainmaker Data Warehousing: A look at Business Intelligence and Data Warehouse, dataware housing/ molap-rolap.html Hari Mailvaganam. Data Warehousing Review Introduction to OLAP, /Introduction_OLAP.html Mri Sonam. What is the difference between star schema and snow flake schema?, Wikipedia, The Free Encyclopedia. Data Warehouse, OLAP, OLTP, Star Schema, Snowflake Schema,

Recently Viewed Presentations

  • Designing an Instruction Set - Computer Science

    Designing an Instruction Set - Computer Science

    Looping the Flow Open Issues in our Simple Model The Stored-Program Computer Anatomy of a von Neumann Computer Instruction Set Architecture (ISA) MIPS Programming Model a representative simple RISC machine Some MIPs Memory Nits MIPS Register Nits MIPS Instruction Formats...
  • ODP Deaf Services for Independent Monitors for Quality

    ODP Deaf Services for Independent Monitors for Quality

    ODP expects all Providers to provide services that comply with all Pennsylvania regulations. ODP also expects that you will modify or arrange the environment in which you provide services so that the environment is safe for people who are deaf...
  • A Short History of Hip Surgery for Children

    A Short History of Hip Surgery for Children

    The purpose of this study was to evaluate if ASGT results in asymmetrical apical vertebral body growth and scoliosis correction. Methods. A retrospective review of patients treated with ASGT between 2011 and 2014 was conducted.
  • D Line Station Plan Overview Hamilton Manor June

    D Line Station Plan Overview Hamilton Manor June

    The Region's Local Bus Corridors. Gold Line LPA refinement in progress. Future Rapid Bus System. 11 improved corridors. Nearly 500,000 jobs served. $400-500 million network
  • Characteristics:


    Group A streptococci cause skin and soft tissue infections, such as cellulitis, erysipelas, necrotizing fasciitis (streptococcal gangrene)which is the extensive and rapidly spreading necrosis of skin and subcutaneous tissue. Impetigo, a form of pyoderma, is a superficial skin infection characterized...
  • DURING READING STRATEGIES Question: What should I do

    DURING READING STRATEGIES Question: What should I do

    DURING READING STRATEGIES ... Paraphrasing may be similar to summarizing, but it is NOT the same thing- with summarizing you only tell the main ideas/important information. Paraphrasing is more detailed than summarizing, and often lengthier- you must put the entire...
  • Uso de Equipos de Asistencia Tecnolgica en el

    Uso de Equipos de Asistencia Tecnolgica en el

    Acceso a Computadoras. TextAloud - Este programa permite convertir el texto electrónico en audio.. AT&T Natural . Voices - Español (Rosa y Alberto). Balabolka - Programa que permite escuchar los documentos y puedes convertirlo en audio.
  • Bandwidth Requirements for GPU Architectures

    Bandwidth Requirements for GPU Architectures

    Heterogeneous NoC. WDM allows multiple bits to be sent in a single beam of light by encoding them into different wavelengths. The different wavelengths are picked up by these ring resonators which are sensitive to only one specific wavelength. We...