FIT5094 Chapter Notes - Chapter 10: Third Normal Form, Star Schema, Data Integration

58 views5 pages
Data Staging?
Data staging is essentially to save your data in a pre-load phase, so you can do all sorts of
manipulations before using it.
What is the difference between data stage and data warehousing?
Data warehousing incorporates data stores and conceptual, logical, and physical models to
support business goals and end-user information needs. A data warehouse (DW) is the
foundation for a successful BI program.
Data stage is the one of the ETL tool used in data warehousing. In data ware housing ETL is
the common process to extract, transformation and load the data to data ware house. in
market many tools are available to this. Data stage is one of the ETL tool from IBM which is
quite famous in market. Other tools are like Abinitio, Informatica, Talend, pentaho etc.
Why do we need staging area during ETL Load?
Few reasons why you can’t avoid a staging area:
Source systems are only available for extraction during a specific time slot
which is generally lesser than your overall data loading time. It’s a good idea
to extract and keep things at your end before you lose the connection to the
source systems.
You want to extract data based on some conditions which require you to join
two or more different systems together. E.g. you want to only extract those
customers who also exist in some other system. You will not be able to
perform an SQL query joining two tables from two physically different
databases.
Various source systems have different allotted timing for data extraction.
Data warehouse’s data loading frequency does not match with the refresh
frequencies of the source systems.
Extracted data from the same set of source systems are going to be used in
multiple places (data warehouse loading, ODS loading, third-party
applications etc.)
ETL process involves complex data transformations that require extra space to
temporarily stage the data.
There is specific data reconciliation / debugging requirement which warrants
the use of staging area for pre, during or post load data validations.
Clearly staging area gives lot flexibility during data loading.
Staging area increases latency that is the time required for a change in the source
system to take effect in the data warehouse. In lot of real time / near real time
applications, staging area is rather avoided.
Data in the staging area occupies extra space.
Data warehouse Architectures:
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in
Enterprise data warehouse
Independent data mart
Dependent data mart
Federated or ‘Bus’ architecture
1) Enterprise DW: Generic, single central data source that feeds information into decision support and
various analytic applications taking care of data management issues and facilitating the core
functionality of BI applications to focus on decision support. It follows 3NF ER approach but with
customised user views to simulate dimensional ‘experience’.
The big advantage of this enterprise arrangement is that it maximises the benefits of integration –
collect data coming from disparate sources, integrate them in a unified fashion for the users to do their
analytics.
The downside of the enterprise data warehouse approach is that because it is inherently large(in size)
and scope, it is inherently complex as well. The more source systems you get the data from, it adds on
to the complexity. Also, the more number of user groups it serves, the more complex it becomes to
handle. Complexity translates to large budgets, large timeline, large project teams and increased risks
which inevitably leads to high failure rates in projects using the data warehouse. As a result, people/
organisation turns to smaller scale, alternative solution instead.
Moreover, with the enterprise data warehouse, it takes a long time to actually deliver any decision
support to the end users.
The focus with the enterprise data warehouse in projects usually tends to be the problems of data
sourcing, fixing up data quality and management problem, ignoring the very requirement of delivering
business decision support to the end users.
//big bang approach…..complex….does not deliver benefits quickly(months and years)……….fail!!!
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Data staging is essentially to save your data in a pre-load phase, so you can do all sorts of manipulations before using it. Data warehousing incorporates data stores and conceptual, logical, and physical models to support business goals and end-user information needs. A data warehouse (dw) is the foundation for a successful bi program. Data stage is the one of the etl tool used in data warehousing. In data ware housing etl is the common process to extract, transformation and load the data to data ware house. in market many tools are available to this. Data stage is one of the etl tool from ibm which is quite famous in market. Other tools are like abinitio, informatica, talend, pentaho etc. Few reasons why you can"t avoid a staging area: Source systems are only available for extraction during a specific time slot which is generally lesser than your overall data loading time.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents