FIT5195 Study Guide - Final Guide: First-Order Logic, Predicate Variable, Query Language

155 views5 pages
1) Big Data; Data Warehouse
Big Data is a collection of huge amount of data that requires special database management
systems to analyse and take out useful insights from it. Analysis & insights from this data is
considered as Big Data Analytics.
A data warehouse is a centralised repository of organisational data sourced from a number of
original data sources such as internal information systems, external data feeds and other
supplementary data as required. The purpose is to support organisational decision-making by
providing an integrated data source for business intelligence and other decision support
applications.
When we compare a big data to a data warehouse, we find that a big data solution is a
technology and that data warehousing is an architecture. They are two very different things. A
technology is just that – a means to store and manage large amounts of data. A data warehouse
is a way of organizing data so that there is corporate credibility and integrity. When someone
takes data from a data warehouse, that person knows that other people are using the same data
for other purposes.
2) Enterprise Data Warehouse; Independent Data Mart
An enterprise data warehouse is a unified database that holds all the business information an
organization and makes it accessible all across the company. This is the simplest approach of all
the data warehouse architecture which follows a basic flow where in all internal sources are
gone through ETL process and loaded in data warehouse.
On the other hand, Independent data mart is a decentralized database that acts a stand-alone
system that are built by drawing data directly from operational or external sources of data, or
both. Each data mart is miniature data warehouse which supports particular user group
requirements of the organisation.
Talking about complexity, both the architectures are complex in nature. Enterprise Data
warehouse looks architecturally simple, but, it needs a very large warehouse and dealing with
number of external sources and posting them into one repository makes it a complex
development project. Whereas, independent data mart deals with multiple ETL process (one
each at least for each data mart) and handling data quality for these becomes very complex.
However, independent data mart is most used architecture as well most successful one.
3) Federated Data Mart Architecture; Dependent Data Mart
Dependent Data Mart is centralized data warehouse architecture which is the combination
of Enterprise data warehouse and Independent Data Mart. This extracts data from external
sources into a single repository and then further divides the data into data mart for all user
group needs.
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in
Federated is similar to a dependent data mart except that there is no physical warehouse
present in the layer between external source and individual data mart. There is a virtual
temporary storage area where the data is placed temporarily after the ETL process and later
it is passed on to the data mart layer.
Both dependent and Federated functions are the same, but dependent data mart is
expensive as it incurs the cost of both Enterprise data warehouse as well independent data
marts whereas Federated excludes physical warehouse cost. Theoretically, Dependent data
marts are most efficient architecture. However, if Federated is successfully implemented
(considering the technical possibilities of virtual database), it is the most recommended
architecture by Kimball.
4) Subject-Oriented; Integrated
Integrated Data warehouse: By the term integrated data warehouse we mean that it
combines the data from multiple sources which in turn is cleansed and integrated to be
present in a single form. Since it comes from several operational systems, all the
inconsistencies must be removed. Consistencies include naming convention, measurement
of variables, encoding structures, physical attributes of data and so forth. Example, Data can
be pulled from sales and marketing department and put in data warehouse in order to get
total yearly revenue. There will single definition of revenue for all departments.
Subject Oriented Data Warehouse: This relates to the design of the data ware house
schema. Warehouse as a tool for decision support are designed with the view of the data
from the perspective of decision making, managerial user. This is inherently a data or subject
oriented perspective focussed on concepts like customers, products and suppliers. This
makes it easier to answer the kinds of business questions like “How many customers do we
have?”
In the same way that a subject oriented view necessarily cuts across artificial data schema
boundaries, it also requires an integrated view of data that cuts across the system and
business unit boundaries.
5) Time Variant; Non-Volatile
Time Variant: Most transaction databases are designed with little consideration given to
temporal aspects of the data contained within them—for example, if a customer changes an
address, then a transaction-processing database will often not keep track of the previous
address, only providing facilities for recording the new, current address. With a data
warehouse, however, there is often a requirement to analyse how data changes over time:
in the words to preserve the “time-variant” nature of the data. This led to the emergence of
time variant data warehouse. A key aspect of the ability to analyse the temporal aspect of
data is the capability to reproduce historically accurate reports. Overwriting data with
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 5 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Big data is a collection of huge amount of data that requires special database management systems to analyse and take out useful insights from it. Analysis & insights from this data is considered as big data analytics. A data warehouse is a centralised repository of organisational data sourced from a number of original data sources such as internal information systems, external data feeds and other supplementary data as required. The purpose is to support organisational decision-making by providing an integrated data source for business intelligence and other decision support applications. When we compare a big data to a data warehouse, we find that a big data solution is a technology and that data warehousing is an architecture. A technology is just that a means to store and manage large amounts of data. A data warehouse is a way of organizing data so that there is corporate credibility and integrity.