MIS 0855 Study Guide - Final Guide: Logical Data Model, Physical Data Model, Apache Hadoop

107 views7 pages
Quizzes
Quiz 8
Hadoop is a platform that: makes big data easier to manage
Bertolucci claims an ongoing problem with Hadoop for companies is that: managers and
eeuties do’t eall udestad hat it does
Hadoop is often paired with another piece of software called: MapReduce
Relational databases follow to a set of practices called: the rules of normalization
A basic rule of the Pivot Table data structure is that: all values of the same type need to be in
one column
Quiz 9
Which of the following is the ost appopiate tehiue to aalze the stateet Coke tastes
ette tha Pepsi? opaatie setiet aalsis
According to Feldman, the most common application of sentiment analysis is: reviews of
consumer products and services
The most detailed type of sentiment analysis is: aspect-based sentiment analysis
Which of the following is NOT an example of unstructured data? Stock prices
According to Hurwitz, Hadoop is capable of handling unstructured data: True
Aodig to Wohlse, Faeook’s eet study of its users revealed: exposure to fewer positive
messages led to fewer positive posts
Quiz 10
Aodig to Paie, a aalsis foud that a tea’s poailit of soig ieases as: the
string together more successful passes
According to Paine, people have been tracking soccer data for: over 60 years
According to Bertolucci, the simplest type of analytics is: descriptive
Aodig to Pek, people aaltis is: the appliatio of peditie aaltis to people’s
careers
According to Peck, the use of aaltis to deteie okes’ potetial is ost idel used i:
hourly work, where the jobs are standardized
Articles
Knowing Just Enough About Relational Databases (Rosenblum and Dorsey)
What akes a dataase elatioal?
o Designed to conform to the rules of normalization (a set of practices)
o E. Ogaize a ogaizatio’s dept ad eploee data ito sepaate tales 
employee number and name, and dept number and name
o Foreign keys special columns that link two tables
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 7 pages and 3 million more documents.

Already have an account? Log in
Understanding basic database terminology
o Database is built in two stages:
Logical data model lay out the design of database and how data will be
organized
Physical data model sets up actual tables and columns
o Logical/Relational
Entity, attribute, instance
o Logical/Object-Oriented
Class, attribute, object
o Physical Implementation
Table, column, row
o Definitions:
Entity corresponds to something in the real world that is of interest and that
you want to store information about (ex. departments)
Instance each specific department or employee in entity
Attribute represents information about an entity instance or an object that
will be tracked (ex. birth date, SS#)
Entities (classes), their attributes, and instances (objects)
Entities (classes) tables
Attributes columns
Instances (objects) rows
o Primary key identifies a specific instance of an entity (no two instances of an entity can
have the same primary key; values of primary keys must not be null) (ex. ID #s)
o Candidate keys attributes used as a primary key
How to Explain Hadoop to Non-Geeks (Bertolucci)
Focus on the benefits of Hadoop and big data
Tutorial video used to explain; called a platform that makes big data easier to manage
Understand how Hadoop stores files and how it processes data
o Hadoop lets you store files bigger than what can be stored on one node or server
o Lets you store many, many files
Maistea usiess uses do’t eed to ko ho Hadoop oks ut eed to udestad that
the constrain with storing and processing data is no longer with Hadoop
MapReduce is the second characteristic of Hadoop: ability to process data or provide a
framework of processing the data
o Rather than moving data to software, MR moves processing software to the data
How to Structure Source Data for Excel Pivot Tables & Unpivot (Acampora)
Pivot tables used to summarize and analyze large data
Data table structure:
o Fields columns (column header describes data in field; ex. company, region, product)
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-2 of the document.
Unlock all 7 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Stock prices: according to hurwitz, hadoop is capable of handling unstructured data: true, a(cid:272)(cid:272)o(cid:396)di(cid:374)g to wohlse(cid:374), fa(cid:272)e(cid:271)ook"s (cid:396)e(cid:272)e(cid:374)t study of its users revealed: exposure to fewer positive messages led to fewer positive posts. Knowing just enough about relational databases (rosenblum and dorsey: what (cid:373)akes a data(cid:271)ase (cid:862)(cid:396)elatio(cid:374)al(cid:863), designed to conform to the rules of normalization (a set of practices, e(cid:454). Instances (objects) rows: primary key identifies a specific instance of an entity (no two instances of an entity can have the same primary key; values of primary keys must not be null) (ex. Id #s: candidate keys attributes used as a primary key. Identification of text to each specific entity that is mentioned (assigning correct text to entity: sarcasm, noisy texts, factual statements that carry sentiment (objective that identify as subjective) They"re wat(cid:272)hing you at work (peck: more than 98% of the (cid:449)o(cid:396)ld"s i(cid:374)fo is (cid:374)o(cid:449) sto(cid:396)ed digitall(cid:455, people analytics appli(cid:272)atio(cid:374) of p(cid:396)edi(cid:272)ti(cid:448)e a(cid:374)al(cid:455)ti(cid:272)s to people"s (cid:272)a(cid:396)ee(cid:396)s.