MIS 0855 Study Guide - Final Guide: Logical Data Model, Physical Data Model, Apache Hadoop
Quizzes
Quiz 8
• Hadoop is a platform that: makes big data easier to manage
• Bertolucci claims an ongoing problem with Hadoop for companies is that: managers and
eeuties do’t eall udestad hat it does
• Hadoop is often paired with another piece of software called: MapReduce
• Relational databases follow to a set of practices called: the rules of normalization
• A basic rule of the Pivot Table data structure is that: all values of the same type need to be in
one column
Quiz 9
• Which of the following is the ost appopiate tehiue to aalze the stateet Coke tastes
ette tha Pepsi? opaatie setiet aalsis
• According to Feldman, the most common application of sentiment analysis is: reviews of
consumer products and services
• The most detailed type of sentiment analysis is: aspect-based sentiment analysis
• Which of the following is NOT an example of unstructured data? Stock prices
• According to Hurwitz, Hadoop is capable of handling unstructured data: True
• Aodig to Wohlse, Faeook’s eet study of its users revealed: exposure to fewer positive
messages led to fewer positive posts
Quiz 10
• Aodig to Paie, a aalsis foud that a tea’s poailit of soig ieases as: the
string together more successful passes
• According to Paine, people have been tracking soccer data for: over 60 years
• According to Bertolucci, the simplest type of analytics is: descriptive
• Aodig to Pek, people aaltis is: the appliatio of peditie aaltis to people’s
careers
• According to Peck, the use of aaltis to deteie okes’ potetial is ost idel used i:
hourly work, where the jobs are standardized
Articles
Knowing Just Enough About Relational Databases (Rosenblum and Dorsey)
• What akes a dataase elatioal?
o Designed to conform to the rules of normalization (a set of practices)
o E. Ogaize a ogaizatio’s dept ad eploee data ito sepaate tales
employee number and name, and dept number and name
o Foreign keys – special columns that link two tables
find more resources at oneclass.com
find more resources at oneclass.com
• Understanding basic database terminology
o Database is built in two stages:
▪ Logical data model – lay out the design of database and how data will be
organized
▪ Physical data model – sets up actual tables and columns
o Logical/Relational
▪ Entity, attribute, instance
o Logical/Object-Oriented
▪ Class, attribute, object
o Physical Implementation
▪ Table, column, row
o Definitions:
▪ Entity – corresponds to something in the real world that is of interest and that
you want to store information about (ex. departments)
• Instance – each specific department or employee in entity
▪ Attribute – represents information about an entity instance or an object that
will be tracked (ex. birth date, SS#)
▪ Entities (classes), their attributes, and instances (objects)
• Entities (classes) – tables
• Attributes – columns
• Instances (objects) – rows
o Primary key – identifies a specific instance of an entity (no two instances of an entity can
have the same primary key; values of primary keys must not be null) (ex. ID #s)
o Candidate keys – attributes used as a primary key
How to Explain Hadoop to Non-Geeks (Bertolucci)
• Focus on the benefits of Hadoop and big data
• Tutorial video used to explain; called a platform that makes big data easier to manage
• Understand how Hadoop stores files and how it processes data
o Hadoop lets you store files bigger than what can be stored on one node or server
o Lets you store many, many files
• Maistea usiess uses do’t eed to ko ho Hadoop oks ut eed to udestad that
the constrain with storing and processing data is no longer with Hadoop
• MapReduce is the second characteristic of Hadoop: ability to process data or provide a
framework of processing the data
o Rather than moving data to software, MR moves processing software to the data
How to Structure Source Data for Excel Pivot Tables & Unpivot (Acampora)
• Pivot tables used to summarize and analyze large data
• Data table structure:
o Fields → columns (column header describes data in field; ex. company, region, product)
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Stock prices: according to hurwitz, hadoop is capable of handling unstructured data: true, a(cid:272)(cid:272)o(cid:396)di(cid:374)g to wohlse(cid:374), fa(cid:272)e(cid:271)ook"s (cid:396)e(cid:272)e(cid:374)t study of its users revealed: exposure to fewer positive messages led to fewer positive posts. Knowing just enough about relational databases (rosenblum and dorsey: what (cid:373)akes a data(cid:271)ase (cid:862)(cid:396)elatio(cid:374)al(cid:863), designed to conform to the rules of normalization (a set of practices, e(cid:454). Instances (objects) rows: primary key identifies a specific instance of an entity (no two instances of an entity can have the same primary key; values of primary keys must not be null) (ex. Id #s: candidate keys attributes used as a primary key. Identification of text to each specific entity that is mentioned (assigning correct text to entity: sarcasm, noisy texts, factual statements that carry sentiment (objective that identify as subjective) They"re wat(cid:272)hing you at work (peck: more than 98% of the (cid:449)o(cid:396)ld"s i(cid:374)fo is (cid:374)o(cid:449) sto(cid:396)ed digitall(cid:455, people analytics appli(cid:272)atio(cid:374) of p(cid:396)edi(cid:272)ti(cid:448)e a(cid:374)al(cid:455)ti(cid:272)s to people"s (cid:272)a(cid:396)ee(cid:396)s.