CS431 Chapter Notes - Chapter 2: Mapreduce, Memory Management, Instruction Set

19 views20 pages

Document Summary

Hadoop is still too low-level than we like. Example: find the top 10 most visited pages in each category. Pig is slower - higher level language built on mapreduce. Take advantage of pig as glue code for scale-out plumbing. We have a need to design a data processing language "from scratch" Data-parallel dataflow languages - pig, dryad(linq), flume(java), spark. Want to apply a bunch of operations to compute some result. Assumption: static collection of records (what"s the limitation?) However, map only solves some class of problem a. b. Problem: writing to hdfs is always slow - resiliency. Has to go through hdfs to do map-map. Read data and feed into reduce fundamental con of reduce-reduce. Size of input is not known until runtime. net constructs for combining imperative and declarative programming. Program compiled into computations that run on dryad. Want to apply a bunch of operations to compute some results.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers