DATA3404 Lecture Notes - Lecture 9: Associative Entity, Vr Class Sm1, Mapreduce
Document Summary
How to join tables if they are distributed among a cluster of node: approach 1: local (replicated) reference tables. Updating distributed data: data partitioning: updates handled same as queries just go to the sites which hold affected data; If we do that it"s damaging to performance and fault-tolerance: thus, replication is best for data where reads > updates, but we can read any copy. Total replication: all dbs have identical content, any reads can be done locally, simple system design, performance may suffer, all dbs need to update. Partial replication: selective replicate parts of dbs, need to manage information about replica locations, need to make choices about data placement, complicated system design, performance might improve slightly. Complete tables: each db has some of the tables, easy to decide whether local copy exists for some data, easy to reuse standard dbms engine for query optimization and processing, relatively simpler system design.