CSE 124 Lecture Notes - Lecture 22: Memcached
Quantifying Performance at a Scale
Typical data centers ONLY run at 10-25% utilization!
- Minimize the workload
Availability Metrics
- mean time between failures (MTBF)
- how often do you get server failures (i.e. 1/year)
- mean time to repair (MTTR)
- time to get the device replaced AND back in service
- a LOT of factors:
- how long to figure out the problem
- figure out what to DO with the problem
- figure out the replacement
- etc...
- availability - (MTBF - MTTR)/ MTBF
- deduct how much time your server is “out of service”
Some Jargon…
90% - 1 9’s reliability
99% - 2 9’s
99.999% - 5 9’s
- Guaranteed to be up for the ENTIRE year except for ~5 minutes
Rather than focusing on eliminating failures, data centers focus on reducing MTTR than
“improvement” in MTBF
Note: if I only require 6/10 servers to be “available”, then I can suffer 4 failures while STILL
retaining 100% availability!
Harvest and Yield
Yield - queries completed / queries offered
- Deals w/ transactions
Harvest - data available/ complete data
- How MUCH of the database is “reflected” in each query?
- Amount of data that ANY server can “return” back to the client
Why do we care? We want to trade off one of these metrics for the other!
When the system ends up in a “degraded” state, we want to prioritize one of these metrics over
the other!
DQ Principle
Data per query * queries per second → constant
- Trade off Harvest (data/query) for Yield (queries/second), or vice versa!
- result remains the SAME
Document Summary
Typical data centers only run at 10-25% utilization! How often do you get server failures (i. e. 1/year) Time to get the device replaced and back in service. How long to figure out the problem. Figure out what to do with the problem. Deduct how much time your server is out of service . Guaranteed to be up for the entire year except for ~5 minutes. Rather than focusing on eliminating failures, data centers focus on reducing mttr than. Note: if i only require 6/10 servers to be available , then i can suffer 4 failures while still retaining 100% availability! Amount of data that any server can return back to the client. We want to trade off one of these metrics for the other! When the system ends up in a degraded state, we want to prioritize one of these metrics over the other! Data per query * queries per second constant.