STATS 10 Lecture Notes - Lecture 3: Exploratory Data Analysis, List Of Statistical Packages, Pareto Chart

62 views4 pages
10 Jun 2018
School
Department
Course
Chapter 2: Picturing Variation with Graphs
Visualizing Numerical Data
Distributions: one of the most important/fundamental conceptual tools in stats
Distribution of a sample is a list that records the values that were observed in
the data and frequencies of those values
Organize data so we can make comparisons between groups, examine data for
errors, and understand the variation in data
Helps us recognize trends, patterns, & outliers in data
E.g. waht is typically # of goals scored by soccer player? Is 19
unusually high?
Visualizing distributions w/ pictures often more effective way of
organizing data than a table
Dotplots
Histograms
Most flexible → most widely used, but
also have most details to watch out for
Stemplots
Dotplots
Dotplot: putting a dot above a number line where each value in
the data occurs; stack dots if value occurs multiple times; not great
for datasets w/ lots of observations
Historgrams
A (frequency) historgram: plot which groups observations into
intervals (bins)
Divide number line into bins of equal width
Count how many observations fall into each bin
draw vertical bar over bins which contain observations, where height of
each bar is proportiional to # of observations inside bin
Bin width: 3 →
Right-hand rule: placing observations on edge/boundary of two bins into
rightmost bin; value 12 put into 12-15 rather than 9-12
Not inherently better than left hand rule→ consistency=key
Bin size matters: changing width of bins changes histogram shape
Small bins=lots of detail, but might miss big picture
Wide bins: better at showing large-scale patterns, but might miss important/interesting details
Default bin width of statistical software usually good starting choice
Good idea to try different bin widths to get better
understnading of distribution of the sample (when doing
exploratory data analysis → visualizing and summarizing data)
Histograms can be used to visualize frequencies or relative frequencies of
observations
Frequency histogram: displays frequency (counts) of observations in
each bin
Relative frequency histogram: displays rellative frequency
(proportion) of observations in each bin
Relative frequencies do not change shape of histogram, but
give us way to answer questions about proportions rather than raw #s
(good for comparing 2 samples w/ diff frequencies)
Stemplots
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows page 1 of the document.
Unlock all 4 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Distributions: one of the most important/fundamental conceptual tools in stats. Distribution of a sample is a list that records the values that were observed in the data and frequencies of those values. Organize data so we can make comparisons between groups, examine data for errors, and understand the variation in data. Helps us recognize trends, patterns, & outliers in data. Visualizing distributions w/ pictures often more effective way of organizing data than a table. Most flexible most widely used, but also have most details to watch out for. Dotplot: putting a dot above a number line where each value in the data occurs; stack dots if value occurs multiple times; not great for datasets w/ lots of observations. A (frequency) historgram: plot which groups observations into intervals (bins) Divide number line into bins of equal width. Count how many observations fall into each bin.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related textbook solutions

Related Documents