STAT1008 Lecture Notes - Lecture 8: Interquartile Range, Quartile, Data General
STAT1008 Week 3 Lecture B
● Skewness and Center:
○ A distribution is left-skewed. Which measure of center would you expect to
be higher? Median. The mean will pull down towards the skewness
(towards the long tail)
● Resistance:
○ A statistic is resistant if it is relatively unaffected by extreme values
○ The median is resistant while the mean is not
● Outlier:
○ An outlier is an observed value that is notably distinct form the other
values in a dataset
○ When using statistics that are not resistant to outliers, stop and think about
whether the outlier is a mistake
○ If not, you have to decide whether the outlier is part of your population of
interest or not
○ Usually, for outliers that are not a mistake, it’s best to run the analysis
twice, one with the outlier(s) and once without, to see how much the
outlier(s) are affecting the results
● Standard deviation:
○ Standard deviation for a quantitative variable measures the spread of the
data
○ Sample standard deviation: s
○ Population standard deviation: sigma
○ Gives a rough estimate of the typical distance of a data value from the
mean
○ Larger the deviation, the more variability there is in the data and the more
spread out the data is
● 95% Rule:
○ If distribution of data is approximately bell shaped, about 95% of the data
should fall within two standard deviations of the mean
● Z score:
○ The z-score for a data value, x, is z = (x-x bar)/s
○ For the population x bar is replaced by upside h (mu) and s is replaced
with sigma
○ Puts values on a common scale
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
The mean will pull down towards the skewness (towards the long tail) A statistic is resistant if it is relatively unaffected by extreme values. The median is resistant while the mean is not. An outlier is an observed value that is notably distinct form the other values in a dataset. When using statistics that are not resistant to outliers, stop and think about whether the outlier is a mistake. If not, you have to decide whether the outlier is part of your population of interest or not. Usually, for outliers that are not a mistake, it"s best to run the analysis twice, one with the outlier(s) and once without, to see how much the outlier(s) are affecting the results. Standard deviation for a quantitative variable measures the spread of the data. Gives a rough estimate of the typical distance of a data value from the mean.