STAT1008 Lecture Notes - Lecture 8: Interquartile Range, Quartile, Data General

31 views2 pages

whitegoat396

30 May 2018

School

Department

Course

Professor

For unlimited access to Class Notes, a Class+ subscription is required.

STAT1008 Week 3 Lecture B

● Skewness and Center:

○ A distribution is left-skewed. Which measure of center would you expect to

be higher? Median. The mean will pull down towards the skewness

(towards the long tail)

● Resistance:

○ A statistic is resistant if it is relatively unaffected by extreme values

○ The median is resistant while the mean is not

● Outlier:

○ An outlier is an observed value that is notably distinct form the other

values in a dataset

○ When using statistics that are not resistant to outliers, stop and think about

whether the outlier is a mistake

○ If not, you have to decide whether the outlier is part of your population of

interest or not

○ Usually, for outliers that are not a mistake, it’s best to run the analysis

twice, one with the outlier(s) and once without, to see how much the

outlier(s) are affecting the results

● Standard deviation:

○ Standard deviation for a quantitative variable measures the spread of the

data

○ Sample standard deviation: s

○ Population standard deviation: sigma

○ Gives a rough estimate of the typical distance of a data value from the

mean

○ Larger the deviation, the more variability there is in the data and the more

spread out the data is

● 95% Rule:

○ If distribution of data is approximately bell shaped, about 95% of the data

should fall within two standard deviations of the mean

● Z score:

○ The z-score for a data value, x, is z = (x-x bar)/s

○ For the population x bar is replaced by upside h (mu) and s is replaced

with sigma

○ Puts values on a common scale

find more resources at oneclass.com

Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

The mean will pull down towards the skewness (towards the long tail) A statistic is resistant if it is relatively unaffected by extreme values. The median is resistant while the mean is not. An outlier is an observed value that is notably distinct form the other values in a dataset. When using statistics that are not resistant to outliers, stop and think about whether the outlier is a mistake. If not, you have to decide whether the outlier is part of your population of interest or not. Usually, for outliers that are not a mistake, it"s best to run the analysis twice, one with the outlier(s) and once without, to see how much the outlier(s) are affecting the results. Standard deviation for a quantitative variable measures the spread of the data. Gives a rough estimate of the typical distance of a data value from the mean.

Related Questions

Opeele Kyinkafa sells different used 2005 GM cars in excellent condition. The value or the retail price (Price) of a car is dependent on a variety of characteristics such as number of miles the car has been driven (Mileage), manufacturer of the car (Make) such as Buick, Cadillac, Chevrolet, Pontiac, SAAB, and Saturn. The car price is also influenced by the body type (Type) such as sedan, convertible, coupe, hatchback and wagon. It is also affected by number of cylinders (Cylinders) in the engine, the engine size (Liter), cruise control which is an indicator variable representing whether the car has cruise control (yes) or no otherwise and Sound indicator representing whether the car has upgraded speakers (yes) or no otherwise. All cars in this data set were less than one year old when priced and considered to be in excellent condition. Determine which major variables are dummies and are dynamic? Identify 3 potential unobserved heterogenous factors in this dataset. Display the full descriptive statistics of the data in R. Why do you think some variables are starred (*)? What is the median for “price”, standard deviation for “mileage”, skewness for “cylinder” and mean for “sound”? Test if “price” is normally distributed using histogram, kernel density plot and the shapiro wilk test. Create a red scatterplot matrix of price, mileage, cylinder, liter, make. From your graph, is the correlation between price & mileage positive or negative? What’s the pearson correlation coefficient between price & mileage and between “cylinder” & “liter” correct to 2 d.p Create a correlation matrix of the dataset by finding the apt R command for converting all string/dummy variables From the matrix, identify 2 potentially good predictors? Identify 3 pairs of variables that are mostly multicollinear. Which 2 pairs of variables are least correlated? Identify 3 pairs of variables most highly correlated with Cylinder Generate a simple regression plot of price on mileage with a re fitted line. Run the whole multiple regression model (label the equation as "regprice") & use it to answer the other questions. Specify the regression model for the whole data keeping the exact names of the variables. Write the estimated regression equation. In one sentence, which variables are significant and which are not? Justify Is the regression model justifiable? Explain Interpret the coefficient of Mileage, MakeCadillac, TypeSedan, Cylinder, & Cruiseyes. Compare the price of a car with 130,000 mileage, 6 cylinders, 3.8 liters, with cruise and sound, was manufactured by Pontiac with a convertible body type with that of the price of a car with four cylinders, 120,000 mileage, 3.8 liters, without cruise nor sound, was manufactured by Cadillac with a hatchback body type. Which factor(s) appears to be causing a comparative difference (if any)? Using only the VIF, which one of the pairs of variables selected to be multicollinear may be deleted? Justify?

aquamarineporpoise659

STAT1008 Lecture Notes - Lecture 8: Interquartile Range, Quartile, Data General

Document Summary

Get access

Related textbook solutions

Introductory Statistics

Related Documents

STAT1008 Study Guide - Final Guide: Bar Chart, Standard Deviation, Quartile

STAT1008 Study Guide - Final Guide: Simple Random Sample, Dependent And Independent Variables, Bar Chart

STAT1008 Lecture Notes - Lecture 25: Central Limit Theorem, Null Hypothesis, Standard Deviation

Related Questions