STAT 2103 Lecture Notes - Lecture 21: Test Statistic, Dependent And Independent Variables, Body Fat Percentage
STAT 2103 – Lecture 21 – Simple Linear Regression Cont’d. + Regression Part 2
Outliers and Influential Observations
• Outliers lie outside the overall pattern
• Influential observations markedly change the correlation and regression results when they
are removed
• Influential observations are often outliers in the x direction
Cautions
• Extrapolation
o Use of a regression line for prediction outside of the range of x values is often
inaccurate
• Using averaged data
o Correlations based on averages are usually higher than correlations based on
individuals
• Lurking variables
o A variable that is not included in a study may have an important effect on the
relationship of the variables studied
• Association is not causation
o Even if an association is very strong, this is not by itself good evidence that a change
in x will cause a change in y
Extrapolation
Use of a regression line for prediction outside of the range of x values
(Slide 26)
Use the regression equation to predict the number of people living on farms in 2000.
1940 1950 1960 1970 1980
10
20
30
Year
Population
Y = 1166.93 - 0.59X
Lurking Variables
• x = population density
• y = lack of indoor plumbing
• red points = public housing
Regression Part 2:
Output
Regression Analysis
The regression equation is
Cost = 2272 + 51.7 Number
Predictor Coef StDev T P
Constant 2272.1 243.3 9.34 0.000
Number 51.661 7.347 7.03 0.000
S = 198.6 R-Sq = 75.5% R-Sq(adj) = 74.0%
Analysis of Variance
Source DF SS MS F P
Regression 1 1949616 1949616 49.44 0.000
Residual Error 16 630957 39435