KHA350 Lecture Notes - Lecture 9: Multicollinearity, Dependent And Independent Variables, Analysis Of Variance

47 views8 pages
Research methods week 9: Multiple regression analysis
Last week:
- How much of a shared relationship there is
- Best fit for relationship
- Use this line to predict scores
- One predictor variable trying to explain the degree of relationship:
oMore important to think about the amount of shared variation
oBecause correlation doesn’t imply causation
- When you think about individuals
oMany things that contribute to their behaviours and patterns of
performance
oMore realistic picture of the multiple influence on behaviour: eg. Memory
function
oNeed to take into account all the other things that have an influence on
behaviour
Multiple regression:
- more than one predictor variable (IV)
oinfluence of multiple predictors
othe relationship between these and the DV
- Often a better reflection of real world influences on behaviour than using a single
predictor variable
oCan examine the importance of multiple predictors at the same time
oAnd provide an assessment of the importance of each variable in the
context of the other predictors
- This analysis works in the exact same way as for simple linear regression
Assumptions:
- if the data does not meet these assumptions, this can lead to under or
overestimation of the strength of relationship
othis may lead to misleading models
- USE THE CORRECT VAIRBLE FOR THE TECHNIQUE
oFor linear models, want interval level data
Equal distances between each point in scale
- Independence of data: and of error terms
oEach person should participate only once
- Sample size and normality: of variables and residuals
oLarger samples improve reliability and dispersion, and reduce outliers
More than 100
Larger reduces the effect of outliers
oUnder-dispersion reduces the size of correlations
oOutliers may have strong influence on models
oNormality:
Get plots of data of residuals and plot against the normal plot
Standardize solutions: SPSS does this
Transforming data to Z distribution
Mean= 0, SD = 1
Probability of normality; 68%, 95% etc.
Probability plot with normal plot over it
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in
QQ plot: expected accumulative probability of a score, and the
observed probability of the score
Easier to read
If normal: will match straight line
Non-normal: great variation ot where we would expect the scores
to be
Influences: if your residuals are non-normal, will throughout t-tests
of significance for each parameter
SD will be askew
Deal with this through other programs
Use more robust techniques
Account for violation of normality
Or, manually transform the data
Boot strapping: take into account violation
oOutliers:
Too much dispersion isn’t a good thing
Outside of the normal distribution
Because the linear regression line is calculated by minimizing the
sums of the squared differences between observed and predicted
values, outliers can have a big influence
Removing outliers
How can you justify getting rid of them
Some take stronger approaches than others
This problem is particularly bad for multiple regressions
Can be tested with advanced diagnostics
Leverage values
Cook’s distance
Testing for outliers:
Basic approach:
oAny residual value >3 SDs from mean
Mahalanobis distance:
oDistance of a case from the means of the predictors
oCut-off for multivariate outlier (p<0.001, depends
on n predictors
oPeople might be outliers for one variable but not on
the other multivariable
oThis calculates this
oDone automatically by SPSS
oChanges depending on number of predictors in your
model:
2: 13.82, 3: 16.27, 4: 18.47, 5: 20.52, 6:
22.46, 7: 24.32
Cook’s distance:
oA measure of the influence of one case/participant
on the model as a whole
oValues >1 may be a concern
oAutomatically calculated by SPSS
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in
oBased on a number of different measure
Leverage:
oInfluence of an observed outcome values on the
predicted outcome values: basically similar to cook
oRange 0-1
oValues >3(k +1)/n
May be a concern
K is number of predictors
N= participants
Standardized DFIT
oWould removing one case produce a substantial
change to the parameters in the model
oValues >2 are having substantial impact on your
model
In SPSS:
Tick the boxes for collinearity diagnostics
For each case of the residuals
Flag if outside 3 SD
Ask for each of the three leverage measures:
oCook’s
oMahalanobis
oStandardized DFIT
Output:
oTells you smallest and biggest
oStandardized residuals
oGenerates new columns in data set for each
measures
oUse min and max to work out if any cases that are
above the cutoffs
- Linearity: non-linear relationships can mistakenly produce zero correlations
oAs we found last week, trying to fit a straight line through a relationship
that is actually curvilinear
oProduces smaller fit to the data
oUnderestimate the size of the relationship
oDraw scatterplots that look at the relationship IV and DV
oIf not linear, transform data so that it is linear
In the appendices of workbook
oLeprokurtic: under dispersed distribution; non-normal with little range of
scores
Line of best fit; hard to see where the line should be
Restricted range produces small correlation
At greater range: even though measuring the same thing, because
bigger sample it is easier to see where to draw the line
Correlation improves accuracy with greater range of scores
Smaller range of scores underestimates the size of the correlation
between variables
find more resources at oneclass.com
find more resources at oneclass.com
Unlock document

This preview shows pages 1-3 of the document.
Unlock all 8 pages and 3 million more documents.

Already have an account? Log in

Document Summary

How much of a shared relationship there is. One predictor variable trying to explain the degree of relationship: more important to think about the amount of shared variation, because correlation doesn"t imply causation. More than one predictor variable (iv: influence of multiple predictors, the relationship between these and the dv. This analysis works in the exact same way as for simple linear regression. Assumptions: if the data does not meet these assumptions, this can lead to under or overestimation of the strength of relationship: this may lead to misleading models. Use the correct vairble for the technique: for linear models, want interval level data. Equal distances between each point in scale. Independence of data: and of error terms: each person should participate only once. Sample size and normality: of variables and residuals: larger samples improve reliability and dispersion, and reduce outliers.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents