STAT3012 Lecture Notes - Lecture 10: Semiparametric Regression, Box Plot, Google
Lecture 10 - Variable selection: Backward and forward
New concepts
✷Backward variable selection
✷The drop1 and update command
✷Forward variable selection
✷The add1 command
Applied Linear Models: Lecture 10 1
find more resources at oneclass.com
find more resources at oneclass.com
New topic – Variable selection
Motivation
✷Hypotheses testing:
Aims to test for redundancy/non-redundancy of
◦single explanatory variables and
◦groups of explanatory variables.
Reason: Wanting to test hypotheses about a given group of covariates.
✷Statistical learning:
What is the ‘best’ group of explanatory variables for describing and/or predicting
the response?
Applied Linear Models: Lecture 10 2
find more resources at oneclass.com
find more resources at oneclass.com
Theory – Possible subsets
✷Let mdenote any subset of pmdistinct elements from {1, . . . , p}.
Remark: Typically the intercept is forced to be part of the model.
✷Let Mdenote a set of linear regression models for the relationship between Y
and X.
Remark: Often Mis reduced by preselection.
Example – Three explanatory variables (k= 3)
✷There are 24= 16 distinct subsets of {1,2,3,4}:∅,{1},{2},{1,2},{3},. . .,
{1,2,3,4}.
✷If the intercept is forced to be be part of the model, then there are 24−1= 2k= 8
possible subsets.
Applied Linear Models: Lecture 10 3
find more resources at oneclass.com
find more resources at oneclass.com
Document Summary
Lecture 10 - variable selection: backward and forward. Aims to test for redundancy/non-redundancy of: single explanatory variables and, groups of explanatory variables. Reason: wanting to test hypotheses about a given group of covariates. Let m denote any subset of pm distinct elements from {1, . Remark: typically the intercept is forced to be part of the model. Let m denote a set of linear regression models for the relationship between y and x. Example three explanatory variables (k = 3) There are 24 = 16 distinct subsets of {1, 2, 3, 4}: , {1}, {2}, {1, 2}, {3}, . If the intercept is forced to be be part of the model, then there are 24 1 = 2k = 8 possible subsets. For simplicity we use m as an abbreviation for the linear regression model of y on those columns of x indexed by m. The linear regression model m is given by yi = 0 + x.