RSM412H1 Lecture Notes - Lecture 8: Overfitting, Heteroscedasticity

46 views2 pages
18 Mar 2020
School
Department
Course
Professor

Document Summary

March 5, 2020: tree-based models are like playing 20 questions, decision tree uses set of binary rules to calculate target value. Can be used for: classification (categorical target variable, regression (continuous target variable) Predictions obtained by fitting simpler model in each region: usually constant like average response value in each region, decision trees typically lack predictive performance compared to more complex algorithms like neural networks and mars. Can use ensemble algorithms to overcome this weakness. Constructed by combining many decision trees together: random forests, gradient boosting machines. Measures degree of probability of particular variable being wrongly classified when it is. Varies between 0 and 1 randomly chosen: 0 means all elements belong to a certain class, 1 means elements are randomly distributed among various classes. Choose feature with least gini index as root node. Incorporate a cost complexity parameter that penalizes for number of terminal nodes of optimal subtree tree. Smaller penalty produces more complex models, thus larger trees.

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents