CEE 5244 Lecture Notes - Lecture 57: C4.5 Algorithm, Gini Coefficient

24 views2 pages

Document Summary

= number of distinct values (cid:1) each splitting value has a count matrix associated with it. Class counts in each of the partitions, a < v and a v (cid:1) simple method to choose best v. For each v, scan the database to gather count matrix and compute its gini index. Repetition of work. (cid:1) continuous attributes: computing gini index (cid:1) for efficient computation: for each attribute, Linearly scan these values, each time updating the count matrix and computing gini index. Measures homogeneity of a node. (cid:2) maximum (log nc) when records are equally distributed among all classes implying least information (cid:2) minimum (0. 0) when all records belong to one class, implying most information. Entropy based computations are similar to the gini index computations (cid:1) examples for computing entropy (cid:1) splitting based on info (cid:1) Information gain: (cid:1)(cid:2)(cid:3)(cid:4)(cid:5)(cid:6)(cid:7)(cid:8)(cid:9)(cid:10)(cid:4)(cid:11)(cid:7)(cid:12)(cid:7)(cid:13)(cid:14)(cid:7)(cid:14)(cid:12)(cid:15)(cid:13)(cid:6)(cid:7)(cid:13)(cid:5)(cid:6)(cid:9)(cid:7)(cid:16)(cid:7)(cid:12)(cid:2)(cid:3)(cid:6)(cid:13)(cid:6)(cid:13)(cid:9)(cid:5)(cid:14)(cid:17) ni is number of records in partition i. Measures reduction in entropy achieved because of the split.

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents