Computer Science 4442A/B Lecture Notes - Lecture 7: Linear Separability, Gradient Descent, Perceptron
Document Summary
3 (cid:170) (cid:171) (cid:171) (cid:171) (cid:172) (cid:170) (cid:171) (cid:171) (cid:171) (cid:172) 5z (cid:186) (cid:187) (cid:187) (cid:187) (cid:188) (cid:186) (cid:187) (cid:187) (cid:187) (cid:188) (cid:170) (cid:171) (cid:171) (cid:171) (cid:172) 5 (cid:187) (cid:171) (cid:188) (cid:172: good line choice is shown in green. 1 (cid:186: single sample perceptron rule (cid:187) 2z: initial weights a(1) = [1 1 1] (cid:187) 1 (cid:187) (cid:188: this is line x1 + x2 + 1 = 0, use fixed learning rate = 1, rule is: a(k+1) =a(k) + zm. Non-linearly separable case: add extra feature and normalize , the single sample perceptron rule, single sample perceptron rule. 1: single sample perceptron rule (cid:186) (cid:170) (cid:170) (cid:186) (cid:170, single sample perceptron rule, initial weights: (cid:187) (cid:171) (cid:171) (cid:187) (cid:171) 1z: initial weights a(1) = [1 1 1, single sample perceptron rule, initial weights a(1) = [1 1 1, initial weights a(1) = [1 1 1] (cid:187) (cid:171) (cid:171) (cid:171) (cid:187)