CSE 150 Lecture Notes - Lecture 12: Mixture Model, Bigram, 96.5 Wave Fm

25 views2 pages
Update rules
- When applied in parallel to ALL pi, will monotonically increase L = ∑t log P(Xt|X’t)
Markov Models of Language
Let wl = l’th word in sentence
How to model P(w1,...,wl) ? Shorthand: w’ = (w1,...,wL)
Model P(w’) MLE
Unigram ∏l P1(w’l) P1(w) = count(w)/L
Bigram P1(wl) ∏l P2(w’l|wl) P2(w’|w) = count(w → w’)
Trigram count(w → anything)
Train on corpus A: P1(w’) <= P2(w’) on corpus A
Test on corpus B: P1(w’) >= P2(w’) = 0 if there are unseen bigrams
Linear Interpolation (mixture model)
PM(wl|wl-1) = (1 - ƛ)P1(wl) + ƛP2(wl|wl-1) with ƛ [0,1]
How to estimate ƛ?
Hidden Variable Model
wl-1 → wlP(wl|wl-1, z) = { P1(wl) if z = 1
/ { P2(wl|wl-1) if z = 2
Z {1,2} P(z = 1) = 1 - ƛ
P(z = 2) = ƛ
How to estimate ƛ? From incomplete data {(wl-1,wl)}l =1
L
In this model:
P(wl|wl-1) = ∑z = 1
2 P(wl, z|wl-1) [marginalization]
= ∑z = 1
2 P(z|wl-1) P(wl|z, wl-1) [Product Rule]
= ∑z = 1
2 P(z) P(wl|z, wl-1) [Marginal Independence]
= (1 - ƛ)P1(wl) + ƛP2(wl|wl-1) [Reproduces mixture model]
E-step: compute posterior probs
P(z|wl-1,wl) = P(wl|z, wl-1) P(z|wl-1) [Bayes Rule]
P(wl|wl-1)
= P(wl|z, wl-1) P(z) [Marginal Independence]
P(wl|wl-1)
P(z = 2|wl-1,wl) = ƛP2(wl|wl-1)
(1-ƛ)P1(wl) + ƛP2(wl|wl-1)
M-step of EM alg:
For root nodes:
P(X = x) ← (1/T) ∑t P(X = x|V(t))
For mixture model:
P(Z = 2) ← (1/L) ∑l P(z = 2|wl,wl-1)
Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

When applied in parallel to all p i , will monotonically increase l = t log p(x t |x" t ) Test on corpus b: p 1 (w") >= p 2 (w") = 0. Hidden variable model w l-1 on corpus a if there are unseen bigrams. P(w l |w l-1 , z) = { p 1 (w l ) if z = 1. P 2 (w"|w) = count(w w") count(w anything) P(x = x) (1/t) t p(x = x|v (t) ) P(z = 2) (1/l) l p(z = 2|w l ,w l-1 ) Word-dependent value of (w) -- em used to estimate as many params of words in corpus. O t {1,2,,m} hidden state at time t observation at time t ( partial , noisy reflection of hidden state of world) S t { has to go, doesn"t have to go, went } O t { wagging tail, barking, standing by door, acting guiltily, }

Get access

Grade+
$40 USD/m
Billed monthly
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
10 Verified Answers
Class+
$30 USD/m
Billed monthly
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
7 Verified Answers

Related Documents

Related Questions