CSE 150 Lecture Notes - Lecture 10: Posterior Probability, Bayes Estimator, Backtracking

33 views3 pages

turquoiseplatypus250

4 May 2018

School

UCSD

Department

Computer Science & Engineering

Course

CSE 150

Professor

Lawrence Saul

For unlimited access to Class Notes, a Class+ subscription is required.

Learning from Incomplete Data

Examples t = 1 to T

Visible Nodes V(t)

Hidden Nodes H(t)

How to learn CPTs? Maximize log-likelihood of “partially” observed ata w/ EM alg

EM Algorithm

1) Initialize CPTs w/ some values

2) Iterate until convergence

E-step: Inference

For EACH example t = 1 to T,

For EACH node Xi (i = 1 to n)

For EACH possible value x of Xi

For EACH possible configuration π of pa(Xi)

Compute posterior P(Xi = x, pai = π|V(t))

M-step: re-estimate/update CPTs

For nodes w/ parents:

P(Xi = x|pai = π) ← ∑t P(Xi = x, pai = π|V(t)) ← computed from current CPTs

New CPTs ∑t P(pai = π|V(t))

Why is it ← ? It’s just a formula for getting NEW values

For root nodes:

P(Xi = x) ← (1/T) ∑t P(Xi = x|V(t))

Intuition

- Use posterior distribution P(H|V(t)) to “fill in” values of hidden nodes missing!

- Expected counts under P(H|V) are “substituting” for observed counts in COMPLETE

data case

Key Properties of EM Algorithm

1) Monotonic convergence

- Each iteration of EM is always getting better! (i.e. improves log-likelihood)

L = ∑t = 1

T log P(V(t))

- If LK is log-likelihood at kth iteration of EM, then LK >= KK-1 (useful for debugging!)

2) NO tuning parameters, NO learning rates, NO backtracking!

3) Converges “in general” to local but NOT global maximum of L = ∑t log P(V(t))

Final result of EM can depend on initialization

Example: A → B → C

Where A and C are visible, B is hidden

First consider inference:

Posterior P(B|A, C) = P(C|B, A) P(B|A) [Bayes Rule]

P(C|A)

= P(C|B) P(A) [CI]

Unlock document

This preview shows page 1 of the document.
Unlock all 3 pages and 3 million more documents.

Already have an account? Log in

Document Summary

Maximize log-likelihood of partially observed ata w/ em alg. Em algorithm: initialize cpts w/ some values, iterate until convergence. For each example t = 1 to t, For each node x i (i = 1 to n) Compute posterior p(x i = x, pa i = |v (t) ) P(x i = x|pa i = ) t p(x i = x, pa i = |v (t) ) computed from current cpts. It"s just a formula for getting new values. P(x i = x) (1/t) t p(x i = x|v (t) ) Use posterior distribution p(h|v (t) ) to fill in values of hidden nodes missing! Expected counts under p(h|v) are substituting for observed counts in complete data case. Key properties of em algorithm: monotonic convergence. Each iteration of em is always getting better! (i. e. improves log-likelihood) Final result of em can depend on initialization. Where a and c are visible , b is hidden.

CSE 150 Lecture Notes - Lecture 10: Posterior Probability, Bayes Estimator, Backtracking

Document Summary

Get access

Related Documents

CSE 150 Lecture Notes - Lecture 7: Data Set, Maximum Likelihood Estimation, Polytree

CSE 150 Lecture Notes - Lecture 8: Product Rule, Bayes Estimator, Asteroid Family

CSE 150 Lecture Notes - Lecture 24: Reinforcement Learning, Product Rule