next up previous
Next: Derivation of the Baum-Welch Up: Bayesian Learning 2: EM Previous: Revision of Lagrange Multipliers

The EM Algorithm

The EM Theorem:

Let X and Y be random variables with distributions P(X) and P(Y) respectively under some model $\theta$. Let P'(X) and P'(Y) be their corresponding distributions under a different model $\theta'$. Then

 \begin{displaymath}
\left[ \sum\limits_X P(X\vert Y)\log \frac{P'(X,Y)}{P(X,Y)} > 0 \right]
\Rightarrow P'(Y) > P(Y)
\end{displaymath} (15)

Interpretation: If we find a model $\theta'$ for which the first inequality holds, then the observed data Y will be more probable under $\theta'$ than under $\theta$.

Proof. Since $\sum_X P(X\vert Y) = 1$, $\log P'(Y)-\log P(Y) =$

\begin{eqnarray*}& & \sum_X P(X\vert Y) \log P'(Y) - \sum_X P(X\vert Y) \log P(Y...
...vert Y)}\\
&\ge& \sum_X P(X\vert Y) \log \frac{P'(X,Y)}{P(X,Y)}
\end{eqnarray*}


where the last step follows from Jensen's inequality proved earlier. If this quantity is at least zero, then so is $\log P'(Y) - \log P(Y)$ and consequently P'(Y) - P(Y). Q.E.D.



Anand Venkataraman
1999-09-16