Solution to problem 1: What is ? next up previous
Next: Solution to problem 2: Up: Markov Networks Previous: Hidden Markov Models (HMMs)

Solution to problem 1: What is $P(O\vert\lambda )$?

The most straightforward way of solving this is:

1.
Enumerate every possible sequence of states.
2.
For each sequence, find the probability of the observation sequence.
3.
These are disjoint events, so the total probability of the observation sequence is just the sum of the above.

i.e. For a given Q,


\begin{eqnarray*}P(O\vert Q,\lambda) &=& \prod_{t=1}^T P(O_t\vert q_t,\lambda) \\
&=& b_{q_1}(O_1) \cdot b_{q_2}(O_2) \cdots b_{q_T}(O_T) \\
\end{eqnarray*}


Also $P(Q\vert\lambda)=\pi_{q_1}a_{q_1q_2}a_{q_2q_3}\cdots a_{q_{T-1}q_T}$. So,


\begin{eqnarray*}P(O\vert\lambda) &=& \sum_Q P(O,Q\vert\lambda)\\
&=& \sum_Q P...
...ot
a_{q_1q_2}b_{q_2}(O_2) \cdots a_{q_{T-1}q_T}b_{q_T}(O_T) \\
\end{eqnarray*}


How many possible state sequences Qs are there for, say, N=5 and T=100?

The Forward Algorithm:

Discuss this:

If we are given the probabilies (of ending in each of the N states) for the sequence $O_1O_2\cdots O_t$, then we can easily calculate the probabilities of the sequence $O_1O_2\cdots O_{t+1}$ (ending in each of the N states).

If we introduce a forward variable $\alpha$ such that

\begin{displaymath}\alpha_t(i) = P(O_1O_2\cdots O_t, q_t = S_i\vert\lambda)\end{displaymath}

then

\begin{displaymath}\alpha_{t+1}(j) = \sum_{i=1}^N \alpha_t(i)a_{ij}b_j(O_{t+1})\end{displaymath}

There is a lattice/trellis analogy of looking at the above computation (common in literature).

Experiment with some simple HMMs and print out the trellises for a few short observation sequences using the HMM Mini-Toolkit.


\begin{displaymath}\alpha = \begin{array}{cccc}
\underset{\alpha_1(1)}{\bullet} ...
...let} & \cdots
& \underset{\alpha_T(N)}{\bullet}\\
\end{array}\end{displaymath}

In practice, $\alpha$ need only be a one-dimensional array most of the time. (Why?)

Here is an inductive statement of the forward algorithm:

\begin{displaymath}\begin{array}{lll}
Init: & \alpha_1(i) = \pi_ib_i(O_1) & \beg...
...ert\lambda) = \sum\limits_{i=1}^N \alpha_T(i) & \\
\end{array}\end{displaymath}

What is the saving in time by using this algorithm as opposed to the one that enumerates all Q?

Likewise we can also define a backward variable $\beta$ such that

\begin{displaymath}\beta_{t+1}(j) = P(O_{t+2}O_{t+3}\cdots O_T, q_{t+1} = S_j\vert\lambda)
\end{displaymath}

Then

\begin{displaymath}\beta_{t-1}(i) = \sum_{j=1}^N a_{ij}b_j(O_t)\beta_t(j)
\end{displaymath}

Like $\alpha$ we can also define $\beta$ inductively:

\begin{displaymath}\begin{array}{lll}
Init: & \beta_T(i) = 1 & \begin{array}{l} ...
...vert\lambda) = \sum\limits_{i=1}^N \beta_1(i) & \\
\end{array}\end{displaymath}

The backward variable $\beta$ is not used in the rest of this series of lectures, but you must understand it and work out other interesting quantities you can obtain using these two, eg. $P(O_1, \cdots, O_T, q_i
= S_i)$ etc.

We will use $\beta$ to illustrate the EM algorithm during the second set of lectures on Bayesian Learning.


next up previous
Next: Solution to problem 2: Up: Markov Networks Previous: Hidden Markov Models (HMMs)
Anand Venkataraman
1999-09-16