Solution to problem 2: The Viterbi Algorithm next up previous
Next: Information Theory Up: Markov Networks Previous: Solution to problem 1:

Solution to problem 2: The Viterbi Algorithm

If we define $\hat{Q} = \hat{q_1}\hat{q_2}\cdots \hat{q_T} =$ the most probable Q that generated O to be the single best state sequence, then

\begin{displaymath}\hat{Q} = \underset{{\rm all\ }Q}{\rm argmax}\quad P(Q\vert O,\lambda)\end{displaymath}

We can compute $\hat{Q}$ in a similar way as we did for $P(O\vert\lambda )$ earlier. Let's first look at computing the highest probability of any sequence of states producing O.

Let

\begin{displaymath}\delta = \begin{array}{cccc}
\underset{\delta_1(1)}{\bullet} ...
...let} & \cdots
& \underset{\delta_T(N)}{\bullet}\\
\end{array}\end{displaymath}

such that

\begin{displaymath}\delta_t(i) = \underset{q_1q_2\cdots q_{t-1}}{\rm max}\quad P(q_1q_2\cdots q_t=i,
O_1O_2\cdots O_t\vert\lambda)\end{displaymath}

then

\begin{displaymath}\delta_{t+1}(j) = \underset{i}{\rm max}\quad \delta_t(i)a_{ij}b_j(O_{t+1})\end{displaymath}

This is fine if we just want the highest probability itself. But often we are interested in the actual path with this probability. So we need to keep track of this. Let's introduce a variable $\psi$ to do this.


\begin{displaymath}\psi = \begin{array}{cccc}
\underset{\psi_1(1)}{\bullet} & \u...
...ullet} & \cdots
& \underset{\psi_T(N)}{\bullet}\\
\end{array}\end{displaymath}

This keeps track of the argument (state number) which maximized the probability $\delta_t(i)$ at each stage.

The Viterbi algorithm can be stated inductively as follows:


\begin{displaymath}\begin{array}{lll}
Init: & \delta_1(i) = \pi_ib_i(O_1) & \beg...
...t{1 \le i \le N}{\rm argmax}\quad \delta_t(i) & \\
\end{array}\end{displaymath}

Sometimes the algorithm is stated recursively:


\begin{displaymath}\begin{array}{lll}
Init: & \delta_1(i) = \pi_ib_i(O_1) & \beg...
...t{1 \le i \le N}{\rm argmax}\quad \delta_t(i) & \\
\end{array}\end{displaymath}

In your implementations you must be careful to only program the iterative version. The stack overhead for the recursive version is considerable in many cases.

Since the two dimensional ( $N \times T$) array $\psi$ stores the most probable state at each step, we can backtrack and find the most probable sequence $\hat{q}$ since

\begin{displaymath}\hat{q_t} =
\psi_{t+1}(\hat{q_{t+1}})\end{displaymath}

Point to ponder: Think about how you can modify the above algorithm to compute the n-best paths, rather than the 1-best.


next up previous
Next: Information Theory Up: Markov Networks Previous: Solution to problem 1:
Anand Venkataraman
1999-09-16