next up previous
Next: The EM Algorithm Up: Bayesian Learning 2: EM Previous: Bayesian Learning 2: EM

Revision of Lagrange Multipliers

We know how to find stationary points of functions that are not subject to constraints - Solving f' = 0 and determining the nature of the turning point by checking f''.

But what if we seek to find stationary points of f subject to some constraint say, g = 0?

The straightforward solution is to incorporate the constraint g into f. For example, to find a turning point of f(x,y) = x2 + y2 subject to the constraint g(x,y) = 2x+y = 0, we could substitute y=-2x in f and work with f(x) = 5x2, giving us a minimum at x = 0.

But substitution of variables like the above is not always possible and often leads to very complicated equations. For example, in the above case, what if $g(x,y) = \sin x + \cos y = 0$? Then we get $y =
\cos^{-1} (-\sin y)$ and a really nasty function

\begin{displaymath}f(x) = x^2 + \left(\cos^{-1}(\sin x)\right)^2
\end{displaymath}

to work with.

Lagrange Multipliers provide an easier way to address this. Consider a smooth scalar function of n independent variables $\phi = f(x_1,
x_2, \cdots, x_n)$. We know

\begin{displaymath}\nabla \phi = \sum\limits_{i=1}^n \frac{\partial \phi}{\partial x_i} \vec{x_i}
\end{displaymath}

where the $\vec{x_i}$ are unit vectors on xi axes.

Likewise if $\psi = g(x_1, x_2, \cdots, x_n) = 0$ is a constraint,

 \begin{displaymath}
\nabla \psi = \sum\limits_{i=1}^n \frac{\partial \psi}{\partial x_i} \vec{x_i} = 0
\end{displaymath} (11)

In order for $\phi$ to have a stationary point on the surface $\psi = 0$, its gradient, $\nabla \phi$, should be parallel to the gradient of the surface, $\nabla \psi$. Why? (Because the rate of change of $\phi$ on the level surface $\psi = 0$ must be zero)

Therefore at this point, each component of $\nabla \phi$ must be the $\alpha$ times the corresponding component of $\nabla \psi$, where $\alpha$ is some constant.


 \begin{displaymath}
\forall i,j: \frac{\ \frac{\partial \phi}{\partial x_i}\ }{\...
...{\partial x_j}\ }{\frac{\partial \psi}{\partial x_j}} = \alpha
\end{displaymath} (12)

Setting $\lambda = -\alpha$, (13) can be rewritten as:

 \begin{displaymath}
\forall i: \frac{\partial \phi}{\partial x_i} + \lambda\frac{\partial \psi}{\partial x_i} = 0
\end{displaymath} (13)

which is exactly the condition for a stationary point of the function

 \begin{displaymath}
\omega = \phi + \lambda \psi
\end{displaymath} (14)

And so we can try find a stationary point of $\omega$ instead subject to the constraint $\psi = 0$.

Steps for applying Lagrange Multipliers to find stationary points of $f(x_1, x_2, \cdots, x_n)$ subject to the constraint $g(x_1, x_2, \cdots,
x_n) = 0$

1.
Form the function $\omega = f(x_1, \cdots, x_n) + \lambda g(x_1,
\cdots, x_n)$

2.
Find values of $\lambda$ and the xi for which the following set of n+1 simultaneous equations have a solution.

\begin{eqnarray*}\underset{1 \le i \le n}{\forall i} : \frac{\partial f}{\partia...
...partial g}{\partial x_i} &=& 0\\
g(x_1, x_2, \cdots, x_n) &=& 0
\end{eqnarray*}


3.
Identify the nature of these stationary points if necessary.

In probability, the constraint function g is usually $\left[\sum_i P(x_i)\right] - 1 = 0$

We don't derive the following, but it is a straightforward extension of the above:

If there are m constraints, $g_1(x_i) = g_2(x_i) = \cdots = g_m(x_i) = 0$, then we use one multiplier for each, i.e. we get the equations

\begin{eqnarray*}\underset{1 \le i \le n}{\forall i} : \frac{\partial f}{\partia...
...set{1 \le j \le m}{\forall j} : g_j(x_1, x_2, \cdots, x_n) &=& 0
\end{eqnarray*}



next up previous
Next: The EM Algorithm Up: Bayesian Learning 2: EM Previous: Bayesian Learning 2: EM
Anand Venkataraman
1999-09-16