next up previous
Next: Gradient Descent in ANNs Up: Artificial Neural Nets Previous: The Perceptron Training Rule

Gradient Descent/Ascent

Gradient Descent/Ascent is a general framework for solving optimization problems where we want to maximize or minimize functions of continuous (differentiable) parameters.

We illustrate the technique here for maximization (ascent) in terms of a single free variable, but the generalization is quite simple.

Suppose we want to find a maximum of some function y=f(x). The standard procedure is to find f'(x), set it to zero and solve for x. But what if f'(x) = 0 is a difficult equation to solve?

We can still find the maximum algorithmically.

Let y=f(x) have a maximum at xmax. Pick an arbitrary value for x, say x1. Compute f'(x1). If the slope of y is positive at x1, i.e. f'(x1) > 0, then xmax > x1 lies to the right of x1. Likewise if f'(x1) < 0, then xmax < x1 lies to its left.

Thus we know the direction in which x1 should be updated in order to approach xmax. In fact this direction is given by f'(x1). So we can use the update rule:

\begin{displaymath}x_1 \leftarrow x_1 + \eta f'(x_1)
\end{displaymath}

where $\eta$ is a positive constant. If $\eta$ is sufficiently small, and there is indeed a maximum for f, the above update rule will converge to it after a finite number of iterations (when f'(x1) = 0).

Ponder: What happens if $\eta$ is not small enough?

This is the so-called gradient ascent technique to determine maxima when actual solution of f' = 0 is difficult. To find minima, the corresponding gradient descent rule can set up to update x1 away from xmax.

For $y = f(x_1, \cdots, x_n)$ the partial derivatives with respect to each xi indicate how rapidly y is changing along that axis and so the gradient, $\nabla y$, in fact denotes the precise direction which promises maximum increase of y.

Collectively these are called gradient techniques. Note that we could have used gradient ascent to maximize Baum's auxiliary function (EM Alg). But in that case solving for $\theta$ was not difficult.


next up previous
Next: Gradient Descent in ANNs Up: Artificial Neural Nets Previous: The Perceptron Training Rule
Anand Venkataraman
1999-09-16