Gradient Descent/Ascent is a general framework for solving optimization problems where we want to maximize or minimize functions of continuous (differentiable) parameters.
We illustrate the technique here for maximization (ascent) in terms of a single free variable, but the generalization is quite simple.
Suppose we want to find a maximum of some function y=f(x). The standard procedure is to find f'(x), set it to zero and solve for x. But what if f'(x) = 0 is a difficult equation to solve?
We can still find the maximum algorithmically.
Let y=f(x) have a maximum at xmax. Pick an arbitrary value for x, say x1. Compute f'(x1). If the slope of y is positive at x1, i.e. f'(x1) > 0, then xmax > x1 lies to the right of x1. Likewise if f'(x1) < 0, then xmax < x1 lies to its left.
Thus we know the direction in which x1 should be updated in order
to approach xmax. In fact this direction is given by f'(x1).
So we can use the update rule:
where
is a positive constant. If
is sufficiently small,
and there is indeed a maximum for f, the above update rule will
converge to it after a finite number of iterations (when
f'(x1) =
0).
Ponder: What happens if
is not small enough?
This is the so-called gradient ascent technique to determine maxima when actual solution of f' = 0 is difficult. To find minima, the corresponding gradient descent rule can set up to update x1 away from xmax.
For
the partial derivatives with respect to
each xi indicate how rapidly y is changing along that axis and so
the gradient,
,
in fact denotes the precise direction which
promises maximum increase of y.
Collectively these are called gradient techniques. Note that we
could have used gradient ascent to maximize Baum's auxiliary function
(EM Alg). But in that case solving for
was not difficult.