Next: The Backpropagation Algorithm
Up: Artificial Neural Nets
Previous: Stochastic approximation to gradient
Motivation:
- Complex non-linear decision surfaces require a network of
cascaded units. (Why?)
- Unthresholded units are not suitable for cascading because the
resultant output is still a linear function of the inputs. In fact, it
is easy to show that for any network of linear units, we can find a
weight vector
such that a single linear unit with this
weight vector is equivalent to the given network.
- Unfortunately, the thresholding function of perceptrons is not
differentiable at 0.
- We want a unit that is differentiable (like linear
units), but also provide the power of thresholded units.
- The sigmoid function
provides
a nice and continuous approximation to the perceptron's thresholding
function. We let k control the steepness of the threshold and set
.
Usually k = 1.
Note the very useful fact that
If we use the sigmoid function for thresholding, then we can use the
gradient descent rule to train cascaded thresholded units. The
backpropagation algorithm does just this.
Anand Venkataraman
1999-09-16