next up previous
Next: Gradient Descent/Ascent Up: Artificial Neural Nets Previous: Perceptrons

The Perceptron Training Rule

Assign initial weights randomly. Then, repeat the following until there are no more changes:

For each training example in training set, update each weight wi according to

\begin{displaymath}w_i \leftarrow w_i + \Delta w_i\end{displaymath}

where

\begin{displaymath}\Delta w_i =
\eta(t-o) x_i\end{displaymath}

Here, t is the target output, o is the perceptron output (so t-o is the error in output) and $\eta$ is a positive constant called the learning rate.

Why should this work?

Consider the error $\epsilon = t-o$. Either $\epsilon=2$ or or $\epsilon = -2$. Consider first the case that epsilon = 2. Then o = -1 when t = 1. So the weights must be increased to enable $\vec{w}\cdot\vec{x}$ to cross the threshold from below. Since $\vec{x}$ is fixed, the weight for every xi > 0 must be increased and the weight for every xi < 0 must be decreased. This is precisely what the training rule does since $\Delta w_i$ is the same sign as xi. A similar line of reasoning shows that the rule is correct when $\epsilon = -2$ as well.

Minsky and Papert (1969) show that the above procedure converges correctly if the training examples are linearly separable.

However, if $\eta$ is too big, the system may end up in oscillation!

In practice, $\eta$ is set to 0.1 and is reduced as a function of the loop counter variable.


next up previous
Next: Gradient Descent/Ascent Up: Artificial Neural Nets Previous: Perceptrons
Anand Venkataraman
1999-09-16