Assign initial weights randomly. Then, repeat the following until there are no more changes:
For each training example in training set, update each weight wi according to
![]()
where
![]()
Here, t is the target output, o is the perceptron output (so t-o
is the error in output) and
is a positive constant called the
learning rate.
Why should this work?
Consider the error
.
Either
or or
.
Consider first the case that
epsilon = 2. Then o
= -1 when t = 1. So the weights must be increased to enable
to cross the threshold from below. Since
is fixed, the weight for every xi > 0 must be increased
and the weight for every xi < 0 must be decreased. This is
precisely what the training rule does since
is the same
sign as xi. A similar line of reasoning shows that the rule is
correct when
as well.
Minsky and Papert (1969) show that the above procedure converges correctly if the training examples are linearly separable.
However, if
is too big, the system may end up in oscillation!
In practice,
is set to 0.1 and is reduced as a function of the
loop counter variable.