Next: Issues in ANN Learning
Up: Artificial Neural Nets
Previous: Multilayer Nets, Sigmoid Units
 1.
 Propagates inputs forward in the usual way, i.e.
 All outputs are computed using sigmoid thresholding of the inner
product of the corresponding weight and input vectors.
 All outputs at stage n are connected to all the inputs at
stage n+1
 2.
 Propagates the errors backwards by apportioning them to each
unit according to the amount of this error the unit is responsible for.
We now derive the stochastic Backpropagation algorithm for the general
case. The derivation is simple, but unfortunately the bookkeeping is
a little messy.

input vector for unit j (x_{ji} =
ith input to the jth unit)

weight vector for unit j (w_{ji} = weight on
x_{ji})

,
the weighted sum of inputs
for unit j
 o_{j} = output of unit j (
)
 t_{j} = target for unit j

Downstream(j) = set of units whose immediate inputs include
the output of j
 Outputs = set of output units in the final layer
Since we update after each training example, we can simplify
the notation somewhat by imagining that the training set consists of
exactly one example and so the error can simply be denoted by E.
We want to calculate
for each input weight
w_{ji} for each output unit j. Note first that since z_{j} is a
function of w_{ji} regardless of where in the network unit j is
located,
Furthermore,
is the same regardless of which input
weight of unit j we are trying to update. So we denote this
quantity by .
Consider the case when
.
We know
Since the outputs of all units
are independent of w_{ji},
we can drop the summation and consider just the contribution to E by
j.
Thus

(17) 
Now consider the case when j is a hidden unit. Like before, we make the
following two important observations.
 1.
 For each unit k downstream from j, z_{k} is a function of z_{j}
 2.
 The contribution to error by all units
in the same
layer as j is independent of w_{ji}
We want to calculate
for each input weight
w_{ji} for each hidden unit j. Note that w_{ji} influences just
z_{j} which influences o_{j} which influences
each of which influence E. So we can write
Again note that all the terms except x_{ji} in the above product are
the same regardless of which input weight of unit j we are trying to
update. Like before, we denote this common quantity by .
Also note that
,
and
.
Substituting,
Thus,

(18) 
We are now in a position to state the Backpropagation algorithm
formally.
Formal statement of the algorithm:
Stochastic Backpropagation(training examples, ,
n_{i}, n_{h}, n_{o})
Each training example is of the form
where
is the input vector and
is the
target vector.
is the learning rate (e.g., .05). n_{i}, n_{h}
and n_{o} are the number of input, hidden and output nodes
respectively. Input from unit i to unit j is denoted x_{ji} and
its weight is denoted by w_{ji}.
 Create a feedforward network with n_{i} inputs, n_{h} hidden
units, and n_{o} output units.
 Initialize all the weights to small random values (e.g., between
.05 and .05)
 Until termination condition is met, Do
 For each training example
,
Do
 1.
 Input the instance
and compute the output o_{u}
of every unit.
 2.
 For each output unit k, calculate
 3.
 For each hidden unit h, calculate
 4.
 Update each network weight w_{ji} as follows:
Next: Issues in ANN Learning
Up: Artificial Neural Nets
Previous: Multilayer Nets, Sigmoid Units
Anand Venkataraman
19990916