next up previous
Next: The Perceptron Training Rule Up: Artificial Neural Nets Previous: Artificial Neural Nets

Perceptrons

Perceptrons (popularized by Marvin Minsky) are the building blocks of a certain kind of ANNs.

Input to a perceptron consists of a vector of real values. Output is 1 if a linear combination of these inputs is greater than a threshold and -1 otherwise. Formally, if $\vec{x} = (x_1, \cdots, x_n)$ is the input to a perceptron, then its output $o(\vec{x})$ is given by

\begin{displaymath}o(\vec{x}) = \left\{ \begin{array}{rl}
1 & {\rm if\ } w_0 + ...
... > 0 \\
& \\
-1 & {\rm otherwise} \\
\end{array} \right.
\end{displaymath}

where each wi is a real valued constant that weights the corresponding xi. It is helpful to imagine an x0 = 1 and treat the wi also as components of a weight vector $\vec{w} = (w_0,
\cdots, w_n)$ so that the above can simply be written as

\begin{displaymath}o(\vec{x}) = \left\{ \begin{array}{rl}
\frac{\vec{w}\cdot \v...
...\
&\\
-1 & \vec{w}\cdot\vec{x} = 0\\
\end{array} \right.
\end{displaymath}

In practice, the case for $\vec{w}\cdot\vec{x} = 0$ is not always stated.

Learning a perceptron amounts to just choosing the best $\vec{w}$. The hypothesis space H for an n input perceptron is the set of all possible real valued weight vectors in n+1 dimensions.

\begin{displaymath}H = \{\vec{w} \vert \vec{w} \in \Re^(n+1)\}
\end{displaymath}

If $\vec{w}$ is given, then $\vec{w}\cdot\vec{x}$ partitions the (n+1)-dimensional input space into points where $\vec{w}\cdot\vec{x}
> 0$ and $\vec{w}\cdot\vec{x} \le 0$. In fact, $\vec{w}\cdot\vec{x} = 0$ is the equation for an (n+1)-dimensional hyperplane that we can interpret as a decision surface.

If we are asked to find some $\vec{w}$ such that $\vec{w}\cdot\vec{x}$ divides all the positive examples $x_\oplus$ from the negative examples $x_\ominus$, is this possible? Unfortunately, not always, because we can imagine linearly inseparable points.

Important observation: If we have m linearly separable training examples, note that we have a linear system of m simultaneous inequations in n unknowns (weights). We can solve this using standard linear programming techniques.

Representational power of perceptrons:

If we represent the boolean values True and False by 1 and -1 resp., all primitive boolean functions, AND, OR, NAND and NOR can be represented by perceptrons.

Each is a special case of the m-of-n functions, i.e. functions where at least m of n inputs must be true.

To implement an m-of-n function, set all input weights $w_1,
\cdots, w_n$ to some identical constant value. Then solve an inequality for the w0 that gives the desired function. Work out an example.

Unfortunately, XOR is a boolean function that cannot be represented by a single perceptron. The set of possible instances is linearly inseparable.

XOR can, however, be represented using combinations (networks) of perceptrons, because XOR can be expressed using just AND and OR.

So, in fact, every boolean function can be represented by some network of perceptrons only two levels deep.

Point to ponder: What is the easiest way to determine the change in weights required to convert an AND perceptron into an OR one?


next up previous
Next: The Perceptron Training Rule Up: Artificial Neural Nets Previous: Artificial Neural Nets
Anand Venkataraman
1999-09-16