A weighted majority classifier.
Read Section 6.7 and 6.8 of Mitchell. The problem is: Should we classify a new instance based on the MAP hypothesis only or take a weighted sum over all hypotheses?
Consider the example in Mitchell: Let
and
P(h1) = 0.4, P(h2) = P(h3) = 0.3. Let
be the set of possible classifications.
Suppose a new example is classified
by h1 and
by
h2 and h3.
If we take into account all hypothesis in H, then the probability it
is
is only 0.4, but that it is
is 0.3 + 0.3 = 0.6.
So the most probable classification (
)
is different from that
predicted by the MAP hypothesis.
If V is the space of possible classifications, then
the probability of a classification
being correct is:
The optimal classification is:
A system that classifies according to Eqn 4 is called a Bayes optimal classifier or Bayes optimal learner.