Bayes Optimal Classifier next up previous
Next: Gibbs Algorithm Up: Bayesian Learning 1 Previous: Why is LSE likely

Bayes Optimal Classifier

A weighted majority classifier.

Read Section 6.7 and 6.8 of Mitchell. The problem is: Should we classify a new instance based on the MAP hypothesis only or take a weighted sum over all hypotheses?

Consider the example in Mitchell: Let $H = \{h_1, h_2, h_3\}$ and P(h1) = 0.4, P(h2) = P(h3) = 0.3. Let $V = \{\oplus, \ominus\}$ be the set of possible classifications.

Suppose a new example is classified $\oplus$ by h1 and $\ominus$ by h2 and h3.

If we take into account all hypothesis in H, then the probability it is $\oplus$ is only 0.4, but that it is $\ominus$ is 0.3 + 0.3 = 0.6. So the most probable classification ($\ominus$) is different from that predicted by the MAP hypothesis.

If V is the space of possible classifications, then the probability of a classification $v \in V$ being correct is:


\begin{displaymath}P(v\vert D) = \sum\limits_{h_i \in H} P(v\vert h_i)P(h_i\vert D)
\end{displaymath}

The optimal classification is:


 
$\displaystyle \hat v$ = $\displaystyle \underset{v \in V}{\rm argmax}\quad P(v\vert D)$  
  = $\displaystyle \underset{v \in V}{\rm argmax}\quad \sum\limits_{h_i \in H} P(v\vert h_i)P(h_i\vert D)$ (4)

A system that classifies according to Eqn 4 is called a Bayes optimal classifier or Bayes optimal learner.


next up previous
Next: Gibbs Algorithm Up: Bayesian Learning 1 Previous: Why is LSE likely
Anand Venkataraman
1999-09-16