The basic principle

Next: Error analysis Up: Bayes Classifier Previous: Bayes Classifier

The basic principle

$P(\omega_k)$ : the a priori probability that an arbitrary pattern belongs to class $\omega_k$ .
$P(\omega_k/{\bf x})$ : the a posteriori conditional probability that a specific pattern ${\bf x}$ belongs to class $\omega_k$ .
$p({\bf x})$ : the density distribution of patterns in all classes.
$p({\bf x}/\omega_k)$ : the conditional density distribution of all patterns belonging to $\omega_k$ .
Note that $p({\bf x})$ is the weighted sum of all $p({\bf x}/\omega_i)$ for $i=1,\cdots,C$ :

$\begin{displaymath} p({\bf x})=\sum_{i=1}^C p({\bf x}/\omega_i)P(\omega_i) \end{displaymath}$
The Bayes' Theorem

$\begin{displaymath} P(\omega_k/{\bf x})=\frac{p({\bf x}/\omega_k)P(\omega_k)}{p... ...a_k)P(\omega_k)}{\sum_{i=1}^C p({\bf x}/\omega_i)P(\omega_i)} \end{displaymath}$
Training
The a priori probability $P(\omega_i)$ can be estimated from the training samples as $P(\omega_i)=P_i=K_i/K$ , assuming the training samples are randomly chosen from all the patterns.
We also need to estimate $p({\bf x}/\omega_i)$ . If we don't have any good reason to believe otherwise, we will assume the density to be a normal distribution:

$\begin{displaymath} p({\bf x}/\omega_i)=N({\bf x},{\bf m}_i,{\bf\Sigma}_i)= \fra... ...f x}-{\bf m}_i)^T {\bf\Sigma}_i^{-1}({\bf x}-{\bf m}_i)\right] \end{displaymath}$

where the mean vector ${\bf m}_i$ and the covariance matrix ${\bf\Sigma}_i$ can be estimated from the training samples as shown before.
Classification
A given pattern ${\bf x}$ of unknown class is classified to $\omega_k$ if it is most likely to belong to $\omega_k$ (optimal classifier), i.e.:

$\begin{displaymath} {\bf x} \in \omega_k\;\;\;\mbox{iff}\;\;\;P(\omega_k/{\bf x}) =max \{P(\omega_i/{\bf x}),\;\;i=1,\cdots,C \} \end{displaymath}$

As shown above, the likelihood $P(\omega_k/{\bf x})$ can be written as

$\begin{displaymath}P(\omega_k/{\bf x})=\frac{p({\bf x}/\omega_k)P(\omega_k)}{p({\bf x})} \end{displaymath}$

and the denominator $p({\bf x})$ can be dropped as it is common in all $P(\omega_k/{\bf x})$ 's, therefore a discriminant function

$\begin{displaymath} D_i({\bf x})=p({\bf x}/\omega_k)P(\omega_i) \end{displaymath}$

can be used in the classification:

$\begin{displaymath} {\bf x}\in \omega_k\;\;\;\mbox{iff}\;\;\;D_k({\bf x}) =max \{D_i({\bf x})\;\;\;i=1,\cdots,C \} \end{displaymath}$

Give all discriminant functions, we can partition the N-D feature into regions $R_1,\cdots,R_C$ by boundaries between any two regions and represented by $D_i({\bf x})=D_j({\bf x})$ .

Next: Error analysis Up: Bayes Classifier Previous: Bayes Classifier

Ruye Wang 2016-11-30