: the a priori probability that an arbitrary pattern belongs 
  to class 
.
 
- 
: the a posteriori conditional probability that a 
  specific pattern 
 belongs to class 
.
 
: the density distribution of patterns in all classes.
 
- 
: the conditional density distribution of all patterns
  belonging to 
.
Note that 
 is the weighted sum of all 
 for
  
:
  
 
- The Bayes' Theorem
 
- Training
The a priori probability 
 can be estimated from the 
training samples as 
, assuming the training samples
are randomly chosen from all the patterns. 
We also need to estimate 
. If we don't have any good reason 
to believe otherwise, we will assume the density to be a normal distribution:
where the mean vector 
 and the covariance matrix 
 
can be estimated from the training samples as shown before.
 
- Classification
A given pattern 
 of unknown class is classified to 
 if
it is most likely to belong to 
 (optimal classifier), i.e.:
As shown above, the likelihood 
 can be written as
and the denominator 
 can be dropped as it is common in all 
's, therefore a discriminant function 
can be used in the classification:
Give all 
 discriminant functions, we can partition the N-D feature into 
 regions 
 by boundaries between any two regions 
 and 
 represented by 
.