We first consider binary classification based on the same
linear model 
 used in 
linear regression considered before. Any test sample 
is classified into one of the two classes 
 
depending on whether 
 is greater or smaller than 
zero:
| if | (275) | 
While in the previously considered least square classification 
method, we find the optimal  that minimizes the squared 
error 
, here we find the optimal
 based on a probabilistic model. Specifically, we now 
convert the linear function 
 into 
the probability for 
 to belong to either class:
| (276) | 
| (278) | 
| (279) | 
 
The binary classification problem can now be treated as a regression 
problem to find the model parameter  that best fits the data 
in the training set 
. Such a regression 
problem is called logistic regression if 
 is used, or 
probit regression if 
 is used.
Same as in the case of Bayesian regression, we assume the prior 
distribution of  to be a zero-mean Gaussian 
, and for simplicity
we further assume 
, and find the likelihood of
 based on the linear model applied to the observed data set
:
| (281) | 
![[*]](crossref.png) )), because
)), because 
The posterior of  can now be expressed in terms of the 
prior 
 and the likelihood 
:
The optimal  that best fits the training set
 can now be found as the one that 
maximizes this posterior 
, or, equivalently, 
the log posterior 
, by setting the derivative of 
 to zero and solving the resulting equation below
by Newton's method or conjugate gradient ascent method:
| (284) | 
Having found the optimal , we can classify any test pattern 
 in terms of the posterior of its corresponding labeling 
:
| (285) | 
In a multi-class case with 
, we can still use a vector
 
 to represent each class 
, the direction 
of the class with respect to the origin in the feature space, and the inner 
product 
 proportional to the projection of 
onto vector 
 measures the extent to which 
 belongs to
. Similar to the logistic function used in the two-class case, here 
the soflmax function defined below is used to convert 
 into the probability 
that 
 belongs to 
:
| (286) |