 
 
 
 
 
   
The function 
 can be written as
 can be written as
 
 is zero-mean Gaussian:
 is zero-mean Gaussian:
 
 , as a linear function of
, as a linear function of  , 
is also a zero-mean Gaussian:
, 
is also a zero-mean Gaussian:
 
 is the covariancee matrix of
 is the covariancee matrix of  :
:
 
 
 , we get
, we get
 
The function  is a Gaussian process as the distribution
 is a Gaussian process as the distribution
 of any
 of any  of its
values correponding to
 of its
values correponding to  input points is a Gaussian. As the noise
 input points is a Gaussian. As the noise
 is also a Gaussian distribution, the 
distribution of the
 is also a Gaussian distribution, the 
distribution of the  values of the output
 values of the output 
 are also Gaussian:
 
are also Gaussian:
 
 is the covariance matrix of
 is the covariance matrix of  :
:
 
 
The result above can be generalized when 
 , i.e., 
the function
, i.e., 
the function  is expressed as the linear combination (integration
rather than summation) of basis functions. For example, assume
 is expressed as the linear combination (integration
rather than summation) of basis functions. For example, assume  and the h-th basis is a radial function centered at
 
and the h-th basis is a radial function centered at  :
:
 
 , the covariance
, the covariance  becomes
 becomes 
|  |  |  | |
|  |  | ||
|  |  | 
 and
 and  are some scaling factors (including
 are some scaling factors (including  ). 
More generally, when
). 
More generally, when  , 
the covariance matrix of the
, 
the covariance matrix of the  function values
 function values  at the
 at the  input points
 
input points 
 can be defined as
 can be defined as
 
Now the regression problem can be approached based on a totally different
point of view. Instead of specifying the basis functions and some model 
parameters (e.g. the weights  ), we can assume the
), we can assume the  function
values
 function
values  to be a Gaussian process and construct its covariance 
matrix (while always assume zeromean vector):
 to be a Gaussian process and construct its covariance 
matrix (while always assume zeromean vector):
![\begin{displaymath}\Sigma_f={\bf Q}=Cov({\bf f})=\left[ \begin{array}{ccc}...&.....
......&...&...\end{array} \right]_{N\times N}
=K({\bf X},{\bf X}) \end{displaymath}](img91.png) 
 
 
 
 
Comments:
 ) should give rise
to similar predictions (large covarance or correlation
) should give rise
to similar predictions (large covarance or correlation  ).
).
 can be
considered as some kernel function of the two vector arguments as used 
in various kernel-based algorithms such as support vector machine and 
kernel PCA. In either case, these kernels need to be constructed based 
on some prior knowledge of the problem.
 can be
considered as some kernel function of the two vector arguments as used 
in various kernel-based algorithms such as support vector machine and 
kernel PCA. In either case, these kernels need to be constructed based 
on some prior knowledge of the problem.
For the output 
 , the covariance matrix becomes:
, the covariance matrix becomes:
 
 
 can be drawn to get the
 can be drawn to get the  values of the function
 
values of the function  , i.e., various curves in 1-D space.
, i.e., various curves in 1-D space. 
 
Now the regression problem of finding the underlying function  to fit the observed data
 
to fit the observed data
 
 
 as well as output
 as well as output 
 are
Gaussian process, the conditional distribution of
 are
Gaussian process, the conditional distribution of  given
 given  can be found as below (proof given in Appendix A).
can be found as below (proof given in Appendix A).
We first consider finding the conditional distribution of function values 
 . Set up the function vector containing both the known values
. Set up the function vector containing both the known values 
 and the prediction
 and the prediction  :
: 
![\begin{displaymath}\left[ \begin{array}{c} {\bf f}  {\bf f}^* \end{array} \right] \end{displaymath}](img107.png) 
 and
 and  :
:
![\begin{displaymath}\Sigma_f=Cov\left[ \begin{array}{c} {\bf f}  {\bf f}^* \end...
...\bf X}^*,{\bf X}) & K({\bf X}^*,{\bf X}^*) \end{array} \right] \end{displaymath}](img108.png) 
 given
 given  can be 
found:
 can be 
found:
 
 
 is symmetric,
 is symmetric, 
 .
.
If  above is replaced by
 above is replaced by 
 , the
discussion is also valid for output
, the
discussion is also valid for output  .
.
The samples drawn from this posterior distribution 
 are different curves that interpolate (fit) the observed data, the
are different curves that interpolate (fit) the observed data, the  data points
 
data points 
 , and
they can also predict the outputs
, and
they can also predict the outputs  at any input points
 at any input points  .
.
 
In summary, the regression problem can be approached in two different ways:
 discrete basis functions
 discrete basis functions 
 (could be just
  the
 (could be just
  the  input points
 input points  in the linear model), and a set of 
  model parameters, the weights, is assumed. The covariance matrix of the 
  function is
 in the linear model), and a set of 
  model parameters, the weights, is assumed. The covariance matrix of the 
  function is 
   
 of the function 
  (assumed a Gaussian process) can be constructed, so long as the resulting 
  convariance matrix is positive definite.
 of the function 
  (assumed a Gaussian process) can be constructed, so long as the resulting 
  convariance matrix is positive definite.
 
 and
 and  are the eigenvalues and eigenfunctions
respectively:
 are the eigenvalues and eigenfunctions
respectively:
 
 
 
 
 
