Appendix B: Kernels and Mercer's Theorem

Next: About this document ... Up: Gaussiaon Process Previous: Marginal and conditional distributions

Appendix B: Kernels and Mercer's Theorem

Generally speaking, a kernel is a continuous function that takes two arguments and (real numbers, functions, vectors, etc.) and maps them to a real value independent of the order of the arguments, i.e., $k(x,y)=k(y,x) \in {\cal R}$ .

Examples:

${\bf x}$ and ${\bf y}$ can be two n-dimensional vectors and

$\begin{displaymath}k({\bf x},{\bf y})={\bf x}^T {\bf y}=\sum_n x_n y_n=k({\bf y},{\bf x}) \end{displaymath}$

Or the kernel can be defined as

$\begin{displaymath}k({\bf x},{\bf y})=exp(-\vert\vert{\bf x}-{\bf y}\vert\vert^2/2\sigma^2) \end{displaymath}$
and can be two functions and

$\begin{displaymath}k(x(t),y(t))=\int x(t) y(t) dt=k(y(t),x(t)) \end{displaymath}$
and can be two functions and their convolution is the kernel:

$\begin{displaymath}k(x(t),y(t))=x(t)*y(t)=\int_{-\infty}^{\infty} x(\tau) y(t-\tau) d\tau =y(t)*x(t)=k(y(t),x(t)) \end{displaymath}$

where is a parameter.

Mercer's theorem:

A kernel is a continuous function that takes two variables and and map them to a real value such that .

A kernel is non-negative definite iff:

$\begin{displaymath}\int \int f(x) k(x,y) f(y) dx\;dy \ge 0 \end{displaymath}$

In association with a kernel , we can define an integral operator , which, when applied to a function , generates another function:

$\begin{displaymath}T_k( f(x) )=\int k(x,y)f(y)dy=[T_k\; f](x) \end{displaymath}$

The eigenvalues and their correponding eigenfunctions of this operation are defined as:

$\begin{displaymath}T_k( \phi_i(x) )=\int k(x,y)\phi(y)dy=\lambda_i \phi_i(x) \end{displaymath}$

The eigenvalues $\lambda_i$ are non-negative and the eigenfunctions $\phi_i(x)$ are orthonomal:

$\begin{displaymath}\int \phi_(x) \phi_j(x) dx=\delta_{ij} \end{displaymath}$

The eigenfunctions corresponding to the non-zero eigenvalues form a set of basis functions so that the kernel can be decomposed in terms of them:

$\begin{displaymath}k(x,y)=\sum_{i=1}^\infty \lambda_i \phi_i(x) \phi_j(y) \end{displaymath}$

The above discussion can be related to a non-negative $N\times N$ matrix

$\begin{displaymath}{\bf A}=\left[ \begin{array}{ccc} ...&...&...\\ ...&a[m,n]&...\\ ...&...&... \end{array} \right]_{N\times N} \end{displaymath}$

A symmetric matrix ${\bf A}^T={\bf A}$ is positive definite iff

$\begin{displaymath}\sum_{m=1}^N \sum_{n=1}^N x[m] a[m,n] y[n] \ge 0, \;\;\;\mbox{or}\;\;\;\;{\bf x}^T {\bf A} {\bf y}\ge 0 \end{displaymath}$

where ${\bf x}=[x[1],\cdots,x[N]]^T$ and ${\bf y}=[y[1],\cdots,y[N]]^T$ are any two non-zero vectors.

A matrix defines a linear operation, which, when applied to a vector ${\bf x}$ , generates another vector ${\bf y}$ :

$\begin{displaymath}\sum_{m=1}^N a[m,n]x[m]=y[m],\;\;\;\;\mbox{or}\;\;\;\; {\bf A}{\bf x}={\bf y} \end{displaymath}$

The eigenvalues and their correponding eigenfunctions of this operation are defined as:

$\begin{displaymath}\sum_{m=1}^N a[m,n] \phi_i[m] =\lambda_i \phi_i[n], \;\;\;\;\mbox{or}\;\;\;\; {\bf A} \phi_i=\lambda_i \phi_i \end{displaymath}$

The eigenvalues $\lambda_i$ are non-negative and the eigenvectors $\phi_i$ are orthonomal:

$\begin{displaymath}\sum_{n=1}^N \phi_i[n] \phi_j[n] =\delta_{ij},\;\;\;\;\mbox{or}\;\;\; \phi_i^T \phi_j=\delta_{ij} \end{displaymath}$

The eigenvectors corresponding to the non-zero eigenvalues form a set of basis vectors so that the matrix ${\bf A}$ can be decomposed in terms of them:

$\begin{displaymath}a[m,n]=\sum_{i=1}^N \lambda_i \phi_i[m] \phi_j[n] \end{displaymath}$

i.e.

$\begin{displaymath}{\bf A}\Phi =\Phi \Lambda,\;\;\;\;\mbox{or}\;\;\;\; {\bf A}=\Phi \Lambda \Phi^T \end{displaymath}$

where $\Phi=[\phi_1,\cdots,\phi_N]$ and $\Lambda=diag[\lambda_1,\cdots,\lambda_N]$

Next: About this document ... Up: Gaussiaon Process Previous: Marginal and conditional distributions

Ruye Wang 2006-11-14