In kernel PCA, the PCA method, which maps the dataset to a low dimensional space, and the kernel method, which maps the datad set to a high dimensional space, are combined to take advantage of both the high dimensional space (e.g., better separability) and the low dimensional space (e.g., computational efficiency).
Specifically, we assume the given dataset
in the original d-dimensional
feature space is mapped into
in a high dimensional space. Correspoindingly, the inner product
in the original space is mapped into an inner
product in the high dimensional space:
(108) |
(109) |
To carry out KLT in the high dimensional space, we need to estimate
the covariance matrix of
:
(110) |
(111) |
(114) |
(115) |
(116) |
The discussion above is based on the assumption that the data
in the high dimensional space have zero mean
.
This cannot be achived by subtracting the mean from the data to
get
, as neither
nor
is available without carrying out the kernel mapping
. However, we can find the inner product
of two zero-mean data points in the high dimensional space as
(118) |
(120) |
If we replace the kernel matrix for
with non-zero
mean in the algorithm considered above by
for
with zero mean, then the assumption that
has zero mean
becomes valid. In summary, here are the steps of the kernel PCA algorithm:
Note that in kernel PCA methods, the complexity for solving the
eigenvalue problem of the kernel matrix
is
, instead of
for the linear PCA based on the
covariance matrix
.
Shown below is a Matlab function for the kernel PCA algorithm
which takes the dataset
as input and generates the eigenvalues in
and the
dataset in the transform space
. The function
Kernel(X,X)
calculates the kernel matrix .
function [Y d]=KPCA(X) [D N]=size(X); K=Kernel(X,X); % kernel matrix of data X one=ones(N)/N; Kt=K-one*K-K*one+one*K*one; % Kernel matrix of data with zero mean [U D]=eig(Kt); % solve eigenvalue problem for K [d idx]=sort(diag(D),'descend'); % sort eigenvalues in descending order U=U(:,idx); D=D(:,idx); for n=1:N U(:,n)=U(:,n)/d(n)/N; end Y=(K*U)'; end
Examples:
Visualization of a 2-D dataset of four classes, linear plot on the left and kernel PCA (based on RBF kernel) on the right:
Visualization of a 256-D dataset of 10 classes of hand-written digits:
Visualization of the 4-D iris dataset in 3-D space spanned by the first 3 principal components of the linear and kernel PCA transforms: