The goal of multidimensional scaling (MDS) is to reduce
the dimensionality of the dataset representing a set of
objects of interest, each described by a set of
features
and represented as a vector
in a
d-dimensional space, while the pairwise similarity relationship
between any two of these objects are preserved. MDS can be used
to visualize datasets in a high dimensional space. Specially,
here we consider the classical MDS (cMDS), also known as
principal coordinates analysis (PCoA), as one of the
methods in MDS. Here is an example
visualizing the US House of Representatives based on their voting
patterns.
Specifically, we represent the pairwise similarity of a set of
objects
by the
similarity matrix
, of which the ij-th
component
is certain measurement of the similarity between
and
. The goal is to map the dataset
in the d-dimensional space into
in a lower d'-dimensional space, in which the similarity matrix
is optimally approximated by
.
When
, these data points in
can be visualized.
MDS can be considered as a method for dimension reduction. In some
cases only the pairwise similarities
are
given, while the
objects in
are not explicitly given,
and the dimensionality
may be unknown.
We assume the given pairwise similarity is the Euclidean distance
between
and
:
(170) |
(171) |
(172) |
Our goal is to find
in a lower
dimensional space so that its pairwise similarity matrix
matches
optimally, in the sense that the following
objective function is minimized:
(173) |
We note that such a solution is not unique, as any
translated version of
has the same pairwise similarity
matrix and is also a solution. We therefore seek to find a unique
solution
in which all data points are centralized, i.e,
the mean of all points in
is zero:
(174) |
To do so, we introduce an symmetric centering matrix
, where
is an
matrix with all components equal 1.
Premultiplying
to a column vector
removes the mean
of
from each of its
component:
(175) |
(176) |
We can now postmultiply to
to remove the
mean of each row of
:
(177) |
(178) |
(179) |
We further apply double centering to
by pre and
post-multiplying
and get:
Now the similarity matrix
of all centralized data
points in
with zoero mean can be represented by the
Gram matrix of
:
(180) |
We further assume all data points in are also
centeralized with zero mean and their corresponding similarity
matrix is also represented by
, then the
objective function can be expressed as:
(181) |
(182) |
(183) |
In summary, here is the PCoA algorithm:
(184) |
The Matlab functions for the PCA and PCoA are listed below:
function Y=PCA(X) [V D]=eig(cov(X')); % solve eigenequation for covariance of data [d idx]=sort(diag(D),'descend'); % sort eigenvalues in descend order V=V(:,idx); % reorder eigenvectors Y=V'*X; % carry out KLT end function Y=PCoA(X) N=size(X,2); D=zeros(N); % pairwise similarity matrix C=eye(N)-ones(N)/N; % centering matrix for i=1:N for j=1:N D(i,j)=(norm(X(:,i)-X(:,j)))^2; end end B=-C*D*C/2; [V D]=eig(B); % solve eigenequation for matrix B [d idx]=sort(diag(D),'descend'); % sort eigenvalues in descend order V=V(:,idx); % reorder eigenvectors Y=sqrt(D)*V'; % carry out PCoA end
Example 1:
In the figure below, the top panel shows the locations of some major cities in North America, and the bottom panel shows the reconstructed city locations based on the pairwise distance of these cities. Note that the reconstructed map is a rotated and centralized version of the original map.
Example 2:
Both PCA and PCoA are applied to the Iris dataset of 150 4-D data points, and the figure below shows the data points in 2-D and 3-D spaces corresponding to the first 2 and 3 eigenvalues.
Example 3:
Both PCA and PCoA are applied to the dataset of handwritten numbers of 2240 256-D data points, and the figure below shows the data points in 2-D and 3-D spaces corresponding to the first 2 and 3 eigenvalues.