In classification problems, it is sometimes the case that if the d-dimensional
data points
are nonlinearly mapped into a higher
dimensional space
, different classes/clusters in the data
can be better separated than in the original space. For example, suppose two
groups of data points are given in a 2D space
in whcih they form
two circles centered at the origin. Obviously it is impossible to linearly
separate these two groups of points in the 2D space. However, if these 2D
points are nonlinearly mapped into a 3D space
, the
two groups of points can be easily separated linearly in the 3rd dimension
corresponding to the radius of the circles.
The drawback of doing classification in a high dimension space is the
increased computational cost. This difficulty can be overcome if the
computation depends exclusively on the inner product of the vectors
in the space, so that the
costly computation can be avoided by the kernell trick, based on the
kernel defined as: