The theoretical foundation of ICA is the
central limit theorem,
which states that the distribution of the sum (average or linear
combination) of independent random variables approaches Gaussian
as
. For example, the face value of a dice has a
uniform distribution from 1 to 6. But the distribution of the sum of a
pair of dice is no longer uniform. It has a maximum probability at the
mean of 7. As the number of dice increases, the distribution of the sum
of the face values will be better approximated by a Gaussian.
Let
be random variables independently drawn from an
arbitrary distribution with mean
and variance
. Then
the distribution of the mean
approaches Gaussian
with mean
and variance
.
To solve the BSS problem, we want to find a matrix
so that
is as close to the independent
sources
as possible. This can be seen as the reverse process
of the central limit theorem above. Consider the jth component
of
, where
is the jth row
of
. As a linear combination of all components of
,
is necessarily more Gaussian than any of the
source components
, unless
is equal to one of them (i.e.,
has only one non-zero component). Therefore for
to be an estimate of
, we desire to find
that
maximizes the non-Gaussianity of
so that
is least Gaussian. This is the essence of all ICA methods.
Obviously if all source variables are Gaussian, the ICA method will
not work.
Based on the above discussion, we get requirements and constrains for the ICA methods:
(206) |
(207) |
(208) |
Based on the same fundamental approach discussed above, all ICA
algorithms can be considered as an optimization process that finds
the matrix that maximizes certain objective function that
measures the non-Gaussianity of
, and thereby
the independence of its components. In the following, we will
discuss some common objective functions.