The mutual information
of two random variables
and
is
defined as
 |
(216) |
Obviously when
and
are independnent, i.e.,
and
, their mutual information
is zero.
Similarly the mutual information
of a set of
variables
(
) is defined as
 |
(217) |
If random vector
is a linear transform of
another random vector
:
or |
(218) |
then the entropy of
is related to that of
by:
where
is the Jacobian of the above transformation:
 |
(220) |
The mutual information above can be written as
We further assume
to be uncorrelated and of unit variance, i.e., the
covariance matrix of
is
 |
(222) |
and its determinant is
 |
(223) |
This means
is a constant (same for any
). Also,
as the second term in the mutual information expression
is also a constant (invariant with respect to
), we have
Constant |
(224) |
i.e., minimization of mutual information
is achieved
by minimizing the entropies
 |
(225) |
As Gaussian density has maximal entropy, minimizing entropy is equivalent
to minimizing Gaussianity.
Moreover, since all
have the same unit variance, their negentropy
becomes
 |
(226) |
where
is the entropy of a Gaussian with unit variance, same
for all
. Substituting
into the expression of
mutual information, and realizing the other two terms
and
are both constant (same for any
), we get
 |
(227) |
where
is a constant (including all terms
,
and
) which is the same for any linear transform matrix
. This is the fundamental relation between mutual information and
negentropy of the variables
. If the mutual information of a set
of variables is decreased (indicating the variables are less dependent)
then the negentropy will be increased, and
are less Gaussian. We
want to find a linear transform matrix
to minimize mutual information
, or, equivalently, to maximize negentropy (under the
assumption that
are uncorrelated).