The mutual information
of two random variables
and
is
defined as
![$\displaystyle I(x,y)=H(x)+H(y)-H(x,y)=H(x)-H(x\vert y)=H(y)-H(y\vert x)$](img804.svg) |
(216) |
Obviously when
and
are independnent, i.e.,
and
, their mutual information
is zero.
Similarly the mutual information
of a set of
variables
(
) is defined as
![$\displaystyle I(y_1,\cdots,y_n)=\sum_{i=1}^n H(y_i)-H(y_1,\cdots,y_n)$](img809.svg) |
(217) |
If random vector
is a linear transform of
another random vector
:
or![$\displaystyle \;\;\;\;{\bf y=Wx}$](img813.svg) |
(218) |
then the entropy of
is related to that of
by:
where
is the Jacobian of the above transformation:
![$\displaystyle J(x_1,\cdots,x_n)=\left\vert \begin{array}{ccc}
\frac{\partial y_...
...\cdots &\frac{\partial y_n}{\partial x_n}
\end{array} \right\vert =det\;{\bf W}$](img818.svg) |
(220) |
The mutual information above can be written as
We further assume
to be uncorrelated and of unit variance, i.e., the
covariance matrix of
is
![$\displaystyle E\{{\bf yy^T}\}={\bf W}E\{{\bf xx^T}\}{\bf W^T}={\bf I}$](img822.svg) |
(222) |
and its determinant is
![$\displaystyle det\;{\bf I}=1=(det\;{\bf W})\;(det\;E\{{\bf xx}^T\})
\;(det\;{\bf W}^T)$](img823.svg) |
(223) |
This means
is a constant (same for any
). Also,
as the second term in the mutual information expression
is also a constant (invariant with respect to
), we have
Constant |
(224) |
i.e., minimization of mutual information
is achieved
by minimizing the entropies
![$\displaystyle H(y_i)=-\int p_i(y_i) log\;p_i(y_i) dy_i=-E\{ log\;p_i(y_i) \}$](img827.svg) |
(225) |
As Gaussian density has maximal entropy, minimizing entropy is equivalent
to minimizing Gaussianity.
Moreover, since all
have the same unit variance, their negentropy
becomes
![$\displaystyle J(y_i)=H(y_G)-H(y_i)=C-H(y_i)$](img828.svg) |
(226) |
where
is the entropy of a Gaussian with unit variance, same
for all
. Substituting
into the expression of
mutual information, and realizing the other two terms
and
are both constant (same for any
), we get
![$\displaystyle I(y_1,\cdots,y_n)=Const-\sum_{i=1}^n J(y_i)$](img833.svg) |
(227) |
where
is a constant (including all terms
,
and
) which is the same for any linear transform matrix
. This is the fundamental relation between mutual information and
negentropy of the variables
. If the mutual information of a set
of variables is decreased (indicating the variables are less dependent)
then the negentropy will be increased, and
are less Gaussian. We
want to find a linear transform matrix
to minimize mutual information
, or, equivalently, to maximize negentropy (under the
assumption that
are uncorrelated).