next up previous
Next: Preprocessing for ICA Up: Methods of ICA Estimations Previous: Measures of Non-Gaussianity

Minimization of Mutual Information

The mutual information $I(x,y)$ of two random variables $x$ and $y$ is defined as

\begin{displaymath}I(x,y)=H(x)+H(y)-H(x,y)=H(x)-H(x\vert y)=H(y)-H(y\vert x) \end{displaymath}

Obviously when $x$ and $y$ are independnent, i.e., $H(y\vert x)=H(y)$ and $H(x\vert y)=H(x)$, their mutual information $I(x,y)$ is zero.

mutual_info.gif

Similarly the mutual information $I(y_1,\cdots,y_n)$ of a set of $n$ variables $y_i$ ($i=1,\cdots,n$) is defined as

\begin{displaymath}I(y_1,\cdots,y_n)=\sum_{i=1}^n H(y_i)-H(y_1,\cdots,y_n) \end{displaymath}

If random vector ${\mathbf y}=[y_1,\cdots,y_n]^T$ is a linear transform of another random vector ${\mathbf x}=[x_1,\cdots,x_n]^T$:

\begin{displaymath}y_i=\sum_{j=1}^n w_{ij} x_j,\;\;\;\;\;\mbox{or}\;\;\;\;{\mathbf y=Wx} \end{displaymath}

then the entropy of ${\mathbf y}$ is related to that of ${\mathbf x}$ by:
$\displaystyle H(y_1,\cdots,y_n)$ $\textstyle =$ $\displaystyle H(x_1,\cdots,x_n)+E\;\{ log\;J(x_1,\cdots,x_n)\}$  
  $\textstyle =$ $\displaystyle H(x_1,\cdots,x_n)+ log\;det\; {\mathbf W}$  

where $J(x_1,\cdots,x_n)$ is the Jacobian of the above transformation:

\begin{displaymath}J(x_1,\cdots,x_n)=\left\vert \begin{array}{ccc}
\frac{\parti...
... y_n}{\partial x_n}
\end{array} \right\vert
=det\;{\mathbf W} \end{displaymath}

The mutual information above can be written as

$\displaystyle I(y_1,\cdots,y_n)$ $\textstyle =$ $\displaystyle \sum_{i=1}^n H(y_i)-H(y_1,\cdots,y_n)$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^n H(y_i)-H(x_1,\cdots,x_n)-log\;det\; {\mathbf W}$  

We further assume $y_i$ to be uncorrelated and of unit variance, i.e., the covariance matrix of ${\mathbf y}$ is

\begin{displaymath}
E\{{\mathbf yy^T}\}={\mathbf W}E\{{\mathbf xx^T}\}{\mathbf W^T}={\mathbf I}
\end{displaymath}

and its determinant is

\begin{displaymath}det\;{\mathbf I}=1=(det\;{\mathbf W})\;(det\;E\{{\mathbf xx}^T\})
\;(det\;{\mathbf W}^T) \end{displaymath}

This means $det\;{\mathbf W}$ is a constant (same for any ${\mathbf W}$). Also, as the second term in the mutual information expression $H(x_1,\cdots,x_n)$ is also a constant (invariant with respect to ${\mathbf W}$), we have

\begin{displaymath}I(y_1,\cdots,y_n)=\sum_{i=1}^n H(y_i)+\mbox{Constant} \end{displaymath}

i.e., minimization of mutual information $I(y_1,\cdots,y_n)$ is achieved by minimizing the entropies

\begin{displaymath}H(y_i)=-\int p_i(y_i) log\;p_i(y_i) dy_i=-E\{ log\;p_i(y_i) \} \end{displaymath}

As Gaussian density has maximal entropy, minimizing entropy is equivalent to minimizing Gaussianity.

Moreover, since all $y_i$ have the same unit variance, their negentropy becomes

\begin{displaymath}J(y_i)=H(y_G)-H(y_i)=C-H(y_i) \end{displaymath}

where $C=H(y_G)$ is the entropy of a Gaussian with unit variance, same for all $y_i$. Substituting $H(y_i)=C-J(y_i)$ into the expression of mutual information, and realizing the other two terms $H({\mathbf x})$ and $log\;det\;{\mathbf W}$ are both constant (same for any ${\mathbf W}$), we get

\begin{displaymath}I(y_1,\cdots,y_n)=Const-\sum_{i=1}^n J(y_i) \end{displaymath}

where $Const$ is a constant (including all terms $C$, $H({\mathbf x})$ and $log\;det\;{\mathbf W}$) which is the same for any linear transform matrix $W$. This is the fundamental relation between mutual information and negentropy of the variables $y_1$. If the mutual information of a set of variables is decreased (indicating the variables are less dependent) then the negentropy will be increased, and $y_i$ are less Gaussian. We want to find a linear transform matrix $W$ to minimize mutual information $I(y_1,\cdots,y_n)$, or, equivalently, to maximize negentropy (under the assumption that $y_i$ are uncorrelated).


next up previous
Next: Preprocessing for ICA Up: Methods of ICA Estimations Previous: Measures of Non-Gaussianity
Ruye Wang 2018-03-26