FastICA Algorithm

Summarizing the objective functions discussed above, we see a common goal of maximizing a function $\sum_i E\{ G(y_i) \}$ , where $y_i={\bf w}_i^T {\bf x}$ is a component of ${\bf y}={\bf Wx}$

$\displaystyle \sum_i E\{ G(y_i) \}=\sum_i E\{ G( {\bf w}_i^T {\bf x} ) \}$

(233)

where ${\bf w}_i^T$ is the ith row vector in matrix ${\bf W}$ . We first consider one particular component (with the subscript i dropped). This is an optimization problem which can be solved by Lagrange multiplier method with the objective function

$\displaystyle J({\bf w})=E\{ G( {\bf w}^T {\bf x} ) \} -\beta( {\bf w}^T{\bf w}-1)/2$

(234)

The second term is the constraint representing the fact that the rows and columns of the orthogonal matrix ${\bf W}$ are normalized, i.e., ${\bf w}^T{\bf w}=1$ . We set the derivative of $J({\bf w})$ with respect to ${\bf w}$ to zero and get

$\displaystyle F({\bf w})\stackrel{\triangle}{=} \frac{\partial J({\bf w})}{ \partial {\bf w}} =E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}-\beta {\bf w}=0$

(235)

where

is the derivative of function $G(z)$

. This algebraic equation system can be solved iteratively by Newton-Raphson method:

$\displaystyle {\bf w} \Leftarrow {\bf w}-J_F^{-1}({\bf w}) F({\bf w})$

(236)

where $J_F({\bf w})$ is the Jacobian of function $F({\bf w})$ :

$\displaystyle J_F({\bf w})=\frac{\partial F}{\partial {\bf w}}= E\{{\bf xx}^T g'({\bf w}^T{\bf x})\}-\beta{\bf I}$

(237)

The first term on the right can be approximated as

$\displaystyle E\{{\bf xx}^T g'({\bf w}^T{\bf x})\} \approx E\{{\bf xx}^T\} E\{g'({\bf w}^T{\bf x})\} =E\{g'({\bf w}^T{\bf x})\} {\bf I}$

(238)

and the Jacobian becomes diagonal

$\displaystyle J_F({\bf w})=[E\{g'({\bf w}^T{\bf x})\}-\beta] {\bf I}$

(239)

and the Newton-Raphson iteration becomes:

$\displaystyle {\bf w} \Leftarrow {\bf w}-\frac{1}{E\{g'({\bf w}^T{\bf x})\}-\beta} [E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}-\beta {\bf w}]$

(240)

Multiplying both sides by the scaler $\beta-E\{g'({\bf w}^T{\bf x})\}$ , we get

$\displaystyle {\bf w} \Leftarrow E\{ {\bf x}g( {\bf w}^T {\bf x} ) \} -E\{g'({\bf w}^T{\bf x})\} {\bf w}$

(241)

Note that we still use the same representation ${\bf w}$ for the left-hand side, while its value is actually multiplied by a scaler. This is taken care of by renormalization, as shown in the following FastICA algorithm:

Choose an initial random guess for ${\bf w}$
Iterate:

$\displaystyle {\bf w} \Leftarrow E\{ {\bf x}g( {\bf w}^T {\bf x} ) \} -E\{g'({\bf w}^T{\bf x})\} {\bf w}$ (242)
Normalize:

$\displaystyle {\bf w} \Leftarrow {\bf w}/\vert\vert{\bf w}\vert\vert$ (243)
If not converged, go back to step 2.

This is a demo of the FastICA algorithm.