FastICA Algorithm

Summarizing the objective functions discussed above, we see a common goal of maximizing a function $\sum_i E\{ G(y_i) \}$, where $y_i={\bf w}_i^T {\bf x}$ is a component of ${\bf y}={\bf Wx}$

$\displaystyle \sum_i E\{ G(y_i) \}=\sum_i E\{ G( {\bf w}_i^T {\bf x} ) \}$ (233)

where ${\bf w}_i^T$ is the ith row vector in matrix ${\bf W}$. We first consider one particular component (with the subscript i dropped). This is an optimization problem which can be solved by Lagrange multiplier method with the objective function

$\displaystyle J({\bf w})=E\{ G( {\bf w}^T {\bf x} ) \}
-\beta( {\bf w}^T{\bf w}-1)/2$ (234)

The second term is the constraint representing the fact that the rows and columns of the orthogonal matrix ${\bf W}$ are normalized, i.e., ${\bf w}^T{\bf w}=1$. We set the derivative of $J({\bf w})$ with respect to ${\bf w}$ to zero and get

$\displaystyle F({\bf w})\stackrel{\triangle}{=}
\frac{\partial J({\bf w})}{ \partial {\bf w}}
=E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}-\beta {\bf w}=0$ (235)

where $g(z)=dG(z)/dz$ is the derivative of function $G(z)$. This algebraic equation system can be solved iteratively by Newton-Raphson method:

$\displaystyle {\bf w} \Leftarrow {\bf w}-J_F^{-1}({\bf w}) F({\bf w})$ (236)

where $J_F({\bf w})$ is the Jacobian of function $F({\bf w})$:

$\displaystyle J_F({\bf w})=\frac{\partial F}{\partial {\bf w}}=
E\{{\bf xx}^T g'({\bf w}^T{\bf x})\}-\beta{\bf I}$ (237)

The first term on the right can be approximated as

$\displaystyle E\{{\bf xx}^T g'({\bf w}^T{\bf x})\}
\approx E\{{\bf xx}^T\} E\{g'({\bf w}^T{\bf x})\}
=E\{g'({\bf w}^T{\bf x})\} {\bf I}$ (238)

and the Jacobian becomes diagonal

$\displaystyle J_F({\bf w})=[E\{g'({\bf w}^T{\bf x})\}-\beta] {\bf I}$ (239)

and the Newton-Raphson iteration becomes:

$\displaystyle {\bf w} \Leftarrow {\bf w}-\frac{1}{E\{g'({\bf w}^T{\bf x})\}-\beta}
[E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}-\beta {\bf w}]$ (240)

Multiplying both sides by the scaler $\beta-E\{g'({\bf w}^T{\bf x})\}$, we get

$\displaystyle {\bf w} \Leftarrow E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}
-E\{g'({\bf w}^T{\bf x})\} {\bf w}$ (241)

Note that we still use the same representation ${\bf w}$ for the left-hand side, while its value is actually multiplied by a scaler. This is taken care of by renormalization, as shown in the following FastICA algorithm:
  1. Choose an initial random guess for ${\bf w}$
  2. Iterate:

    $\displaystyle {\bf w} \Leftarrow E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}
-E\{g'({\bf w}^T{\bf x})\} {\bf w}$ (242)

  3. Normalize:

    $\displaystyle {\bf w} \Leftarrow {\bf w}/\vert\vert{\bf w}\vert\vert$ (243)

  4. If not converged, go back to step 2.

This is a demo of the FastICA algorithm.