Summarizing the objective functions discussed above, we see a
common goal of maximizing a function
, where
is a component of
![$\displaystyle \sum_i E\{ G(y_i) \}=\sum_i E\{ G( {\bf w}_i^T {\bf x} ) \}$](img865.svg) |
(233) |
where
is the ith row vector in matrix
. We first
consider one particular component (with the subscript i dropped). This is an
optimization problem which can be solved by Lagrange multiplier method with
the objective function
![$\displaystyle J({\bf w})=E\{ G( {\bf w}^T {\bf x} ) \}
-\beta( {\bf w}^T{\bf w}-1)/2$](img867.svg) |
(234) |
The second term is the constraint representing the fact that the rows
and columns of the orthogonal matrix
are normalized, i.e.,
. We set the derivative of
with
respect to
to zero and get
![$\displaystyle F({\bf w})\stackrel{\triangle}{=}
\frac{\partial J({\bf w})}{ \partial {\bf w}}
=E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}-\beta {\bf w}=0$](img871.svg) |
(235) |
where
is the derivative of function
. This
algebraic equation system can be solved iteratively by Newton-Raphson
method:
![$\displaystyle {\bf w} \Leftarrow {\bf w}-J_F^{-1}({\bf w}) F({\bf w})$](img874.svg) |
(236) |
where
is the Jacobian of function
:
![$\displaystyle J_F({\bf w})=\frac{\partial F}{\partial {\bf w}}=
E\{{\bf xx}^T g'({\bf w}^T{\bf x})\}-\beta{\bf I}$](img877.svg) |
(237) |
The first term on the right can be approximated as
![$\displaystyle E\{{\bf xx}^T g'({\bf w}^T{\bf x})\}
\approx E\{{\bf xx}^T\} E\{g'({\bf w}^T{\bf x})\}
=E\{g'({\bf w}^T{\bf x})\} {\bf I}$](img878.svg) |
(238) |
and the Jacobian becomes diagonal
![$\displaystyle J_F({\bf w})=[E\{g'({\bf w}^T{\bf x})\}-\beta] {\bf I}$](img879.svg) |
(239) |
and the Newton-Raphson iteration becomes:
![$\displaystyle {\bf w} \Leftarrow {\bf w}-\frac{1}{E\{g'({\bf w}^T{\bf x})\}-\beta}
[E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}-\beta {\bf w}]$](img880.svg) |
(240) |
Multiplying both sides by the scaler
,
we get
![$\displaystyle {\bf w} \Leftarrow E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}
-E\{g'({\bf w}^T{\bf x})\} {\bf w}$](img882.svg) |
(241) |
Note that we still use the same representation
for the
left-hand side, while its value is actually multiplied by a scaler. This is
taken care of by renormalization, as shown in the following FastICA algorithm:
- Choose an initial random guess for
- Iterate:
![$\displaystyle {\bf w} \Leftarrow E\{ {\bf x}g( {\bf w}^T {\bf x} ) \}
-E\{g'({\bf w}^T{\bf x})\} {\bf w}$](img882.svg) |
(242) |
- Normalize:
![$\displaystyle {\bf w} \Leftarrow {\bf w}/\vert\vert{\bf w}\vert\vert$](img883.svg) |
(243) |
- If not converged, go back to step 2.
This is a demo
of the FastICA algorithm.