next up previous
Next: Performance Measurements Up: classify Previous: Distance Measurements

Separability Criteria for Feature Selection

In feature selection, we need to evaluate how separable a set of classes are in an M-dimensional feature space by some criteria.

classification.gif

We can show that ${\bf S}_T={\bf S}_W+{\bf S}_B$, i.e., the total scatteredness is the sum of within-class scatteredness and between-class scatteredness.
$\displaystyle {\bf S}_T$ $\textstyle =$ $\displaystyle \frac{1}{K}\sum_{i=1}^C\sum_{{\bf x}\in\omega_i}({\bf x}-{\bf m})...
...i}
({\bf x}-{\bf m}_i+{\bf m}_i-{\bf m})({\bf x}-{\bf m}_i+{\bf m}_i-{\bf m})^T$  
  $\textstyle =$ $\displaystyle \frac{1}{K}\sum_{i=1}^C \sum_{{\bf x} \in \omega_i}
[({\bf x}-{\b...
...bf m}_i-{\bf m})({\bf x}-{\bf m}_i)^T+({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T]$  
  $\textstyle \stackrel{*}{=}$ $\displaystyle \frac{1}{K}\sum_{i=1}^C \sum_{{\bf x} \in \omega_i}
({\bf x}-{\b...
...m_{i=1}^C \sum_{{\bf x} \in \omega_i}
({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^C \frac{K_i}{K}{\bf S}_i +
\sum_{i=1}^C \frac{K_i}{K}({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T$  
  $\textstyle =$ $\displaystyle {\bf S}_W+{\bf S}_B$  

* Proof:

\begin{displaymath}
\sum_{{\bf x}\in\omega_i} ({\bf x}-{\bf m}_i) ({\bf m}_i-{\b...
...})^T
=K_i ( {\bf m}_i-{\bf m}_i)({\bf m}_i-{\bf m})^T ={\bf0}
\end{displaymath}

We can use the traces of these scatter matrices as some scalar criteria to measure the separability between the classes:
$\displaystyle J_B$ $\textstyle =$ $\displaystyle tr\;{\bf S}_B=tr\;\left[\sum_{i=1}^C P_i({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T\right]$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^C P_i\;tr\left[({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T\right]
=\sum_{i=1}^C P_i\;\vert\vert{\bf m}_i-{\bf m}\vert\vert^2$  

and
$\displaystyle J_W$ $\textstyle =$ $\displaystyle tr\;{\bf S}_W
=tr\left[\sum_{i=1}^C P_i \sum_{{\bf x}\in\omega_i}({\bf x}-{\bf m}_i)({\bf x}-{\bf m}_i)^T \right]$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^C P_i \sum_{{\bf x}\in\omega_i}tr\left[({\bf x}-{\bf m}_i)({\bf x}-{\bf m}_i)^T \right]$  
  $\textstyle =$ $\displaystyle \sum_{i=1}^C P_i \sum_{{\bf x}\in\omega_i}\vert\vert{\bf x}-{\bf m}_i\vert\vert^2$  

We see that $J_B$ is the weighted average of the Euclidean distances between ${\bf m}_i$ and ${\bf m}$ for all $C$ classes, and $J_W$ is the weighted average of the Euclidean distances between ${\bf m}_i$ and ${\bf m}$ for all $C$ classes.

It is obviously desirable to maximize $J_B$ while at the same time also minimize $J_W$, so that the classes are maximally separated for the classification to be most effectively carried out. We can therefore construct a new scalar criterion to be used in feature selection

\begin{displaymath}
J_{B/W}=tr({\bf S}_W^{-1}{\bf S}_B)=tr({\bf S}_B{\bf S}_W^{-1})
\end{displaymath}

or, equivalently, due to the relationship ${\bf S}_T={\bf S}_B+{\bf S}_W$,

\begin{displaymath}
J_{B/T}=tr({\bf S}_T^{-1}{\bf S}_B)=tr({\bf S}_B{\bf S}_T^{-1})
\end{displaymath}


next up previous
Next: Performance Measurements Up: classify Previous: Distance Measurements
Ruye Wang 2016-11-30