Next: Performance Measurements Up: classify Previous: Distance Measurements

Separability Criteria for Feature Selection

In feature selection, we need to evaluate how separable a set of classes are in an M-dimensional feature space by some criteria.

Total number of samples:

$\begin{displaymath}K=\sum_{i=1}^C K_i \end{displaymath}$

where is the number of samples in class $\omega_i$
Overall mean vector:

$\begin{displaymath} {\bf m}=\frac{1}{K} \sum_{\bf x} {\bf x} =\frac{1}{K} \su... ...{i=1}^C \frac{K_i}{K} {\bf m}_i =\sum_{i=1}^C P_i{\bf m}_i \end{displaymath}$

where is the a priori probability of class $\omega_i$
Scatter matrix of class $\omega_i$ (same as the covariance matrix of the class):

$\begin{displaymath}{\bf S}_i=\frac{1}{K_i} \sum_{{\bf x} \in \omega_i} ({\bf x}... ..._i)({\bf x}-{\bf m}_i)^T={\bf\Sigma}_i,\;\;\;\;\;(i=1,\cdots,C)\end{displaymath}$
Within-class scatter matrix:

$\begin{displaymath} {\bf S}_W=\sum_{i=1}^C P_i {\bf S}_i=\sum_{i=1}^C P_i {\bf\... ...m_{{\bf x}\in\omega_i}({\bf x}-{\bf m}_i)({\bf x}-{\bf m}_i)^T \end{displaymath}$
Between-class scatter matrix:

$\begin{displaymath}{\bf S}_B=\sum_{i=1}^C P_i({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T \end{displaymath}$
Total scatter matrix:

$\begin{displaymath}{\bf S}_T=\frac{1}{K}\sum_{\bf x} ({\bf x}-{\bf m})({\bf x}-{... ...um_{{\bf x} \in \omega_i} ({\bf x}-{\bf m})({\bf x}-{\bf m})^T \end{displaymath}$

We can show that ${\bf S}_T={\bf S}_W+{\bf S}_B$ , i.e., the total scatteredness is the sum of within-class scatteredness and between-class scatteredness.

$\displaystyle {\bf S}_T$	$\textstyle =$	$\displaystyle \frac{1}{K}\sum_{i=1}^C\sum_{{\bf x}\in\omega_i}({\bf x}-{\bf m})... ...i} ({\bf x}-{\bf m}_i+{\bf m}_i-{\bf m})({\bf x}-{\bf m}_i+{\bf m}_i-{\bf m})^T$
	$\textstyle =$	$\displaystyle \frac{1}{K}\sum_{i=1}^C \sum_{{\bf x} \in \omega_i} [({\bf x}-{\b... ...bf m}_i-{\bf m})({\bf x}-{\bf m}_i)^T+({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T]$
	$\textstyle \stackrel{*}{=}$	$\displaystyle \frac{1}{K}\sum_{i=1}^C \sum_{{\bf x} \in \omega_i} ({\bf x}-{\b... ...m_{i=1}^C \sum_{{\bf x} \in \omega_i} ({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T$
	$\textstyle =$	$\displaystyle \sum_{i=1}^C \frac{K_i}{K}{\bf S}_i + \sum_{i=1}^C \frac{K_i}{K}({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T$
	$\textstyle =$	$\displaystyle {\bf S}_W+{\bf S}_B$

* Proof:

$\begin{displaymath} \sum_{{\bf x}\in\omega_i} ({\bf x}-{\bf m}_i) ({\bf m}_i-{\b... ...})^T =K_i ( {\bf m}_i-{\bf m}_i)({\bf m}_i-{\bf m})^T ={\bf0} \end{displaymath}$

We can use the traces of these scatter matrices as some scalar criteria to measure the separability between the classes:

$\displaystyle J_B$	$\textstyle =$	$\displaystyle tr\;{\bf S}_B=tr\;\left[\sum_{i=1}^C P_i({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T\right]$
	$\textstyle =$	$\displaystyle \sum_{i=1}^C P_i\;tr\left[({\bf m}_i-{\bf m})({\bf m}_i-{\bf m})^T\right] =\sum_{i=1}^C P_i\;\vert\vert{\bf m}_i-{\bf m}\vert\vert^2$

and

$\displaystyle J_W$	$\textstyle =$	$\displaystyle tr\;{\bf S}_W =tr\left[\sum_{i=1}^C P_i \sum_{{\bf x}\in\omega_i}({\bf x}-{\bf m}_i)({\bf x}-{\bf m}_i)^T \right]$
	$\textstyle =$	$\displaystyle \sum_{i=1}^C P_i \sum_{{\bf x}\in\omega_i}tr\left[({\bf x}-{\bf m}_i)({\bf x}-{\bf m}_i)^T \right]$
	$\textstyle =$	$\displaystyle \sum_{i=1}^C P_i \sum_{{\bf x}\in\omega_i}\vert\vert{\bf x}-{\bf m}_i\vert\vert^2$

We see that

is the weighted average of the Euclidean distances between ${\bf m}_i$ and ${\bf m}$ for all

classes, and

is the weighted average of the Euclidean distances between ${\bf m}_i$ and ${\bf m}$ for all

classes.

It is obviously desirable to maximize while at the same time also minimize , so that the classes are maximally separated for the classification to be most effectively carried out. We can therefore construct a new scalar criterion to be used in feature selection

$\begin{displaymath} J_{B/W}=tr({\bf S}_W^{-1}{\bf S}_B)=tr({\bf S}_B{\bf S}_W^{-1}) \end{displaymath}$

or, equivalently, due to the relationship ${\bf S}_T={\bf S}_B+{\bf S}_W$ ,

$\begin{displaymath} J_{B/T}=tr({\bf S}_T^{-1}{\bf S}_B)=tr({\bf S}_B{\bf S}_T^{-1}) \end{displaymath}$

Next: Performance Measurements Up: classify Previous: Distance Measurements

Ruye Wang 2016-11-30