next up previous
Next: Stories about feature selection Up: Feature Selection Previous: Information conservation in feature

Suboptimal feature selection

When the number of features $n$ is large, solving the eigenvalue problem of the $n \times n$ matrix ${\bf S}_{B/W}^{(x)}$ may be very time consuming. To compromise, we can use other orthogonal transform such as DFT or WHT instead of KLT for the transform ${\bf y}={\bf A}^T {\bf x}$.

Obviously DFT and WHT are not dependent on the feature selection criterion ${\bf S}_{B/W}^{(x)}$. The reason why they can be used to replace KLT is that orthogonal transforms in general tend to decorrelate signals so that the energy/information (separability information here) is concentrated in a small number of components while others containing little. (However, this energy compaction is suboptimal compared to KLT.) We should choose the $m$ rows of the $n$ by $n$ DFT or WHT matrix corresponding to the $m$ largest ${\bf a}_i^T{\bf S}_{B/W}^{(x)}{\bf a}_i$ values to achieve best feature selection effect.



Ruye Wang 2016-11-30