E186 Project 7

E186 Project 7 (due May 8)

This project is about image pattern classification. You are provided with training and testing data, extracted from 10 handwritten digits (0 through 9) written by the students in the previous class. Each digit is represented by a 16 by 16 gray level image. We will use all 256 = 16 x 16 pixels as 256 features for classification. The purpose of this project is to develop a program for both feature selection and pattern classification. As this problem is pretty complex, an incomplete program (/projects/p7/recognize_student.c) is provided to you. This is from a working program with certain parts of the program removed. You need to understand the program and then fill in the code wherever needed. Search for "FILL-IN" in your editor to find where you need to fill in. All data file and programs are in /e186/projects/p7 subdirectory.

(1) Feature selection is essential here as it is too computationally costly to use all the 256 features. A function called select() uses three different types of transform (a) identity (no transform), (b) DFT, and (c) KLT (principal component transform) to reduce the dimension of the feature space from N=256 to a smaller number of M to be used in the actual classification. The obvious requirement is that in these M features most (e.g., 90 percent) of the separability information (measured by diagonal elements of the between-class scatter matrix Sb). Try all three transforms and compare their performances. But always use KLT to select features for classification.

(2) Supervised classification is carried out after reducing the dimension of the feature space from N to M by KLT transform. In the function called classify(), two methods are implemented. The first one is minimum distance classifier, which classifies a pattern (an M-dimensional vector) to the class whose mean is closest to the pattern. The second one is maximum likelihood classifier (also called Bayes classifier) which uses the covariance matrix of each class as well as its mean vector. The details were discussed in class and available in handouts.

To do this project, you should first understand the feature selection and classification methods and then read through and thoroghly understand the program. Then fill in all code necessary in the program. You should set the parameter N, the number of total features, to some small number (e.g. 30) for testing and debugging. Only when the program is thoroughly checked and free of any bugs, should you use all N=256 features. It will take the fastest HP station (Habu) a few minutes to solve the eigenvalue problem of a 256 by 256 matrix.

When running the program, pay attention to the following:

The total energy, which represents separability information here, is conserved. In the program, the total energy is measured by the trace of the between-class scatter matrix Sb, which is invariant under orthogonal transforms.
How energy is re-distributed in the N=256 features after different transforms (identity, DFT, KLT). The diagonal elements of between-class scatter matrix Sb is printed after they are sorted into descending order. You should be able to observe that KLT is the best in terms of compacting most of the energy in a small number of features.
How the parameter M, representing number of features necessary to keep in order to reserve a certain percentage of the total energy (e.g., 90%), is different for different transforms. You should observe, again, KLT will require fewest features after transform.
How the two different classifier will generate different classification error rate. The Bayes classifier in general will have better error rate than the minimum distance classifier, because additional information (here, the covariance matrces of all classes) is used.
Finally, how the classification error rate will improve as more energy (e.g., 99% of the total) is reserved.

You should submit (a) the completed and working c program named by your last name with extension c. I will compile and run your program to see what you have achieved, and (b) a text file describing your results in terms of those listed above (90 and 99 percent of total energy used, different transforms used, and different classification methods used).

For your reference, the executable version of my complete program is available for you (projects/p7/recognize). It will serve as a demo to show all interesting points listed above and also to help you develop your code. Run it and compare the results with your own code.

Finally, as we are talking about pattern recogniztion and feature selection here, I can't help but think of some related stories showing how important it is to select the RIGHT features, and so on. Check them out.