Variable Selection

Any method used to select some of variables in a dataset, resulting in the dimension reduction, is called variable selection method. These methods are useful especially for large datasets with hundreds to thousands input space variables, which loading, processing, and evaluating such a dataset is difficult.

In NIPS 2003 conference, there was a workshop with a competition called NIPS 2003 Feature Selection Challenge: The NIPS 2003 challenge in feature selection is to find feature selection algorithms that significantly outperform methods using all features, using as benchmark ALL five datasets formatted for that purpose. Each dataset is split into training, validation, and test set. Only the training labels are provided. During the development period, participants can return classification results on the validation set, even for a subset of the datasets. They will receive in return their validation set scores. At any time (but presumably after some development period) the participants can submit their final classification results on ALL the datasets (with a limit of five sumissions per person). For more information, refer to competition website.

I tested several algorithms for this challenge, which are very simple to implement and have a good computation time efficiency. All of them are filter methods including correlation and single-variable-classifier ranking based variable selection. My final submission for normal duration can be found under Collection1 and Collection2, which first is optimized for the best performance and second is optimized for the best selected features. These methods was ranked 16th and 15th by challenge organizers, respectively, from more than 75 methods.

Below, you can find some useful MATLAB programs related to my methods. Feel free to use, modify, and redistribute them. Also there is a short report sent to the workshop, which is useful for fast review of my methods. In addition, a detailed paper submitted for Feature extraction, foundations and Applications by Isabelle Guyon, Steve Gunn, Masoud Nikravesh, and Lofti Zadeh, is available:

MATLAB programs.
Short report: NIPS Feature Selection Challenge: Details of Methods
Detailed paper: Variable Selection using Correlation and SVC Methods: Applications

One thought on “Variable Selection

  1. Dear Ms. Mr,

    - I’m Dinh, Nguyen. I’m a student in Computer Science at University in Viet Nam. Now, I also have investigated on “Feature Extraction” and deploy to “handwriting Vietnamese character Recognition”.

    - I also thinking about new approach to process on “Feature extracting”. And also, writing about “Related Works”. Therefore, can you send the book: “Feature extraction, foundations and Applications” to me.

    - After, My Source code complete and Final Test also, may be, I need to dicuss with you again

    I hope you reply to me as soon as possible

    I look forward to your email

    Sincerely, Thanks

    Dinh, Nguyen

Leave a Reply