_______________________________________________________________________________________________
 


L1-SVM based feature selection for noisy biological data

By

Dr Michael Wagner,
Bioinformatics group at Cincinnati Children's Hospital
and the University, Cincinnati, USA

_______________________________________________________________________________________________

Abstract:
Biological data, e.g., from molecular profiling experiments with microarrays, is inherently noisy, with both the measurement techniques and the natural biological variability being major contributing factors.
Additionally, the number of features (e.g., gene or protein expression measurements) is very often several orders of magnitude larger than the number of samples. Together with the need for biologically interpretable and statistically significant classification models this motivates the need
for noise-insensitive, robust feature selection methods for these kinds of large-scale data sets.

In this talk we will discuss an L1-SVM-based algorithm that can handle correlated complex features. Statistics on the behaviour of the SVM weights under Gaussian perturbations are used to decide whether to use a particular feature dimension or not. The underlying computational problem is a structured linear programming problem, which can be solved very efficiently, and on parallel computers if necessary.

We will present benchmarks on various standard test problems as well as results on protein profiling data obtained from both lung and prostate cancer samples.

Speaker
Michael Wagner has been an Assistant Professor in the bioinformatics group at Cincinnati Children's Hospital and the University of Cincinnati since 2002. After obtaining a Ph.D. in Operations Research (mathematical optimization) from Cornell University in 2000 he spent two years on the faculty in the Department of Mathematics and Statistics at Old Dominion
University in Norfolk, VA. His main interests lie in the application and efficient solution of optimization strategies to problems in biomedical informatics.

 
_______________________________________________________________________________________________