33 research outputs found
Feature selection and nearest centroid classification for protein mass spectrometry
BACKGROUND: The use of mass spectrometry as a proteomics tool is poised to revolutionize early disease diagnosis and biomarker identification. Unfortunately, before standard supervised classification algorithms can be employed, the "curse of dimensionality" needs to be solved. Due to the sheer amount of information contained within the mass spectra, most standard machine learning techniques cannot be directly applied. Instead, feature selection techniques are used to first reduce the dimensionality of the input space and thus enable the subsequent use of classification algorithms. This paper examines feature selection techniques for proteomic mass spectrometry. RESULTS: This study examines the performance of the nearest centroid classifier coupled with the following feature selection algorithms. Student-t test, Kolmogorov-Smirnov test, and the P-test are univariate statistics used for filter-based feature ranking. From the wrapper approaches we tested sequential forward selection and a modified version of sequential backward selection. Embedded approaches included shrunken nearest centroid and a novel version of boosting based feature selection we developed. In addition, we tested several dimensionality reduction approaches, namely principal component analysis and principal component analysis coupled with linear discriminant analysis. To fairly assess each algorithm, evaluation was done using stratified cross validation with an internal leave-one-out cross-validation loop for automated feature selection. Comprehensive experiments, conducted on five popular cancer data sets, revealed that the less advocated sequential forward selection and boosted feature selection algorithms produce the most consistent results across all data sets. In contrast, the state-of-the-art performance reported on isolated data sets for several of the studied algorithms, does not hold across all data sets. CONCLUSION: This study tested a number of popular feature selection methods using the nearest centroid classifier and found that several reportedly state-of-the-art algorithms in fact perform rather poorly when tested via stratified cross-validation. The revealed inconsistencies provide clear evidence that algorithm evaluation should be performed on several data sets using a consistent (i.e., non-randomized, stratified) cross-validation procedure in order for the conclusions to be statistically sound
Shape Detection, Analysis and Recognition
Technical report TR02-18. This paper surveys the various techniques for shape recognition and analysis with emphasis on robustness. Specifically, a review of boundary shape analysis methods and techniques is presented followed by description and experimental results of a new technique, based on Markov Shape Theory, that may alleviate the problems experienced by classical methods for shape analysis and boundary extraction in the presence of various kinds of noise. | TRID-ID TR02-1
Proteomic Pattern Recognition
Technical report TR04-10. This report overviews the Mass Spectrometry Data Classification and Feature Extraction problem. After reviewing previous research new classification and feature extraction techniques are presented and empirically evaluated on three data sets. One of the key points made in this work, is that feature extraction techniques are composed of dimensionality reduction and feature selection methods. However, the two notions are quite different. The need for dimensionality reduction stems from the fact that classification algorithms cannot cope with the large number of input variables. On the other hand, feature selection techniques attempt to remove irrelevant and/or redundant features. Often classification algorithms cannot handle both a large number of variables and irrelevant variables that are not needed or even worse are misleading. In order to evaluate the dimensionality reduction and feature selection techniques, we use a simple classifier to evaluate performance. This makes the approach tractable. The experiments indicate that feature selection algorithms tend to both reduce data dimensionality and increase classification accuracy, while the studied dimensionality reduction technique sacrifices performance as a result of lowering the number of features a learning algorithm needs to deal with. | TRID-ID TR04-1
Heterogeneous Stacking for Classification-Driven Watershed Segmentation
Marker-driven watershed segmentation attempts to extract seeds that indicate the presence of objects within an image. These markers are subsequently used to enforce regional minima within a topological surface used by the watershed algorithm. The classification-driven watershed segmentation (CDWS) algorithm improved the production of markers and topological surface by employing two machine-learned pixel classifiers. The probability maps produced by the two classifiers were utilized for creating markers, object boundaries, and the topological surface. This paper extends the CDWS algorithm by (i) enabling automated feature extraction via independent components analysis and (ii) improving the segmentation accuracy by introducing heterogeneous stacking. Heterogeneous stacking, an extension of stacked generalization for object delineation, improves pixel labeling and segmentation by training base classifiers on multiple target concepts extracted from the original ground truth, which are subsequently fused by the second set of classifiers. Experimental results demonstrate the effectiveness of the proposed system on real world images, and indicate significant improvement in segmentation quality over the base system
SHAPE DETECTION, ANALYSIS AND RECOGNITION
ABSTRACT. This paper surveys the various techniques for shape recognition and analysis with emphasis on robustness. Specifically, a review of boundary shape analysis methods and techniques is presented followed by description and experimental results of a new technique, based on Markov Shape Theory, that may alleviate the problems experienced by classical methods for shape analysis and boundary extraction in the presence of various kinds of noise.
Feature selection and nearest centroid classification for protein mass spectrometry-0
<p><b>Copyright information:</b></p><p>Taken from "Feature selection and nearest centroid classification for protein mass spectrometry"</p><p>BMC Bioinformatics 2005;6():68-68.</p><p>Published online 23 Mar 2005</p><p>PMCID:PMC1274262.</p><p>Copyright © 2005 Levner; licensee BioMed Central Ltd.</p>s grouped by feature extraction algorithm
