85,339 research outputs found
Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data
Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers
Visual Integration of Data and Model Space in Ensemble Learning
Ensembles of classifier models typically deliver superior performance and can
outperform single classifier models given a dataset and classification task at
hand. However, the gain in performance comes together with the lack in
comprehensibility, posing a challenge to understand how each model affects the
classification outputs and where the errors come from. We propose a tight
visual integration of the data and the model space for exploring and combining
classifier models. We introduce a workflow that builds upon the visual
integration and enables the effective exploration of classification outputs and
models. We then present a use case in which we start with an ensemble
automatically selected by a standard ensemble selection algorithm, and show how
we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture
Exploring the SDSS Dataset with Linked Scatter Plots: I. EMP, CEMP, and CV Stars
We present the results of a search for extremely metal-poor (EMP),
carbon-enhanced metal-poor (CEMP), and cataclysmic variable (CV) stars using a
new exploration tool based on linked scatter plots (LSPs). Our approach is
especially designed to work with very large spectrum data sets such as the
SDSS, LAMOST, RAVE, and Gaia data sets, and it can be applied to stellar,
galaxy, and quasar spectra. As a demonstration, we conduct our search using the
SDSS DR10 data set. We first created a 3326-dimensional phase space containing
nearly 2 billion measures of the strengths of over 1600 spectral features in
569,738 SDSS stars. These measures capture essentially all the stellar atomic
and molecular species visible at the resolution of SDSS spectra. We show how
LSPs can be used to quickly isolate and examine interesting portions of this
phase space. To illustrate, we use LSPs coupled with cuts in selected portions
of phase space to extract EMP stars, CEMP stars, and CV stars. We present
identifications for 59 previously unrecognized candidate EMP stars and 11
previously unrecognized candidate CEMP stars. We also call attention to 2
candidate He~II emission CV stars found by the LSP approach that have not yet
been discussed in the literature.Comment: Accepted by the Astrophysical Journal Supplement (February 2017
Effective Discriminative Feature Selection with Non-trivial Solutions
Feature selection and feature transformation, the two main ways to reduce
dimensionality, are often presented separately. In this paper, a feature
selection method is proposed by combining the popular transformation based
dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity
regularization. We impose row sparsity on the transformation matrix of LDA
through -norm regularization to achieve feature selection, and
the resultant formulation optimizes for selecting the most discriminative
features and removing the redundant ones simultaneously. The formulation is
extended to the -norm regularized case: which is more likely to
offer better sparsity when . Thus the formulation is a better
approximation to the feature selection problem. An efficient algorithm is
developed to solve the -norm based optimization problem and it is
proved that the algorithm converges when . Systematical experiments
are conducted to understand the work of the proposed method. Promising
experimental results on various types of real-world data sets demonstrate the
effectiveness of our algorithm
Selection Effects, Biases, and Constraints in the Calan/Tololo Supernova Survey
We use Monte Carlo simulations of the Calan/Tololo photographic supernova
survey to show that a simple model of the survey's selection effects accounts
for the observed distributions of recession velocity, apparent magnitude,
angular offset, and projected radial distance between the supernova and the
host galaxy nucleus for this sample of Type Ia supernovae (SNe Ia). The model
includes biases due to the flux-limited nature of the survey, the different
light curve morphologies displayed by different SNe Ia, and the difficulty of
finding events projected near the central regions of the host galaxies. From
these simulations we estimate the bias in the zero-point and slope of the
absolute magnitude-decline rate relation used in SNe Ia distance measurements.
For an assumed intrinsic scatter of 0.15 mag about this relation, these
selection effects decrease the zero-point by 0.04 mag. The slope of the
relation is not significantly biased. We conclude that despite selection
effects in the survey, the shape and zero-point of the relation determined from
the Calan/Tololo sample are quite reliable. We estimate the degree of
incompleteness of the survey as a function of decline rate and estimate a
corrected luminosity function for SNe Ia in which the frequency of SNe appears
to increase with decline rate (the fainter SNe are more common). Finally, we
compute the integrated detection efficiency of the survey in order to infer the
rate of SNe Ia from the 31 events found. For a value of Ho=65 km/sec/Mpc we
obtain a SN Ia rate of 0.21(+0.30)(-0.13) SNu. This is in good agreement with
the value 0.16+/-0.05 SNu recently determined by Capellaro et al. (1997).Comment: 36 pages, 19 figures as extra files, to appear in the A
Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data
In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrapand k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead tocorrect classification rates with less than 10% of the original features
Recommended from our members
Musical instrument classification using non-negative matrix factorization algorithms and subset feature selection
In this paper, a class of algorithms for automatic classification of individual musical instrument sounds is presented. Several perceptual features used in sound classification applications as well as MPEG-7 descriptors were measured for 300 sound recordings consisting of 6 different musical instrument classes. Subsets of the feature set are selected using branchand-bound search, obtaining the most suitable features for classification. A class of classifiers is developed based on the non-negative matrix factorization (NMF). The standard NMF method is examined as well as its modifications: the local, the sparse, and the discriminant NMF. The experimental results compare feature subsets of varying sizes alongside the various NMF algorithms. It has been found that a subset containing the mean and the variance of the first mel-frequency cepstral coefficient and the AudioSpectrumFlatness descriptor along with the means of the AudioSpectrumEnvelope and the AudioSpectrumSpread descriptors when is fed to a standard NMF classifier yields an accuracy exceeding 95%
Incremental Training of a Detector Using Online Sparse Eigen-decomposition
The ability to efficiently and accurately detect objects plays a very crucial
role for many computer vision tasks. Recently, offline object detectors have
shown a tremendous success. However, one major drawback of offline techniques
is that a complete set of training data has to be collected beforehand. In
addition, once learned, an offline detector can not make use of newly arriving
data. To alleviate these drawbacks, online learning has been adopted with the
following objectives: (1) the technique should be computationally and storage
efficient; (2) the updated classifier must maintain its high classification
accuracy. In this paper, we propose an effective and efficient framework for
learning an adaptive online greedy sparse linear discriminant analysis (GSLDA)
model. Unlike many existing online boosting detectors, which usually apply
exponential or logistic loss, our online algorithm makes use of LDA's learning
criterion that not only aims to maximize the class-separation criterion but
also incorporates the asymmetrical property of training data distributions. We
provide a better alternative for online boosting algorithms in the context of
training a visual object detector. We demonstrate the robustness and efficiency
of our methods on handwriting digit and face data sets. Our results confirm
that object detection tasks benefit significantly when trained in an online
manner.Comment: 14 page
Recommended from our members
A niching memetic algorithm for simultaneous clustering and feature selection
Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data
- …