Search CORE

85,339 research outputs found

Comparison of metaheuristic strategies for peakbin selection in proteomic mass spectrometry data

Author: Armañanzas Arnedillo Ruben
Bielza Lozoya Maria Concepcion
García Torres Miguel
Larrañaga Múgica Pedro
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Mass spectrometry (MS) data provide a promising strategy for biomarker discovery. For this purpose, the detection of relevant peakbins in MS data is currently under intense research. Data from mass spectrometry are challenging to analyze because of their high dimensionality and the generally low number of samples available. To tackle this problem, the scientific community is becoming increasingly interested in applying feature subset selection techniques based on specialized machine learning algorithms. In this paper, we present a performance comparison of some metaheuristics: best first (BF), genetic algorithm (GA), scatter search (SS) and variable neighborhood search (VNS). Up to now, all the algorithms, except for GA, have been first applied to detect relevant peakbins in MS data. All these metaheuristic searches are embedded in two different filter and wrapper schemes coupled with Naive Bayes and SVM classifiers

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Visual Integration of Data and Model Space in Ensemble Learning

Author: Diehl Alexandra
Fuchs Johannes
Jäckle Dominik
Keim Daniel
Schneider Bruno
Stoffel Florian
Publication venue
Publication date: 01/01/2017
Field of study

Ensembles of classifier models typically deliver superior performance and can outperform single classifier models given a dataset and classification task at hand. However, the gain in performance comes together with the lack in comprehensibility, posing a challenge to understand how each model affects the classification outputs and where the errors come from. We propose a tight visual integration of the data and the model space for exploring and combining classifier models. We introduce a workflow that builds upon the visual integration and enables the effective exploration of classification outputs and models. We then present a use case in which we start with an ensemble automatically selected by a standard ensemble selection algorithm, and show how we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture

arXiv.org e-Print Archive

Crossref

Exploring the SDSS Dataset with Linked Scatter Plots: I. EMP, CEMP, and CV Stars

Author: Carbon Duane F.
Henze Christopher
Nelson Bron C.
Publication venue: 'American Astronomical Society'
Publication date: 02/05/2017
Field of study

We present the results of a search for extremely metal-poor (EMP), carbon-enhanced metal-poor (CEMP), and cataclysmic variable (CV) stars using a new exploration tool based on linked scatter plots (LSPs). Our approach is especially designed to work with very large spectrum data sets such as the SDSS, LAMOST, RAVE, and Gaia data sets, and it can be applied to stellar, galaxy, and quasar spectra. As a demonstration, we conduct our search using the SDSS DR10 data set. We first created a 3326-dimensional phase space containing nearly 2 billion measures of the strengths of over 1600 spectral features in 569,738 SDSS stars. These measures capture essentially all the stellar atomic and molecular species visible at the resolution of SDSS spectra. We show how LSPs can be used to quickly isolate and examine interesting portions of this phase space. To illustrate, we use LSPs coupled with cuts in selected portions of phase space to extract EMP stars, CEMP stars, and CV stars. We present identifications for 59 previously unrecognized candidate EMP stars and 11 previously unrecognized candidate CEMP stars. We also call attention to 2 candidate He~II emission CV stars found by the LSP approach that have not yet been discussed in the literature.Comment: Accepted by the Astrophysical Journal Supplement (February 2017

arXiv.org e-Print Archive

NASA Technical Reports Server

Effective Discriminative Feature Selection with Non-trivial Solutions

Author: Hou Chenping
Jiao Yuanyuan
Nie Feiping
Tao Hong
Yi Dongyun
Publication venue
Publication date: 21/04/2015
Field of study

Feature selection and feature transformation, the two main ways to reduce dimensionality, are often presented separately. In this paper, a feature selection method is proposed by combining the popular transformation based dimensionality reduction method Linear Discriminant Analysis (LDA) and sparsity regularization. We impose row sparsity on the transformation matrix of LDA through

{\ell}_{2,1}

-norm regularization to achieve feature selection, and the resultant formulation optimizes for selecting the most discriminative features and removing the redundant ones simultaneously. The formulation is extended to the

{\ell}_{2,p}

-norm regularized case: which is more likely to offer better sparsity when

0<p<1

. Thus the formulation is a better approximation to the feature selection problem. An efficient algorithm is developed to solve the

{\ell}_{2,p}

-norm based optimization problem and it is proved that the algorithm converges when

0<p\le 2

. Systematical experiments are conducted to understand the work of the proposed method. Promising experimental results on various types of real-world data sets demonstrate the effectiveness of our algorithm

arXiv.org e-Print Archive

CiteSeerX

Selection Effects, Biases, and Constraints in the Calan/Tololo Supernova Survey

Author: Capellaro E.
Mario Hamuy
Philip A. Pinto
Shaw R. L.
van den Berg S.
Publication venue: 'University of Chicago Press'
Publication date: 01/01/1998
Field of study

We use Monte Carlo simulations of the Calan/Tololo photographic supernova survey to show that a simple model of the survey's selection effects accounts for the observed distributions of recession velocity, apparent magnitude, angular offset, and projected radial distance between the supernova and the host galaxy nucleus for this sample of Type Ia supernovae (SNe Ia). The model includes biases due to the flux-limited nature of the survey, the different light curve morphologies displayed by different SNe Ia, and the difficulty of finding events projected near the central regions of the host galaxies. From these simulations we estimate the bias in the zero-point and slope of the absolute magnitude-decline rate relation used in SNe Ia distance measurements. For an assumed intrinsic scatter of 0.15 mag about this relation, these selection effects decrease the zero-point by 0.04 mag. The slope of the relation is not significantly biased. We conclude that despite selection effects in the survey, the shape and zero-point of the relation determined from the Calan/Tololo sample are quite reliable. We estimate the degree of incompleteness of the survey as a function of decline rate and estimate a corrected luminosity function for SNe Ia in which the frequency of SNe appears to increase with decline rate (the fainter SNe are more common). Finally, we compute the integrated detection efficiency of the survey in order to infer the rate of SNe Ia from the 31 events found. For a value of Ho=65 km/sec/Mpc we obtain a SN Ia rate of 0.21(+0.30)(-0.13) SNu. This is in good agreement with the value 0.16+/-0.05 SNu recently determined by Capellaro et al. (1997).Comment: 36 pages, 19 figures as extra files, to appear in the A

arXiv.org e-Print Archive

CiteSeerX

Crossref

Automated design of robust discriminant analysis classifier for foot pressure lesions using kinematic data

Author: Bowker P
Findlow AH
Goulermas JY
Howard D
Nester CJ
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

In the recent years, the use of motion tracking systems for acquisition of functional biomechanical gait data, has received increasing interest due to the richness and accuracy of the measured kinematic information. However, costs frequently restrict the number of subjects employed, and this makes the dimensionality of the collected data far higher than the available samples. This paper applies discriminant analysis algorithms to the classification of patients with different types of foot lesions, in order to establish an association between foot motion and lesion formation. With primary attention to small sample size situations, we compare different types of Bayesian classifiers and evaluate their performance with various dimensionality reduction techniques for feature extraction, as well as search methods for selection of raw kinematic variables. Finally, we propose a novel integrated method which fine-tunes the classifier parameters and selects the most relevant kinematic variables simultaneously. Performance comparisons are using robust resampling techniques such as Bootstrap

632+

and k-fold cross-validation. Results from experimentations with lesion subjects suffering from pathological plantar hyperkeratosis, show that the proposed method can lead to

sim 96%

correct classification rates with less than 10% of the original features

Keele Research Repository

University of Salford Institutional Repository

Crossref

Recommended from our members

Musical instrument classification using non-negative matrix factorization algorithms and subset feature selection

Author: Benetos E.
Kotropoulos C.
Kotti M.
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, a class of algorithms for automatic classification of individual musical instrument sounds is presented. Several perceptual features used in sound classification applications as well as MPEG-7 descriptors were measured for 300 sound recordings consisting of 6 different musical instrument classes. Subsets of the feature set are selected using branchand-bound search, obtaining the most suitable features for classification. A class of classifiers is developed based on the non-negative matrix factorization (NMF). The standard NMF method is examined as well as its modifications: the local, the sparse, and the discriminant NMF. The experimental results compare feature subsets of varying sizes alongside the various NMF algorithms. It has been found that a subset containing the mean and the variance of the first mel-frequency cepstral coefficient and the AudioSpectrumFlatness descriptor along with the means of the AudioSpectrumEnvelope and the AudioSpectrumSpread descriptors when is fed to a standard NMF classifier yields an accuracy exceeding 95%

City Research Online

Crossref

Spiral - Imperial College Digital Repository

Incremental Training of a Detector Using Online Sparse Eigen-decomposition

Author: Paisitkriangkrai Sakrapee
Shen Chunhua
Zhang Jian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/05/2010
Field of study

The ability to efficiently and accurately detect objects plays a very crucial role for many computer vision tasks. Recently, offline object detectors have shown a tremendous success. However, one major drawback of offline techniques is that a complete set of training data has to be collected beforehand. In addition, once learned, an offline detector can not make use of newly arriving data. To alleviate these drawbacks, online learning has been adopted with the following objectives: (1) the technique should be computationally and storage efficient; (2) the updated classifier must maintain its high classification accuracy. In this paper, we propose an effective and efficient framework for learning an adaptive online greedy sparse linear discriminant analysis (GSLDA) model. Unlike many existing online boosting detectors, which usually apply exponential or logistic loss, our online algorithm makes use of LDA's learning criterion that not only aims to maximize the class-separation criterion but also incorporates the asymmetrical property of training data distributions. We provide a better alternative for online boosting algorithms in the context of training a visual object detector. We demonstrate the robustness and efficiency of our methods on handwriting digit and face data sets. Our results confirm that object detection tasks benefit significantly when trained in an online manner.Comment: 14 page

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

OPUS - University of Technology Sydney

The Australian National University

Recommended from our members

A niching memetic algorithm for simultaneous clustering and feature selection

Author: Fairhurst M
Liu X
Sheng W
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2008
Field of study

Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data

Brunel University Research Archive