104,637 research outputs found
Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data
Background: High-throughput proteomics techniques, such as mass spectrometry
(MS)-based approaches, produce very high-dimensional data-sets. In a clinical
setting one is often interested in how mass spectra differ between patients of
different classes, for example spectra from healthy patients vs. spectra from
patients having a particular disease. Machine learning algorithms are needed to
(a) identify these discriminating features and (b) classify unknown spectra
based on this feature set. Since the acquired data is usually noisy, the
algorithms should be robust against noise and outliers, while the identified
feature set should be as small as possible.
Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based
on the theory of compressed sensing that allows us to identify a minimal
discriminating set of features from mass spectrometry data-sets. We show (1)
how our method performs on artificial and real-world data-sets, (2) that its
performance is competitive with standard (and widely used) algorithms for
analyzing proteomics data, and (3) that it is robust against random and
systematic noise. We further demonstrate the applicability of our algorithm to
two previously published clinical data-sets
Implementation strategies for hyperspectral unmixing using Bayesian source separation
Bayesian Positive Source Separation (BPSS) is a useful unsupervised approach
for hyperspectral data unmixing, where numerical non-negativity of spectra and
abundances has to be ensured, such in remote sensing. Moreover, it is sensible
to impose a sum-to-one (full additivity) constraint to the estimated source
abundances in each pixel. Even though non-negativity and full additivity are
two necessary properties to get physically interpretable results, the use of
BPSS algorithms has been so far limited by high computation time and large
memory requirements due to the Markov chain Monte Carlo calculations. An
implementation strategy which allows one to apply these algorithms on a full
hyperspectral image, as typical in Earth and Planetary Science, is introduced.
Effects of pixel selection, the impact of such sampling on the relevance of the
estimated component spectra and abundance maps, as well as on the computation
times, are discussed. For that purpose, two different dataset have been used: a
synthetic one and a real hyperspectral image from Mars.Comment: 10 pages, 6 figures, submitted to IEEE Transactions on Geoscience and
Remote Sensing in the special issue on Hyperspectral Image and Signal
Processing (WHISPERS
Recommended from our members
Artificial Immune Systems - Models, algorithms and applications
Copyright © 2010 Academic Research Publishing Agency.This article has been made available through the Brunel Open Access Publishing Fund.Artificial Immune Systems (AIS) are computational paradigms that belong to the computational intelligence family and are inspired by the biological immune system. During the past decade, they have attracted a lot of interest from researchers aiming to develop immune-based models and techniques to solve complex computational or engineering problems. This work presents a survey of existing AIS models and algorithms with a focus on the last five years.This article is available through the Brunel Open Access Publishing Fun
Feature selection when there are many influential features
Recent discussion of the success of feature selection methods has argued that
focusing on a relatively small number of features has been counterproductive.
Instead, it is suggested, the number of significant features can be in the
thousands or tens of thousands, rather than (as is commonly supposed at
present) approximately in the range from five to fifty. This change, in orders
of magnitude, in the number of influential features, necessitates alterations
to the way in which we choose features and to the manner in which the success
of feature selection is assessed. In this paper, we suggest a general approach
that is suited to cases where the number of relevant features is very large,
and we consider particular versions of the approach in detail. We propose ways
of measuring performance, and we study both theoretical and numerical
properties of the proposed methodology.Comment: Published in at http://dx.doi.org/10.3150/13-BEJ536 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Fast Conical Hull Algorithms for Near-separable Non-negative Matrix Factorization
The separability assumption (Donoho & Stodden, 2003; Arora et al., 2012)
turns non-negative matrix factorization (NMF) into a tractable problem.
Recently, a new class of provably-correct NMF algorithms have emerged under
this assumption. In this paper, we reformulate the separable NMF problem as
that of finding the extreme rays of the conical hull of a finite set of
vectors. From this geometric perspective, we derive new separable NMF
algorithms that are highly scalable and empirically noise robust, and have
several other favorable properties in relation to existing methods. A parallel
implementation of our algorithm demonstrates high scalability on shared- and
distributed-memory machines.Comment: 15 pages, 6 figure
Discrete curvature approximations and segmentation of polyhedral surfaces
The segmentation of digitized data to divide a free form surface into patches is one of the key steps required to perform a reverse engineering process of an object. To this end, discrete curvature approximations are introduced as the basis of a segmentation process that lead to a decomposition of digitized data into areas that will help the construction of parametric surface patches. The approach proposed relies on the use of a polyhedral representation of the object built from the digitized data input. Then, it is shown how noise reduction, edge swapping techniques and adapted remeshing schemes can participate to different preparation phases to provide a geometry that highlights useful characteristics for the segmentation process. The segmentation process is performed with various approximations of discrete curvatures evaluated on the polyhedron produced during the preparation phases. The segmentation process proposed involves two phases: the identification of characteristic polygonal lines and the identification of polyhedral areas useful for a patch construction process. Discrete curvature criteria are adapted to each phase and the concept of invariant evaluation of curvatures is introduced to generate criteria that are constant over equivalent meshes. A description of the segmentation procedure is provided together with examples of results for free form object surfaces
- âŠ