6,172 research outputs found
Sparse regression algorithm for activity estimation in spectrometry
We consider the counting rate estimation of an unknown radioactive source,
which emits photons at times modeled by an homogeneous Poisson process. A
spectrometer converts the energy of incoming photons into electrical pulses,
whose number provides a rough estimate of the intensity of the Poisson process.
When the activity of the source is high, a physical phenomenon known as pileup
effect distorts direct measurements, resulting in a significant bias to the
standard estimators of the source activities used so far in the field. We show
in this paper that the problem of counting rate estimation can be interpreted
as a sparse regression problem. We suggest a post-processed, non-negative,
version of the Least Absolute Shrinkage and Selection Operator (LASSO) to
estimate the photon arrival times. The main difficulty in this problem is that
no theoretical conditions can guarantee consistency in sparsity of LASSO,
because the dictionary is not ideal and the signal is sampled. We therefore
derive theoretical conditions and bounds which illustrate that the proposed
method can none the less provide a good, close to the best attainable, estimate
of the counting rate activity. The good performances of the proposed approach
are studied on simulations and real datasets
Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches
Imaging spectrometers measure electromagnetic energy scattered in their
instantaneous field view in hundreds or thousands of spectral channels with
higher spectral resolution than multispectral cameras. Imaging spectrometers
are therefore often referred to as hyperspectral cameras (HSCs). Higher
spectral resolution enables material identification via spectroscopic analysis,
which facilitates countless applications that require identifying materials in
scenarios unsuitable for classical spectroscopic analysis. Due to low spatial
resolution of HSCs, microscopic material mixing, and multiple scattering,
spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus,
accurate estimation requires unmixing. Pixels are assumed to be mixtures of a
few materials, called endmembers. Unmixing involves estimating all or some of:
the number of endmembers, their spectral signatures, and their abundances at
each pixel. Unmixing is a challenging, ill-posed inverse problem because of
model inaccuracies, observation noise, environmental conditions, endmember
variability, and data set size. Researchers have devised and investigated many
models searching for robust, stable, tractable, and accurate unmixing
algorithms. This paper presents an overview of unmixing methods from the time
of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models
are first discussed. Signal-subspace, geometrical, statistical, sparsity-based,
and spatial-contextual unmixing algorithms are described. Mathematical problems
and potential solutions are described. Algorithm characteristics are
illustrated experimentally.Comment: This work has been accepted for publication in IEEE Journal of
Selected Topics in Applied Earth Observations and Remote Sensin
Sparse Proteomics Analysis - A compressed sensing-based approach for feature selection and classification of high-dimensional proteomics mass spectrometry data
Background: High-throughput proteomics techniques, such as mass spectrometry
(MS)-based approaches, produce very high-dimensional data-sets. In a clinical
setting one is often interested in how mass spectra differ between patients of
different classes, for example spectra from healthy patients vs. spectra from
patients having a particular disease. Machine learning algorithms are needed to
(a) identify these discriminating features and (b) classify unknown spectra
based on this feature set. Since the acquired data is usually noisy, the
algorithms should be robust against noise and outliers, while the identified
feature set should be as small as possible.
Results: We present a new algorithm, Sparse Proteomics Analysis (SPA), based
on the theory of compressed sensing that allows us to identify a minimal
discriminating set of features from mass spectrometry data-sets. We show (1)
how our method performs on artificial and real-world data-sets, (2) that its
performance is competitive with standard (and widely used) algorithms for
analyzing proteomics data, and (3) that it is robust against random and
systematic noise. We further demonstrate the applicability of our algorithm to
two previously published clinical data-sets
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Investigation of Spatial and Temporal Aspects of Airborne Gamma Spectrometry: Final Report
A study has been conducted which demonstrates the reproducibility of Airborne Gamma-ray Spectrometry (AGS)
and the effects of changes in survey parameters, particularly line spacing. This has involved analysis of new data collected from estuarine salt marsh and upland areas in West Cumbria and SW Scotland during three phases of field work, in which over 150000 spectra were recorded with a 16 litre NaI(Tl) detector. The shapes and inventories of radiometric features have been examined. It has been shown that features with dimensions that are large relative to the survey line spacing are very well reproduced with all line spacings, whereas smaller features show more variability. The AGS technique has been applied to measuring changes in the radiation environment over a range of time scales from a few days to several years using data collected during this and previous surveys of the area.
Changes due to sedimentation and erosion of salt marshes, and hydrological transportation of upland activity have
been observed
Multilevel functional principal component analysis
The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of
sleep and its impacts on health outcomes. A primary metric of the SHHS is the
in-home polysomnogram, which includes two electroencephalographic (EEG)
channels for each subject, at two visits. The volume and importance of this
data presents enormous challenges for analysis. To address these challenges, we
introduce multilevel functional principal component analysis (MFPCA), a novel
statistical methodology designed to extract core intra- and inter-subject
geometric components of multilevel functional data. Though motivated by the
SHHS, the proposed methodology is generally applicable, with potential
relevance to many modern scientific studies of hierarchical or longitudinal
functional outcomes. Notably, using MFPCA, we identify and quantify
associations between EEG activity during sleep and adverse cardiovascular
outcomes.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS206 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Latent protein trees
Unbiased, label-free proteomics is becoming a powerful technique for
measuring protein expression in almost any biological sample. The output of
these measurements after preprocessing is a collection of features and their
associated intensities for each sample. Subsets of features within the data are
from the same peptide, subsets of peptides are from the same protein, and
subsets of proteins are in the same biological pathways, therefore, there is
the potential for very complex and informative correlational structure inherent
in these data. Recent attempts to utilize this data often focus on the
identification of single features that are associated with a particular
phenotype that is relevant to the experiment. However, to date, there have been
no published approaches that directly model what we know to be multiple
different levels of correlation structure. Here we present a hierarchical
Bayesian model which is specifically designed to model such correlation
structure in unbiased, label-free proteomics. This model utilizes partial
identification information from peptide sequencing and database lookup as well
as the observed correlation in the data to appropriately compress features into
latent proteins and to estimate their correlation structure. We demonstrate the
effectiveness of the model using artificial/benchmark data and in the context
of a series of proteomics measurements of blood plasma from a collection of
volunteers who were infected with two different strains of viral influenza.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS639 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …