40,679 research outputs found
Satellite-based precipitation estimation using watershed segmentation and growing hierarchical self-organizing map
This paper outlines the development of a multi-satellite precipitation estimation methodology that draws on techniques from machine learning and morphology to produce high-resolution, short-duration rainfall estimates in an automated fashion. First, cloud systems are identified from geostationary infrared imagery using morphology based watershed segmentation algorithm. Second, a novel pattern recognition technique, growing hierarchical self-organizing map (GHSOM), is used to classify clouds into a number of clusters with hierarchical architecture. Finally, each cloud cluster is associated with co-registered passive microwave rainfall observations through a cumulative histogram matching approach. The network was initially trained using remotely sensed geostationary infrared satellite imagery and hourly ground-radar data in lieu of a dense constellation of polar-orbiting spacecraft such as the proposed global precipitation measurement (GPM) mission. Ground-radar and gauge rainfall measurements were used to evaluate this technique for both warm (June 2004) and cold seasons (December 2004-February 2005) at various temporal (daily and monthly) and spatial (0.04 and 0.25) scales. Significant improvements of estimation accuracy are found classifying the clouds into hierarchical sub-layers rather than a single layer. Furthermore, 2-year (2003-2004) satellite rainfall estimates generated by the current algorithm were compared with gauge-corrected Stage IV radar rainfall at various time scales over continental United States. This study demonstrates the usefulness of the watershed segmentation and the GHSOM in satellite-based rainfall estimations
Mondrian Forests for Large-Scale Regression when Uncertainty Matters
Many real-world regression problems demand a measure of the uncertainty
associated with each prediction. Standard decision forests deliver efficient
state-of-the-art predictive performance, but high-quality uncertainty estimates
are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but
scaling GPs to large-scale data sets comes at the cost of approximating the
uncertainty estimates. We extend Mondrian forests, first proposed by
Lakshminarayanan et al. (2014) for classification problems, to the large-scale
non-parametric regression setting. Using a novel hierarchical Gaussian prior
that dovetails with the Mondrian forest framework, we obtain principled
uncertainty estimates, while still retaining the computational advantages of
decision forests. Through a combination of illustrative examples, real-world
large-scale datasets, and Bayesian optimization benchmarks, we demonstrate that
Mondrian forests outperform approximate GPs on large-scale regression tasks and
deliver better-calibrated uncertainty assessments than decision-forest-based
methods.Comment: Proceedings of the 19th International Conference on Artificial
Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. JMLR: W&CP volume
5
Discussion of "Calibrated Bayes, for Statistics in General, and Missing Data in Particular" by R. J. A. Little
Discussion of "Calibrated Bayes, for Statistics in General, and Missing Data
in Particular" by R. Little [arXiv:1108.1917]Comment: Published in at http://dx.doi.org/10.1214/10-STS318B the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
A nested mixture model for protein identification using mass spectrometry
Mass spectrometry provides a high-throughput way to identify proteins in
biological samples. In a typical experiment, proteins in a sample are first
broken into their constituent peptides. The resulting mixture of peptides is
then subjected to mass spectrometry, which generates thousands of spectra, each
characteristic of its generating peptide. Here we consider the problem of
inferring, from these spectra, which proteins and peptides are present in the
sample. We develop a statistical approach to the problem, based on a nested
mixture model. In contrast to commonly used two-stage approaches, this model
provides a one-stage solution that simultaneously identifies which proteins are
present, and which peptides are correctly identified. In this way our model
incorporates the evidence feedback between proteins and their constituent
peptides. Using simulated data and a yeast data set, we compare and contrast
our method with existing widely used approaches (PeptideProphet/ProteinProphet)
and with a recently published new approach, HSM. For peptide identification,
our single-stage approach yields consistently more accurate results. For
protein identification the methods have similar accuracy in most settings,
although we exhibit some scenarios in which the existing methods perform
poorly.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS316 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Definition of valid proteomic biomarkers: a bayesian solution
Clinical proteomics is suffering from high hopes generated by reports on apparent biomarkers, most of which could not be later substantiated via validation. This has brought into focus the need for improved methods of finding a panel of clearly defined biomarkers. To examine this problem, urinary proteome data was collected from healthy adult males and females, and analysed to find biomarkers that differentiated between genders. We believe that models that incorporate sparsity in terms of variables are desirable for biomarker selection, as proteomics data typically contains a huge number of variables (peptides) and few samples making the selection process potentially unstable. This suggests the application of a two-level hierarchical Bayesian probit regression model for variable selection which assumes a prior that favours sparseness. The classification performance of this method is shown to improve that of the Probabilistic K-Nearest Neighbour model
- ā¦