6,639 research outputs found
Multimodal nested sampling: an efficient and robust alternative to MCMC methods for astronomical data analysis
In performing a Bayesian analysis of astronomical data, two difficult
problems often emerge. First, in estimating the parameters of some model for
the data, the resulting posterior distribution may be multimodal or exhibit
pronounced (curving) degeneracies, which can cause problems for traditional
MCMC sampling methods. Second, in selecting between a set of competing models,
calculation of the Bayesian evidence for each model is computationally
expensive. The nested sampling method introduced by Skilling (2004), has
greatly reduced the computational expense of calculating evidences and also
produces posterior inferences as a by-product. This method has been applied
successfully in cosmological applications by Mukherjee et al. (2006), but their
implementation was efficient only for unimodal distributions without pronounced
degeneracies. Shaw et al. (2007), recently introduced a clustered nested
sampling method which is significantly more efficient in sampling from
multimodal posteriors and also determines the expectation and variance of the
final evidence from a single run of the algorithm, hence providing a further
increase in efficiency. In this paper, we build on the work of Shaw et al. and
present three new methods for sampling and evidence evaluation from
distributions that may contain multiple modes and significant degeneracies; we
also present an even more efficient technique for estimating the uncertainty on
the evaluated evidence. These methods lead to a further substantial improvement
in sampling efficiency and robustness, and are applied to toy problems to
demonstrate the accuracy and economy of the evidence calculation and parameter
estimation. Finally, we discuss the use of these methods in performing Bayesian
object detection in astronomical datasets.Comment: 14 pages, 11 figures, submitted to MNRAS, some major additions to the
previous version in response to the referee's comment
Identifying Mixtures of Mixtures Using Bayesian Estimation
The use of a finite mixture of normal distributions in model-based clustering
allows to capture non-Gaussian data clusters. However, identifying the clusters
from the normal components is challenging and in general either achieved by
imposing constraints on the model or by using post-processing procedures.
Within the Bayesian framework we propose a different approach based on sparse
finite mixtures to achieve identifiability. We specify a hierarchical prior
where the hyperparameters are carefully selected such that they are reflective
of the cluster structure aimed at. In addition this prior allows to estimate
the model using standard MCMC sampling methods. In combination with a
post-processing approach which resolves the label switching issue and results
in an identified model, our approach allows to simultaneously (1) determine the
number of clusters, (2) flexibly approximate the cluster distributions in a
semi-parametric way using finite mixtures of normals and (3) identify
cluster-specific parameters and classify observations. The proposed approach is
illustrated in two simulation studies and on benchmark data sets.Comment: 49 page
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Spatial Guilds in the Serengeti Food Web Revealed by a Bayesian Group Model
Food webs, networks of feeding relationships among organisms, provide
fundamental insights into mechanisms that determine ecosystem stability and
persistence. Despite long-standing interest in the compartmental structure of
food webs, past network analyses of food webs have been constrained by a
standard definition of compartments, or modules, that requires many links
within compartments and few links between them. Empirical analyses have been
further limited by low-resolution data for primary producers. In this paper, we
present a Bayesian computational method for identifying group structure in food
webs using a flexible definition of a group that can describe both functional
roles and standard compartments. The Serengeti ecosystem provides an
opportunity to examine structure in a newly compiled food web that includes
species-level resolution among plants, allowing us to address whether groups in
the food web correspond to tightly-connected compartments or functional groups,
and whether network structure reflects spatial or trophic organization, or a
combination of the two. We have compiled the major mammalian and plant
components of the Serengeti food web from published literature, and we infer
its group structure using our method. We find that network structure
corresponds to spatially distinct plant groups coupled at higher trophic levels
by groups of herbivores, which are in turn coupled by carnivore groups. Thus
the group structure of the Serengeti web represents a mixture of trophic guild
structure and spatial patterns, in contrast to the standard compartments
typically identified in ecological networks. From data consisting only of nodes
and links, the group structure that emerges supports recent ideas on spatial
coupling and energy channels in ecosystems that have been proposed as important
for persistence.Comment: 28 pages, 6 figures (+ 3 supporting), 2 tables (+ 4 supporting
- …