59,793 research outputs found
Bayesian Hierarchical Models for High-Dimensional Mediation Analysis with Coordinated Selection of Correlated Mediators
We consider Bayesian high-dimensional mediation analysis to identify among a
large set of correlated potential mediators the active ones that mediate the
effect from an exposure variable to an outcome of interest. Correlations among
mediators are commonly observed in modern data analysis; examples include the
activated voxels within connected regions in brain image data, regulatory
signals driven by gene networks in genome data and correlated exposure data
from the same source. When correlations are present among active mediators,
mediation analysis that fails to account for such correlation can be
sub-optimal and may lead to a loss of power in identifying active mediators.
Building upon a recent high-dimensional mediation analysis framework, we
propose two Bayesian hierarchical models, one with a Gaussian mixture prior
that enables correlated mediator selection and the other with a Potts mixture
prior that accounts for the correlation among active mediators in mediation
analysis. We develop efficient sampling algorithms for both methods. Various
simulations demonstrate that our methods enable effective identification of
correlated active mediators, which could be missed by using existing methods
that assume prior independence among active mediators. The proposed methods are
applied to the LIFECODES birth cohort and the Multi-Ethnic Study of
Atherosclerosis (MESA) and identified new active mediators with important
biological implications
EM Algorithms for Weighted-Data Clustering with Application to Audio-Visual Scene Analysis
Data clustering has received a lot of attention and numerous methods,
algorithms and software packages are available. Among these techniques,
parametric finite-mixture models play a central role due to their interesting
mathematical properties and to the existence of maximum-likelihood estimators
based on expectation-maximization (EM). In this paper we propose a new mixture
model that associates a weight with each observed point. We introduce the
weighted-data Gaussian mixture and we derive two EM algorithms. The first one
considers a fixed weight for each observation. The second one treats each
weight as a random variable following a gamma distribution. We propose a model
selection method based on a minimum message length criterion, provide a weight
initialization strategy, and validate the proposed algorithms by comparing them
with several state of the art parametric and non-parametric clustering
techniques. We also demonstrate the effectiveness and robustness of the
proposed clustering technique in the presence of heterogeneous data, namely
audio-visual scene analysis.Comment: 14 pages, 4 figures, 4 table
A Comparison of Nature Inspired Algorithms for Multi-threshold Image Segmentation
In the field of image analysis, segmentation is one of the most important
preprocessing steps. One way to achieve segmentation is by mean of threshold
selection, where each pixel that belongs to a determined class islabeled
according to the selected threshold, giving as a result pixel groups that share
visual characteristics in the image. Several methods have been proposed in
order to solve threshold selectionproblems; in this work, it is used the method
based on the mixture of Gaussian functions to approximate the 1D histogram of a
gray level image and whose parameters are calculated using three nature
inspired algorithms (Particle Swarm Optimization, Artificial Bee Colony
Optimization and Differential Evolution). Each Gaussian function approximates
thehistogram, representing a pixel class and therefore a threshold point.
Experimental results are shown, comparing in quantitative and qualitative
fashion as well as the main advantages and drawbacks of each algorithm, applied
to multi-threshold problem.Comment: 16 pages, this is a draft of the final version of the article sent to
the Journa
MDL Denoising Revisited
We refine and extend an earlier MDL denoising criterion for wavelet-based
denoising. We start by showing that the denoising problem can be reformulated
as a clustering problem, where the goal is to obtain separate clusters for
informative and non-informative wavelet coefficients, respectively. This
suggests two refinements, adding a code-length for the model index, and
extending the model in order to account for subband-dependent coefficient
distributions. A third refinement is derivation of soft thresholding inspired
by predictive universal coding with weighted mixtures. We propose a practical
method incorporating all three refinements, which is shown to achieve good
performance and robustness in denoising both artificial and natural signals.Comment: Submitted to IEEE Transactions on Information Theory, June 200
-MLE: A fast algorithm for learning statistical mixture models
We describe -MLE, a fast and efficient local search algorithm for learning
finite statistical mixtures of exponential families such as Gaussian mixture
models. Mixture models are traditionally learned using the
expectation-maximization (EM) soft clustering technique that monotonically
increases the incomplete (expected complete) likelihood. Given prescribed
mixture weights, the hard clustering -MLE algorithm iteratively assigns data
to the most likely weighted component and update the component models using
Maximum Likelihood Estimators (MLEs). Using the duality between exponential
families and Bregman divergences, we prove that the local convergence of the
complete likelihood of -MLE follows directly from the convergence of a dual
additively weighted Bregman hard clustering. The inner loop of -MLE can be
implemented using any -means heuristic like the celebrated Lloyd's batched
or Hartigan's greedy swap updates. We then show how to update the mixture
weights by minimizing a cross-entropy criterion that implies to update weights
by taking the relative proportion of cluster points, and reiterate the mixture
parameter update and mixture weight update processes until convergence. Hard EM
is interpreted as a special case of -MLE when both the component update and
the weight update are performed successively in the inner loop. To initialize
-MLE, we propose -MLE++, a careful initialization of -MLE guaranteeing
probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201
On Quantifying Qualitative Geospatial Data: A Probabilistic Approach
Living in the era of data deluge, we have witnessed a web content explosion,
largely due to the massive availability of User-Generated Content (UGC). In
this work, we specifically consider the problem of geospatial information
extraction and representation, where one can exploit diverse sources of
information (such as image and audio data, text data, etc), going beyond
traditional volunteered geographic information. Our ambition is to include
available narrative information in an effort to better explain geospatial
relationships: with spatial reasoning being a basic form of human cognition,
narratives expressing such experiences typically contain qualitative spatial
data, i.e., spatial objects and spatial relationships.
To this end, we formulate a quantitative approach for the representation of
qualitative spatial relations extracted from UGC in the form of texts. The
proposed method quantifies such relations based on multiple text observations.
Such observations provide distance and orientation features which are utilized
by a greedy Expectation Maximization-based (EM) algorithm to infer a
probability distribution over predefined spatial relationships; the latter
represent the quantified relationships under user-defined probabilistic
assumptions. We evaluate the applicability and quality of the proposed approach
using real UGC data originating from an actual travel blog text corpus. To
verify the quality of the result, we generate grid-based maps visualizing the
spatial extent of the various relations
- …