48,560 research outputs found
A Bayesian alternative to mutual information for the hierarchical clustering of dependent random variables
The use of mutual information as a similarity measure in agglomerative
hierarchical clustering (AHC) raises an important issue: some correction needs
to be applied for the dimensionality of variables. In this work, we formulate
the decision of merging dependent multivariate normal variables in an AHC
procedure as a Bayesian model comparison. We found that the Bayesian
formulation naturally shrinks the empirical covariance matrix towards a matrix
set a priori (e.g., the identity), provides an automated stopping rule, and
corrects for dimensionality using a term that scales up the measure as a
function of the dimensionality of the variables. Also, the resulting log Bayes
factor is asymptotically proportional to the plug-in estimate of mutual
information, with an additive correction for dimensionality in agreement with
the Bayesian information criterion. We investigated the behavior of these
Bayesian alternatives (in exact and asymptotic forms) to mutual information on
simulated and real data. An encouraging result was first derived on
simulations: the hierarchical clustering based on the log Bayes factor
outperformed off-the-shelf clustering techniques as well as raw and normalized
mutual information in terms of classification accuracy. On a toy example, we
found that the Bayesian approaches led to results that were similar to those of
mutual information clustering techniques, with the advantage of an automated
thresholding. On real functional magnetic resonance imaging (fMRI) datasets
measuring brain activity, it identified clusters consistent with the
established outcome of standard procedures. On this application, normalized
mutual information had a highly atypical behavior, in the sense that it
systematically favored very large clusters. These initial experiments suggest
that the proposed Bayesian alternatives to mutual information are a useful new
tool for hierarchical clustering
A Geometric Approach to Pairwise Bayesian Alignment of Functional Data Using Importance Sampling
We present a Bayesian model for pairwise nonlinear registration of functional
data. We use the Riemannian geometry of the space of warping functions to
define appropriate prior distributions and sample from the posterior using
importance sampling. A simple square-root transformation is used to simplify
the geometry of the space of warping functions, which allows for computation of
sample statistics, such as the mean and median, and a fast implementation of a
-means clustering algorithm. These tools allow for efficient posterior
inference, where multiple modes of the posterior distribution corresponding to
multiple plausible alignments of the given functions are found. We also show
pointwise credible intervals to assess the uncertainty of the alignment
in different clusters. We validate this model using simulations and present
multiple examples on real data from different application domains including
biometrics and medicine
A Quasi-Bayesian Perspective to Online Clustering
When faced with high frequency streams of data, clustering raises theoretical
and algorithmic pitfalls. We introduce a new and adaptive online clustering
algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e.,
time-dependent) estimation of the (unknown and changing) number of clusters. We
prove that our approach is supported by minimax regret bounds. We also provide
an RJMCMC-flavored implementation (called PACBO, see
https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a
convergence guarantee. Finally, numerical experiments illustrate the potential
of our procedure
Outlier detection techniques for wireless sensor networks: A survey
In the field of wireless sensor networks, those measurements that significantly deviate from the normal pattern of sensed data are considered as outliers. The potential sources of outliers include noise and errors, events, and malicious attacks on the network. Traditional outlier detection techniques are not directly applicable to wireless sensor networks due to the nature of sensor data and specific requirements and limitations of the wireless sensor networks. This survey provides a comprehensive overview of existing outlier detection techniques specifically developed for the wireless sensor networks. Additionally, it presents a technique-based taxonomy and a comparative table to be used as a guideline to select a technique suitable for the application at hand based on characteristics such as data type, outlier type, outlier identity, and outlier degree
- …