38,727 research outputs found
Leveraging Edge Computing through Collaborative Machine Learning
The Internet of Things (IoT) offers the ability
to analyze and predict our surroundings through sensor
networks at the network edge. To facilitate this predictive
functionality, Edge Computing (EC) applications are developed
by considering: power consumption, network lifetime and
quality of context inference. Humongous contextual data from
sensors provide data scientists better knowledge extraction,
albeit coming at the expense of holistic data transfer that
threatens the network feasibility and lifetime. To cope with this,
collaborative machine learning is applied to EC devices to (i)
extract the statistical relationships and (ii) construct regression
(predictive) models to maximize communication efficiency. In
this paper, we propose a learning methodology that improves
the prediction accuracy by quantizing the input space and
leveraging the local knowledge of the EC devices
Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
We consider the estimation of Dirichlet Process Mixture Models (DPMMs) in
distributed environments, where data are distributed across multiple computing
nodes. A key advantage of Bayesian nonparametric models such as DPMMs is that
they allow new components to be introduced on the fly as needed. This, however,
posts an important challenge to distributed estimation -- how to handle new
components efficiently and consistently. To tackle this problem, we propose a
new estimation method, which allows new components to be created locally in
individual computing nodes. Components corresponding to the same cluster will
be identified and merged via a probabilistic consolidation scheme. In this way,
we can maintain the consistency of estimation with very low communication cost.
Experiments on large real-world data sets show that the proposed method can
achieve high scalability in distributed and asynchronous environments without
compromising the mixing performance.Comment: This paper is published on IJCAI 2017.
https://www.ijcai.org/proceedings/2017/64
False discovery rate analysis of brain diffusion direction maps
Diffusion tensor imaging (DTI) is a novel modality of magnetic resonance
imaging that allows noninvasive mapping of the brain's white matter. A
particular map derived from DTI measurements is a map of water principal
diffusion directions, which are proxies for neural fiber directions. We
consider a study in which diffusion direction maps were acquired for two groups
of subjects. The objective of the analysis is to find regions of the brain in
which the corresponding diffusion directions differ between the groups. This is
attained by first computing a test statistic for the difference in direction at
every brain location using a Watson model for directional data. Interesting
locations are subsequently selected with control of the false discovery rate.
More accurate modeling of the null distribution is obtained using an empirical
null density based on the empirical distribution of the test statistics across
the brain. Further, substantial improvements in power are achieved by local
spatial averaging of the test statistic map. Although the focus is on one
particular study and imaging technology, the proposed inference methods can be
applied to other large scale simultaneous hypothesis testing problems with a
continuous underlying spatial structure.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS133 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Nonparametric Hierarchical Clustering of Functional Data
In this paper, we deal with the problem of curves clustering. We propose a
nonparametric method which partitions the curves into clusters and discretizes
the dimensions of the curve points into intervals. The cross-product of these
partitions forms a data-grid which is obtained using a Bayesian model selection
approach while making no assumptions regarding the curves. Finally, a
post-processing technique, aiming at reducing the number of clusters in order
to improve the interpretability of the clustering, is proposed. It consists in
optimally merging the clusters step by step, which corresponds to an
agglomerative hierarchical classification whose dissimilarity measure is the
variation of the criterion. Interestingly this measure is none other than the
sum of the Kullback-Leibler divergences between clusters distributions before
and after the merges. The practical interest of the approach for functional
data exploratory analysis is presented and compared with an alternative
approach on an artificial and a real world data set
Group invariance principles for causal generative models
The postulate of independence of cause and mechanism (ICM) has recently led
to several new causal discovery algorithms. The interpretation of independence
and the way it is utilized, however, varies across these methods. Our aim in
this paper is to propose a group theoretic framework for ICM to unify and
generalize these approaches. In our setting, the cause-mechanism relationship
is assessed by comparing it against a null hypothesis through the application
of random generic group transformations. We show that the group theoretic view
provides a very general tool to study the structure of data generating
mechanisms with direct applications to machine learning.Comment: 16 pages, 6 figure
- …