42,053 research outputs found
Knowledge is at the Edge! How to Search in Distributed Machine Learning Models
With the advent of the Internet of Things and Industry 4.0 an enormous amount
of data is produced at the edge of the network. Due to a lack of computing
power, this data is currently send to the cloud where centralized machine
learning models are trained to derive higher level knowledge. With the recent
development of specialized machine learning hardware for mobile devices, a new
era of distributed learning is about to begin that raises a new research
question: How can we search in distributed machine learning models? Machine
learning at the edge of the network has many benefits, such as low-latency
inference and increased privacy. Such distributed machine learning models can
also learn personalized for a human user, a specific context, or application
scenario. As training data stays on the devices, control over possibly
sensitive data is preserved as it is not shared with a third party. This new
form of distributed learning leads to the partitioning of knowledge between
many devices which makes access difficult. In this paper we tackle the problem
of finding specific knowledge by forwarding a search request (query) to a
device that can answer it best. To that end, we use a entropy based quality
metric that takes the context of a query and the learning quality of a device
into account. We show that our forwarding strategy can achieve over 95%
accuracy in a urban mobility scenario where we use data from 30 000 people
commuting in the city of Trento, Italy.Comment: Published in CoopIS 201
Optimal model-free prediction from multivariate time series
Forecasting a time series from multivariate predictors constitutes a
challenging problem, especially using model-free approaches. Most techniques,
such as nearest-neighbor prediction, quickly suffer from the curse of
dimensionality and overfitting for more than a few predictors which has limited
their application mostly to the univariate case. Therefore, selection
strategies are needed that harness the available information as efficiently as
possible. Since often the right combination of predictors matters, ideally all
subsets of possible predictors should be tested for their predictive power, but
the exponentially growing number of combinations makes such an approach
computationally prohibitive. Here a prediction scheme that overcomes this
strong limitation is introduced utilizing a causal pre-selection step which
drastically reduces the number of possible predictors to the most predictive
set of causal drivers making a globally optimal search scheme tractable. The
information-theoretic optimality is derived and practical selection criteria
are discussed. As demonstrated for multivariate nonlinear stochastic delay
processes, the optimal scheme can even be less computationally expensive than
commonly used sub-optimal schemes like forward selection. The method suggests a
general framework to apply the optimal model-free approach to select variables
and subsequently fit a model to further improve a prediction or learn
statistical dependencies. The performance of this framework is illustrated on a
climatological index of El Ni\~no Southern Oscillation.Comment: 14 pages, 9 figure
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Conditional independence testing is a fundamental problem underlying causal
discovery and a particularly challenging task in the presence of nonlinear and
high-dimensional dependencies. Here a fully non-parametric test for continuous
data based on conditional mutual information combined with a local permutation
scheme is presented. Through a nearest neighbor approach, the test efficiently
adapts also to non-smooth distributions due to strongly nonlinear dependencies.
Numerical experiments demonstrate that the test reliably simulates the null
distribution even for small sample sizes and with high-dimensional conditioning
sets. The test is better calibrated than kernel-based tests utilizing an
analytical approximation of the null distribution, especially for non-smooth
densities, and reaches the same or higher power levels. Combining the local
permutation scheme with the kernel tests leads to better calibration, but
suffers in power. For smaller sample sizes and lower dimensions, the test is
faster than random fourier feature-based kernel tests if the permutation scheme
is (embarrassingly) parallelized, but the runtime increases more sharply with
sample size and dimensionality. Thus, more theoretical research to analytically
approximate the null distribution and speed up the estimation for larger sample
sizes is desirable.Comment: 17 pages, 12 figures, 1 tabl
- …