194,710 research outputs found
Recommended from our members
Variable grouping in multivariate time series via correlation
The decomposition of high-dimensional multivariate time series (MTS) into a number of low-dimensional MTS is a useful but challenging task because the number of possible dependencies between variables is likely to be huge. This paper is about a systematic study of the “variable groupings” problem in MTS. In particular, we investigate different methods of utilizing the information regarding correlations among MTS variables. This type of method does not appear to have been studied before. In all, 15 methods are suggested and applied to six datasets where there are identifiable mixed groupings of MTS variables. This paper describes the general methodology, reports extensive experimental results, and concludes with useful insights on the strength and weakness of this type of grouping metho
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Data competitions rely on real-time leaderboards to rank competitor entries
and stimulate algorithm improvement. While such competitions have become quite
popular and prevalent, particularly in supervised learning formats, their
implementations by the host are highly variable. Without careful planning, a
supervised learning competition is vulnerable to overfitting, where the winning
solutions are so closely tuned to the particular set of provided data that they
cannot generalize to the underlying problem of interest to the host. This paper
outlines some important considerations for strategically designing relevant and
informative data sets to maximize the learning outcome from hosting a
competition based on our experience. It also describes a post-competition
analysis that enables robust and efficient assessment of the strengths and
weaknesses of solutions from different competitors, as well as greater
understanding of the regions of the input space that are well-solved. The
post-competition analysis, which complements the leaderboard, uses exploratory
data analysis and generalized linear models (GLMs). The GLMs not only expand
the range of results we can explore, they also provide more detailed analysis
of individual sub-questions including similarities and differences between
algorithms across different types of scenarios, universally easy or hard
regions of the input space, and different learning objectives. When coupled
with a strategically planned data generation approach, the methods provide
richer and more informative summaries to enhance the interpretation of results
beyond just the rankings on the leaderboard. The methods are illustrated with a
recently completed competition to evaluate algorithms capable of detecting,
identifying, and locating radioactive materials in an urban environment.Comment: 36 page
Cross-modal Recurrent Models for Weight Objective Prediction from Multimodal Time-series Data
We analyse multimodal time-series data corresponding to weight, sleep and
steps measurements. We focus on predicting whether a user will successfully
achieve his/her weight objective. For this, we design several deep long
short-term memory (LSTM) architectures, including a novel cross-modal LSTM
(X-LSTM), and demonstrate their superiority over baseline approaches. The
X-LSTM improves parameter efficiency by processing each modality separately and
allowing for information flow between them by way of recurrent
cross-connections. We present a general hyperparameter optimisation technique
for X-LSTMs, which allows us to significantly improve on the LSTM and a prior
state-of-the-art cross-modal approach, using a comparable number of parameters.
Finally, we visualise the model's predictions, revealing implications about
latent variables in this task.Comment: To appear in NIPS ML4H 2017 and NIPS TSW 201
Decision Making for Rapid Information Acquisition in the Reconnaissance of Random Fields
Research into several aspects of robot-enabled reconnaissance of random
fields is reported. The work has two major components: the underlying theory of
information acquisition in the exploration of unknown fields and the results of
experiments on how humans use sensor-equipped robots to perform a simulated
reconnaissance exercise.
The theoretical framework reported herein extends work on robotic exploration
that has been reported by ourselves and others. Several new figures of merit
for evaluating exploration strategies are proposed and compared. Using concepts
from differential topology and information theory, we develop the theoretical
foundation of search strategies aimed at rapid discovery of topological
features (locations of critical points and critical level sets) of a priori
unknown differentiable random fields. The theory enables study of efficient
reconnaissance strategies in which the tradeoff between speed and accuracy can
be understood. The proposed approach to rapid discovery of topological features
has led in a natural way to to the creation of parsimonious reconnaissance
routines that do not rely on any prior knowledge of the environment. The design
of topology-guided search protocols uses a mathematical framework that
quantifies the relationship between what is discovered and what remains to be
discovered. The quantification rests on an information theory inspired model
whose properties allow us to treat search as a problem in optimal information
acquisition. A central theme in this approach is that "conservative" and
"aggressive" search strategies can be precisely defined, and search decisions
regarding "exploration" vs. "exploitation" choices are informed by the rate at
which the information metric is changing.Comment: 34 pages, 20 figure
SkILL - a Stochastic Inductive Logic Learner
Probabilistic Inductive Logic Programming (PILP) is a rel- atively unexplored
area of Statistical Relational Learning which extends classic Inductive Logic
Programming (ILP). This work introduces SkILL, a Stochastic Inductive Logic
Learner, which takes probabilistic annotated data and produces First Order
Logic theories. Data in several domains such as medicine and bioinformatics
have an inherent degree of uncer- tainty, that can be used to produce models
closer to reality. SkILL can not only use this type of probabilistic data to
extract non-trivial knowl- edge from databases, but it also addresses
efficiency issues by introducing a novel, efficient and effective search
strategy to guide the search in PILP environments. The capabilities of SkILL
are demonstrated in three dif- ferent datasets: (i) a synthetic toy example
used to validate the system, (ii) a probabilistic adaptation of a well-known
biological metabolism ap- plication, and (iii) a real world medical dataset in
the breast cancer domain. Results show that SkILL can perform as well as a
deterministic ILP learner, while also being able to incorporate probabilistic
knowledge that would otherwise not be considered
Generative Adversarial Networks for Financial Trading Strategies Fine-Tuning and Combination
Systematic trading strategies are algorithmic procedures that allocate assets
aiming to optimize a certain performance criterion. To obtain an edge in a
highly competitive environment, the analyst needs to proper fine-tune its
strategy, or discover how to combine weak signals in novel alpha creating
manners. Both aspects, namely fine-tuning and combination, have been
extensively researched using several methods, but emerging techniques such as
Generative Adversarial Networks can have an impact into such aspects.
Therefore, our work proposes the use of Conditional Generative Adversarial
Networks (cGANs) for trading strategies calibration and aggregation. To this
purpose, we provide a full methodology on: (i) the training and selection of a
cGAN for time series data; (ii) how each sample is used for strategies
calibration; and (iii) how all generated samples can be used for ensemble
modelling. To provide evidence that our approach is well grounded, we have
designed an experiment with multiple trading strategies, encompassing 579
assets. We compared cGAN with an ensemble scheme and model validation methods,
both suited for time series. Our results suggest that cGANs are a suitable
alternative for strategies calibration and combination, providing
outperformance when the traditional techniques fail to generate any alpha
- …