12,319 research outputs found
Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis
One of the main challenges in the field of embodied artificial intelligence
is the open-ended autonomous learning of complex behaviours. Our approach is to
use task-independent, information-driven intrinsic motivation(s) to support
task-dependent learning. The work presented here is a preliminary step in which
we investigate the predictive information (the mutual information of the past
and future of the sensor stream) as an intrinsic drive, ideally supporting any
kind of task acquisition. Previous experiments have shown that the predictive
information (PI) is a good candidate to support autonomous, open-ended learning
of complex behaviours, because a maximisation of the PI corresponds to an
exploration of morphology- and environment-dependent behavioural regularities.
The idea is that these regularities can then be exploited in order to solve any
given task. Three different experiments are presented and their results lead to
the conclusion that the linear combination of the one-step PI with an external
reward function is not generally recommended in an episodic policy gradient
setting. Only for hard tasks a great speed-up can be achieved at the cost of an
asymptotic performance lost
Task Transfer by Preference-Based Cost Learning
The goal of task transfer in reinforcement learning is migrating the action
policy of an agent to the target task from the source task. Given their
successes on robotic action planning, current methods mostly rely on two
requirements: exactly-relevant expert demonstrations or the explicitly-coded
cost function on target task, both of which, however, are inconvenient to
obtain in practice. In this paper, we relax these two strong conditions by
developing a novel task transfer framework where the expert preference is
applied as a guidance. In particular, we alternate the following two steps:
Firstly, letting experts apply pre-defined preference rules to select related
expert demonstrates for the target task. Secondly, based on the selection
result, we learn the target cost function and trajectory distribution
simultaneously via enhanced Adversarial MaxEnt IRL and generate more
trajectories by the learned target distribution for the next preference
selection. The theoretical analysis on the distribution learning and
convergence of the proposed algorithm are provided. Extensive simulations on
several benchmarks have been conducted for further verifying the effectiveness
of the proposed method.Comment: Accepted to AAAI 2019. Mingxuan Jing and Xiaojian Ma contributed
equally to this wor
Learning in Feedforward Neural Networks Accelerated by Transfer Entropy
Current neural networks architectures are many times harder to train because of the increasing size and complexity of the used datasets. Our objective is to design more efficient training algorithms utilizing causal relationships inferred from neural networks. The transfer entropy (TE) was initially introduced as an information transfer measure used to quantify the statistical coherence between events (time series). Later, it was related to causality, even if they are not the same. There are only few papers reporting applications of causality or TE in neural networks. Our contribution is an information-theoretical method for analyzing information transfer between the nodes of feedforward neural networks. The information transfer is measured by the TE of feedback neural connections. Intuitively, TE measures the relevance of a connection in the network and the feedback amplifies this connection. We introduce a backpropagation type training algorithm that uses TE feedback connections to improve its performance
Counterexample-Guided Data Augmentation
We present a novel framework for augmenting data sets for machine learning
based on counterexamples. Counterexamples are misclassified examples that have
important properties for retraining and improving the model. Key components of
our framework include a counterexample generator, which produces data items
that are misclassified by the model and error tables, a novel data structure
that stores information pertaining to misclassifications. Error tables can be
used to explain the model's vulnerabilities and are used to efficiently
generate counterexamples for augmentation. We show the efficacy of the proposed
framework by comparing it to classical augmentation techniques on a case study
of object detection in autonomous driving based on deep neural networks
- …