12 research outputs found
Framing Lifelong Learning as Autonomous Deployment: Tune Once Live Forever
International audienceLifelong Learning in the context of Artificial Intelligence is a new paradigm that is still in its infancy. It refers to agents that are able to learn continuously, accumulating the knowledge learned in previous tasks and using it to help future learning. In this position paper we depart from the focus on learning new tasks and instead take a stance from the perspective of the life-cycle of intelligent software. We propose to focus lifelong learning research on autonomous intelligent systems that sustain their performance after deployment in production across time without the need of machine learning experts. This perspective is being applied to three Eu-ropean projects funded under the CHIST-ERA framework on several domains of application
Self-organization of action hierarchy and compositionality by reinforcement learning with recurrent neural networks
Recurrent neural networks (RNNs) for reinforcement learning (RL) have shown
distinct advantages, e.g., solving memory-dependent tasks and meta-learning.
However, little effort has been spent on improving RNN architectures and on
understanding the underlying neural mechanisms for performance gain. In this
paper, we propose a novel, multiple-timescale, stochastic RNN for RL. Empirical
results show that the network can autonomously learn to abstract sub-goals and
can self-develop an action hierarchy using internal dynamics in a challenging
continuous control task. Furthermore, we show that the self-developed
compositionality of the network enhances faster re-learning when adapting to a
new task that is a re-composition of previously learned sub-goals, than when
starting from scratch. We also found that improved performance can be achieved
when neural activities are subject to stochastic rather than deterministic
dynamics
Transfer Value Iteration Networks
Value iteration networks (VINs) have been demonstrated to have a good
generalization ability for reinforcement learning tasks across similar domains.
However, based on our experiments, a policy learned by VINs still fail to
generalize well on the domain whose action space and feature space are not
identical to those in the domain where it is trained. In this paper, we propose
a transfer learning approach on top of VINs, termed Transfer VINs (TVINs), such
that a learned policy from a source domain can be generalized to a target
domain with only limited training data, even if the source domain and the
target domain have domain-specific actions and features. We empirically verify
that our proposed TVINs outperform VINs when the source and the target domains
have similar but not identical action and feature spaces. Furthermore, we show
that the performance improvement is consistent across different environments,
maze sizes, dataset sizes as well as different values of hyperparameters such
as number of iteration and kernel size
Human-Inspired Framework to Accelerate Reinforcement Learning
While deep reinforcement learning (RL) is becoming an integral part of good
decision-making in data science, it is still plagued with sample inefficiency.
This can be challenging when applying deep-RL in real-world environments where
physical interactions are expensive and can risk system safety. To improve the
sample efficiency of RL algorithms, this paper proposes a novel human-inspired
framework that facilitates fast exploration and learning for difficult RL
tasks. The main idea is to first provide the learning agent with simpler but
similar tasks that gradually grow in difficulty and progress toward the main
task. The proposed method requires no pre-training phase. Specifically, the
learning of simpler tasks is only done for one iteration. The generated
knowledge could be used by any transfer learning, including value transfer and
policy transfer, to reduce the sample complexity while not adding to the
computational complexity. So, it can be applied to any goal, environment, and
reinforcement learning algorithm - both value-based methods and policy-based
methods and both tabular methods and deep-RL methods. We have evaluated our
proposed framework on both a simple Random Walk for illustration purposes and
on more challenging optimal control problems with constraint. The experiments
show the good performance of our proposed framework in improving the sample
efficiency of RL-learning algorithms, especially when the main task is
difficult