68,341 research outputs found
Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning
A crucial challenge in reinforcement learning is to reduce the number of
interactions with the environment that an agent requires to master a given
task. Transfer learning proposes to address this issue by re-using knowledge
from previously learned tasks. However, determining which source task qualifies
as optimal for knowledge extraction, as well as the choice regarding which
algorithm components to transfer, represent severe obstacles to its application
in reinforcement learning. The goal of this paper is to alleviate these issues
with modular multi-source transfer learning techniques. Our proposed
methodologies automatically learn how to extract useful information from source
tasks, regardless of the difference in state-action space and reward function.
We support our claims with extensive and challenging cross-domain experiments
for visual control.Comment: 15 pages, 6 figures, 8 tables. arXiv admin note: text overlap with
arXiv:2108.0652
Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences
International audienceDesigning dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy
Learning from Peers: Deep Transfer Reinforcement Learning for Joint Radio and Cache Resource Allocation in 5G RAN Slicing
Radio access network (RAN) slicing is an important pillar in cross-domain
network slicing which covers RAN, edge, transport and core slicing. The
evolving network architecture requires the orchestration of multiple network
resources such as radio and cache resources. In recent years, machine learning
(ML) techniques have been widely applied for network management. However, most
existing works do not take advantage of the knowledge transfer capability in
ML. In this paper, we propose a deep transfer reinforcement learning (DTRL)
scheme for joint radio and cache resource allocation to serve 5G RAN slicing.
We first define a hierarchical architecture for the joint resource allocation.
Then we propose two DTRL algorithms: Q-value-based deep transfer reinforcement
learning (QDTRL) and action selection-based deep transfer reinforcement
learning (ADTRL). In the proposed schemes, learner agents utilize expert
agents' knowledge to improve their performance on target tasks. The proposed
algorithms are compared with both the model-free exploration bonus deep
Q-learning (EB-DQN) and the model-based priority proportional fairness and
time-to-live (PPF-TTL) algorithms. Compared with EB-DQN, our proposed DTRL
based method presents 21.4% lower delay for Ultra Reliable Low Latency
Communications (URLLC) slice and 22.4% higher throughput for enhanced Mobile
Broad Band (eMBB) slice, while achieving significantly faster convergence than
EB-DQN. Moreover, 40.8% lower URLLC delay and 59.8% higher eMBB throughput are
observed with respect to PPF-TTL.Comment: Under review of IEEE Transactions on Cognitive Communications and
Networkin
Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
Computers can understand and then engage with people in an emotionally
intelligent way thanks to speech-emotion recognition (SER). However, the
performance of SER in cross-corpus and real-world live data feed scenarios can
be significantly improved. The inability to adapt an existing model to a new
domain is one of the shortcomings of SER methods. To address this challenge,
researchers have developed domain adaptation techniques that transfer knowledge
learnt by a model across the domain. Although existing domain adaptation
techniques have improved performances across domains, they can be improved to
adapt to a real-world live data feed situation where a model can self-tune
while deployed. In this paper, we present a deep reinforcement learning-based
strategy (RL-DA) for adapting a pre-trained model to a real-world live data
feed setting while interacting with the environment and collecting continual
feedback. RL-DA is evaluated on SER tasks, including cross-corpus and
cross-language domain adaption schema. Evaluation results show that in a live
data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in
cross-corpus and cross-language scenarios, respectively
Task-aware Adaptive Learning for Cross-domain Few-shot Learning
Although existing few-shot learning works yield promising results for in-domain queries, they still suffer from weak cross-domain generalization. Limited support data requires effective knowledge transfer, but domain-shift makes this harder. Towards this emerging challenge, researchers improved adaptation by introducing task-specific parameters, which are directly optimized and estimated for each task. However, adding a fixed number of additional parameters fails to consider the diverse domain shifts between target tasks and the source domain, limiting efficacy. In this paper, we first observe the dependence of task-specific parameter configuration on the target task. Abundant task-specific parameters may over-fit, and insufficient task-specific parameters may result in under-adaptation -- but the optimal task-specific configuration varies for different test tasks. Based on these findings, we propose the Task-aware Adaptive Network (TA2-Net), which is trained by reinforcement learning to adaptively estimate the optimal task-specific parameter configuration for each test task. It learns, for example, that tasks with significant domain shift usually have a larger need for task-specific parameters for adaptation. We evaluate our model on Meta-dataset. Empirical results show that our model outperforms existing state-of-the-art methods
Grounding Language for Transfer in Deep Reinforcement Learning
In this paper, we explore the utilization of natural language to drive
transfer for reinforcement learning (RL). Despite the wide-spread application
of deep RL techniques, learning generalized policy representations that work
across domains remains a challenging problem. We demonstrate that textual
descriptions of environments provide a compact intermediate channel to
facilitate effective policy transfer. Specifically, by learning to ground the
meaning of text to the dynamics of the environment such as transitions and
rewards, an autonomous agent can effectively bootstrap policy learning on a new
domain given its description. We employ a model-based RL approach consisting of
a differentiable planning module, a model-free component and a factorized state
representation to effectively use entity descriptions. Our model outperforms
prior work on both transfer and multi-task scenarios in a variety of different
environments. For instance, we achieve up to 14% and 11.5% absolute improvement
over previously existing models in terms of average and initial rewards,
respectively.Comment: JAIR 201
- …