Search CORE

68,341 research outputs found

Multi-Source Transfer Learning for Deep Model-Based Reinforcement Learning

Author: Sabatelli Matthia
Sasso Remo
Wiering Marco A.
Publication venue
Publication date: 10/10/2022
Field of study

A crucial challenge in reinforcement learning is to reduce the number of interactions with the environment that an agent requires to master a given task. Transfer learning proposes to address this issue by re-using knowledge from previously learned tasks. However, determining which source task qualifies as optimal for knowledge extraction, as well as the choice regarding which algorithm components to transfer, represent severe obstacles to its application in reinforcement learning. The goal of this paper is to alleviate these issues with modular multi-source transfer learning techniques. Our proposed methodologies automatically learn how to extract useful information from source tasks, regardless of the difference in state-action space and reward function. We support our claims with extensive and challenging cross-domain experiments for visual control.Comment: 15 pages, 6 figures, 8 tables. arXiv admin note: text overlap with arXiv:2108.0652

arXiv.org e-Print Archive

Sample Efficient On-line Learning of Optimal Dialogue Policies with Kalman Temporal Differences

Author: Chandramohan Senthilkumar
Geist Matthieu
Pietquin Olivier
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

International audienceDesigning dialog policies for voice-enabled interfaces is a tailoring job that is most often left to natural language processing experts. This job is generally redone for every new dialog task because cross-domain transfer is not possible. For this reason, machine learning methods for dialog policy optimization have been investigated during the last 15 years. Especially, reinforcement learning (RL) is now part of the state of the art in this domain. Standard RL methods require to test more or less random changes in the policy on users to assess them as improvements or degradations. This is called on policy learning. Nevertheless, it can result in system behaviors that are not acceptable by users. Learning algorithms should ideally infer an optimal strategy by observing interactions generated by a non-optimal but acceptable strategy, that is learning off-policy. In this contribution, a sample-efficient, online and off-policy reinforcement learning algorithm is proposed to learn an optimal policy from few hundreds of dialogues generated with a very simple handcrafted policy

HAL-CentraleSupelec

CiteSeerX

HAL - Université de Franche-Comté

HAL-Rennes 1

Learning from Peers: Deep Transfer Reinforcement Learning for Joint Radio and Cache Resource Allocation in 5G RAN Slicing

Author: Erol-Kantarci Melike
Poor Vincent
Zhou Hao
Publication venue
Publication date: 15/04/2022
Field of study

Radio access network (RAN) slicing is an important pillar in cross-domain network slicing which covers RAN, edge, transport and core slicing. The evolving network architecture requires the orchestration of multiple network resources such as radio and cache resources. In recent years, machine learning (ML) techniques have been widely applied for network management. However, most existing works do not take advantage of the knowledge transfer capability in ML. In this paper, we propose a deep transfer reinforcement learning (DTRL) scheme for joint radio and cache resource allocation to serve 5G RAN slicing. We first define a hierarchical architecture for the joint resource allocation. Then we propose two DTRL algorithms: Q-value-based deep transfer reinforcement learning (QDTRL) and action selection-based deep transfer reinforcement learning (ADTRL). In the proposed schemes, learner agents utilize expert agents' knowledge to improve their performance on target tasks. The proposed algorithms are compared with both the model-free exploration bonus deep Q-learning (EB-DQN) and the model-based priority proportional fairness and time-to-live (PPF-TTL) algorithms. Compared with EB-DQN, our proposed DTRL based method presents 21.4% lower delay for Ultra Reliable Low Latency Communications (URLLC) slice and 22.4% higher throughput for enhanced Mobile Broad Band (eMBB) slice, while achieving significantly faster convergence than EB-DQN. Moreover, 40.8% lower URLLC delay and 59.8% higher eMBB throughput are observed with respect to PPF-TTL.Comment: Under review of IEEE Transactions on Cognitive Communications and Networkin

arXiv.org e-Print Archive

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition

Author: Khalifa Sara
Rajapakshe Thejan
Rana Rajib
Schuller Bjorn W.
Publication venue
Publication date: 23/09/2022
Field of study

Computers can understand and then engage with people in an emotionally intelligent way thanks to speech-emotion recognition (SER). However, the performance of SER in cross-corpus and real-world live data feed scenarios can be significantly improved. The inability to adapt an existing model to a new domain is one of the shortcomings of SER methods. To address this challenge, researchers have developed domain adaptation techniques that transfer knowledge learnt by a model across the domain. Although existing domain adaptation techniques have improved performances across domains, they can be improved to adapt to a real-world live data feed situation where a model can self-tune while deployed. In this paper, we present a deep reinforcement learning-based strategy (RL-DA) for adapting a pre-trained model to a real-world live data feed setting while interacting with the environment and collecting continual feedback. RL-DA is evaluated on SER tasks, including cross-corpus and cross-language domain adaption schema. Evaluation results show that in a live data feed setting, RL-DA outperforms a baseline strategy by 11% and 14% in cross-corpus and cross-language scenarios, respectively

arXiv.org e-Print Archive

Task-aware Adaptive Learning for Cross-domain Few-shot Learning

Author: Dong Yuan
Du Ruoyi
Guo Yurong
Hospedales Timothy
Ma Zhanyu
Song Yi-Zhe
Publication venue
Publication date: 15/01/2024
Field of study

Although existing few-shot learning works yield promising results for in-domain queries, they still suffer from weak cross-domain generalization. Limited support data requires effective knowledge transfer, but domain-shift makes this harder. Towards this emerging challenge, researchers improved adaptation by introducing task-specific parameters, which are directly optimized and estimated for each task. However, adding a fixed number of additional parameters fails to consider the diverse domain shifts between target tasks and the source domain, limiting efficacy. In this paper, we first observe the dependence of task-specific parameter configuration on the target task. Abundant task-specific parameters may over-fit, and insufficient task-specific parameters may result in under-adaptation -- but the optimal task-specific configuration varies for different test tasks. Based on these findings, we propose the Task-aware Adaptive Network (TA2-Net), which is trained by reinforcement learning to adaptively estimate the optimal task-specific parameter configuration for each test task. It learns, for example, that tasks with significant domain shift usually have a larger need for task-specific parameters for adaptation. We evaluate our model on Meta-dataset. Empirical results show that our model outperforms existing state-of-the-art methods

Edinburgh Research Explorer

Grounding Language for Transfer in Deep Reinforcement Learning

Author: Barzilay Regina
Jaakkola Tommi
Narasimhan Karthik
Publication venue
Publication date: 05/12/2018
Field of study

In this paper, we explore the utilization of natural language to drive transfer for reinforcement learning (RL). Despite the wide-spread application of deep RL techniques, learning generalized policy representations that work across domains remains a challenging problem. We demonstrate that textual descriptions of environments provide a compact intermediate channel to facilitate effective policy transfer. Specifically, by learning to ground the meaning of text to the dynamics of the environment such as transitions and rewards, an autonomous agent can effectively bootstrap policy learning on a new domain given its description. We employ a model-based RL approach consisting of a differentiable planning module, a model-free component and a factorized state representation to effectively use entity descriptions. Our model outperforms prior work on both transfer and multi-task scenarios in a variety of different environments. For instance, we achieve up to 14% and 11.5% absolute improvement over previously existing models in terms of average and initial rewards, respectively.Comment: JAIR 201

arXiv.org e-Print Archive

DSpace@MIT