Search CORE

824 research outputs found

Goal-Directed Planning via Hindsight Experience Replay

Author: Amarildo Likmeta
Enrico Prati
Lorenzo Moro
Marcello Restelli
Publication venue
Publication date: 01/01/2022
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Learning Adaptive Display Exposure for Real-Time Advertising

Author: Agrawal Shipra
Andrychowicz Marcin
Aranyak Mehta
Bacon Pierre-Luc
Badanidiyuru Ashwinkumar
Hester Todd
Hu Yujing
Kulkarni Tejas D
Tang Liang
Wu Di
Wu Huasen
Zhang Weinan
Zhao Jun
Publication venue
Publication date: 02/09/2019
Field of study

In E-commerce advertising, where product recommendations and product ads are presented to users simultaneously, the traditional setting is to display ads at fixed positions. However, under such a setting, the advertising system loses the flexibility to control the number and positions of ads, resulting in sub-optimal platform revenue and user experience. Consequently, major e-commerce platforms (e.g., Taobao.com) have begun to consider more flexible ways to display ads. In this paper, we investigate the problem of advertising with adaptive exposure: can we dynamically determine the number and positions of ads for each user visit under certain business constraints so that the platform revenue can be increased? More specifically, we consider two types of constraints: request-level constraint ensures user experience for each user visit, and platform-level constraint controls the overall platform monetization rate. We model this problem as a Constrained Markov Decision Process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning approach to decompose the original problem into two relatively independent sub-problems. To accelerate policy learning, we also devise a constrained hindsight experience replay mechanism. Experimental evaluations on industry-scale real-world datasets demonstrate the merits of our approach in both obtaining higher revenue under the constraints and the effectiveness of the constrained hindsight experience replay mechanism.Comment: accepted by CIKM201

arXiv.org e-Print Archive

Crossref

UCL Discovery

Speeding up Reinforcement Learning with Learned Models

Author: Pou Mulet Bartomeu
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

In this master thesis, we have tried to solve two of most prominent Reinforcement Learning problems: sparse rewards and sample efficiency. The combination of Model Based Reinforcement Learning, Hindsight Experience Replay and off-policy methods is the approach we took to solve the problems

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

ETHER: Aligning Emergent Communication for Hindsight Experience Replay

Author: Denamganaï Kevin
Hernandez Daniel
Missaoui Sondess
Vardal Ozan
Walker James Alfred
Publication venue
Publication date: 28/07/2023
Field of study

Natural language instruction following is paramount to enable collaboration between artificial agents and human beings. Natural language-conditioned reinforcement learning (RL) agents have shown how natural languages' properties, such as compositionality, can provide a strong inductive bias to learn complex policies. Previous architectures like HIGhER combine the benefit of language-conditioning with Hindsight Experience Replay (HER) to deal with sparse rewards environments. Yet, like HER, HIGhER relies on an oracle predicate function to provide a feedback signal highlighting which linguistic description is valid for which state. This reliance on an oracle limits its application. Additionally, HIGhER only leverages the linguistic information contained in successful RL trajectories, thus hurting its final performance and data-efficiency. Without early successful trajectories, HIGhER is no better than DQN upon which it is built. In this paper, we propose the Emergent Textual Hindsight Experience Replay (ETHER) agent, which builds on HIGhER and addresses both of its limitations by means of (i) a discriminative visual referential game, commonly studied in the subfield of Emergent Communication (EC), used here as an unsupervised auxiliary task and (ii) a semantic grounding scheme to align the emergent language with the natural language of the instruction-following benchmark. We show that the referential game's agents make an artificial language emerge that is aligned with the natural-like language used to describe goals in the BabyAI benchmark and that it is expressive enough so as to also describe unsuccessful RL trajectories and thus provide feedback to the RL agent to leverage the linguistic, structured information contained in all trajectories. Our work shows that EC is a viable unsupervised auxiliary task for RL and provides missing pieces to make HER more widely applicable.Comment: work in progres

arXiv.org e-Print Archive