Search CORE

4 research outputs found

Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning

Author: Lindner David
Montesinos Victoriano
Nava Elvis
Perez Ethan
Rocamonde Juan
Publication venue
Publication date: 19/10/2023
Field of study

Reinforcement learning (RL) requires either manually specifying a reward function, which is often infeasible, or learning a reward model from a large amount of human feedback, which is often very expensive. We study a more sample-efficient alternative: using pretrained vision-language models (VLMs) as zero-shot reward models (RMs) to specify tasks via natural language. We propose a natural and general approach to using VLMs as reward models, which we call VLM-RMs. We use VLM-RMs based on CLIP to train a MuJoCo humanoid to learn complex tasks without a manually specified reward function, such as kneeling, doing the splits, and sitting in a lotus position. For each of these tasks, we only provide a single sentence text prompt describing the desired task with minimal prompt engineering. We provide videos of the trained agents at: https://sites.google.com/view/vlm-rm. We can improve performance by providing a second ``baseline'' prompt and projecting out parts of the CLIP embedding space irrelevant to distinguish between goal and baseline. Further, we find a strong scaling effect for VLM-RMs: larger VLMs trained with more compute and data are better reward models. The failure modes of VLM-RMs we encountered are all related to known capability limitations of current VLMs, such as limited spatial reasoning ability or visually unrealistic environments that are far off-distribution for the VLM. We find that VLM-RMs are remarkably robust as long as the VLM is large enough. This suggests that future VLMs will become more and more useful reward models for a wide range of RL applications

arXiv.org e-Print Archive

Picking the low-hanging fruit: testing new physics at scale with active learning

Author: Avramidou Maria
Butterworth Jonathan
Corpe Louie
Rocamonde Juan
Zilgalvis G
Publication venue: 'Stichting SciPost'
Publication date: 01/01/2022
Field of study

Since the discovery of the Higgs boson, testing the many possible extensions to the Standard Model has become a key challenge in particle physics. This paper discusses a new method for predicting the compatibility of new physics theories with existing experimental data from particle colliders. Using machine learning, the technique obtained comparable results to previous methods (>90% precision and recall) with only a fraction of their computing resources (<10%). This makes it possible to test models that were impossible to probe before, and allows for large-scale testing of new physics theories

arXiv.org e-Print Archive

UCL Discovery

CERN Document Server

Visions de futur i utopies de ciutat-mobilitat : sessió 2

Author: Alemany Laia
Apel-Muller Mireille
Avellaneda Pau
Balbás Alonso David
Belenguer Laura
Bravo David
Cabrera Amèlia
Calatayud Souweine Daniel
Carrasco Marta
Claret Coque
Colomé Bernat
Compta Sílvia
Corbetta Gavina
Cuchí Burgos Alberto
Díaz Egea José Ángel
Español Espinar Clara
Espejo Carlos
Estrada Miquel
Fernàndez Bravo Marina
Fernández Sedas Cristian
Ferrando Elisa
Gabilondo Maite
Gonzalez Leiva Juan
Ibars Arán Jordi
Jiménez Núria
Jorba Genoveva
Julià Alba
Kourkoutas Konstantinos
Llop Torné Carles Joan
Lorite Miriam
Maria Nogué Josep
Martin Nathalie
Mas Carla
Medina Yolanda
Milà Cartanyà Gemma
Moreno David
Morón Elena
Moya Rojano Pedro
Navarro Silvia
Ojeda María Elisa
Pablo Albert de
Padrisa Dalmases Marti
Pérez Marçal
Roca Bosch Elisabet
Rocamonde Lourido Javier
Roncador Giovani
Rony Gaëlle
Rossinyol Bernat
Sabatés Blasi Núria
Sentís Lucas
Serra Ureta Oriol
Serracant Camps Maties
Sobré Obradors Pere
Soriano Leticia
Sánchez Berta
Torelló Laura
Vera Natalia
Virolleaud Mattieu
Publication venue
Publication date: 25/05/2011
Field of study

UPCommons. Portal del coneixement obert de la UPC