Search CORE

639 research outputs found

Reinforcement Learning of Action and Query Policies with LTL Instructions under Uncertain Event Detector

Author: Hatanaka Wataru
Matsubara Takamitsu
Yamashina Ryota
Publication venue
Publication date: 06/09/2023
Field of study

Reinforcement learning (RL) with linear temporal logic (LTL) objectives can allow robots to carry out symbolic event plans in unknown environments. Most existing methods assume that the event detector can accurately map environmental states to symbolic events; however, uncertainty is inevitable for real-world event detectors. Such uncertainty in an event detector generates multiple branching possibilities on LTL instructions, confusing action decisions. Moreover, the queries to the uncertain event detector, necessary for the task's progress, may increase the uncertainty further. To cope with those issues, we propose an RL framework, Learning Action and Query over Belief LTL (LAQBL), to learn an agent that can consider the diversity of LTL instructions due to uncertain event detection while avoiding task failure due to the unnecessary event-detection query. Our framework simultaneously learns 1) an embedding of belief LTL, which is multiple branching possibilities on LTL instructions using a graph neural network, 2) an action policy, and 3) a query policy which decides whether or not to query for the event detector. Simulations in a 2D grid world and image-input robotic inspection environments show that our method successfully learns actions to follow LTL instructions even with uncertain event detectors.Comment: 8 pages, Accepted by Robotics and Automation Letters (RA-L

arXiv.org e-Print Archive

Noisy Symbolic Abstractions for Deep RL: A case study with Reward Machines

Author: Chen Zizhao
Icarte Rodrigo Toro
Klassen Toryn Q.
Li Andrew C.
McIlraith Sheila A.
Vaezipoor Pashootan
Publication venue
Publication date: 23/11/2022
Field of study

Natural and formal languages provide an effective mechanism for humans to specify instructions and reward functions. We investigate how to generate policies via RL when reward functions are specified in a symbolic language captured by Reward Machines, an increasingly popular automaton-inspired structure. We are interested in the case where the mapping of environment state to a symbolic (here, Reward Machine) vocabulary -- commonly known as the labelling function -- is uncertain from the perspective of the agent. We formulate the problem of policy learning in Reward Machines with noisy symbolic abstractions as a special class of POMDP optimization problem, and investigate several methods to address the problem, building on existing and new techniques, the latter focused on predicting Reward Machine state, rather than on grounding of individual symbols. We analyze these methods and evaluate them experimentally under varying degrees of uncertainty in the correct interpretation of the symbolic vocabulary. We verify the strength of our approach and the limitation of existing methods via an empirical investigation on both illustrative, toy domains and partially observable, deep RL domains.Comment: NeurIPS Deep Reinforcement Learning Workshop 202

arXiv.org e-Print Archive

A framework for simultaneous task allocation and planning under uncertainty

Author: Faruq Fatma
Hawes Nick
Lacerda Bruno
Parker David
Publication venue: Association for Computing Machinery
Publication date: 28/05/2024
Field of study

We present novel techniques for simultaneous task allocation and planning in multi-robot systems operating under uncertainty. By performing task allocation and planning simultaneously, allocations are informed by individual robot behaviour, creating more efficient team behaviour. We go beyond existing work by planning for task reallocation across the team given a model of partial task satisfaction under potential robot failures and uncertain action outcomes. We model the problem using Markov decision processes, with tasks encoded in co-safe linear temporal logic, and optimise for the expected number of tasks completed by the team. To avoid the inherent complexity of joint models, we propose an alternative model that simultaneously considers task allocation and planning, but in a sequential fashion. We then build a joint policy from the sequential policy obtained from our model, thus allowing for concurrent policy execution. Furthermore, to enable adaptation in the case of robot failures, we consider replanning from failure states and propose an approach to preemptively replan in an anytime fashion, replanning for more probable failure states first. Our method also allows us to quantify the performance of the team by providing an analysis of properties such as the expected number of completed tasks under concurrent policy execution. We implement and extensively evaluate our approach on a range of scenarios. We compare its performance to a state-of-the-art baseline in decoupled task allocation and planning: sequential single-item auctions. Our approach outperforms the baseline in terms of computation time and the number of times replanning is required on robot failure

Oxford University Research Archive

The shoe industry of Marikina City, Philippines: a developing country cluster in crisis

Author: Allen J. Scott
Publication venue
Publication date
Field of study

I initiate the discussion with a few general remarks on industrial clusters and commodity chains. I describe the main features of the shoe industry in the Philippines. The core of the industry is located in Marikina City in the northeast of the Manila Metropolitan Area. I provide a detailed account of the internal structure and changing fortunes of this cluster. The deeply-rooted failures of the cluster since the early 1990s are pinpointed. I show that these can be directly related to the liberalization of the Filipino economy, and the concomitant increase in Chinese-made shoes on domestic markets. Various private and public responses to the crisis are described and evaluated. I argue that as helpful as many of these responses may be, their overall impact is likely to remain limited. I enumerate a series of possible policy options, but I also emphasize the high risks of failure. I try, in particular, to provide a developmental scenario based on cluster upgrading and intensified export activity.shoe industry, industrial districts, regional development, clustering, agglomeration

Research Papers in Economics

Grounding Complex Natural Language Commands for Temporal Tasks in Unseen Environments

Author: Idrees Ifrah
Liang Sam
Liu Jason Xinyu
Schornstein Benjamin
Shah Ankit
Tellex Stefanie
Yang Ziyi
Publication venue
Publication date: 17/10/2023
Field of study

Grounding navigational commands to linear temporal logic (LTL) leverages its unambiguous semantics for reasoning about long-horizon tasks and verifying the satisfaction of temporal constraints. Existing approaches require training data from the specific environment and landmarks that will be used in natural language to understand commands in those environments. We propose Lang2LTL, a modular system and a software package that leverages large language models (LLMs) to ground temporal navigational commands to LTL specifications in environments without prior language data. We comprehensively evaluate Lang2LTL for five well-defined generalization behaviors. Lang2LTL demonstrates the state-of-the-art ability of a single model to ground navigational commands to diverse temporal specifications in 21 city-scaled environments. Finally, we demonstrate a physical robot using Lang2LTL can follow 52 semantically diverse navigational commands in two indoor environments.Comment: Conference on Robot Learning 202

arXiv.org e-Print Archive

Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning

Author: Mansinghka Vikash
Tenenbaum Joshua B.
Ying Lance
Zhi-Xuan Tan
Publication venue
Publication date: 27/02/2024
Field of study

People often give instructions whose meaning is ambiguous without further context, expecting that their actions or goals will disambiguate their intentions. How can we build assistive agents that follow such instructions in a flexible, context-sensitive manner? This paper introduces cooperative language-guided inverse plan search (CLIPS), a Bayesian agent architecture for pragmatic instruction following and goal assistance. Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant, then performs multimodal Bayesian inference over the human's goal from actions and language, using large language models (LLMs) to evaluate the likelihood of an instruction given a hypothesized plan. Given this posterior, our assistant acts to minimize expected goal achievement cost, enabling it to pragmatically follow ambiguous instructions and provide effective assistance even when uncertain about the goal. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome), finding that CLIPS significantly outperforms GPT-4V, LLM-based literal instruction following and unimodal inverse planning in both accuracy and helpfulness, while closely matching the inferences and assistive judgments provided by human raters.Comment: Accepted to AAMAS 2024. 8 pages (excl. references), 5 figures/tables. (Appendix: 8 pages, 8 figures/tables). Code available at: https://github.com/probcomp/CLIPS.j

arXiv.org e-Print Archive

He is a very naughty translator. An analysis of english/spanish humor translation devices in the comedy film Life of Brian

Author: Cortiñas Díez José Ramón
Publication venue
Publication date: 01/01/2017
Field of study

Timeless comedy is clearly the authentic successful one. It is not possible to find an equivalent in spatial terms; the so-called "universal comedy" is just an ideal. Monty Python's Life of Brian represents the apex of this type of humor whose stillness challenges the necessary evolution of everything else including the language, main Pythonesque cornerstone. This tension between opposing forces increases as we introduce the aforementioned notion of space in terms of culture. Hence, it is interesting to examine whether our culture is able to capture this motionlessness without fossilizing it. The present paper analyzes the different methods used in the translation of Life of Brian in order to reach a perspective regarding how its characteristic features seem affected by the Target Language.La única comedia que puede considerarse exitosa es la atemporal. No es posible encontrar un equivalente en términos de espacio; la llamada "comedia universal" no es más que un ideal. La vida de Brian de Monty Python supone el punto más alto de este tipo de humor cuya inalterabilidad se enfrenta a la evolución necesaria de todo lo demás, incluyendo la piedra angular del grupo británico: el lenguaje. El anterior concepto de espacio aplicado a la cultura no hace más que aumentar la tensión entre estas fuerzas opuestas. Por tanto, es sin duda interesante examinar si nuestra cultura es capaz de capturar esta inmutabilidad sin limitarla a la época en la que fue traducida. Este trabajo analiza los diferentes métodos empleados en la traducción de La vida de Brian con el objetivo de ver cómo la Lengua Meta afecta a sus principales características.Departamento de Filología InglesaGrado en Estudios Inglese

Repositorio Documental de la Universidad de Valladolid