36,236 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
From Manifesta to Krypta: The Relevance of Categories for Trusting Others
In this paper we consider the special abilities needed by agents for assessing trust based on inference and reasoning. We analyze the case in which it is possible to infer trust towards unknown counterparts by reasoning on abstract classes or categories of agents shaped in a concrete application domain. We present a scenario of interacting agents providing a computational model implementing different strategies to assess trust. Assuming a medical domain, categories, including both competencies and dispositions of possible trustees, are exploited to infer trust towards possibly unknown counterparts. The proposed approach for the cognitive assessment of trust relies on agents' abilities to analyze heterogeneous information sources along different dimensions. Trust is inferred based on specific observable properties (Manifesta), namely explicitly readable signals indicating internal features (Krypta) regulating agents' behavior and effectiveness on specific tasks. Simulative experiments evaluate the performance of trusting agents adopting different strategies to delegate tasks to possibly unknown trustees, while experimental results show the relevance of this kind of cognitive ability in the case of open Multi Agent Systems
Exploration in Gradient-Based Reinforcement Learning
Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements
Planning for perception and perceiving for decision: POMDP-like online optimization in large complex robotics missions
This ongoing phD work aims at proposing a unified framework to optimize both perception and task planning using extended Partially Observable Markov Decision Processes (POMDPs). Targeted applications are large complex aerial robotics missions where the problem is too large to be solved off-line, and acquiring information about the environment is as important as achieving some symbolic goals. Challenges of this work include: (1) optimizing a dual objective in a single decision-theoretic framework, i.e. environment perception and goal achievement ; (2) properly dealing with action preconditions on belief states in order to guarantee safety constraints or physical limitations, what is crucial in aerial robotics ; (3) modeling the symbolic output of image processing algorithms as input of the POMDP's observation function ; (4) parallel optimization and execution of POMDP policies in constrained time. A global view of each of these topics are presented, as well as some ongoing experimental results
Dispersive Elastodynamics of 1D Banded Materials and Structures: Design
Within periodic materials and structures, wave scattering and dispersion
occur across constituent material interfaces leading to a banded frequency
response. In an earlier paper, the elastodynamics of one-dimensional periodic
materials and finite structures comprising these materials were examined with
an emphasis on their frequency-dependent characteristics. In this work, a novel
design paradigm is presented whereby periodic unit cells are designed for
desired frequency band properties, and with appropriate scaling, these cells
are used as building blocks for forming fully periodic or partially periodic
structures with related dynamical characteristics. Through this multiscale
dispersive design methodology, which is hierarchical and integrated, structures
can be devised for effective vibration or shock isolation without needing to
employ dissipative damping mechanisms. The speed of energy propagation in a
designed structure can also be dictated through synthesis of the unit cells.
Case studies are presented to demonstrate the effectiveness of the methodology
for several applications. Results are given from sensitivity analyses that
indicate a high level of robustness to geometric variation.Comment: 33 text pages, 27 figure
Text-based Adventures of the Golovin AI Agent
The domain of text-based adventure games has been recently established as a
new challenge of creating the agent that is both able to understand natural
language, and acts intelligently in text-described environments.
In this paper, we present our approach to tackle the problem. Our agent,
named Golovin, takes advantage of the limited game domain. We use genre-related
corpora (including fantasy books and decompiled games) to create language
models suitable to this domain. Moreover, we embed mechanisms that allow us to
specify, and separately handle, important tasks as fighting opponents, managing
inventory, and navigating on the game map.
We validated usefulness of these mechanisms, measuring agent's performance on
the set of 50 interactive fiction games. Finally, we show that our agent plays
on a level comparable to the winner of the last year Text-Based Adventure AI
Competition
Local Goals Driven Hierarchical Reinforcement Learning
* This research was partially supported by the Latvian Science Foundation under grant No.02-86d.Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous
approaches to exploration in reinforcement learning usually address exploration in the case when the
environment is fully observable. In contrast, the current paper, like the previous paper [Pch2003], studies the
case when the environment is only partially observable. One additional difficulty is considered – complex
temporal dependencies. In order to overcome this additional difficulty a new hierarchical reinforcement learning
algorithm is proposed. The learning algorithm exploits a very simple learning principle, similar to Q-learning,
except the lookup table has one more variable – the currently selected goal. Additionally, the algorithm uses the
idea of internal reward for achieving hard-to-reach states [Pch2003]. The proposed learning algorithm is
experimentally investigated in partially observable maze problems where it shows a robust ability to learn a good
policy
Egocentric Planning for Scalable Embodied Task Achievement
Embodied agents face significant challenges when tasked with performing
actions in diverse environments, particularly in generalizing across object
types and executing suitable actions to accomplish tasks. Furthermore, agents
should exhibit robustness, minimizing the execution of illegal actions. In this
work, we present Egocentric Planning, an innovative approach that combines
symbolic planning and Object-oriented POMDPs to solve tasks in complex
environments, harnessing existing models for visual perception and natural
language processing. We evaluated our approach in ALFRED, a simulated
environment designed for domestic tasks, and demonstrated its high scalability,
achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and
winning the ALFRED challenge at CVPR Embodied AI workshop. Our method requires
reliable perception and the specification or learning of a symbolic description
of the preconditions and effects of the agent's actions, as well as what object
types reveal information about others. It is capable of naturally scaling to
solve new tasks beyond ALFRED, as long as they can be solved using the
available skills. This work offers a solid baseline for studying end-to-end and
hybrid methods that aim to generalize to new tasks, including recent approaches
relying on LLMs, but often struggle to scale to long sequences of actions or
produce robust plans for novel tasks
- …