36,236 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    From Manifesta to Krypta: The Relevance of Categories for Trusting Others

    No full text
    In this paper we consider the special abilities needed by agents for assessing trust based on inference and reasoning. We analyze the case in which it is possible to infer trust towards unknown counterparts by reasoning on abstract classes or categories of agents shaped in a concrete application domain. We present a scenario of interacting agents providing a computational model implementing different strategies to assess trust. Assuming a medical domain, categories, including both competencies and dispositions of possible trustees, are exploited to infer trust towards possibly unknown counterparts. The proposed approach for the cognitive assessment of trust relies on agents' abilities to analyze heterogeneous information sources along different dimensions. Trust is inferred based on specific observable properties (Manifesta), namely explicitly readable signals indicating internal features (Krypta) regulating agents' behavior and effectiveness on specific tasks. Simulative experiments evaluate the performance of trusting agents adopting different strategies to delegate tasks to possibly unknown trustees, while experimental results show the relevance of this kind of cognitive ability in the case of open Multi Agent Systems

    Exploration in Gradient-Based Reinforcement Learning

    Get PDF
    Gradient-based policy search is an alternative to value-function-based methods for reinforcement learning in non-Markovian domains. One apparent drawback of policy search is its requirement that all actions be 'on-policy'; that is, that there be no explicit exploration. In this paper, we provide a method for using importance sampling to allow any well-behaved directed exploration policy during learning. We show both theoretically and experimentally that using this method can achieve dramatic performance improvements

    Planning for perception and perceiving for decision: POMDP-like online optimization in large complex robotics missions

    Get PDF
    This ongoing phD work aims at proposing a unified framework to optimize both perception and task planning using extended Partially Observable Markov Decision Processes (POMDPs). Targeted applications are large complex aerial robotics missions where the problem is too large to be solved off-line, and acquiring information about the environment is as important as achieving some symbolic goals. Challenges of this work include: (1) optimizing a dual objective in a single decision-theoretic framework, i.e. environment perception and goal achievement ; (2) properly dealing with action preconditions on belief states in order to guarantee safety constraints or physical limitations, what is crucial in aerial robotics ; (3) modeling the symbolic output of image processing algorithms as input of the POMDP's observation function ; (4) parallel optimization and execution of POMDP policies in constrained time. A global view of each of these topics are presented, as well as some ongoing experimental results

    Dispersive Elastodynamics of 1D Banded Materials and Structures: Design

    Full text link
    Within periodic materials and structures, wave scattering and dispersion occur across constituent material interfaces leading to a banded frequency response. In an earlier paper, the elastodynamics of one-dimensional periodic materials and finite structures comprising these materials were examined with an emphasis on their frequency-dependent characteristics. In this work, a novel design paradigm is presented whereby periodic unit cells are designed for desired frequency band properties, and with appropriate scaling, these cells are used as building blocks for forming fully periodic or partially periodic structures with related dynamical characteristics. Through this multiscale dispersive design methodology, which is hierarchical and integrated, structures can be devised for effective vibration or shock isolation without needing to employ dissipative damping mechanisms. The speed of energy propagation in a designed structure can also be dictated through synthesis of the unit cells. Case studies are presented to demonstrate the effectiveness of the methodology for several applications. Results are given from sensitivity analyses that indicate a high level of robustness to geometric variation.Comment: 33 text pages, 27 figure

    Text-based Adventures of the Golovin AI Agent

    Full text link
    The domain of text-based adventure games has been recently established as a new challenge of creating the agent that is both able to understand natural language, and acts intelligently in text-described environments. In this paper, we present our approach to tackle the problem. Our agent, named Golovin, takes advantage of the limited game domain. We use genre-related corpora (including fantasy books and decompiled games) to create language models suitable to this domain. Moreover, we embed mechanisms that allow us to specify, and separately handle, important tasks as fighting opponents, managing inventory, and navigating on the game map. We validated usefulness of these mechanisms, measuring agent's performance on the set of 50 interactive fiction games. Finally, we show that our agent plays on a level comparable to the winner of the last year Text-Based Adventure AI Competition

    Local Goals Driven Hierarchical Reinforcement Learning

    Get PDF
    * This research was partially supported by the Latvian Science Foundation under grant No.02-86d.Efficient exploration is of fundamental importance for autonomous agents that learn to act. Previous approaches to exploration in reinforcement learning usually address exploration in the case when the environment is fully observable. In contrast, the current paper, like the previous paper [Pch2003], studies the case when the environment is only partially observable. One additional difficulty is considered – complex temporal dependencies. In order to overcome this additional difficulty a new hierarchical reinforcement learning algorithm is proposed. The learning algorithm exploits a very simple learning principle, similar to Q-learning, except the lookup table has one more variable – the currently selected goal. Additionally, the algorithm uses the idea of internal reward for achieving hard-to-reach states [Pch2003]. The proposed learning algorithm is experimentally investigated in partially observable maze problems where it shows a robust ability to learn a good policy

    Egocentric Planning for Scalable Embodied Task Achievement

    Full text link
    Embodied agents face significant challenges when tasked with performing actions in diverse environments, particularly in generalizing across object types and executing suitable actions to accomplish tasks. Furthermore, agents should exhibit robustness, minimizing the execution of illegal actions. In this work, we present Egocentric Planning, an innovative approach that combines symbolic planning and Object-oriented POMDPs to solve tasks in complex environments, harnessing existing models for visual perception and natural language processing. We evaluated our approach in ALFRED, a simulated environment designed for domestic tasks, and demonstrated its high scalability, achieving an impressive 36.07% unseen success rate in the ALFRED benchmark and winning the ALFRED challenge at CVPR Embodied AI workshop. Our method requires reliable perception and the specification or learning of a symbolic description of the preconditions and effects of the agent's actions, as well as what object types reveal information about others. It is capable of naturally scaling to solve new tasks beyond ALFRED, as long as they can be solved using the available skills. This work offers a solid baseline for studying end-to-end and hybrid methods that aim to generalize to new tasks, including recent approaches relying on LLMs, but often struggle to scale to long sequences of actions or produce robust plans for novel tasks
    • …
    corecore