72,279 research outputs found

    Reinforcement learning with restrictions on the action set

    Full text link
    Consider a 2-player normal-form game repeated over time. We introduce an adaptive learning procedure, where the players only observe their own realized payoff at each stage. We assume that agents do not know their own payoff function, and have no information on the other player. Furthermore, we assume that they have restrictions on their own action set such that, at each stage, their choice is limited to a subset of their action set. We prove that the empirical distributions of play converge to the set of Nash equilibria for zero-sum and potential games, and games where one player has two actions.Comment: 28 page

    Reward-Relevance-Filtered Linear Offline Reinforcement Learning

    Full text link
    This paper studies offline reinforcement learning with linear function approximation in a setting with decision-theoretic, but not estimation sparsity. The structural restrictions of the data-generating process presume that the transitions factor into a sparse component that affects the reward and could affect additional exogenous dynamics that do not affect the reward. Although the minimally sufficient adjustment set for estimation of full-state transition properties depends on the whole state, the optimal policy and therefore state-action value function depends only on the sparse component: we call this causal/decision-theoretic sparsity. We develop a method for reward-filtering the estimation of the state-action value function to the sparse component by a modification of thresholded lasso in least-squares policy evaluation. We provide theoretical guarantees for our reward-filtered linear fitted-Q-iteration, with sample complexity depending only on the size of the sparse component.Comment: conference version accepted at AISTATS 202

    Intelligent Scheduling Method for Bulk Cargo Terminal Loading Process Based on Deep Reinforcement Learning

    Get PDF
    Funding Information: Funding: This research was funded by the National Natural Science Foundation of China under Grant U1964201 and Grant U21B6001, the Major Scientific and Technological Special Project of Hei-longjiang Province under Grant 2021ZX05A01, the Heilongjiang Natural Science Foundation under Grant LH2019F020, and the Major Scientific and Technological Research Project of Ningbo under Grant 2021Z040. Publisher Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland.Sea freight is one of the most important ways for the transportation and distribution of coal and other bulk cargo. This paper proposes a method for optimizing the scheduling efficiency of the bulk cargo loading process based on deep reinforcement learning. The process includes a large number of states and possible choices that need to be taken into account, which are currently performed by skillful scheduling engineers on site. In terms of modeling, we extracted important information based on actual working data of the terminal to form the state space of the model. The yard information and the demand information of the ship are also considered. The scheduling output of each convey path from the yard to the cabin is the action of the agent. To avoid conflicts of occupying one machine at same time, certain restrictions are placed on whether the action can be executed. Based on Double DQN, an improved deep reinforcement learning method is proposed with a fully connected network structure and selected action sets according to the value of the network and the occupancy status of environment. To make the network converge more quickly, an improved new epsilon-greedy exploration strategy is also proposed, which uses different exploration rates for completely random selection and feasible random selection of actions. After training, an improved scheduling result is obtained when the tasks arrive randomly and the yard state is random. An important contribution of this paper is to integrate the useful features of the working time of the bulk cargo terminal into a state set, divide the scheduling process into discrete actions, and then reduce the scheduling problem into simple inputs and outputs. Another major contribution of this article is the design of a reinforcement learning algorithm for the bulk cargo terminal scheduling problem, and the training efficiency of the proposed algorithm is improved, which provides a practical example for solving bulk cargo terminal scheduling problems using reinforcement learning.publishersversionpublishe

    Answer Set Programming for Non-Stationary Markov Decision Processes

    Full text link
    Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains

    Developing social action capabilities in a humanoid robot using an interaction history architecture

    Get PDF
    “This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." “Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.” DOI: 10.1109/ICHR.2008.4756013We present experimental results for the humanoid robot Kaspar2 engaging in a simple “peekaboo” interaction game with a human partner. The robot develops the capability to engage in the game by using its history of interactions coupled with audio and visual feedback from the interaction partner to continually generate increasingly appropriate behaviour. The robot also uses facial expressions to feedback its level of reward to the partner. The results support the hypothesis that reinforcement of time-extended experiences through interaction allows a robot to act appropriately in an interaction
    corecore