124 research outputs found

    Reinforcement learning in continuous state and action spaces

    Get PDF
    Many traditional reinforcement-learning algorithms have been designed for problems with small finite state and action spaces. Learning in such discrete problems can been difficult, due to noise and delayed reinforcements. However, many real-world problems have continuous state or action spaces, which can make learning a good decision policy even more involved. In this chapter we discuss how to automatically find good decision policies in continuous domains. Because analytically computing a good policy from a continuous model can be infeasible, in this chapter we mainly focus on methods that explicitly update a representation of a value function, a policy or both. We discuss considerations in choosing an appropriate representation for these functions and discuss gradient-based and gradient-free ways to update the parameters. We show how to apply these methods to reinforcement-learning problems and discuss many specific algorithms. Amongst others, we cover gradient-based temporal-difference learning, evolutionary strategies, policy-gradient algorithms and actor-critic methods. We discuss the advantages of different approaches and compare the performance of a state-of-the-art actor-critic method and a state-of-the-art evolutionary strategy empirically

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Approximate multi-agent planning in dynamic and uncertain environments

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, February 2012."December 2011." Cataloged from PDF version of thesis.Includes bibliographical references (p. 120-131).Teams of autonomous mobile robotic agents will play an important role in the future of robotics. Efficient coordination of these agents within large, cooperative teams is an important characteristic of any system utilizing multiple autonomous vehicles. Applications of such a cooperative technology stretch beyond multi-robot systems to include satellite formations, networked systems, traffic flow, and many others. The diversity of capabilities offered by a team, as opposed to an individual, has attracted the attention of both researchers and practitioners in part due to the associated challenges such as the combinatorial nature of joint action selection among interdependent agents. This thesis aims to address the issues of the issues of scalability and adaptability within teams of such inter-dependent agents while planning, coordinating, and learning in a decentralized environment. In doing so, the first focus is the integration of learning and adaptation algorithms into a multi-agent planning architecture to enable online adaptation of planner parameters. A second focus is the development of approximation algorithms to reduce the computational complexity of decentralized multi-agent planning methods. Such a reduction improves problem scalability and ultimately enables much larger robot teams. Finally, we are interested in implementing these algorithms in meaningful, real-world scenarios. As robots and unmanned systems continue to advance technologically, enabling a self-awareness as to their physical state of health will become critical. In this context, the architecture and algorithms developed in this thesis are implemented in both hardware and software flight experiments under a class of cooperative multi-agent systems we call persistent health management scenarios.by Joshua David Redding.Ph.D
    • …
    corecore