247 research outputs found

    Towards Control-Centric Representations in Reinforcement Learning from Images

    Full text link
    Image-based Reinforcement Learning is a practical yet challenging task. A major hurdle lies in extracting control-centric representations while disregarding irrelevant information. While approaches that follow the bisimulation principle exhibit the potential in learning state representations to address this issue, they still grapple with the limited expressive capacity of latent dynamics and the inadaptability to sparse reward environments. To address these limitations, we introduce ReBis, which aims to capture control-centric information by integrating reward-free control information alongside reward-specific knowledge. ReBis utilizes a transformer architecture to implicitly model the dynamics and incorporates block-wise masking to eliminate spatiotemporal redundancy. Moreover, ReBis combines bisimulation-based loss with asymmetric reconstruction loss to prevent feature collapse in environments with sparse rewards. Empirical studies on two large benchmarks, including Atari games and DeepMind Control Suit, demonstrate that ReBis has superior performance compared to existing methods, proving its effectiveness

    Pretraining in Deep Reinforcement Learning: A Survey

    Full text link
    The past few years have seen rapid progress in combining reinforcement learning (RL) with deep learning. Various breakthroughs ranging from games to robotics have spurred the interest in designing sophisticated RL algorithms and systems. However, the prevailing workflow in RL is to learn tabula rasa, which may incur computational inefficiency. This precludes continuous deployment of RL algorithms and potentially excludes researchers without large-scale computing resources. In many other areas of machine learning, the pretraining paradigm has shown to be effective in acquiring transferable knowledge, which can be utilized for a variety of downstream tasks. Recently, we saw a surge of interest in Pretraining for Deep RL with promising results. However, much of the research has been based on different experimental settings. Due to the nature of RL, pretraining in this field is faced with unique challenges and hence requires new design principles. In this survey, we seek to systematically review existing works in pretraining for deep reinforcement learning, provide a taxonomy of these methods, discuss each sub-field, and bring attention to open problems and future directions

    Reinforcement learning in large state action spaces

    Get PDF
    Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios. This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory). In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications

    Formální komponentový model pro mobilní architektury

    Get PDF
    Disertační práce se zabývá modelováním komponentových systémů a formálním popisem jejich chování. Řešení je založeno na vlastním komponentovém modelu, který je popsán meta-modelem, z logického pohledu, a popisem v pi-kalkulu, z procesního pohledu. Je ukázáno, že komponentový model pokrývá dynamické aspekty softwarových architektur včetně mobility jejich komponent. Dále je popsán způsob modelování chování v architekturách orientovaných na služby a přechod ke komponentovým systémům. Chování konkrétní architektury orientované na služby lze pak vyjádřit jako jediný proces v pi-kalkulu. V závěru práce je navržené řešení ověřeno na případové studii prostředí pro testování kritických aplikací. Přínosem disertační práce je zejména zmíněná podpora dynamických architektur a integrace s architekturami orientovanými na služby.In the thesis, we propose an approach to modelling of component-based systems and formal description of their behaviour. The approach is based on a novel component model defined by a metamodel in a logical view and by description in the pi-calculus in a process view. We show that the component model addresses the dynamic aspects of software architectures including the component mobility. Furthermore, we propose a method of behavioural modelling of service-oriented architectures to pass smoothly from service level to component level and to describe behaviour of a whole system, services and components, as a single pi-calculus process. Finally, we illustrate an application of our approach on a case study of an environment for functional testing of complex safety-critical systems. The support of dynamic architecture and the integration with service-oriented architecture compromise the main advantages of our approach.Katedra softwarového inženýrstvíDepartment of Software EngineeringFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Towards Coordination and Control of Multi-robot Systems

    Get PDF

    An Intelligence-Aware Process Calculus for Multi-Agent System Modeling

    Get PDF
    In this paper we propose an agent modeling language named CAML that provides a comprehensive framework for representing all relevant aspects of a multi-agent system: specially, its configuration and the reasoning abilities of its constituent agents. The configuration modeling aspect of the language supports natural grouping and mobility, and the reasoning framework is inspired by an extension of the popular BDI theory of modeling cognitive skills of agents. We present the motivation behind the development of the language, its syntax, and an informal semantics

    Symbolic planning for heterogeneous robots through composition of their motion description languages

    Get PDF
    This dissertation introduces a new formalism to define compositions of interacting heterogeneous systems, described by extended motion description languages (MDLes). The properties of the composition system are analyzed and an automatic process to generate sequential atom plan is introduced. The novelty of the formalism is in producing a composed system with a behavior that could be a superset of the union of the behaviors of its generators. As robotic systems perform increasingly complex tasks, people resort increasingly to switching or hybrid control algorithms. A need arises for a formalism to compose different robotic behaviors and meet a final target. The significant work produced to date on various aspects of robotics arguably has not yet effectively captured the interaction between systems. Another problem in motion control is automating the process of planning and it has been recognized that there is a gap between high level planning algorithms and low level motion control implementation. This dissertation is an attempt to address these problems. A new composition system is given and the properties are checked. We allow systems to have additional cooperative transitions and become active only when the systems are composed with other systems appropriately. We distinguish between events associated with transitions a push-down automaton representing an MDLe can take autonomously, and events that cannot initiate transitions. Among the latter, there can be events that when synchronized with some of another push-down automaton, become active and do initiate transitions. We identify MDLes as recursive systems in some basic process algebra (BPA) written in Greibach Normal Form. By identifying MDLes as a subclass of BPAs, we are able to borrow the syntax and semantics of the BPAs merge operator (instead of defining a new MDLe operator), and thus establish closeness and decidability properties for MDLe compositions. We introduce an instance of the sliding block puzzle as a multi-robot hybrid system. We automate the process of planning and dictate how the behaviors are sequentially synthesized into plans that drive the system into a desired state. The decidability result gives us hope to abstract the system to the point that some of the available model checkers can be used to construct motion plans. The new notion of system composition allows us to capture the interaction between systems and we realize that the whole system can do more than the sum of its parts. The framework can be used on groups of heterogeneous robotic systems to communicate and allocate tasks among themselves, and sort through possible solutions to find a plan of action without human intervention or guidance

    Proceedings of the 2012 Workshop on Ambient Intelligence Infrastructures (WAmIi)

    Get PDF
    This is a technical report including the papers presented at the Workshop on Ambient Intelligence Infrastructures (WAmIi) that took place in conjunction with the International Joint Conference on Ambient Intelligence (AmI) in Pisa, Italy on November 13, 2012. The motivation for organizing the workshop was the wish to learn from past experience on Ambient Intelligence systems, and in particular, on the lessons learned on the system architecture of such systems. A significant number of European projects and other research have been performed, often with the goal of developing AmI technology to showcase AmI scenarios. We believe that for AmI to become further successfully accepted the system architecture is essential