8 research outputs found

    Output-feedback online optimal control for a class of nonlinear systems

    Full text link
    In this paper an output-feedback model-based reinforcement learning (MBRL) method for a class of second-order nonlinear systems is developed. The control technique uses exact model knowledge and integrates a dynamic state estimator within the model-based reinforcement learning framework to achieve output-feedback MBRL. Simulation results demonstrate the efficacy of the developed method

    Successor-Predecessor Intrinsic Exploration

    Full text link
    Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures of future prospects of states, ignoring the information contained in the retrospective structure of transition sequences. Here we argue that the agent can utilise retrospective information to generate explorative behaviour with structure-awareness, facilitating efficient exploration based on global instead of local information. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games

    Successor-Predecessor Intrinsic Exploration

    Get PDF
    Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures of future prospects of states, ignoring the information contained in the retrospective structure of transition sequences. Here we argue that the agent can utilise retrospective information to generate explorative behaviour with structure-awareness, facilitating efficient exploration based on global instead of local information. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games

    Reinforcement learning in process control and optimization

    Full text link
    Podatki postajajo glavni vir 21 stoletja. Učenje in obdelava vseh teh podatkov presega sposobnosti in zmogljivosti človeka zato je uporaba strojev neizbežna. Med naborom paradigem strojnega učenja je posebej zanimivo spodbujevano učenje vendar kako se to umešča v vodenje procesov, kakšne so posebnosti, delovni okvirji, teh informacij ni na voljo. V okviru naloge smo raziskali in preučili teoretično osnovo paradigme, različne scenarije in problematike ter preizkusili in medsebojno primerjali nekatera delovna okolja. Rezultat je umestitev paradigme v področje vodenja in optimizacije ter pregled strojnega učenja na splošno. Glavni del predstavlja ključne gradnike in teoretično osnovo paradigme s pregledom glavnih algoritmov in njihovih lastnosti in tipičnih scenarijev uporabe in problematik znotraj same paradigme. Vsebinsko so predstavljane tri javno dostopne odprtokodne knjižnice in ena spletna storitev, ki kot take predstavljajo delovna in razvoja okolja. Nakazane so smernice in izhodišča za nadaljevanje študija in raziskovanja. Čeprav so algoritmi spodbujevanega učenja počasnejši v primerjavi z algoritmi v drugih paradigmah učenja, imajo širše področje uporabe in potencial za izgradnjo boljših samo učečih se strojev.Data is becoming the prime 21st century resource. Learning and processing all of this data surpasses human capability and capacity, meaning machines are unavoidable. Amongst the many machine learning paradigms, Reinforcement Learning is of especial interesthowever, there is no information as to how the latter be included in process management, specifics and frameworks. Within the framework of this thesis, we researched and examined the theoretical basis for this paradigm, the various scenarios and problems, and tested and compared some of work environments, resulting in the paradigm’s inclusion in the area of processes control and optimisation, as well as providing an overview of machine learning in general. The bulk of this work presents the key building blocks and basis for the paradigm, focusing on its main algorithms and their characteristics. It also presents typical use scenarios and inherent problems within the paradigm itself. We present three public open-source libraries and one web-based service as examples of work and development environments. This thesis also presents guidelines and starting points for further study and research. Even though reinforced learning algorithms are slower when compared to other learning paradigms, they have a much wider scope of use and the potential to produce better autonomous learning machines
    corecore