Search CORE

16 research outputs found

Interpolation assisted deep reinforcement learning

Author: Baron Pilar von Pilchau Wenzel
Publication venue
Publication date: 28/03/2024
Field of study

Reinforcement Learning is a field of Machine Learning that, in contrast to the other prominent representatives, generates its training data during runtime in direct interaction with an environment. A sample of this training data is called experience and represents a state-transition that is caused by an action together with the corresponding reward signal. Experiences can be seen as a form of knowledge about the underlying dynamics of the environment and a common technique in the field of Reinforcement Learning is the so-called Experience Replay which stores and replays experiences that have been observed at some point in training. By doing so, sample efficiency can be increased as experiences are used many times for training instead of throwing them away after one update. As experiences are generated during runtime, the learner has to explore the state-space and it is only able to learn the dynamics of specific areas when it has been there sometime in the past. As mentioned earlier, experiences can be seen as knowledge about the underlying problem and it is possible to generate synthetic experiences of states that have not been visited yet based on stored real experiences of neighbouring states. Such synthetic experiences can be generated by means of interpolation and can further on be used to assist the learner with exploration. Also, sample efficiency can be increased even further as real experiences are used to generate synthetic ones. In this work, two different techniques are presented that make use of synthetic experiences to assist the learner. The first approach stores generated synthetic experiences in the buffer alongside real experiences and during training real, as well as synthetic experiences are drawn at random from the buffer. This mechanism is called Interpolated Experience Replay. The second approach leverages on the architectural design of the Deep Q-Network and uses synthetic experiences to enable training updates that take the full action-space into account. This second algorithm is called Full-Update DQN. As methods that combine interpolation with a replay buffer and model-free learning algorithms fit neither the definition of model-free, nor model-based, the new class Semi-Model-Based is introduced to cover them

OPUS Augsburg

Bootstrapping a DQN Replay Memory with Synthetic Experiences

Author: Hähner Jörg
Stein Anthony
von Pilchau Wenzel Baron Pilar
Publication venue
Publication date: 01/01/2020
Field of study

An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples bear great potential in form of knowledge about the problem that can be extracted. We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner. The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Harmonization of heterogeneous asset administration shells

Author: Baron Pilar von Pilchau Wenzel
Corici Marius-Iulian
Gowtham Varun
Hähner Jörg
Jung Thomas Josef
Koutrakis Nikolaos-Stefanos
Magedanz Thomas
Polte Julian
Uhlmann Eckart
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

OPUS Augsburg

An Architectural Design for Measurement Uncertainty Evaluation in Cyber-Physical Systems

Author: Eichstädt Sascha
Frey Volker
Gowtham Varun
Gruber Maximilian
Hähner Jörg
Koutrakis Nikolaos-Stefanos
Polte Julian
Riedl Matthias
Tayyub Jawad
Uhlmann Eckart
von Pilchau Wenzel Pilar
Willner Alexander
Publication venue
Publication date: 01/01/2020
Field of study

Several use cases from the areas of manufacturing and process industry, require highly accurate sensor data. As sensors always have some degree of uncertainty, methods are needed to increase their reliability. The common approach is to regularly calibrate the devices to enable traceability according to national standards and Syst\`eme international (SI) units - which follows costly processes. However, sensor networks can also be represented as Cyber Physical Systems (CPS) and a single sensor can have a digital representation (Digital Twin) to use its data further on. To propagate uncertainty in a reliable way in the network, we present a system architecture to communicate measurement uncertainties in sensor networks utilizing the concept of Asset Administration Shells alongside methods from the domain of Organic Computing. The presented approach contains methods for uncertainty propagation as well as concepts from the Machine Learning domain that combine the need for an accurate uncertainty estimation. The mathematical description of the metrological uncertainty of fused or propagated values can be seen as a first step towards the development of a harmonized approach for uncertainty in distributed CPSs in the context of Industrie 4.0. In this paper, we present basic use cases, conceptual ideas and an agenda of how to proceed further on.Comment: accepted at FedCSIS 202

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

Averaging rewards as a first approach towards interpolated experience replay

Author: Baron Pilar von Pilchau Wenzel
Publication venue
Publication date: 22/03/2020
Field of study

Reinforcement learning and especially deep reinforcement learning are research areas which are getting more and more attention. The mathematical method of interpolation is used to get information of data points in an area where only neighboring samples are known and thus seems like a good expansion for the experience replay which is a major component of a variety of deep reinforcement learning methods. Interpolated experiences stored in the experience replay could speed up learning in the early phase and reduce the overall amount of exploration needed. A first approach of averaging rewards in a setting with unstable transition function and very low exploration is implemented and shows promising results that encourage further investigation

OPUS Augsburg

Combining machine learning with blockchain: benefits, approaches and challenges

Author: Baron Pilar von Pilchau Wenzel
Publication venue
Publication date: 13/02/2019
Field of study

OPUS Augsburg

Synthetic experiences for accelerating DQN performance in discrete non-deterministic environments

Author: Baron Pilar von Pilchau Wenzel
Hähner Jörg
Stein Anthony
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment

Multidisciplinary Digital Publishing Institute

OPUS Augsburg

Directory of Open Access Journals

Combining machine learning with blockchain: benefits, approaches and challenges

Author: Baron Pilar von Pilchau Wenzel
Hähner Jörg
Publication venue
Publication date: 13/02/2019
Field of study

Semi-model-based reinforcement learning in organic computing systems

Author: Hähner Jörg
Pilar von Pilchau Wenzel
Stein Anthony
Publication venue
Publication date: 01/01/2022
Field of study

OPUS Augsburg

Bootstrapping a DQN replay memory with synthetic experiences

Author: Baron Pilar von Pilchau Wenzel
Hähner Jörg
Stein Anthony
Publication venue: 'Scitepress'
Publication date: 01/01/2020
Field of study

OPUS Augsburg

Crossref