16 research outputs found

    Interpolation assisted deep reinforcement learning

    Get PDF
    Reinforcement Learning is a field of Machine Learning that, in contrast to the other prominent representatives, generates its training data during runtime in direct interaction with an environment. A sample of this training data is called experience and represents a state-transition that is caused by an action together with the corresponding reward signal. Experiences can be seen as a form of knowledge about the underlying dynamics of the environment and a common technique in the field of Reinforcement Learning is the so-called Experience Replay which stores and replays experiences that have been observed at some point in training. By doing so, sample efficiency can be increased as experiences are used many times for training instead of throwing them away after one update. As experiences are generated during runtime, the learner has to explore the state-space and it is only able to learn the dynamics of specific areas when it has been there sometime in the past. As mentioned earlier, experiences can be seen as knowledge about the underlying problem and it is possible to generate synthetic experiences of states that have not been visited yet based on stored real experiences of neighbouring states. Such synthetic experiences can be generated by means of interpolation and can further on be used to assist the learner with exploration. Also, sample efficiency can be increased even further as real experiences are used to generate synthetic ones. In this work, two different techniques are presented that make use of synthetic experiences to assist the learner. The first approach stores generated synthetic experiences in the buffer alongside real experiences and during training real, as well as synthetic experiences are drawn at random from the buffer. This mechanism is called Interpolated Experience Replay. The second approach leverages on the architectural design of the Deep Q-Network and uses synthetic experiences to enable training updates that take the full action-space into account. This second algorithm is called Full-Update DQN. As methods that combine interpolation with a replay buffer and model-free learning algorithms fit neither the definition of model-free, nor model-based, the new class Semi-Model-Based is introduced to cover them

    Bootstrapping a DQN Replay Memory with Synthetic Experiences

    Get PDF
    An important component of many Deep Reinforcement Learning algorithms is the Experience Replay which serves as a storage mechanism or memory of made experiences. These experiences are used for training and help the agent to stably find the perfect trajectory through the problem space. The classic Experience Replay however makes only use of the experiences it actually made, but the stored samples bear great potential in form of knowledge about the problem that can be extracted. We present an algorithm that creates synthetic experiences in a nondeterministic discrete environment to assist the learner. The Interpolated Experience Replay is evaluated on the FrozenLake environment and we show that it can support the agent to learn faster and even better than the classic version

    An Architectural Design for Measurement Uncertainty Evaluation in Cyber-Physical Systems

    Full text link
    Several use cases from the areas of manufacturing and process industry, require highly accurate sensor data. As sensors always have some degree of uncertainty, methods are needed to increase their reliability. The common approach is to regularly calibrate the devices to enable traceability according to national standards and Syst\`eme international (SI) units - which follows costly processes. However, sensor networks can also be represented as Cyber Physical Systems (CPS) and a single sensor can have a digital representation (Digital Twin) to use its data further on. To propagate uncertainty in a reliable way in the network, we present a system architecture to communicate measurement uncertainties in sensor networks utilizing the concept of Asset Administration Shells alongside methods from the domain of Organic Computing. The presented approach contains methods for uncertainty propagation as well as concepts from the Machine Learning domain that combine the need for an accurate uncertainty estimation. The mathematical description of the metrological uncertainty of fused or propagated values can be seen as a first step towards the development of a harmonized approach for uncertainty in distributed CPSs in the context of Industrie 4.0. In this paper, we present basic use cases, conceptual ideas and an agenda of how to proceed further on.Comment: accepted at FedCSIS 202

    Averaging rewards as a first approach towards interpolated experience replay

    Get PDF
    Reinforcement learning and especially deep reinforcement learning are research areas which are getting more and more attention. The mathematical method of interpolation is used to get information of data points in an area where only neighboring samples are known and thus seems like a good expansion for the experience replay which is a major component of a variety of deep reinforcement learning methods. Interpolated experiences stored in the experience replay could speed up learning in the early phase and reduce the overall amount of exploration needed. A first approach of averaging rewards in a setting with unstable transition function and very low exploration is implemented and shows promising results that encourage further investigation

    Synthetic experiences for accelerating DQN performance in discrete non-deterministic environments

    Get PDF
    State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment
    corecore