10 research outputs found
Bootstrapping a DQN Replay Memory with Synthetic Experiences
An important component of many Deep Reinforcement Learning algorithms is the
Experience Replay which serves as a storage mechanism or memory of made
experiences. These experiences are used for training and help the agent to
stably find the perfect trajectory through the problem space. The classic
Experience Replay however makes only use of the experiences it actually made,
but the stored samples bear great potential in form of knowledge about the
problem that can be extracted. We present an algorithm that creates synthetic
experiences in a nondeterministic discrete environment to assist the learner.
The Interpolated Experience Replay is evaluated on the FrozenLake environment
and we show that it can support the agent to learn faster and even better than
the classic version
Averaging rewards as a first approach towards interpolated experience replay
Reinforcement learning and especially deep reinforcement learning are research areas which are getting more and more attention. The mathematical method of interpolation is used to get information of data points in an area where only neighboring samples are known and thus seems like a good expansion for the experience replay which is a major component of a variety of deep reinforcement learning methods. Interpolated experiences stored in the experience replay could speed up learning in the early phase and reduce the overall amount of exploration needed. A first approach of averaging rewards in a setting with unstable transition function and very low exploration is implemented and shows promising results that encourage further investigation
Synthetic experiences for accelerating DQN performance in discrete non-deterministic environments
State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment
D2.2 Uncertainty-aware sensor fusion in sensor networks
Uncertainty assignment for measurement values is a common process for single sensors, but this procedure grows in complexity for sensor networks. Often measured values are processed further in such networks and uncertainty must be evaluated for virtual values. A simple example is the fusion of homogeneous values and faulty or drifting sensors can harm the virtual value. We introduce a method from the field of key-comparison into the domain of sensor fusion. The method is evaluated in three different scenarios within an agent-framework