2 research outputs found

    Navigating a Robot through Big Visual Sensory Data

    Get PDF
    AbstractThis paper describes a reinforcement learning architecture that is capable of incorporating deeply learned feature representation of a robot's unknown working environment. An autoencoder is used along with convolutional and pooling layers to deduce the reduced feature representation based on a set of images taken by the agent. This representation is used to discover and learn the best route to navigate to a goal. The features are fed to an actor layer that can learn from a value function calculated by a second output layer. The policy is É›-greedy and the effect is similar to actor-critic architecture where temporal difference error is back propagated from the critic to the actor. This compact architecture helps in reducing the overhead of setting up a desired fully fledged actor-critic architecture that typically needs extra processing time. Hence, the model is ideal for dealing with lots of data coming from visual sensor that needs speedy processing. The processing is accomplished off board due to the limitation of the used robot but latency was compensated by the speedy processing. Adaptability for the different data sizes, critical to big data processing, is realized by the ability to shrink or expand the whole architecture to fit different deeply learned feature dimensions. This added flexibility is crucial for setting up such model since the space dimensionality is not known prior to operating in the environment. Initial experimental results on real robot show that the agent accomplished good level of accuracy and efficacy in reaching the goal

    Self-reflective deep reinforcement learning

    Get PDF
    © 2016 IEEE. In this paper we present a new concept of self-reflection learning to support a deep reinforcement learning model. The self-reflective process occurs offline between episodes to help the agent to learn to navigate towards a goal location and boost its online performance. In particular, a so far optimal experience is recalled and compared with other similar but suboptimal episodes to reemphasize worthy decisions and deemphasize unworthy ones using eligibility and learning traces. At the same time, relatively bad experience is forgotten to remove its confusing effect. We set up a layer-wise deep actor-critic architecture and apply the self-reflection process to help to train it. We show that the self-reflective model seems to work well and initial experimental result on real robot shows that the agent accomplished good success rate in reaching a goal location
    corecore