60,240 research outputs found

    End-to-End Navigation in Unknown Environments using Neural Networks

    Full text link
    We investigate how a neural network can learn perception actions loops for navigation in unknown environments. Specifically, we consider how to learn to navigate in environments populated with cul-de-sacs that represent convex local minima that the robot could fall into instead of finding a set of feasible actions that take it to the goal. Traditional methods rely on maintaining a global map to solve the problem of over coming a long cul-de-sac. However, due to errors induced from local and global drift, it is highly challenging to maintain such a map for long periods of time. One way to mitigate this problem is by using learning techniques that do not rely on hand engineered map representations and instead output appropriate control policies directly from their sensory input. We first demonstrate that such a problem cannot be solved directly by deep reinforcement learning due to the sparse reward structure of the environment. Further, we demonstrate that deep supervised learning also cannot be used directly to solve this problem. We then investigate network models that offer a combination of reinforcement learning and supervised learning and highlight the significance of adding fully differentiable memory units to such networks. We evaluate our networks on their ability to generalize to new environments and show that adding memory to such networks offers huge jumps in performanceComment: Workshop on Learning Perception and Control for Autonomous Flight: Safety, Memory and Efficiency, Robotics Science and Systems 201

    Towards Better Interpretability in Deep Q-Networks

    Full text link
    Deep reinforcement learning techniques have demonstrated superior performance in a wide variety of environments. As improvements in training algorithms continue at a brisk pace, theoretical or empirical studies on understanding what these networks seem to learn, are far behind. In this paper we propose an interpretable neural network architecture for Q-learning which provides a global explanation of the model's behavior using key-value memories, attention and reconstructible embeddings. With a directed exploration strategy, our model can reach training rewards comparable to the state-of-the-art deep Q-learning models. However, results suggest that the features extracted by the neural network are extremely shallow and subsequent testing using out-of-sample examples shows that the agent can easily overfit to trajectories seen during training.Comment: Accepted at AAAI-19; (16 pages, 18 figures

    Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy

    Full text link
    Proximal policy optimization and trust region policy optimization (PPO and TRPO) with actor and critic parametrized by neural networks achieve significant empirical success in deep reinforcement learning. However, due to nonconvexity, the global convergence of PPO and TRPO remains less understood, which separates theory from practice. In this paper, we prove that a variant of PPO and TRPO equipped with overparametrized neural networks converges to the globally optimal policy at a sublinear rate. The key to our analysis is the global convergence of infinite-dimensional mirror descent under a notion of one-point monotonicity, where the gradient and iterate are instantiated by neural networks. In particular, the desirable representation power and optimization geometry induced by the overparametrization of such neural networks allow them to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio

    PSO-FNN-Based Vertical Handoff Decision Algorithm in Heterogeneous Wireless Networks

    Get PDF
    AbstractAiming at working out the problem that fuzzy logic and neural network based vertical handoff algorithm didn’t consider the load state reasonably in heterogeneous wireless networks, a PSO-FNN-based vertical handoff decision algorithm is proposed. The algorithm executes factors reinforcement learning for the fuzzy neural network (FNN) with the objective of the equal blocking probability to adapt for load state dynamically, and combined with particle swarm optimization (PSO) algorithm with global optimization capability to set initial parameters in order to improve the precision of parameter learning. The simulation results show that the PSO-FNN algorithm can balance the load of heterogeneous wireless networks effectively and decrease the blocking probability as well as handoff call blocking probability compared to sum-received signal strength (S-RSS) algorithm
    • …
    corecore