60,240 research outputs found
End-to-End Navigation in Unknown Environments using Neural Networks
We investigate how a neural network can learn perception actions loops for
navigation in unknown environments. Specifically, we consider how to learn to
navigate in environments populated with cul-de-sacs that represent convex local
minima that the robot could fall into instead of finding a set of feasible
actions that take it to the goal. Traditional methods rely on maintaining a
global map to solve the problem of over coming a long cul-de-sac. However, due
to errors induced from local and global drift, it is highly challenging to
maintain such a map for long periods of time. One way to mitigate this problem
is by using learning techniques that do not rely on hand engineered map
representations and instead output appropriate control policies directly from
their sensory input. We first demonstrate that such a problem cannot be solved
directly by deep reinforcement learning due to the sparse reward structure of
the environment. Further, we demonstrate that deep supervised learning also
cannot be used directly to solve this problem. We then investigate network
models that offer a combination of reinforcement learning and supervised
learning and highlight the significance of adding fully differentiable memory
units to such networks. We evaluate our networks on their ability to generalize
to new environments and show that adding memory to such networks offers huge
jumps in performanceComment: Workshop on Learning Perception and Control for Autonomous Flight:
Safety, Memory and Efficiency, Robotics Science and Systems 201
Towards Better Interpretability in Deep Q-Networks
Deep reinforcement learning techniques have demonstrated superior performance
in a wide variety of environments. As improvements in training algorithms
continue at a brisk pace, theoretical or empirical studies on understanding
what these networks seem to learn, are far behind. In this paper we propose an
interpretable neural network architecture for Q-learning which provides a
global explanation of the model's behavior using key-value memories, attention
and reconstructible embeddings. With a directed exploration strategy, our model
can reach training rewards comparable to the state-of-the-art deep Q-learning
models. However, results suggest that the features extracted by the neural
network are extremely shallow and subsequent testing using out-of-sample
examples shows that the agent can easily overfit to trajectories seen during
training.Comment: Accepted at AAAI-19; (16 pages, 18 figures
Neural Proximal/Trust Region Policy Optimization Attains Globally Optimal Policy
Proximal policy optimization and trust region policy optimization (PPO and
TRPO) with actor and critic parametrized by neural networks achieve significant
empirical success in deep reinforcement learning. However, due to nonconvexity,
the global convergence of PPO and TRPO remains less understood, which separates
theory from practice. In this paper, we prove that a variant of PPO and TRPO
equipped with overparametrized neural networks converges to the globally
optimal policy at a sublinear rate. The key to our analysis is the global
convergence of infinite-dimensional mirror descent under a notion of one-point
monotonicity, where the gradient and iterate are instantiated by neural
networks. In particular, the desirable representation power and optimization
geometry induced by the overparametrization of such neural networks allow them
to accurately approximate the infinite-dimensional gradient and iterate.Comment: A short versio
PSO-FNN-Based Vertical Handoff Decision Algorithm in Heterogeneous Wireless Networks
AbstractAiming at working out the problem that fuzzy logic and neural network based vertical handoff algorithm didn’t consider the load state reasonably in heterogeneous wireless networks, a PSO-FNN-based vertical handoff decision algorithm is proposed. The algorithm executes factors reinforcement learning for the fuzzy neural network (FNN) with the objective of the equal blocking probability to adapt for load state dynamically, and combined with particle swarm optimization (PSO) algorithm with global optimization capability to set initial parameters in order to improve the precision of parameter learning. The simulation results show that the PSO-FNN algorithm can balance the load of heterogeneous wireless networks effectively and decrease the blocking probability as well as handoff call blocking probability compared to sum-received signal strength (S-RSS) algorithm
- …