190,542 research outputs found
Agent Navigation based on Boundary Value Problem using Iterative Methods
This paper presents the simulation of numerical solutions to the navigational problem of an agent traveling safely in its environment. The approach is based on the numeric solutions of the boundary value problem (BVP) that generate harmonic potential fields through a differential equation whose gradient represents navigation routes to the destination. Two methods, namely KSOR and KAOR, were tested to solve the BVP. KSOR and KAOR are variants of the standard SOR and AOR methods, respectively. In this work, the KSOR and KAOR methods were used to solve the BVP by applying Laplace's equation to obtain harmonic functions. The generated harmonic functions are then utilized by the searching algorithm to find a smooth navigational route for an agent to travel in its environment without colliding with any obstacles. The numerical results from the solutions of BVP demonstrate that the KAOR provides a faster execution time with fewer iterations compared to the KSOR method
The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming
In adaptive dynamic programming, neurocontrol and reinforcement learning, the
objective is for an agent to learn to choose actions so as to minimise a total
cost function. In this paper we show that when discretized time is used to
model the motion of the agent, it can be very important to do "clipping" on the
motion of the agent in the final time step of the trajectory. By clipping we
mean that the final time step of the trajectory is to be truncated such that
the agent stops exactly at the first terminal state reached, and no distance
further. We demonstrate that when clipping is omitted, learning performance can
fail to reach the optimum; and when clipping is done properly, learning
performance can improve significantly.
The clipping problem we describe affects algorithms which use explicit
derivatives of the model functions of the environment to calculate a learning
gradient. These include Backpropagation Through Time for Control, and methods
based on Dual Heuristic Dynamic Programming. However the clipping problem does
not significantly affect methods based on Heuristic Dynamic Programming,
Temporal Differences or Policy Gradient Learning algorithms. Similarly, the
clipping problem does not affect fixed-length finite-horizon problems
Model checking learning agent systems using Promela with embedded C code and abstraction
As autonomous systems become more prevalent, methods for their verification will become more
widely used. Model checking is a formal verification technique that can help ensure the safety of autonomous
systems, but in most cases it cannot be applied by novices, or in its straight \off-the-shelf" form. In order
to be more widely applicable it is crucial that more sophisticated techniques are used, and are presented
in a way that is reproducible by engineers and verifiers alike. In this paper we demonstrate in detail two
techniques that are used to increase the power of model checking using the model checker SPIN. The first
of these is the use of embedded C code within Promela specifications, in order to accurately re
ect robot
movement. The second is to use abstraction together with a simulation relation to allow us to verify multiple
environments simultaneously. We apply these techniques to a fairly simple system in which a robot moves
about a fixed circular environment and learns to avoid obstacles. The learning algorithm is inspired by the
way that insects learn to avoid obstacles in response to pain signals received from their antennae. Crucially,
we prove that our abstraction is sound for our example system { a step that is often omitted but is vital if
formal verification is to be widely accepted as a useful and meaningful approach
Econometrics for Learning Agents
The main goal of this paper is to develop a theory of inference of player
valuations from observed data in the generalized second price auction without
relying on the Nash equilibrium assumption. Existing work in Economics on
inferring agent values from data relies on the assumption that all participant
strategies are best responses of the observed play of other players, i.e. they
constitute a Nash equilibrium. In this paper, we show how to perform inference
relying on a weaker assumption instead: assuming that players are using some
form of no-regret learning. Learning outcomes emerged in recent years as an
attractive alternative to Nash equilibrium in analyzing game outcomes, modeling
players who haven't reached a stable equilibrium, but rather use algorithmic
learning, aiming to learn the best way to play from previous observations. In
this paper we show how to infer values of players who use algorithmic learning
strategies. Such inference is an important first step before we move to testing
any learning theoretic behavioral model on auction data. We apply our
techniques to a dataset from Microsoft's sponsored search ad auction system
Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning
Recent advances in combining deep learning and Reinforcement Learning have
shown a promising path for designing new control agents that can learn optimal
policies for challenging control tasks. These new methods address the main
limitations of conventional Reinforcement Learning methods such as customized
feature engineering and small action/state space dimension requirements. In
this paper, we leverage one of the state-of-the-art Reinforcement Learning
methods, known as Trust Region Policy Optimization, to tackle intersection
management for autonomous vehicles. We show that using this method, we can
perform fine-grained acceleration control of autonomous vehicles in a grid
street plan to achieve a global design objective.Comment: Accepted in IEEE Smart World Congress 201
Hybrid modelling of individual movement and collective behaviour
Mathematical models of dispersal in biological systems are often written in terms of partial differential equations (PDEs) which describe the time evolution of population-level variables (concentrations, densities). A more detailed modelling approach is given by individual-based (agent-based) models which describe the behaviour of each organism. In recent years, an intermediate modelling methodology – hybrid modelling – has been applied to a number of biological systems. These hybrid models couple an individual-based description of cells/animals with a PDEmodel of their environment. In this chapter, we overview hybrid models in the literature with the focus on the mathematical challenges of this modelling approach. The detailed analysis is presented using the example of chemotaxis, where cells move according to extracellular chemicals that can be altered by the cells themselves. In this case, individual-based models of cells are coupled with PDEs for extracellular chemical signals. Travelling waves in these hybrid models are investigated. In particular, we show that in contrary to the PDEs, hybrid chemotaxis models only develop a transient travelling wave
- …