190,542 research outputs found

    Agent Navigation based on Boundary Value Problem using Iterative Methods

    Get PDF
    This paper presents the simulation of numerical solutions to the navigational problem of an agent traveling safely in its environment. The approach is based on the numeric solutions of the boundary value problem (BVP) that generate harmonic potential fields through a differential equation whose gradient represents navigation routes to the destination. Two methods, namely KSOR and KAOR, were tested to solve the BVP. KSOR and KAOR are variants of the standard SOR and AOR methods, respectively. In this work, the KSOR and KAOR methods were used to solve the BVP by applying Laplace's equation to obtain harmonic functions. The generated harmonic functions are then utilized by the searching algorithm to find a smooth navigational route for an agent to travel in its environment without colliding with any obstacles. The numerical results from the solutions of BVP demonstrate that the KAOR provides a faster execution time with fewer iterations compared to the KSOR method

    The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

    Full text link
    In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do "clipping" on the motion of the agent in the final time step of the trajectory. By clipping we mean that the final time step of the trajectory is to be truncated such that the agent stops exactly at the first terminal state reached, and no distance further. We demonstrate that when clipping is omitted, learning performance can fail to reach the optimum; and when clipping is done properly, learning performance can improve significantly. The clipping problem we describe affects algorithms which use explicit derivatives of the model functions of the environment to calculate a learning gradient. These include Backpropagation Through Time for Control, and methods based on Dual Heuristic Dynamic Programming. However the clipping problem does not significantly affect methods based on Heuristic Dynamic Programming, Temporal Differences or Policy Gradient Learning algorithms. Similarly, the clipping problem does not affect fixed-length finite-horizon problems

    Model checking learning agent systems using Promela with embedded C code and abstraction

    Get PDF
    As autonomous systems become more prevalent, methods for their verification will become more widely used. Model checking is a formal verification technique that can help ensure the safety of autonomous systems, but in most cases it cannot be applied by novices, or in its straight \off-the-shelf" form. In order to be more widely applicable it is crucial that more sophisticated techniques are used, and are presented in a way that is reproducible by engineers and verifiers alike. In this paper we demonstrate in detail two techniques that are used to increase the power of model checking using the model checker SPIN. The first of these is the use of embedded C code within Promela specifications, in order to accurately re ect robot movement. The second is to use abstraction together with a simulation relation to allow us to verify multiple environments simultaneously. We apply these techniques to a fairly simple system in which a robot moves about a fixed circular environment and learns to avoid obstacles. The learning algorithm is inspired by the way that insects learn to avoid obstacles in response to pain signals received from their antennae. Crucially, we prove that our abstraction is sound for our example system { a step that is often omitted but is vital if formal verification is to be widely accepted as a useful and meaningful approach

    Econometrics for Learning Agents

    Full text link
    The main goal of this paper is to develop a theory of inference of player valuations from observed data in the generalized second price auction without relying on the Nash equilibrium assumption. Existing work in Economics on inferring agent values from data relies on the assumption that all participant strategies are best responses of the observed play of other players, i.e. they constitute a Nash equilibrium. In this paper, we show how to perform inference relying on a weaker assumption instead: assuming that players are using some form of no-regret learning. Learning outcomes emerged in recent years as an attractive alternative to Nash equilibrium in analyzing game outcomes, modeling players who haven't reached a stable equilibrium, but rather use algorithmic learning, aiming to learn the best way to play from previous observations. In this paper we show how to infer values of players who use algorithmic learning strategies. Such inference is an important first step before we move to testing any learning theoretic behavioral model on auction data. We apply our techniques to a dataset from Microsoft's sponsored search ad auction system

    Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning

    Full text link
    Recent advances in combining deep learning and Reinforcement Learning have shown a promising path for designing new control agents that can learn optimal policies for challenging control tasks. These new methods address the main limitations of conventional Reinforcement Learning methods such as customized feature engineering and small action/state space dimension requirements. In this paper, we leverage one of the state-of-the-art Reinforcement Learning methods, known as Trust Region Policy Optimization, to tackle intersection management for autonomous vehicles. We show that using this method, we can perform fine-grained acceleration control of autonomous vehicles in a grid street plan to achieve a global design objective.Comment: Accepted in IEEE Smart World Congress 201

    Hybrid modelling of individual movement and collective behaviour

    Get PDF
    Mathematical models of dispersal in biological systems are often written in terms of partial differential equations (PDEs) which describe the time evolution of population-level variables (concentrations, densities). A more detailed modelling approach is given by individual-based (agent-based) models which describe the behaviour of each organism. In recent years, an intermediate modelling methodology – hybrid modelling – has been applied to a number of biological systems. These hybrid models couple an individual-based description of cells/animals with a PDEmodel of their environment. In this chapter, we overview hybrid models in the literature with the focus on the mathematical challenges of this modelling approach. The detailed analysis is presented using the example of chemotaxis, where cells move according to extracellular chemicals that can be altered by the cells themselves. In this case, individual-based models of cells are coupled with PDEs for extracellular chemical signals. Travelling waves in these hybrid models are investigated. In particular, we show that in contrary to the PDEs, hybrid chemotaxis models only develop a transient travelling wave
    • …
    corecore