442 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Receding-horizon motion planning of quadrupedal robot locomotion

    Get PDF
    Quadrupedal robots are designed to offer efficient and robust mobility on uneven terrain. This thesis investigates combining numerical optimization and machine learning methods to achieve interpretable kinodynamic planning of natural and agile locomotion. The proposed algorithm, called Receding-Horizon Experience-Controlled Adaptive Legged Locomotion (RHECALL), uses nonlinear programming (NLP) with learned initialization to produce long-horizon, high-fidelity, terrain-aware, whole-body trajectories. RHECALL has been implemented and validated on the ANYbotics ANYmal B and C quadrupeds on complex terrain. The proposed optimal control problem formulation uses the single-rigid-body dynamics (SRBD) model and adopts a direct collocation transcription method which enables the discovery of aperiodic contact sequences. To generate reliable trajectories, we propose fast-to-compute analytical costs that leverage the discretization and terrain-dependent kinematic constraints. To extend the formulation to receding-horizon planning, we propose a segmentation approach with asynchronous centre of mass (COM) and end-effector timings and a heuristic initialization scheme which reuses the previous solution. We integrate real-time 2.5D perception data for online foothold selection. Additionally, we demonstrate that a learned stability criterion can be incorporated into the planning framework. To accelerate the convergence of the NLP solver to locally optimal solutions, we propose data-driven initialization schemes trained using supervised and unsupervised behaviour cloning. We demonstrate the computational advantage of the schemes and the ability to leverage latent space to reconstruct dynamic segments of plans which are several seconds long. Finally, in order to apply RHECALL to quadrupeds with significant leg inertias, we derive the more accurate lump leg single-rigid-body dynamics (LL-SRBD) and centroidal dynamics (CD) models and their first-order partial derivatives. To facilitate intuitive usage of costs, constraints and initializations, we parameterize these models by Euclidean-space variables. We show the models have the ability to shape rotational inertia of the robot which offers potential to further improve agility

    Multi-Robot Active Information Gathering Using Random Finite Sets

    Get PDF
    Many tasks in the modern world involve collecting information, such as infrastructure inspection, security and surveillance, environmental monitoring, and search and rescue. All of these tasks involve searching an environment to detect, localize, and track objects of interest, such as damage to roadways, suspicious packages, plant species, or victims of a natural disaster. In any of these tasks the number of objects of interest is often not known at the onset of exploration. Teams of robots can automate these often dull, dirty, or dangerous tasks to decrease costs and improve speed and safety. This dissertation addresses the problem of automating data collection processes, so that a team of mobile sensor platforms is able to explore an environment to determine the number of objects of interest and their locations. In real-world scenarios, robots may fail to detect objects within the field of view, receive false positive measurements to clutter objects, and be unable to disambiguate true objects. This makes data association, i.e., matching individual measurements to targets, difficult. To account for this, we utilize filtering algorithms based on random finite sets to simultaneously estimate the number of objects and their locations within the environment without the need to explicitly consider data association. Using the resulting estimates they receive, robots choose actions that maximize the mutual information between the set of targets and the binary events of receiving no detections. This effectively hedges against uninformative actions and leads to a closed form equation to compute mutual information, allowing the robot team to plan over a long time horizon. The robots either communicate with a central agent, which performs the estimation and control computations, or act in a decentralized manner. Our extensive hardware and simulated experiments validate the unified estimation and control framework, using robots with a wide variety of mobility and sensing capabilities to showcase the broad applicability of the framework

    Belief State Planning for Autonomous Driving: Planning with Interaction, Uncertain Prediction and Uncertain Perception

    Get PDF
    This thesis presents a behavior planning algorithm for automated driving in urban environments with an uncertain and dynamic nature. The uncertainty in the environment arises by the fact that the intentions as well as the future trajectories of the surrounding drivers cannot be measured directly but can only be estimated in a probabilistic fashion. Even the perception of objects is uncertain due to sensor noise or possible occlusions. When driving in such environments, the autonomous car must predict the behavior of the other drivers and plan safe, comfortable and legal trajectories. Planning such trajectories requires robust decision making when several high-level options are available for the autonomous car. Current planning algorithms for automated driving split the problem into different subproblems, ranging from discrete, high-level decision making to prediction and continuous trajectory planning. This separation of one problem into several subproblems, combined with rule-based decision making, leads to sub-optimal behavior. This thesis presents a global, closed-loop formulation for the motion planning problem which intertwines action selection and corresponding prediction of the other agents in one optimization problem. The global formulation allows the planning algorithm to make the decision for certain high-level options implicitly. Furthermore, the closed-loop manner of the algorithm optimizes the solution for various, future scenarios concerning the future behavior of the other agents. Formulating prediction and planning as an intertwined problem allows for modeling interaction, i.e. the future reaction of the other drivers to the behavior of the autonomous car. The problem is modeled as a partially observable Markov decision process (POMDP) with a discrete action and a continuous state and observation space. The solution to the POMDP is a policy over belief states, which contains different reactive plans for possible future scenarios. Surrounding drivers are modeled with interactive, probabilistic agent models to account for their prediction uncertainty. The field of view of the autonomous car is simulated ahead over the whole planning horizon during the optimization of the policy. Simulating the possible, corresponding, future observations allows the algorithm to select actions that actively reduce the uncertainty of the world state. Depending on the scenario, the behavior of the autonomous car is optimized in (combined lateral and) longitudinal direction. The algorithm is formulated in a generic way and solved online, which allows for applying the algorithm on various road layouts and scenarios. While such a generic problem formulation is intractable to solve exactly, this thesis demonstrates how a sufficiently good approximation to the optimal policy can be found online. The problem is solved by combining state of the art Monte Carlo tree search algorithms with near-optimal, domain specific roll-outs. The algorithm is evaluated in scenarios such as the crossing of intersections under unknown intentions of other crossing vehicles, interactive lane changes in narrow gaps and decision making at intersections with large occluded areas. It is shown that the behavior of the closed-loop planner is less conservative than comparable open-loop planners. More precisely, it is even demonstrated that the policy enables the autonomous car to drive in a similar way as an omniscient planner with full knowledge of the scene. It is also demonstrated how the autonomous car executes actions to actively gather more information about the surrounding and to reduce the uncertainty of its belief state

    Formal Methods for Autonomous Systems

    Full text link
    Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification

    Belief State Planning for Autonomous Driving: Planning with Interaction, Uncertain Prediction and Uncertain Perception

    Get PDF
    This work presents a behavior planning algorithm for automated driving in urban environments with an uncertain and dynamic nature. The algorithm allows to consider the prediction uncertainty (e.g. different intentions), perception uncertainty (e.g. occlusions) as well as the uncertain interactive behavior of the other agents explicitly. Simulating the most likely future scenarios allows to find an optimal policy online that enables non-conservative planning under uncertainty
    • …
    corecore