212 research outputs found
Multi-agent persistent surveillance under temporal logic constraints
This thesis proposes algorithms for the deployment of multiple autonomous agents for persistent surveillance missions requiring repeated, periodic visits to regions of interest. Such problems arise in a variety of domains, such as monitoring ocean conditions like temperature and algae content, performing crowd security during public events, tracking wildlife in remote or dangerous areas, or watching traffic patterns and road conditions. Using robots for surveillance is an attractive solution to scenarios in which fixed sensors are not sufficient to maintain situational awareness. Multi-agent solutions are particularly promising, because they allow for improved spatial and temporal resolution of sensor information.
In this work, we consider persistent monitoring by teams of agents that are tasked with satisfying missions specified using temporal logic formulas. Such formulas allow rich, complex tasks to be specified, such as "visit regions A and B infinitely often, and if region C is visited then go to region D, and always avoid obstacles." The agents must determine how to satisfy such missions according to fuel, communication, and other constraints. Such problems are inherently difficult due to the typically infinite horizon, state space explosion from planning for multiple agents, communication constraints, and other issues. Therefore, computing an optimal solution to these problems is often infeasible. Instead, a balance must be struck between computational complexity and optimality.
This thesis describes solution methods for two main classes of multi-agent persistent surveillance problems. First, it considers the class of problems in which persistent surveillance goals are captured entirely by TL constraints. Such problems require agents to repeatedly visit a set of surveillance regions in order to satisfy their mission. We present results for agents solving such missions with charging constraints, with noisy observations, and in the presence of adversaries. The second class of problems include an additional optimality criterion, such as minimizing uncertainty about the location of a target or maximizing sensor information among the team of agents. We present solution methods and results for such missions with a variety of optimality criteria based on information metrics. For both classes of problems, the proposed algorithms are implemented and evaluated via simulation, experiments with robots in a motion capture environment, or both
Reinforcement Learning and Planning for Preference Balancing Tasks
Robots are often highly non-linear dynamical systems with many degrees of freedom, making solving motion problems computationally challenging. One solution has been reinforcement learning (RL), which learns through experimentation to automatically perform the near-optimal motions that complete a task. However, high-dimensional problems and task formulation often prove challenging for RL. We address these problems with PrEference Appraisal Reinforcement Learning (PEARL), which solves Preference Balancing Tasks (PBTs). PBTs define a problem as a set of preferences that the system must balance to achieve a goal. The method is appropriate for acceleration-controlled systems with continuous state-space and either discrete or continuous action spaces with unknown system dynamics. We show that PEARL learns a sub-optimal policy on a subset of states and actions, and transfers the policy to the expanded domain to produce a more refined plan on a class of robotic problems. We establish convergence to task goal conditions, and even when preconditions are not verifiable, show that this is a valuable method to use before other more expensive approaches. Evaluation is done on several robotic problems, such as Aerial Cargo Delivery, Multi-Agent Pursuit, Rendezvous, and Inverted Flying Pendulum both in simulation and experimentally. Additionally, PEARL is leveraged outside of robotics as an array sorting agent. The results demonstrate high accuracy and fast learning times on a large set of practical applications
Recommended from our members
Game-Theoretic Safety Assurance for Human-Centered Robotic Systems
In order for autonomous systems like robots, drones, and self-driving cars to be reliably introduced into our society, they must have the ability to actively account for safety during their operation. While safety analysis has traditionally been conducted offline for controlled environments like cages on factory floors, the much higher complexity of open, human-populated spaces like our homes, cities, and roads makes it unviable to rely on common design-time assumptions, since these may be violated once the system is deployed. Instead, the next generation of robotic technologies will need to reason about safety online, constructing high-confidence assurances informed by ongoing observations of the environment and other agents, in spite of models of them being necessarily fallible.This dissertation aims to lay down the necessary foundations to enable autonomous systems to ensure their own safety in complex, changing, and uncertain environments, by explicitly reasoning about the gap between their models and the real world. It first introduces a suite of novel robust optimal control formulations and algorithmic tools that permit tractable safety analysis in time-varying, multi-agent systems, as well as safe real-time robotic navigation in partially unknown environments; these approaches are demonstrated on large-scale unmanned air traffic simulation and physical quadrotor platforms. After this, it draws on Bayesian machine learning methods to translate model-based guarantees into high-confidence assurances, monitoring the reliability of predictive models in light of changing evidence about the physical system and surrounding agents. This principle is first applied to a general safety framework allowing the use of learning-based control (e.g. reinforcement learning) for safety-critical robotic systems such as drones, and then combined with insights from cognitive science and dynamic game theory to enable safe human-centered navigation and interaction; these techniques are showcased on physical quadrotors—flying in unmodeled wind and among human pedestrians—and simulated highway driving. The dissertation ends with a discussion of challenges and opportunities ahead, including the bridging of safety analysis and reinforcement learning and the need to ``close the loop'' around learning and adaptation in order to deploy increasingly advanced autonomous systems with confidence
Optimal temporal logic control of autonomous vehicles
Thesis (Ph.D.)--Boston UniversityTemporal logics, such as Linear Temporal Logic (LTL) and Computation Tree Logic (CTL), are extensions of propositional logic that can capture temporal relations. Even though temporal logics have been used in model checking of finite systems for quite some time, they have gained popularity as a means for specifying complex mission requirements in path planning and control synthesis problems only recently. This dissertation proposes and evaluates methods and algorithms for optimal path planning and control synthesis for autonomous vehicles where a high-level mission specification expressed in LTL (or a fragment of LTL) must be satisfied. In summary, after obtaining a discrete representation of the overall system, ideas and tools from formal verification and graph theory are leveraged to synthesize provably correct and optimal control strategies.
The first part of this dissertation focuses on automatic planning of optimal paths for a group of robots that must satisfy a common high level mission specification. The effect of slight deviations in traveling times on the behavior of the team is analyzed and methods that are robust to bounded non-determinism in traveling times are proposed. The second part focuses on the case where a controllable agent is required to satisfy a high-level mission specification in the presence of other probabilistic agents that cannot be controlled. Efficient methods to synthesize control policies that maximize the probability of satisfaction of the mission specification are presented. The focus of the third part is the problem where an autonomous vehicle is required to satisfy a rich mission specification over service requests occurring at the regions of a partitioned environment. A receding horizon control strategy that makes use of the local information provided by the sensors on the vehicle in addition to the a priori information about the environment is presented. For all of the automatic planning and control synthesis problems that are considered, the proposed algorithms are implemented, evaluated, and validated through experiments and/or simulations
Learning multi-robot coordination from demonstrations
This paper develops a Distributed Differentiable Dynamic Game (DDDG)
framework, which enables learning multi-robot coordination from demonstrations.
We represent multi-robot coordination as a dynamic game, where the behavior of
a robot is dictated by its own dynamics and objective that also depends on
others' behavior. The coordination thus can be adapted by tuning the objective
and dynamics of each robot. The proposed DDDG enables each robot to
automatically tune its individual dynamics and objectives in a distributed
manner by minimizing the mismatch between its trajectory and demonstrations.
This process requires a new distributed design of the forward-pass, where all
robots collaboratively seek Nash equilibrium behavior, and a backward-pass,
where gradients are propagated via the communication graph. We test the DDDG in
simulation with a team of quadrotors given different task configurations. The
results demonstrate the capability of DDDG for learning multi-robot
coordination from demonstrationsComment: 6 figure
Reachability-based Trajectory Design
Autonomous mobile robots have the potential to increase the availability and accessibility of goods and services throughout society. However, to enable public trust in such systems, it is critical to certify that they are safe. This requires formally specifying safety, and designing motion planning methods that can guarantee safe operation (note, this work is only concerned with planning, not perception).
The typical paradigm to attempt to ensure safety is receding-horizon planning, wherein a robot creates a short plan, then executes it while creating its next short plan in an iterative fashion, allowing a robot to incorporate new sensor information over time. However, this requires a robot to plan in real time. Therefore, the key challenge in making safety guarantees lies in balancing performance (how quickly a robot can plan) and conservatism (how cautiously a robot behaves). Existing methods suffer from a tradeoff between performance and conservatism, which is rooted in the choice of model used describe a robot; accuracy typically comes at the price of computation speed.
To address this challenge, this dissertation proposes Reachability-based Trajectory Design (RTD), which performs real-time, receding-horizon planning with a simplified planning model, and ensures safety by describing the model error using a reachable set of the robot.
RTD begins with the offline design of a continuum of parameterized trajectories for the plan- ning model; each trajectory ends with a fail-safe maneuver such as braking to a stop. RTD then computes the robot’s Forward Reachable Set (FRS), which contains all points in workspace reach- able by the robot for each parameterized trajectory. Importantly, the FRS also contains the error model, since a robot can typically never track planned trajectories perfectly. Online (at runtime), the robot intersects the FRS with sensed obstacles to provably determine which trajectory plans could cause collisions. Then, the robot performs trajectory optimization over the remaining safe trajectories. If no new safe plan can be found, the robot can execute its previously-found fail-safe maneuver, enabling perpetual safety.
This dissertation begins by presenting RTD as a theoretical framework, then presents three representations of a robot’s FRS, using (1) sums-of-squares (SOS) polynomial programming, (2) zonotopes (a special type of convex polytope), and (3) rotatotopes (a generalization of zonotopes that enable representing a robot’s swept volume). To enable real-time planning, this work also de- velops an obstacle representation that enables provable safety while treating obstacles as discrete, finite sets of points. The practicality of RTD is demonstrated on four different wheeled robots (using the SOS FRS), two quadrotor aerial robots (using the zonotope FRS), and one manipulator robot (using the rotatotope FRS). Over thousands of simulations and dozens of hardware trials, RTD performs safe, real-time planning in arbitrary and challenging environments.
In summary, this dissertation proposes RTD as a general purpose, practical framework for provably safe, real-time robot motion planning.PHDMechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/162884/1/skousik_1.pd
Autonomous Drone Racing with Deep Reinforcement Learning
In many robotic tasks, such as drone racing, the goal is to travel through a
set of waypoints as fast as possible. A key challenge for this task is planning
the minimum-time trajectory, which is typically solved by assuming perfect
knowledge of the waypoints to pass in advance. The resulting solutions are
either highly specialized for a single-track layout, or suboptimal due to
simplifying assumptions about the platform dynamics. In this work, a new
approach to minimum-time trajectory generation for quadrotors is presented.
Leveraging deep reinforcement learning and relative gate observations, this
approach can adaptively compute near-time-optimal trajectories for random track
layouts. Our method exhibits a significant computational advantage over
approaches based on trajectory optimization for non-trivial track
configurations. The proposed approach is evaluated on a set of race tracks in
simulation and the real world, achieving speeds of up to 17 m/s with a physical
quadrotor
Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning
Finding feasible, collision-free paths for multiagent systems can be challenging, particularly in non-communicating scenarios where each agent's intent (e.g. goal) is unobservable to the others. In particular, finding time efficient paths often requires anticipating interaction with neighboring agents, the process of which can be computationally prohibitive. This work presents a decentralized multiagent collision avoidance algorithm based on a novel application of deep reinforcement learning, which effectively offloads the online computation (for predicting interaction patterns) to an offline learning procedure. Specifically, the proposed approach develops a value network that encodes the estimated time to the goal given an agent's joint configuration (positions and velocities) with its neighbors. Use of the value network not only admits efficient (i.e., real-time implementable) queries for finding a collision-free velocity vector, but also considers the uncertainty in the other agents' motion. Simulation results show more than 26% improvement in paths quality (i.e., time to reach the goal) when compared with optimal reciprocal collision avoidance (ORCA), a state-of-the-art collision avoidance strategy.Ford Motor Compan
Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning
Finding feasible, collision-free paths for multiagent systems can be challenging, particularly in non-communicating scenarios where each agent's intent (e.g. goal) is unobservable to the others. In particular, finding time efficient paths often requires anticipating interaction with neighboring agents, the process of which can be computationally prohibitive. This work presents a decentralized multiagent collision avoidance algorithm based on a novel application of deep reinforcement learning, which effectively offloads the online computation (for predicting interaction patterns) to an offline learning procedure. Specifically, the proposed approach develops a value network that encodes the estimated time to the goal given an agent's joint configuration (positions and velocities) with its neighbors. Use of the value network not only admits efficient (i.e., real-time implementable) queries for finding a collision-free velocity vector, but also considers the uncertainty in the other agents' motion. Simulation results show more than 26% improvement in paths quality (i.e., time to reach the goal) when compared with optimal reciprocal collision avoidance (ORCA), a state-of-the-art collision avoidance strategy.Ford Motor Compan
- …