282 research outputs found

    Double-oracle sampling method for Stackelberg Equilibrium approximation in general-sum extensive-form games

    Full text link
    The paper presents a new method for approximating Strong Stackelberg Equilibrium in general-sum sequential games with imperfect information and perfect recall. The proposed approach is generic as it does not rely on any specific properties of a particular game model. The method is based on iterative interleaving of the two following phases: (1) guided Monte Carlo Tree Search sampling of the Follower's strategy space and (2) building the Leader's behavior strategy tree for which the sampled Follower's strategy is an optimal response. The above solution scheme is evaluated with respect to expected Leader's utility and time requirements on three sets of interception games with variable characteristics, played on graphs. A comparison with three state-of-the-art MILP/LP-based methods shows that in vast majority of test cases proposed simulation-based approach leads to optimal Leader's strategies, while excelling the competitive methods in terms of better time scalability and lower memory requirements

    Efficient Mission Planning for Robot Networks in Communication Constrained Environments

    Get PDF
    Many robotic systems are remotely operated nowadays that require uninterrupted connection and safe mission planning. Such systems are commonly found in military drones, search and rescue operations, mining robotics, agriculture, and environmental monitoring. Different robotic systems may employ disparate communication modalities such as radio network, visible light communication, satellite, infrared, Wi-Fi. However, in an autonomous mission where the robots are expected to be interconnected, communication constrained environment frequently arises due to the out of range problem or unavailability of the signal. Furthermore, several automated projects (building construction, assembly line) do not guarantee uninterrupted communication, and a safe project plan is required that optimizes collision risks, cost, and duration. In this thesis, we propose four pronged approaches to alleviate some of these issues: 1) Communication aware world mapping; 2) Communication preserving using the Line-of-Sight (LoS); 3) Communication aware safe planning; and 4) Multi-Objective motion planning for navigation. First, we focus on developing a communication aware world map that integrates traditional world models with the planning of multi-robot placement. Our proposed communication map selects the optimal placement of a chain of intermediate relay vehicles in order to maximize communication quality to a remote unit. We also vi propose an algorithm to build a min-Arborescence tree when there are multiple remote units to be served. Second, in communication denied environments, we use Line-of-Sight (LoS) to establish communication between mobile robots, control their movements and relay information to other autonomous units. We formulate and study the complexity of a multi-robot relay network positioning problem and propose approximation algorithms that restore visibility based connectivity through the relocation of one or more robots. Third, we develop a framework to quantify the safety score of a fully automated robotic mission where the coexistence of human and robot may pose a collision risk. A number of alternate mission plans are analyzed using motion planning algorithms to select the safest one. Finally, an efficient multi-objective optimization based path planning for the robots is developed to deal with several Pareto optimal cost attributes

    Formulation of control strategies for requirement definition of multi-agent surveillance systems

    Get PDF
    In a multi-agent system (MAS), the overall performance is greatly influenced by both the design and the control of the agents. The physical design determines the agent capabilities, and the control strategies drive the agents to pursue their objectives using the available capabilities. The objective of this thesis is to incorporate control strategies in the early conceptual design of an MAS. As such, this thesis proposes a methodology that mainly explores the interdependency between the design variables of the agents and the control strategies used by the agents. The output of the proposed methodology, i.e. the interdependency between the design variables and the control strategies, can be utilized in the requirement analysis as well as in the later design stages to optimize the overall system through some higher fidelity analyses. In this thesis, the proposed methodology is applied to a persistent multi-UAV surveillance problem, whose objective is to increase the situational awareness of a base that receives some instantaneous monitoring information from a group of UAVs. Each UAV has a limited energy capacity and a limited communication range. Accordingly, the connectivity of the communication network becomes essential for the information flow from the UAVs to the base. In long-run missions, the UAVs need to return to the base for refueling with certain frequencies depending on their endurance. Whenever a UAV leaves the surveillance area, the remaining UAVs may need relocation to mitigate the impact of its absence. In the control part of this thesis, a set of energy-aware control strategies are developed for efficient multi-UAV surveillance operations. To this end, this thesis first proposes a decentralized strategy to recover the connectivity of the communication network. Second, it presents two return policies for UAVs to achieve energy-aware persistent surveillance. In the design part of this thesis, a design space exploration is performed to investigate the overall performance by varying a set of design variables and the candidate control strategies. Overall, it is shown that a control strategy used by an MAS affects the influence of the design variables on the mission performance. Furthermore, the proposed methodology identifies the preferable pairs of design variables and control strategies through low fidelity analysis in the early design stages.Ph.D

    ONLINE LEARNING WITH BANDITS FOR COVERAGE

    Get PDF
    With the rapid growth in velocity and volume, streaming data compels decision support systems to predict a small number of unique data points in due time that can represent a massive amount of correlated data without much loss of precision. In this work, we formulate this problem as the {\it online set coverage problem} and propose its solution for recommendation systems and the patrol assignment problem. We propose a novel online reinforcement learning algorithm inspired by the Multi-Armed Bandit problem to solve the online recommendation system problem. We introduce a graph-based mechanism to improve the user coverage by recommended items and show that the mechanism can facilitate the coordination between bandits and therefore, reduce the overall complexity. Our graph-based bandit algorithm can select a much smaller set of items to cover a vast variety of users’ choices for recommendation systems. We present our experimental results in a partially observable real-world environment. We also study the patrol assignment as an online set coverage problem, which presents an additional level of difficulty. Along with covering the susceptible routes by learning the diversity of attacks, unlike in recommendation systems, our technique needs to make choices against actively engaging adversarial opponents. We assume that attacks over those routes are posed by intelligent entities, capable of reacting with their best responses. Therefore, to model such attacks, we used the Stackelberg Security Game. We augment our graph-based bandit defenders with adaptive adjustment of reward coming from this game to perplex the attackers and gradually succeed over them by maximizing the confrontation. We found that our graph bandits can outperform other Multi-Arm bandit algorithms when a simulated annealing-based scheduling is incorporated to adjust the balance between exploration and exploitation

    A Dynamical System Approach for Resource-Constrained Mobile Robotics

    Get PDF
    The revolution of autonomous vehicles has led to the development of robots with abundant sensors, actuators with many degrees of freedom, high-performance computing capabilities, and high-speed communication devices. These robots use a large volume of information from sensors to solve diverse problems. However, this usually leads to a significant modeling burden as well as excessive cost and computational requirements. Furthermore, in some scenarios, sophisticated sensors may not work precisely, the real-time processing power of a robot may be inadequate, the communication among robots may be impeded by natural or adversarial conditions, or the actuation control in a robot may be insubstantial. In these cases, we have to rely on simple robots with limited sensing and actuation, minimal onboard processing, moderate communication, and insufficient memory capacity. This reality motivates us to model simple robots such as bouncing and underactuated robots making use of the dynamical system techniques. In this dissertation, we propose a four-pronged approach for solving tasks in resource-constrained scenarios: 1) Combinatorial filters for bouncing robot localization; 2) Bouncing robot navigation and coverage; 3) Stochastic multi-robot patrolling; and 4) Deployment and planning of underactuated aquatic robots. First, we present a global localization method for a bouncing robot equipped with only a clock and contact sensors. Space-efficient and finite automata-based combinatorial filters are synthesized to solve the localization task by determining the robot’s pose (position and orientation) in its environment. Second, we propose a solution for navigation and coverage tasks using single or multiple bouncing robots. The proposed solution finds a navigation plan for a single bouncing robot from the robot’s initial pose to its goal pose with limited sensing. Probabilistic paths from several policies of the robot are combined artfully so that the actual coverage distribution can become as close as possible to a target coverage distribution. A joint trajectory for multiple bouncing robots to visit all the locations of an environment is incrementally generated. Third, a scalable method is proposed to find stochastic strategies for multi-robot patrolling under an adversarial and communication-constrained environment. Then, we evaluate the vulnerability of our patrolling policies by finding the probability of capturing an adversary for a location in our proposed patrolling scenarios. Finally, a data-driven deployment and planning approach is presented for the underactuated aquatic robots called drifters that creates the generalized flow pattern of the water, develops a Markov-chain based motion model, and studies the long- term behavior of a marine environment from a flow point-of-view. In a broad summary, our dynamical system approach is a unique solution to typical robotic tasks and opens a new paradigm for the modeling of simple robotics system

    Effective Cooperation and Scalability in Multi-Robot Teams for Automatic Patrolling of Infrastructures

    Get PDF
    Tese de doutoramento em Engenharia Electrotécnica e de Computadores, apresentada ao Departamento de Engenharia Electrotécnica e de Computadores da Faculdade de Ciências e Tecnologia da Universidade de CoimbraIn the digital era that we live in, advances in technology have proliferated throughout our society, quickening the completion of tasks that were painful in the old days, improving solutions to the everyday problems that we face, and generally assisting human beings both in their professional and personal life. Robotics is a clear example of a broad technological field that evolves every day. In fact, scientists predict that in the upcoming few decades, robots will naturally interact and coexist alongside human beings. While it is true that robots already have a strong presence in industrial environments, e.g., robotic arms for manufacturing, the average person still looks upon robots with suspicion, since they are not acquainted by such type of technology. In this thesis, the author deploys teams of mobile robots in indoor scenarios to cooperatively perform patrolling missions, which represents an effort to bring robots closer to humans and assist them in monotonous or repetitive tasks, such as supervising and monitoring indoor infrastructures or simply cooperatively cleaning floors. In this context, the team of robots should be able to sense the environment, localize and navigate autonomously between way points while avoiding obstacles, incorporate any number of robots, communicate actions in a distributed way and being robust not only to agent failures but also communication failures, so as to effectively coordinate to achieve optimal collective performance. The referred capabilities are an evidence that such systems can only prove their reliability in real-world environments if robots are endowed with intelligence and autonomy. Thus, the author follows a line of research where patrolling units have the necessary tools for intelligent decision-making, according to the information of the mission, the environment and teammates' actions, using distributed coordination architectures. An incremental approach is followed. Firstly, the problem is presented and the literature is deeply studied in order to identify potential weaknesses and research opportunities, backing up the objectives and contributions proposed in this thesis. Then, problem fundamentals are described and benchmarking of multi-robot patrolling algorithms in realistic conditions is conducted. In these earlier stages, the role of different parameters of the problem, like environment connectivity, team size and strategy philosophy, will become evident through extensive empirical results and statistical analysis. In addition, scalability is deeply analyzed and tied with inter-robot interference and coordination, imposed by each patrolling strategy. After gaining sensibility to the problem, preliminary models for multi-robot patrol with special focus on real-world application are presented, using a Bayesian inspired formalism. Based on these, distributed strategies that lead to superior team performance are described. Interference between autonomous agents is explicitly dealt with, and the approaches are shown to scale to large teams of robots. Additionally, the robustness to agent and communication failures is demonstrated, as well as the flexibility of the model proposed. In fact, by later generalizing the model with learning agents and maintaining memory of past events, it is then shown that these capabilities can be inherited, while at the same time increasing team performance even further and fostering adaptability. This is verified in simulation experiments and real-world results in a large indoor scenario. Furthermore, since the issue of team scalability is highly in focus in this thesis, a method for estimating the optimal team size in a patrolling mission, according to the environment topology is proposed. Upper bounds for team performance prior to the mission start are provided, supporting the choice of the number of robots to be used so that temporal constraints can be satisfied. All methods developed in this thesis are tested and corroborated by experimental results, showing the usefulness of employing cooperative teams of robots in real-world environments and the potential for similar systems to emerge in our society.FCT - SFRH/BD/64426/200

    Monitoring using Heterogeneous Autonomous Agents.

    Full text link
    This dissertation studies problems involving different types of autonomous agents observing objects of interests in an area. Three types of agents are considered: mobile agents, stationary agents, and marsupial agents, i.e., agents capable of deploying other agents or being deployed themselves. Objects can be mobile or stationary. The problem of a mobile agent without fuel constraints revisiting stationary objects is formulated. Visits to objects are dictated by revisit deadlines, i.e., the maximum time that can elapse between two visits to the same object. The problem is shown to be NP-complete and heuristics are provided to generate paths for the agent. Almost periodic paths are proven to exist. The efficacy of the heuristics is shown through simulation. A variant of the problem where the agent has a finite fuel capacity and purchases fuel is treated. Almost periodic solutions to this problem are also shown to exist and an algorithm to compute the minimal cost path is provided. A problem where mobile and stationary agents cooperate to track a mobile object is formulated, shown to be NP-hard, and a heuristic is given to compute paths for the mobile agents. Optimal configurations for the stationary agents are then studied. Several methods are provided to optimally place the stationary agents; these methods are the maximization of Fisher information, the minimization of the probability of misclassification, and the minimization of the penalty incurred by the placement. A method to compute optimal revisit deadlines for the stationary agents is given. The placement methods are compared and their effectiveness shown using numerical results. The problem of two marsupial agents, one carrier and one passenger, performing a general monitoring task using a constrained optimization formulation is stated. Necessary conditions for optimal paths are provided for cases accounting for constrained release of the passenger, termination conditions for the task, as well as retrieval and constrained retrieval of the passenger. A problem involving two marsupial agents collecting information about a stationary object while avoiding detection is then formulated. Necessary conditions for optimal paths are provided and rectilinear motion is demonstrated to be optimal for both agents.PhDAerospace EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111439/1/jfargeas_1.pd

    Selected Topics in Network Optimization: Aligning Binary Decision Diagrams for a Facility Location Problem and a Search Method for Dynamic Shortest Path Interdiction

    Get PDF
    This work deals with three different combinatorial optimization problems: minimizing the total size of a pair of binary decision diagrams (BDDs) under a certain structural property, a variant of the facility location problem, and a dynamic version of the Shortest-Path Interdiction (DSPI) problem. However, these problems all have the following core idea in common: They all stem from representing an optimization problem as a decision diagram. We begin from cases in which such a diagram representation of reasonable size might exist, but finding a small diagram is difficult to achieve. The first problem develops a heuristic for enforcing a structural property for a collection of BDDs, which allows them to be merged into a single one efficiently. In the second problem, we consider a specific combinatorial problem that allows for a natural representation by a pair of BDDs. We use the previous result and ideas developed earlier in the literature to reformulate this problem as a linear program over a single BDD. This approach enables us to obtain sensitivity information, while often enjoying runtimes comparable to a mixed integer program solved with a commercial solver, after we pay the computational overhead of building the diagram (e.g., when re-solving the problem using different costs, but the same graph topology). In the last part, we examine DSPI, for which building the full decision diagram is generally impractical. We formalize the concept of a game tree for the DSPI and design a heuristic based on the idea of building only selected parts of this exponentially-sized decision diagram (which is not binary any more). We use a Monte Carlo Tree Search framework to establish policies that are near optimal. To mitigate the size of the game tree, we leverage previously derived bounds for the DSPI and employ an alpha–beta pruning technique for minimax optimization. We highlight the practicality of these ideas in a series of numerical experiments

    Long-term Informative Path Planning with Autonomous Soaring

    Get PDF
    The ability of UAVs to cover large areas efficiently is valuable for information gathering missions. For long-term information gathering, a UAV may extend its endurance by accessing energy sources present in the atmosphere. Thermals are a favourable source of wind energy and thermal soaring is adopted in this thesis to enable long-term information gathering. This thesis proposes energy-constrained path planning algorithms for a gliding UAV to maximise information gain given a mission time that greatly exceeds the UAV's endurance. This thesis is motivated by the problem of probabilistic target-search performed by an energy-constrained UAV, which is tasked to simultaneously search for a lost ground target and explore for thermals to regain energy. This problem is termed informative soaring (IFS) and combines informative path planning (IPP) with energy constraints. IFS is shown to be NP-hard by showing that it has a similar problem structure to the weight-constrained shortest path problem with replenishments. While an optimal solution may not exist in polynomial time, this thesis proposes path planning algorithms based on informed tree search to find high quality plans with low computational cost. This thesis addresses complex probabilistic belief maps and three primary contributions are presented: • First, IFS is formulated as a graph search problem by observing that any feasible long-term plan must alternate between 1) information gathering between thermals and 2) replenishing energy within thermals. This is a first step to reducing the large search state space. • The second contribution is observing that a complex belief map can be viewed as a collection of information clusters and using a divide and conquer approach, cluster tree search (CTS), to efficiently find high-quality plans in the large search state space. In CTS, near-greedy tree search is used to find locally optimal plans and two global planning versions are proposed to combine local plans into a full plan. Monte Carlo simulation studies show that CTS produces similar plans to variations of exhaustive search, but runs five to 20 times faster. The more computationally efficient version, CTSDP, uses dynamic programming (DP) to optimally combine local plans. CTSDP is executed in real time on board a UAV to demonstrate computational feasibility. • The third contribution is an extension of CTS to unknown drifting thermals. A thermal exploration map is created to detect new thermals that will eventually intercept clusters, and therefore be valuable to the mission. Time windows are computed for known thermals and an optimal cluster visit schedule is formed. A tree search algorithm called CTSDrift combines CTS and thermal exploration. Using 2400 Monte Carlo simulations, CTSDrift is evaluated against a Full Knowledge method that has full knowledge of the thermal field and a Greedy method. On average, CTSDrift outperforms Greedy in one-third of trials, and achieves similar performance to Full Knowledge when environmental conditions are favourable

    Efficient Environment Sensing and Learning for Mobile Robots

    Get PDF
    Data-driven learning is becoming an integral part of many robotic systems. Robots can be used as mobile sensors to learn about the environment in which they operate. Robots can also seek to learn essential skills, such as navigation, within the environment. A critical challenge in both types of learning is sample efficiency. Acquiring samples with physical robots can be prohibitively time-consuming. As a result, when applying learning techniques in robotics that require physical interaction with the environment, minimizing the number of such interactions becomes a key. The key question we seek to answer is: How do we make robots learn efficiently with a minimal amount of physical interaction? We approach this question along two fronts: extrinsic learning and intrinsic learning. In extrinsic learning, we want the robot to learn about the external environment in which it is operating. In intrinsic learning, our focus is on the robot to learn a skill using reinforcement learning (RL) such as navigating in an environment. In this dissertation, we develop algorithms that carefully plan where the robots obtain samples in order to efficiently perform intrinsic and extrinsic learning. In particular, we exploit the structural properties of Gaussian Process (GP) regression to design efficient sampling algorithms. We study two types of problems under extrinsic learning. We start with the problem of learning a spatially varying field modeled by a GP efficiently. Our goal is to ensure that the GP posterior variance, which is also the mean square error between the learned and actual fields, is below a predefined value. By exploiting the underlying properties of GP, we present a series of constant-factor approximation algorithms for minimizing the number of stationary sensors to place, minimizing the total time taken by a single robot, and minimizing the time taken by a team of robots to learn the field. Here, we assume that the GP hyperparameters are known. We then study a variant where our goal is to identify the hotspot in an environment. Here we do not assume that hyperparameters are unknown. For this problem, we present Upper Confidence Bound (UCB) and Monte Carlo Tree Search (MCTS) based algorithms for a single robot and later extend them to decentralized multi-robot teams. We also validate their performance on real-world datasets. For intrinsic learning, our aim is to reduce the number of physical interactions by leveraging simulations often known as Multi-Fidelity Reinforcement Learning (MFRL). In the MFRL framework, an agent uses multiple simulators of the real environment to perform actions. We present two MFRL framework versions, model-based and model-free, that leverage GPs to learn the optimal policy in a real-world environment. By incorporating GPs in the MFRL framework, we empirically observe a significant reduction in the number of samples for model-based and model-free learning
    • …
    corecore