249 research outputs found

    Multiagent Cooperative Learning Strategies for Pursuit-Evasion Games

    Get PDF
    This study examines the pursuit-evasion problem for coordinating multiple robotic pursuers to locate and track a nonadversarial mobile evader in a dynamic environment. Two kinds of pursuit strategies are proposed, one for agents that cooperate with each other and the other for agents that operate independently. This work further employs the probabilistic theory to analyze the uncertain state information about the pursuers and the evaders and uses case-based reasoning to equip agents with memories and learning abilities. According to the concepts of assimilation and accommodation, both positive-angle and bevel-angle strategies are developed to assist agents in adapting to their environment effectively. The case study analysis uses the Recursive Porous Agent Simulation Toolkit (REPAST) to implement a multiagent system and demonstrates superior performance of the proposed approaches to the pursuit-evasion game

    Model-Predictive Strategy Generation for Multi-Agent Pursuit-Evasion Games

    Get PDF
    Multi-agent pursuit-evasion games can be used to model a variety of different real world problems including surveillance, search-and-rescue, and defense-related scenarios. However, many pursuit-evasion problems are computationally difficult, which can be problematic for domains with complex geometry or large numbers of agents. To compound matters further, practical applications often require planning methods to operate under high levels of uncertainty or meet strict running-time requirements. These challenges strongly suggest that heuristic methods are needed to address pursuit-evasion problems in the real world. In this dissertation I present heuristic planning techniques for three related problem domains: visibility-based pursuit-evasion, target following with differential motion constraints, and distributed asset guarding with unmanned sea-surface vehicles. For these domains, I demonstrate that heuristic techniques based on problem relaxation and model-predictive simulation can be used to efficiently perform low-level control action selection, motion goal selection, and high-level task allocation. In particular, I introduce a polynomial-time algorithm for control action selection in visibility-based pursuit-evasion games, where a team of pursuers must minimize uncertainty about the location of an evader. The algorithm uses problem relaxation to estimate future states of the game. I also show how to incorporate a probabilistic opponent model learned from interaction traces of prior games into the algorithm. I verify experimentally that by performing Monte Carlo sampling over the learned model to estimate the location of the evader, the algorithm performs better than existing planning approaches based on worst-case analysis. Next, I introduce an algorithm for motion goal selection in pursuit-evasion scenarios with unmanned boats. I show how a probabilistic model accounting for differential motion constraints can be used to project the future positions of the target boat. Motion goals for the pursuer boat can then be selected based on those projections. I verify experimentally that motion goals selected with this technique are better optimized for travel time and proximity to the target boat when compared to motion goals selected based on the current position of the target boat. Finally, I introduce a task-allocation technique for a team of unmanned sea-surface vehicles (USVs) responsible for guarding a high-valued asset. The team of USVs must intercept and block a set of hostile intruder boats before they reach the asset. The algorithm uses model-predictive simulation to estimate the value of high-level task assignments, which are then realized by a set of learned low-level behaviors. I show experimentally that using model-predictive simulations based on Monte-Carlo sampling is more effective than hand-coded evaluation heuristics

    Robot Planning in Adversarial Environments Using Tree Search Techniques

    Get PDF
    One of the main advantages of robots is that they can be used in environments that are dangerous for humans. Robots can not only be used for tasks in known and safe areas but also in environments that may have adversaries. When planning the robot's actions in such scenarios, we have to consider the outcomes of a robot's actions based on the actions taken by the adversary, as well as the information available to the robot and the adversary. The goal of this dissertation is to design planning strategies that improve the robot's performance in adversarial environments. Specifically, we study how the availability of information affects the planning process and the outcome. We also study how to improve the computational efficiency by exploiting the structural properties of the underlying setting. We adopt a game-theoretic formulation and study two scenarios: adversarial active target tracking and reconnaissance in environments with adversaries. A conservative approach is to plan the robot's action assuming a worst-case adversary with complete knowledge of the robot's state and objective. We start with such a "symmetric" information game for the adversarial target tracking scenario with noisy sensing. By using the properties of the Kalman filter, we design a pruning strategy to improve the efficiency of a tree search algorithm. We investigate the performance limits of the asymmetric version where the adversary can inject false sensing data. We then study a reconnaissance scenario where the robot and the adversary have symmetric information. We design an algorithm that allows a robot to scan more area while avoiding being detected by the adversary. The symmetric adversarial model may yield too conservative plans when the adversary may not have the same information as the robot. Furthermore, the information available to the adversary may change during execution. We then investigate the dynamic version of this asymmetric information game and show how much the robot can exploit the asymmetry in information using tree search techniques. Specifically, we study scenarios where the information available to the adversary changes during execution. We devise a new algorithm for this asymmetric information game with theoretical performance guarantees and evaluate those approaches through experiments. We use qualitative examples to show how the new algorithm can outperform symmetric minimax and use quantitative experiments to show how much the improvement is

    Evolving Effective Micro Behaviors for Real-Time Strategy Games

    Get PDF
    Real-Time Strategy games have become a new frontier of artificial intelligence research. Advances in real-time strategy game AI, like with chess and checkers before, will significantly advance the state of the art in AI research. This thesis aims to investigate using heuristic search algorithms to generate effective micro behaviors in combat scenarios for real-time strategy games. Macro and micro management are two key aspects of real-time strategy games. While good macro helps a player collect more resources and build more units, good micro helps a player win skirmishes against equal numbers of opponent units or win even when outnumbered. In this research, we use influence maps and potential fields as a basis representation to evolve micro behaviors. We first compare genetic algorithms against two types of hill climbers for generating competitive unit micro management. Second, we investigated the use of case-injected genetic algorithms to quickly and reliably generate high quality micro behaviors. Then we compactly encoded micro behaviors including influence maps, potential fields, and reactive control into fourteen parameters and used genetic algorithms to search for a complete micro bot, ECSLBot. We compare the performance of our ECSLBot with two state of the art bots, UAlbertaBot and Nova, on several skirmish scenarios in a popular real-time strategy game StarCraft. The results show that the ECSLBot tuned by genetic algorithms outperforms UAlbertaBot and Nova in kiting efficiency, target selection, and fleeing. In addition, the same approach works to create competitive micro behaviors in another game SeaCraft. Using parallelized genetic algorithms to evolve parameters in SeaCraft we are able to speed up the evolutionary process from twenty one hours to nine minutes. We believe this work provides evidence that genetic algorithms and our representation may be a viable approach to creating effective micro behaviors for winning skirmishes in real-time strategy games

    Multi-agent persistent surveillance under temporal logic constraints

    Full text link
    This thesis proposes algorithms for the deployment of multiple autonomous agents for persistent surveillance missions requiring repeated, periodic visits to regions of interest. Such problems arise in a variety of domains, such as monitoring ocean conditions like temperature and algae content, performing crowd security during public events, tracking wildlife in remote or dangerous areas, or watching traffic patterns and road conditions. Using robots for surveillance is an attractive solution to scenarios in which fixed sensors are not sufficient to maintain situational awareness. Multi-agent solutions are particularly promising, because they allow for improved spatial and temporal resolution of sensor information. In this work, we consider persistent monitoring by teams of agents that are tasked with satisfying missions specified using temporal logic formulas. Such formulas allow rich, complex tasks to be specified, such as "visit regions A and B infinitely often, and if region C is visited then go to region D, and always avoid obstacles." The agents must determine how to satisfy such missions according to fuel, communication, and other constraints. Such problems are inherently difficult due to the typically infinite horizon, state space explosion from planning for multiple agents, communication constraints, and other issues. Therefore, computing an optimal solution to these problems is often infeasible. Instead, a balance must be struck between computational complexity and optimality. This thesis describes solution methods for two main classes of multi-agent persistent surveillance problems. First, it considers the class of problems in which persistent surveillance goals are captured entirely by TL constraints. Such problems require agents to repeatedly visit a set of surveillance regions in order to satisfy their mission. We present results for agents solving such missions with charging constraints, with noisy observations, and in the presence of adversaries. The second class of problems include an additional optimality criterion, such as minimizing uncertainty about the location of a target or maximizing sensor information among the team of agents. We present solution methods and results for such missions with a variety of optimality criteria based on information metrics. For both classes of problems, the proposed algorithms are implemented and evaluated via simulation, experiments with robots in a motion capture environment, or both

    Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability

    Get PDF
    The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities. Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio

    A METHODOLOGY FOR TECHNOLOGY-TUNED DECISION BEHAVIOR ALGORITHMS FOR TACTICS EXPLORATION

    Get PDF
    In 2016, the USAF found that current development and acquisition methods may be inadequate to achieve air superiority in 2030. The airspace is expected to be highly contested by 2030 due to the Anti-Access/Area Denial strategies being employed by adversaries. Capability gaps must be addressed in order to maintain air superiority. The USAF identified new development and acquisition paradigms as the number one non-material capability development area. The idea of a new development and acquisition paradigm is not new. Such a paradigm shift occurred during the transition from threat-based acquisition during the cold war to capability-based acquisition during the war on terror. Investigation into current US development and acquisition methods found several notional methodologies. Effectiveness-Based Design and Technology Identification, Evaluation, and Selection for Systems-of-Systems have been proposed as notional solutions. Both methodologies seek to evaluate the means – the technologies used to perform a mission – and the ways – the tactics used to complete a mission – of the technology design space. Proper evaluation of the ways would provide critical information to the decision-maker during technology selection. These findings suggest that a new paradigm focused on effectiveness-based acquisition is needed to improve current development and acquisition methods. To evaluate the ways design space, current methods must move away from a fixed or constrained mission model to one that is minimally defined and capable of exploring tactics for each unique technology. The proposed Technology-tuned Decision Behavior Algorithms for Tactics Exploration (Tech-DEBATE) methodology enables the exploration of the ways, or more formally, the mission action design space. The methodology enables further exploration of the technology design space by improving the quantification of mission effectiveness through deep reinforcement learning in a minimally defined mission environment. The data's foundation is based on traceable tactical alternatives that increase the confidence in the measures of effectiveness for each technology-tactic alternative. The methodology enables more informed decisions for technology investment, thereby reducing risks in the development and acquisition of new technologies. The reduction in risk inherently reduces the costs and development time associated with investment in new technologies. The Tech-DEBATE methodology provides a new methodology for technology evaluation through its emphasis on quantifying mission effectiveness in a minimally defined mission to inform technology investment decisions.Ph.D
    • …
    corecore