4,184 research outputs found

    Approximate Dynamic Programming for Military Resource Allocation

    Get PDF
    This research considers the optimal allocation of weapons to a collection of targets with the objective of maximizing the value of destroyed targets. The weapon-target assignment (WTA) problem is a classic non-linear combinatorial optimization problem with an extensive history in operations research literature. The dynamic weapon target assignment (DWTA) problem aims to assign weapons optimally over time using the information gained to improve the outcome of their engagements. This research investigates various formulations of the DWTA problem and develops algorithms for their solution. Finally, an embedded optimization problem is introduced in which optimization of the multi-stage DWTA is used to determine optimal weaponeering of aircraft. Approximate dynamic programming is applied to the various formulations of the WTA problem. Like many in the field of combinatorial optimization, the DWTA problem suffers from the curses of dimensionality and exact solutions are often computationally intractability. As such, approximations are developed which exploit the special structure of the problem and allow for efficient convergence to high-quality local optima. Finally, a genetic algorithm solution framework is developed to test the embedded optimization problem for aircraft weaponeering

    Real-Time Heuristics and Metaheuristics for Static and Dynamic Weapon Target Assignments

    Get PDF
    The problem of targeting and engaging individual missiles (targets) with an arsenal of interceptors (weapons) is known as the weapon target assignment problem. This problem has been well-researched since the seminal work in 1958. There are two distinct categories of the weapon target assignment problem: static and dynamic. The static weapon target assignment problem considers a single instance in which a known number of incoming missiles is to be engaged with a finite number of interceptors. By contrast, the dynamic weapon target assignment problem considers either follow on engagement(s) should the first engagement(s) fail, a subsequent salvo of incoming missiles, or both. This research seeks to define and solve a realistic dynamic model. First, assignment heuristics and metaheuristics are developed to provide rapid near-optimal solutions to the static weapon target assignment. Next, a technique capable of determining how many of each interceptor type to reserve for a second salvo by means of approximate dynamic programming is developed. Lastly, a model that realistically considers erratic flight paths of incoming missiles and determines assignments and firing sequences of interceptors within a simulation to minimize the number of hits to a protected asset is developed. Additionally, the first contemporary survey of the weapon target assignment problem since 1985 is presented. Collectively, this work extends the research of missile defense into practical application more so than currently is found within the literature

    A Hybrid Multiobjective Discrete Particle Swarm Optimization Algorithm for Cooperative Air Combat DWTA

    Get PDF

    Deep Reinforcement Learning for Weapons to Targets Assignment in a Hypersonic strike

    Full text link
    We use deep reinforcement learning (RL) to optimize a weapons to target assignment (WTA) policy for multi-vehicle hypersonic strike against multiple targets. The objective is to maximize the total value of destroyed targets in each episode. Each randomly generated episode varies the number and initial conditions of the hypersonic strike weapons (HSW) and targets, the value distribution of the targets, and the probability of a HSW being intercepted. We compare the performance of this WTA policy to that of a benchmark WTA policy derived using non-linear integer programming (NLIP), and find that the RL WTA policy gives near optimal performance with a 1000X speedup in computation time, allowing real time operation that facilitates autonomous decision making in the mission end game

    Designing a Framework for Target-Site Assignment in Naval Combat Management

    Get PDF
    In this study, using operational research techniques, a model has been presented to assess battlefield threat, to prioritise aggressive targets, to evaluate the capability of own sites and the risks of the conflict with the targets, to define conflict scenarios and finally to select the best scenario using an assignment model. The above proceedings were added as an intermediate phase of target-site assignment, called ‘deciding the best conflict scenario’, to the ‘threat assessment’ and ‘weapon-target assignment’ in the naval combat management system. For each of the own site, the data collected from the environment together with the panels of experts are shown in a two-dimensional matrix, in which the four areas of the matrix represent the conflict scenarios. Considering that the study was done in a simulated environment, the expert’s verification and the convergence of the results in Monte Carlo method were used to validate the research. The proposed model can offer optimised decision to the operational commander through predicting the battlefield and managing the site’s capacity and the interaction in between during the combat

    Determination of Fire Control Policies via Approximate Dynamic Programming

    Get PDF
    Given the ubiquitous nature of both offensive and defensive missile systems, the catastrophe-causing potential they represent, and the limited resources available to countries for missile defense, optimizing the defensive response to a missile attack is a necessary endeavor. For a single salvo of offensive missiles launched at a set of targets, a missile defense system protecting those targets must decide how many interceptors to fire at each incoming missile. Since such missile engagements often involve the firing of more than one attack salvo, we develop a Markov decision process (MDP) model to examine the optimal fire control policy for the defender. Due to the computational intractability of using exact methods for all but the smallest problem instances, we utilize an approximate dynamic programming (ADP) approach to explore the efficacy of applying approximate methods to the problem. We obtain policy insights by analyzing subsets of the state space that reflect a range of possible defender interceptor inventories. Testing of four scenarios demonstrates that the ADP policy provides high-quality decisions for a majority of the state space, achieving a 7.74% mean optimality gap in the baseline scenario. Moreover, computational effort for the ADP algorithm requires only a few minutes versus 12 hours for the exact dynamic programming algorithm, providing a method to address more complex and realistically-sized instances

    Techniques for the allocation of resources under uncertainty

    Get PDF
    L’allocation de ressources est un problème omniprésent qui survient dès que des ressources limitées doivent être distribuées parmi de multiples agents autonomes (e.g., personnes, compagnies, robots, etc). Les approches standard pour déterminer l’allocation optimale souffrent généralement d’une très grande complexité de calcul. Le but de cette thèse est de proposer des algorithmes rapides et efficaces pour allouer des ressources consommables et non consommables à des agents autonomes dont les préférences sur ces ressources sont induites par un processus stochastique. Afin d’y parvenir, nous avons développé de nouveaux modèles pour des problèmes de planifications, basés sur le cadre des Processus Décisionnels de Markov (MDPs), où l’espace d’actions possibles est explicitement paramétrisés par les ressources disponibles. Muni de ce cadre, nous avons développé des algorithmes basés sur la programmation dynamique et la recherche heuristique en temps-réel afin de générer des allocations de ressources pour des agents qui agissent dans un environnement stochastique. En particulier, nous avons utilisé la propriété acyclique des créations de tâches pour décomposer le problème d’allocation de ressources. Nous avons aussi proposé une stratégie de décomposition approximative, où les agents considèrent des interactions positives et négatives ainsi que les actions simultanées entre les agents gérants les ressources. Cependant, la majeure contribution de cette thèse est l’adoption de la recherche heuristique en temps-réel pour l’allocation de ressources. À cet effet, nous avons développé une approche basée sur la Q-décomposition munie de bornes strictes afin de diminuer drastiquement le temps de planification pour formuler une politique optimale. Ces bornes strictes nous ont permis d’élaguer l’espace d’actions pour les agents. Nous montrons analytiquement et empiriquement que les approches proposées mènent à des diminutions de la complexité de calcul par rapport à des approches de planification standard. Finalement, nous avons testé la recherche heuristique en temps-réel dans le simulateur SADM, un simulateur d’allocation de ressource pour une frégate.Resource allocation is an ubiquitous problem that arises whenever limited resources have to be distributed among multiple autonomous entities (e.g., people, companies, robots, etc). The standard approaches to determine the optimal resource allocation are computationally prohibitive. The goal of this thesis is to propose computationally efficient algorithms for allocating consumable and non-consumable resources among autonomous agents whose preferences for these resources are induced by a stochastic process. Towards this end, we have developed new models of planning problems, based on the framework of Markov Decision Processes (MDPs), where the action sets are explicitly parameterized by the available resources. Given these models, we have designed algorithms based on dynamic programming and real-time heuristic search to formulating thus allocations of resources for agents evolving in stochastic environments. In particular, we have used the acyclic property of task creation to decompose the problem of resource allocation. We have also proposed an approximative decomposition strategy, where the agents consider positive and negative interactions as well as simultaneous actions among the agents managing the resources. However, the main contribution of this thesis is the adoption of stochastic real-time heuristic search for a resource allocation. To this end, we have developed an approach based on distributed Q-values with tight bounds to diminish drastically the planning time to formulate the optimal policy. These tight bounds enable to prune the action space for the agents. We show analytically and empirically that our proposed approaches lead to drastic (in many cases, exponential) improvements in computational efficiency over standard planning methods. Finally, we have tested real-time heuristic search in the SADM simulator, a simulator for the resource allocation of a platform

    Applications of agent architectures to decision support in distributed simulation and training systems

    Get PDF
    This work develops the approach and presents the results of a new model for applying intelligent agents to complex distributed interactive simulation for command and control. In the framework of tactical command, control communications, computers and intelligence (C4I), software agents provide a novel approach for efficient decision support and distributed interactive mission training. An agent-based architecture for decision support is designed, implemented and is applied in a distributed interactive simulation to significantly enhance the command and control training during simulated exercises. The architecture is based on monitoring, evaluation, and advice agents, which cooperate to provide alternatives to the dec ision-maker in a time and resource constrained environment. The architecture is implemented and tested within the context of an AWACS Weapons Director trainer tool. The foundation of the work required a wide range of preliminary research topics to be covered, including real-time systems, resource allocation, agent-based computing, decision support systems, and distributed interactive simulations. The major contribution of our work is the construction of a multi-agent architecture and its application to an operational decision support system for command and control interactive simulation. The architectural design for the multi-agent system was drafted in the first stage of the work. In the next stage rules of engagement, objective and cost functions were determined in the AWACS (Airforce command and control) decision support domain. Finally, the multi-agent architecture was implemented and evaluated inside a distributed interactive simulation test-bed for AWACS Vv\u27Ds. The evaluation process combined individual and team use of the decision support system to improve the performance results of WD trainees. The decision support system is designed and implemented a distributed architecture for performance-oriented management of software agents. The approach provides new agent interaction protocols and utilizes agent performance monitoring and remote synchronization mechanisms. This multi-agent architecture enables direct and indirect agent communication as well as dynamic hierarchical agent coordination. Inter-agent communications use predefined interfaces, protocols, and open channels with specified ontology and semantics. Services can be requested and responses with results received over such communication modes. Both traditional (functional) parameters and nonfunctional (e.g. QoS, deadline, etc.) requirements and captured in service requests

    Optimal Policy for Sequential Stochastic Resource Allocation

    Get PDF
    A gambler in possession of R chips/coins is allowed N(\u3eR) pulls/trials at a slot machine. Upon pulling the arm, the slot machine realizes a random state i É›{1, ..., M} with probability p(i) and the corresponding positive monetary reward g(i) is presented to the gambler. The gambler can accept the reward by inserting a coin in the machine. However, the dilemma facing the gambler is whether to spend the coin or keep it in reserve hoping to pick up a greater reward in the future. We assume that the gambler has full knowledge of the reward distribution function. We are interested in the optimal gambling strategy that results in the maximal cumulative reward. The problem is naturally posed as a Stochastic Dynamic Program whose solution yields the optimal policy and expected cumulative reward. We show that the optimal strategy is a threshold policy, wherein a coin is spent if and only if the number of coins r exceeds a state and stage/trial dependent threshold value. We illustrate the utility of the result on a military operational scenario
    • …
    corecore