1,005 research outputs found

    Deep Reinforcement Learning for Swarm Systems

    Full text link
    Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20

    Optimal pilot decisions and flight trajectories in air combat

    Get PDF
    The thesis concerns the analysis and synthesis of pilot decision-making and the design of optimal flight trajectories. In the synthesis framework, the methodology of influence diagrams is applied for modeling and simulating the maneuvering decision process of the pilot in one-on-one air combat. The influence diagram representations describing the maneuvering decision in a one sided optimization setting and in a game setting are constructed. The synthesis of team decision-making in a multiplayer air combat is tackled by formulating a decision theoretical information prioritization approach based on a value function and interval analysis. It gives the team optimal sequence of tactical data that is transmitted between cooperating air units for improving the situation awareness of the friendly pilots in the best possible way. In the optimal trajectory planning framework, an approach towards the interactive automated solution of deterministic aircraft trajectory optimization problems is presented. It offers design principles for a trajectory optimization software that can be operated automatically by a nonexpert user. In addition, the representation of preferences and uncertainties in trajectory optimization is considered by developing a multistage influence diagram that describes a series of the maneuvering decisions in a one-on-one air combat setting. This influence diagram representation as well as the synthesis elaborations provide seminal ways to treat uncertainties in air combat modeling. The work on influence diagrams can also be seen as the extension of the methodology to dynamically evolving decision situations involving possibly multiple actors with conflicting objectives. From the practical point of view, all the synthesis models can be utilized in decision-making systems of air combat simulators. The information prioritization approach can also be implemented in an onboard data link system.reviewe

    Dagstuhl Reports : Volume 1, Issue 2, February 2011

    Get PDF
    Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-Hübner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro Pezzé, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn

    Partially Observable Stochastic Games with Neural Perception Mechanisms

    Full text link
    Stochastic games are a well established model for multi-agent sequential decision making under uncertainty. In reality, though, agents have only partial observability of their environment, which makes the problem computationally challenging, even in the single-agent setting of partially observable Markov decision processes. Furthermore, in practice, agents increasingly perceive their environment using data-driven approaches such as neural networks trained on continuous data. To tackle this problem, we propose the model of neuro-symbolic partially-observable stochastic games (NS-POSGs), a variant of continuous-space concurrent stochastic games that explicitly incorporates perception mechanisms. We focus on a one-sided setting, comprising a partially-informed agent with discrete, data-driven observations and a fully-informed agent with continuous observations. We present a new point-based method, called one-sided NS-HSVI, for approximating values of one-sided NS-POSGs and implement it based on the popular particle-based beliefs, showing that it has closed forms for computing values of interest. We provide experimental results to demonstrate the practical applicability of our method for neural networks whose preimage is in polyhedral form.Comment: 41 pages, 5 figure

    Accelerated K-Serial Stable Coalition for Dynamic Capture and Resource Defense

    Full text link
    Coalition is an important mean of multi-robot systems to collaborate on common tasks. An effective and adaptive coalition strategy is essential for the online performance in dynamic and unknown environments. In this work, the problem of territory defense by large-scale heterogeneous robotic teams is considered. The tasks include exploration, capture of dynamic targets, and perimeter defense over valuable resources. Since each robot can choose among many tasks, it remains a challenging problem to coordinate jointly these robots such that the overall utility is maximized. This work proposes a generic coalition strategy called K-serial stable coalition algorithm (KS-COAL). Different from centralized approaches, it is distributed and complete, meaning that only local communication is required and a K-serial Stable solution is ensured. Furthermore, to accelerate adaptation to dynamic targets and resource distribution that are only perceived online, a heterogeneous graph attention network (HGAN)-based heuristic is learned to select more appropriate parameters and promising initial solutions during local optimization. Compared with manual heuristics or end-to-end predictors, it is shown to both improve online adaptability and retain the quality guarantee. The proposed methods are validated rigorously via large-scale simulations with 170 robots and hardware experiments of 13 robots, against several strong baselines including GreedyNE and FastMaxSum.Comment: 8 pages, 10 figures, 1 tabl

    Optimal and Robust Neural Network Controllers for Proximal Spacecraft Maneuvers

    Get PDF
    Recent successes in machine learning research, buoyed by advances in computational power, have revitalized interest in neural networks and demonstrated their potential in solving complex controls problems. In this research, the reinforcement learning framework is combined with traditional direct shooting methods to generate optimal proximal spacecraft maneuvers. Open-loop and closed-loop feedback controllers, parameterized by multi-layer feed-forward artificial neural networks, are developed with evolutionary and gradient-based optimization algorithms. Utilizing Clohessy- Wiltshire relative motion dynamics, terminally constrained fixed-time, fuel-optimal trajectories are solved for intercept, rendezvous, and natural motion circumnavigation transfer maneuvers using three different thrust models: impulsive, finite, and continuous. In addition to optimality, the neurocontroller performance robustness to parametric uncertainty and bounded initial conditions is assessed. By bridging the gap between existing optimal and nonlinear control techniques, this research demonstrates that neurocontrollers offer a flexible and robust alternative approach to the solution of complex controls problems in the space domain and present a promising path forward to more capable, autonomous spacecraft

    Multi-Agent System Concepts Theory and Application Phases

    Get PDF

    A Methodology to Evolve Cooperation in Pursuit Domain using Genetic Network Programming

    Get PDF
    The design of strategies to devise teamwork and cooperation among agents is a central research issue in the field of multi-agent systems (MAS). The complexity of the cooperative strategy design can rise rapidly with increasing number of agents and their behavioral sophistication. The field of cooperative multi-agent learning promises solutions to such problems by attempting to discover agent behaviors as well as suggesting new approaches by applying machine learning techniques. Due to the difficulty in specifying a priori for an effective algorithm for multiple interacting agents, and the inherent adaptability of artificially evolved agents, recently, the use of evolutionary computation as a machining learning technique and a design process has received much attention. In this thesis, we design a methodology using an evolutionary computation technique called Genetic Network Programming (GNP) to automatically evolve teamwork and cooperation among agents in the pursuit domain. Simulation results show that our proposed methodology was effective in evolving teamwork and cooperation among agents. Compared with Genetic Programming approaches, its performance is significantly superior, its computation cost is less and the learning speed is faster. We also provide some analytical results of the proposed approach

    Discovering the potential of utilizing artificial intelligence in tax procedures : AI-powered artifact as a knowledge creation instrument

    Get PDF
    Artificial intelligence, machine learning, and deep learning have become ubiquitous concepts. Interest in their utilization opportunities in many sectors has exponentially grown during recent decades partly due to the exponential growth of computer power and the increased availability of data, allowing for more powerful and sophisticated information technology solutions. Technological maturity has lowered the threshold, and various open-source libraries and active communities enable the utilization of algorithms such as neural networks in practice. This thesis set out to find whether deep learning algorithms could be utilized in a value-adding way in the procedure for limited liability companies responsible for handling tax claims in the case organization the Finnish Tax Administration. Additionally, the creation and deployment of artificial intelligence solutions should consider legal and ethical manners as restrictive key concerns. The research was carried out according to the action design research method in which the focus of the research is concurrently building a suitable artifact for the organization and learning (design principles) from the creation and intervention itself. The research method was chosen due to its inclination towards authenticity in the organization and organizational centricity. As a result, the project team consisting of three members created two functional artifacts: one based on neural networks and another based on self-organizing maps. The case organization provided data fueling the deep learning algorithms. Data consisted of financial information of anonymous limited liability companies in Finland. The artifacts were limited to function only as knowledge creation instruments due to legal and ethical limitations present in the context. Knowledge creation in this research context refers to the artifact's ability to identify customers not returning (defaulting) their income tax returns from others. The created artifacts functioned sufficiently, and their ability to identify defaulting customers from others was promising. Results suggest that it is recommendable to approach problems with more than one artifact solution, and focused roles in the project team are recommended. Artificial intelligence-based artifacts are seen as value-adding since the knowledge created by them can potentially save time, liberate resources and expedite processes. However, finalized artifacts were not created, and testing was limited to a simulated environment. The design principles that emerged from the artifact creation focused on addressing the legal and ethical challenges associated with artificial intelligence in taxation to secure sustainable artifact creation and usage. Design principles were divided into three levels: trustworthiness through accuracy, legal and ethical restrictions and limitations of use, and justification of use. An organization-defined performance threshold needs to be reached by an artifact. An artifact must be transparent and regulated to fulfill context-specified legal and ethical limitations. Lastly, a preliminary inspection of artificial intelligence usage in a case organization is required. Consequently, the preliminary results of this research should be validated by applying the concept in a case organization, followed by an analysis of the results in an end-user setting. Tekoäly, koneoppiminen ja syväoppiminen ovat muodostuneet kaikkialla läsnäoleviksi käsitteiksi. Kiinnostus niiden hyödyntämispotentiaaliin monilla toimialoilla on kasvanut viimeisten vuosikymmenten aikana. Laskentatehon ja saatavilla olevan tiedon eksponentiaalinen kasvu mahdollistavat tehokkaampien ja monimutkaisempien ratkaisujen luomisen. Teknologian maturiteetin kypsyminen on laskenut kynnystä ja avoimet ohjelmistokirjastot sekä aktiiviset yhteisöt mahdollistavat neuroverkkojen kaltaisten algoritmien hyödyntämisen käytännössä. Tämän opinnäytetyön tarkoitus oli tutkia tuottaako syväoppimisalgoritmien hyödyntäminen lisäarvoa osakeyhtiöiden verotuksen oikaisumenettelyssä Verohallinnossa. Lainmukaisten ja eettisten tekoälysovellusten luominen ja käyttöönotto tunnistettiin rajoittavaksi ja keskeiseksi tekijäksi. Tutkimus toteutettiin toiminnan suunnittelututkimuksen mukaisesti, jossa on tarkoitus samanaikaisesti luoda kohdeorganisaation soveltuva artefakti sekä oppia (suunnitteluperiaatteet) artefaktin luomisesta ja interventiosta organisaatioon. Tutkimusmenetelmä valittiin sen organisaatiokeskeisyyden ja organisaatiokohtaisen aitouden vuoksi. Tutkimusmenetelmän soveltamisen seurauksena kolmehenkinen projektiryhmä loi kaksi toimivaa artefaktia, joista toinen pohjautui neuroverkkoihin ja toinen itseohjautuviin karttoihin. Kohdeorganisaatio toimitti syväoppimisalgoritmien tarvitseman datan. Data koostui tunnistamattomien suomalaisten osakeyhtiöiden taloustiedoista. Artefaktit oli rajattu toimimaan ainoastaan nk. tietoa tuottavina työkaluina johtuen lain ja etiikan rajoitteista. Tiedon tuottamisella tutkimuskontekstissa viitataan artefaktin kykyyn tunnistaa asiakkaita, jotka eivät täytä niiden tuloverotuksen veroilmoitusvelvollisuutta. Luodut artefaktit toimivat riittävällä tasolla. Niiden kyky tunnistaa haluttua asiakasryhmää oli lupaava. Tulosten perusteella on suositeltavaa lähestyä ongelmia luomalla useita erilaisia tekoälysovellutuksia. Lisäksi suositellaan kiinnittämään huomiota keskitettyihin rooleihin projektiryhmässä. Tekoälypohjaiset artefaktit nähdään lisäarvoa tuottavina. Niiden tuottaman tiedon perusteella on mahdollista säästää aikaa, vapauttaa resursseja ja nopeuttaa prosesseja. Viimeisteltyjä ja organisaatioon vapautettuja artefakteja ei luotu. Artefaktien luonnin ja testauksen perusteella syntyneet suunnitteluperiaatteet keskittyivät vastaamaan lain ja eettisyyden asettamiin rajoitteisiin, jotka liittyvät tekoälyn hyödyntämiseen verotuksessa. Näin on mahdollista varmistaa kestävä tapa luoda artefakteja ja ottaa niitä käyttöön. Suunnitteluperiaatteet jaettiin kolmeen tasoon: luottamus tarkkuuden kautta, lain ja eettisyyden luomat rajoitteet käytössä ja tekoälyn käytön perustelu. Artefaktin tulee ylittää organisaatiokohtainen kynnys suorituskyvylle. Artefaktin tulee olla läpinäkyvä ja säännelty, jotta se noudattaa kohdeympäristönsä rajoitteita. Ennakollinen tutkimus tekoälyn hyödyntämiskohteista organisaatiossa on kehoitettavaa. Tämän työn saavuttamat ennakolliset tulokset on suositeltavaa vahvistaa kohdeorganisaatiossa, jota seuraa tulosten analysointi loppukäyttäjien keskuudessa
    corecore