35 research outputs found

    General-Sum Multi-Agent Continuous Inverse Optimal Control

    Get PDF
    IEEE Modelling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behaviour of a human agent is advantageous for modelling the behaviour with Markov Decision Processes (MDPs). However, learning the rewards that determine the observed actions from data is complicated by interactions. We present a novel inverse reinforcement learning(IRL) algorithm that can infer the reward function in multi-agent interactive scenarios. In particular, the agents may act boundedly rational (i.e., sub-optimal), a characteristic that is typical for human decision making. Additionally, every agent optimizes its own reward function which makes it possible to address non-cooperative setups. In contrast to other methods, the algorithm does not rely on reinforcement learning during inference of the parameters of the reward function. We demonstrate that our proposed method accurately infers the ground truth reward function in two-agent interactive experiments

    Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

    Full text link
    Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.Comment: 13 pages, 4 figures, version 2, accepted at ANTS 201

    Bayesian RL in factored POMDPs

    Get PDF
    Robust decision-making agents in any non-trivial system must reason over uncertainty of various types such as action outcomes, the agent's current state and the dynamics of the environment. The outcome and state un- certainty are elegantly captured by the Partially Observable Markov Decision Processes (POMDP) framework [1], which enable reasoning in stochastic, par- tially observable environments. POMDP solution methods, however, typically assume complete access to the system dynamics, which unfortunately are often not available. When such a model is not available, model-based Bayesian Re- inforcement Learning (BRL) methods explicitly maintain a posterior over the possible models of the environment, and use this knowledge to select actions that, theoretically, trade o_ exploration and exploitation optimally. However, few of the BRL methods are applicable to partial observable settings, and those that are, have limited scaling properties. The Bayes-Adaptive POMDP (BA- POMDP) [4], for example, models the environment in a tabular fashion, which poses a bottleneck for scalability. Here, we describe previous work [3] that pro- poses a method to overcome this bottleneck by representing the dynamics with Bayes Network, an approach that exploits structure in the form of independence between state and observation features.Interactive Intelligenc

    Multiagent Sequential Decision Making (MSDM)

    Get PDF

    Interspecific Germline Transmission of Cultured Primordial Germ Cells

    Get PDF
    In birds, the primordial germ cell (PGC) lineage separates from the soma within 24 h following fertilization. Here we show that the endogenous population of about 200 PGCs from a single chicken embryo can be expanded one million fold in culture. When cultured PGCs are injected into a xenogeneic embryo at an equivalent stage of development, they colonize the testis. At sexual maturity, these donor PGCs undergo spermatogenesis in the xenogeneic host and become functional sperm. Insemination of semen from the xenogeneic host into females from the donor species produces normal offspring from the donor species. In our model system, the donor species is chicken (Gallus domesticus) and the recipient species is guinea fowl (Numida meleagris), a member of a different avian family, suggesting that the mechanisms controlling proliferation of the germline are highly conserved within birds. From a pragmatic perspective, these data are the basis of a novel strategy to produce endangered species of birds using domesticated hosts that are both tractable and fecund

    Development and application of genomic control methods for genome-wide association studies using non-additive models

    Get PDF
    Genome-wide association studies (GWAS) comprise a powerful tool for mapping genes of complex traits. However, an inflation of the test statistic can occur because of population substructure or cryptic relatedness, which could cause spurious associations. If information on a large number of genetic markers is available, adjusting the analysis results by using the method of genomic control (GC) is possible. GC was originally proposed to correct the Cochran-Armitage additive trend test. For non-additive models, correction has been shown to depend on allele frequencies. Therefore, usage of GC is limited to situations where allele frequencies of null markers and candidate markers are matched. In this work, we extended the capabilities of the GC method for non-additive models, which allows us to use null markers with arbitrary allele frequencies for GC. Analytical expressions for the inflation of a test statistic describing its dependency on allele frequency and several population parameters were obtained for recessive, dominant, and over-dominant models of inheritance. We proposed a method to estimate these required population parameters. Furthermore, we suggested a GC method based on approximation of the correction coefficient by a polynomial of allele frequency and described procedures to correct the genotypic (two degrees of freedom) test for cases when the model of inheritance is unknown. Statistical properties of the described methods were investigated using simulated and real data. We demonstrated that all considered methods were effective in controlling type 1 error in the presence of genetic substructure. The proposed GC methods can be applied to statistical tests for GWAS with various models of inheritance. All methods developed and tested in this work were implemented using R language as a part of the GenABEL package
    corecore