981 research outputs found

    Layered controller synthesis for dynamic multi-agent systems

    Full text link
    In this paper we present a layered approach for multi-agent control problem, decomposed into three stages, each building upon the results of the previous one. First, a high-level plan for a coarse abstraction of the system is computed, relying on parametric timed automata augmented with stopwatches as they allow to efficiently model simplified dynamics of such systems. In the second stage, the high-level plan, based on SMT-formulation, mainly handles the combinatorial aspects of the problem, provides a more dynamically accurate solution. These stages are collectively referred to as the SWA-SMT solver. They are correct by construction but lack a crucial feature: they cannot be executed in real time. To overcome this, we use SWA-SMT solutions as the initial training dataset for our last stage, which aims at obtaining a neural network control policy. We use reinforcement learning to train the policy, and show that the initial dataset is crucial for the overall success of the method

    Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration

    Full text link
    Deep Reinforcement Learning has been successfully applied to learn robotic control. However, the corresponding algorithms struggle when applied to problems where the agent is only rewarded after achieving a complex task. In this context, using demonstrations can significantly speed up the learning process, but demonstrations can be costly to acquire. In this paper, we propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration. To do so, our method learns a goal-conditioned policy to control a system between successive low-dimensional goals. This sequential goal-reaching approach raises a problem of compatibility between successive goals: we need to ensure that the state resulting from reaching a goal is compatible with the achievement of the following goals. To tackle this problem, we present a new algorithm called DCIL-II. We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up as well as fast running with a simulated Cassie robot. Our method leveraging sequentiality is a step towards the resolution of complex robotic tasks under minimal specification effort, a key feature for the next generation of autonomous robots

    The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

    Full text link
    In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.Comment: 10+7 page

    Assessing Quality-Diversity Neuro-Evolution Algorithms Performance in Hard Exploration Problems

    Full text link
    A fascinating aspect of nature lies in its ability to produce a collection of organisms that are all high-performing in their niche. Quality-Diversity (QD) methods are evolutionary algorithms inspired by this observation, that obtained great results in many applications, from wing design to robot adaptation. Recently, several works demonstrated that these methods could be applied to perform neuro-evolution to solve control problems in large search spaces. In such problems, diversity can be a target in itself. Diversity can also be a way to enhance exploration in tasks exhibiting deceptive reward signals. While the first aspect has been studied in depth in the QD community, the latter remains scarcer in the literature. Exploration is at the heart of several domains trying to solve control problems such as Reinforcement Learning and QD methods are promising candidates to overcome the challenges associated. Therefore, we believe that standardized benchmarks exhibiting control problems in high dimension with exploration difficulties are of interest to the QD community. In this paper, we highlight three candidate benchmarks and explain why they appear relevant for systematic evaluation of QD algorithms. We also provide open-source implementations in Jax allowing practitioners to run fast and numerous experiments on few compute resources.Comment: GECCO 2022 Workshop on Quality Diversity Algorithm Benchmark

    Extensive degeneracy, Coulomb phase and magnetic monopoles in an artificial realization of the square ice model

    Full text link
    Artificial spin ice systems have been introduced as a possible mean to investigate frustration effects in a well-controlled manner by fabricating lithographically-patterned two-dimensional arrangements of interacting magnetic nanostructures. This approach offers the opportunity to visualize unconventional states of matter, directly in real space, and triggered a wealth of studies at the frontier between nanomagnetism, statistical thermodynamics and condensed matter physics. Despite the strong efforts made these last ten years to provide an artificial realization of the celebrated square ice model, no simple geometry based on arrays of nanomagnets succeeded to capture the macroscopically degenerate ground state manifold of the corresponding model. Instead, in all works reported so far, square lattices of nanomagnets are characterized by a magnetically ordered ground state consisting of local flux-closure configurations with alternating chirality. Here, we show experimentally and theoretically, that all the characteristics of the square ice model can be observed if the artificial square lattice is properly designed. The spin configurations we image after demagnetizing our arrays reveal unambiguous signatures of an algebraic spin liquid state characterized by the presence of pinch points in the associated magnetic structure factor. Local excitations, i.e. classical analogues of magnetic monopoles, are found to be free to evolve in a massively degenerated, divergence-free vacuum. We thus provide the first lab-on-chip platform allowing the investigation of collective phenomena, including Coulomb phases and ice-like physics.Comment: 26 pages, 10 figure

    Optimasi Portofolio Resiko Menggunakan Model Markowitz MVO Dikaitkan dengan Keterbatasan Manusia dalam Memprediksi Masa Depan dalam Perspektif Al-Qur`an

    Full text link
    Risk portfolio on modern finance has become increasingly technical, requiring the use of sophisticated mathematical tools in both research and practice. Since companies cannot insure themselves completely against risk, as human incompetence in predicting the future precisely that written in Al-Quran surah Luqman verse 34, they have to manage it to yield an optimal portfolio. The objective here is to minimize the variance among all portfolios, or alternatively, to maximize expected return among all portfolios that has at least a certain expected return. Furthermore, this study focuses on optimizing risk portfolio so called Markowitz MVO (Mean-Variance Optimization). Some theoretical frameworks for analysis are arithmetic mean, geometric mean, variance, covariance, linear programming, and quadratic programming. Moreover, finding a minimum variance portfolio produces a convex quadratic programming, that is minimizing the objective function ðð¥with constraintsð ð 𥠥 ðandð´ð¥ = ð. The outcome of this research is the solution of optimal risk portofolio in some investments that could be finished smoothly using MATLAB R2007b software together with its graphic analysis

    Search for narrow resonances in dilepton mass spectra in proton-proton collisions at root s=13 TeV and combination with 8 TeV data

    Get PDF
    Peer reviewe

    Search for new physics with dijet angular distributions in proton-proton collisions at root S = 13 TeV

    Get PDF
    Peer reviewe

    Search for light bosons in decays of the 125 GeV Higgs boson in proton-proton collisions at root s=8 TeV

    Get PDF
    Peer reviewe

    Search for supersymmetry in events with photons and missing transverse energy in pp collisions at 13 TeV

    Get PDF
    Peer reviewe
    corecore