167 research outputs found

    On Backdoors to Tractable Constraint Languages

    Get PDF
    International audienceIn the context of CSPs, a strong backdoor is a subset of variables such that every complete assignment yields a residual instance guaranteed to have a specified property. If the property allows efficient solving, then a small strong backdoor provides a reasonable decomposition of the original instance into easy instances. An important challenge is the design of algorithms that can find quickly a small strong backdoor if one exists. We present a systematic study of the parameterized complexity of backdoor detection when the target property is a restricted type of constraint language defined by means of a family of polymor-phisms. In particular, we show that under the weak assumption that the polymorphisms are idempotent, the problem is unlikely to be FPT when the parameter is either r (the constraint arity) or k (the size of the backdoor) unless P = NP or FPT = W[2]. When the parameter is k + r, however, we are able to identify large classes of languages for which the problem of finding a small backdoor is FPT

    On the reduction of the CSP dichotomy conjecture to digraphs

    Full text link
    It is well known that the constraint satisfaction problem over general relational structures can be reduced in polynomial time to digraphs. We present a simple variant of such a reduction and use it to show that the algebraic dichotomy conjecture is equivalent to its restriction to digraphs and that the polynomial reduction can be made in logspace. We also show that our reduction preserves the bounded width property, i.e., solvability by local consistency methods. We discuss further algorithmic properties that are preserved and related open problems.Comment: 34 pages. Article is to appear in CP2013. This version includes two appendices with proofs of claims omitted from the main articl

    Artificial Neural Network-based error compensation procedure for low-cost encoders

    Full text link
    An Artificial Neural Network-based error compensation method is proposed for improving the accuracy of resolver-based 16-bit encoders by compensating for their respective systematic error profiles. The error compensation procedure, for a particular encoder, involves obtaining its error profile by calibrating it on a precision rotary table, training the neural network by using a part of this data and then determining the corrected encoder angle by subtracting the ANN-predicted error from the measured value of the encoder angle. Since it is not guaranteed that all the resolvers will have exactly similar error profiles because of the inherent differences in their construction on a micro scale, the ANN has been trained on one error profile at a time and the corresponding weight file is then used only for compensating the systematic error of this particular encoder. The systematic nature of the error profile for each of the encoders has also been validated by repeated calibration of the encoders over a period of time and it was found that the error profiles of a particular encoder recorded at different epochs show near reproducible behavior. The ANN-based error compensation procedure has been implemented for 4 encoders by training the ANN with their respective error profiles and the results indicate that the accuracy of encoders can be improved by nearly an order of magnitude from quoted values of ~6 arc-min to ~0.65 arc-min when their corresponding ANN-generated weight files are used for determining the corrected encoder angle.Comment: 16 pages, 4 figures. Accepted for Publication in Measurement Science and Technology (MST

    Deep Reinforcement Learning: An Overview

    Full text link
    In recent years, a specific machine learning method called deep learning has gained huge attraction, as it has obtained astonishing results in broad applications such as pattern recognition, speech recognition, computer vision, and natural language processing. Recent research has also been shown that deep learning techniques can be combined with reinforcement learning methods to learn useful representations for the problems with high dimensional raw data input. This chapter reviews the recent advances in deep reinforcement learning with a focus on the most used deep architectures such as autoencoders, convolutional neural networks and recurrent neural networks which have successfully been come together with the reinforcement learning framework.Comment: Proceedings of SAI Intelligent Systems Conference (IntelliSys) 201

    Tractability in Constraint Satisfaction Problems: A Survey

    Get PDF
    International audienceEven though the Constraint Satisfaction Problem (CSP) is NP-complete, many tractable classes of CSP instances have been identified. After discussing different forms and uses of tractability, we describe some landmark tractable classes and survey recent theoretical results. Although we concentrate on the classical CSP, we also cover its important extensions to infinite domains and optimisation, as well as #CSP and QCSP

    Approximate policy iteration: A survey and some new methods

    Get PDF
    We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

    Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail

    Get PDF
    Changes of synaptic connections between neurons are thought to be the physiological basis of learning. These changes can be gated by neuromodulators that encode the presence of reward. We study a family of reward-modulated synaptic learning rules for spiking neurons on a learning task in continuous space inspired by the Morris Water maze. The synaptic update rule modifies the release probability of synaptic transmission and depends on the timing of presynaptic spike arrival, postsynaptic action potentials, as well as the membrane potential of the postsynaptic neuron. The family of learning rules includes an optimal rule derived from policy gradient methods as well as reward modulated Hebbian learning. The synaptic update rule is implemented in a population of spiking neurons using a network architecture that combines feedforward input with lateral connections. Actions are represented by a population of hypothetical action cells with strong mexican-hat connectivity and are read out at theta frequency. We show that in this architecture, a standard policy gradient rule fails to solve the Morris watermaze task, whereas a variant with a Hebbian bias can learn the task within 20 trials, consistent with experiments. This result does not depend on implementation details such as the size of the neuronal populations. Our theoretical approach shows how learning new behaviors can be linked to reward-modulated plasticity at the level of single synapses and makes predictions about the voltage and spike-timing dependence of synaptic plasticity and the influence of neuromodulators such as dopamine. It is an important step towards connecting formal theories of reinforcement learning with neuronal and synaptic properties

    An Imperfect Dopaminergic Error Signal Can Drive Temporal-Difference Learning

    Get PDF
    An open problem in the field of computational neuroscience is how to link synaptic plasticity to system-level learning. A promising framework in this context is temporal-difference (TD) learning. Experimental evidence that supports the hypothesis that the mammalian brain performs temporal-difference learning includes the resemblance of the phasic activity of the midbrain dopaminergic neurons to the TD error and the discovery that cortico-striatal synaptic plasticity is modulated by dopamine. However, as the phasic dopaminergic signal does not reproduce all the properties of the theoretical TD error, it is unclear whether it is capable of driving behavior adaptation in complex tasks. Here, we present a spiking temporal-difference learning model based on the actor-critic architecture. The model dynamically generates a dopaminergic signal with realistic firing rates and exploits this signal to modulate the plasticity of synapses as a third factor. The predictions of our proposed plasticity dynamics are in good agreement with experimental results with respect to dopamine, pre- and post-synaptic activity. An analytical mapping from the parameters of our proposed plasticity dynamics to those of the classical discrete-time TD algorithm reveals that the biological constraints of the dopaminergic signal entail a modified TD algorithm with self-adapting learning parameters and an adapting offset. We show that the neuronal network is able to learn a task with sparse positive rewards as fast as the corresponding classical discrete-time TD algorithm. However, the performance of the neuronal network is impaired with respect to the traditional algorithm on a task with both positive and negative rewards and breaks down entirely on a task with purely negative rewards. Our model demonstrates that the asymmetry of a realistic dopaminergic signal enables TD learning when learning is driven by positive rewards but not when driven by negative rewards

    Differential Allocation of Constitutive and Induced Chemical Defenses in Pine Tree Juveniles: A Test of the Optimal Defense Theory

    Get PDF
    Optimal defense theory (ODT) predicts that the within-plant quantitative allocation of defenses is not random, but driven by the potential relative contribution of particular plant tissues to overall fitness. These predictions have been poorly tested on long-lived woody plants. We explored the allocation of constitutive and methyl-jasmonate (MJ) inducible chemical defenses in six half-sib families of Pinus radiata juveniles. Specifically, we studied the quantitative allocation of resin and polyphenolics (the two major secondary chemicals in pine trees) to tissues with contrasting fitness value (stem phloem, stem xylem and needles) across three parts of the plants (basal, middle and apical upper part), using nitrogen concentration as a proxy of tissue value. Concentration of nitrogen in the phloem, xylem and needles was found to be greater higher up the plant. As predicted by the ODT, the same pattern was found for the concentration of non-volatile resin in the stem. However, in leaf tissues the concentrations of both resin and total phenolics were greater towards the base of the plant. Two weeks after MJ application, the concentrations of nitrogen in the phloem, resin in the stem and total phenolics in the needles increased by roughly 25% compared with the control plants, inducibility was similar across all plant parts, and families differed in the inducibility of resin compounds in the stem. In contrast, no significant changes were observed either for phenolics in the stems, or for resin in the needles after MJ application. Concentration of resin in the phloem was double that in the xylem and MJ-inducible, with inducibility being greater towards the base of the stem. In contrast, resin in the xylem was not MJ-inducible and increased in concentration higher up the plant. The pattern of inducibility by MJ-signaling in juvenile P. radiata is tissue, chemical-defense and plant-part specific, and is genetically variable
    • …
    corecore