351,410 research outputs found

    Gradient-free Policy Architecture Search and Adaptation

    Full text link
    We develop a method for policy architecture search and adaptation via gradient-free optimization which can learn to perform autonomous driving tasks. By learning from both demonstration and environmental reward we develop a model that can learn with relatively few early catastrophic failures. We first learn an architecture of appropriate complexity to perceive aspects of world state relevant to the expert demonstration, and then mitigate the effect of domain-shift during deployment by adapting a policy demonstrated in a source domain to rewards obtained in a target environment. We show that our approach allows safer learning than baseline methods, offering a reduced cumulative crash metric over the agent's lifetime as it learns to drive in a realistic simulated environment.Comment: Accepted in Conference on Robot Learning, 201

    Clustering Without Knowing How To: Application and Evaluation

    Full text link
    Crowdsourcing allows running simple human intelligence tasks on a large crowd of workers, enabling solving problems for which it is difficult to formulate an algorithm or train a machine learning model in reasonable time. One of such problems is data clustering by an under-specified criterion that is simple for humans, but difficult for machines. In this demonstration paper, we build a crowdsourced system for image clustering and release its code under a free license at https://github.com/Toloka/crowdclustering. Our experiments on two different image datasets, dresses from Zalando's FEIDEGGER and shoes from the Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no machine learning algorithms purely with crowdsourcing.Comment: accepted at ECIR 2023 Demonstration Trac

    Differentiable Programming Tensor Networks

    Full text link
    Differentiable programming is a fresh programming paradigm which composes parameterized algorithmic components and trains them using automatic differentiation (AD). The concept emerges from deep learning but is not only limited to training neural networks. We present theory and practice of programming tensor network algorithms in a fully differentiable way. By formulating the tensor network algorithm as a computation graph, one can compute higher order derivatives of the program accurately and efficiently using AD. We present essential techniques to differentiate through the tensor networks contractions, including stable AD for tensor decomposition and efficient backpropagation through fixed point iterations. As a demonstration, we compute the specific heat of the Ising model directly by taking the second order derivative of the free energy obtained in the tensor renormalization group calculation. Next, we perform gradient based variational optimization of infinite projected entangled pair states for quantum antiferromagnetic Heisenberg model and obtain start-of-the-art variational energy and magnetization with moderate efforts. Differentiable programming removes laborious human efforts in deriving and implementing analytical gradients for tensor network programs, which opens the door to more innovations in tensor network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted for publication in PRX. Source code available at https://github.com/wangleiphy/tensorgra

    The Double-edged Sword of Pedagogy: Modeling the Effect of Pedagogical Contexts on Preschoolers’ Exploratory Play

    Get PDF
    URL to paper from conference siteHow does explicit instruction affect exploratory play and learning? We present a model that captures pedagogical assumptions (adapted from Shafto and Goodman, 2008) and test the model with a novel experiment looking at 4-year-olds’ exploratory play in pedagogical and non-pedagogical contexts. Our findings are consistent with the model predictions: preschool children limit their exploration in pedagogical contexts, spending most of their free play performing only the demonstrated action. By contrast, children explore broadly both at baseline and after an accidental demonstration. Thus pedagogy constrains children’s exploration for better and for worse; children learn the demonstrated causal relationship but are less likely than children in non-pedagogical contexts to discover and learn other causal relationships.American Psychological Foundation (Elizabeth Munsterberg Koppitz Fellowship)Templeton FoundationJames S. McDonnell Foundatio

    Forget Demonstrations, Focus on Learning from Textual Instructions

    Full text link
    This work studies a challenging yet more realistic setting for zero-shot cross-task generalization: demonstration-free learning from textual instructions, presuming the existence of a paragraph-style task definition while no demonstrations exist. To better learn the task supervision from the definition, we propose two strategies: first, to automatically find out the critical sentences in the definition; second, a ranking objective to force the model to generate the gold outputs with higher probabilities when those critical parts are highlighted in the definition. The joint efforts of the two strategies yield state-of-the-art performance on the challenging benchmark. Our code will be released in the final version of the paper.Comment: Preprin

    Realizing a deep reinforcement learning agent discovering real-time feedback control strategies for a quantum system

    Get PDF
    To realize the full potential of quantum technologies, finding good strategies to control quantum information processing devices in real time becomes increasingly important. Usually these strategies require a precise understanding of the device itself, which is generally not available. Model-free reinforcement learning circumvents this need by discovering control strategies from scratch without relying on an accurate description of the quantum system. Furthermore, important tasks like state preparation, gate teleportation and error correction need feedback at time scales much shorter than the coherence time, which for superconducting circuits is in the microsecond range. Developing and training a deep reinforcement learning agent able to operate in this real-time feedback regime has been an open challenge. Here, we have implemented such an agent in the form of a latency-optimized deep neural network on a field-programmable gate array (FPGA). We demonstrate its use to efficiently initialize a superconducting qubit into a target state. To train the agent, we use model-free reinforcement learning that is based solely on measurement data. We study the agent’s performance for strong and weak measurements, and for three-level readout, and compare with simple strategies based on thresholding. This demonstration motivates further research towards adoption of reinforcement learning for real-time feedback control of quantum devices and more generally any physical system requiring learnable low-latency feedback control

    Bayesian Neural Networks for Fast SUSY Predictions

    Full text link
    One of the goals of current particle physics research is to obtain evidence for new physics, that is, physics beyond the Standard Model (BSM), at accelerators such as the Large Hadron Collider (LHC) at CERN. The searches for new physics are often guided by BSM theories that depend on many unknown parameters, which, in some cases, makes testing their predictions difficult. In this paper, machine learning is used to model the mapping from the parameter space of the phenomenological Minimal Supersymmetric Standard Model (pMSSM), a BSM theory with 19 free parameters, to some of its predictions. Bayesian neural networks are used to predict cross sections for arbitrary pMSSM parameter points, the mass of the associated lightest neutral Higgs boson, and the theoretical viability of the parameter points. All three quantities are modeled with average percent errors of 3.34% or less and in a time significantly shorter than is possible with the supersymmetry codes from which the results are derived. These results are a further demonstration of the potential for machine learning to model accurately the mapping from the high dimensional spaces of BSM theories to their predictions
    • …
    corecore