351,410 research outputs found
Gradient-free Policy Architecture Search and Adaptation
We develop a method for policy architecture search and adaptation via
gradient-free optimization which can learn to perform autonomous driving tasks.
By learning from both demonstration and environmental reward we develop a model
that can learn with relatively few early catastrophic failures. We first learn
an architecture of appropriate complexity to perceive aspects of world state
relevant to the expert demonstration, and then mitigate the effect of
domain-shift during deployment by adapting a policy demonstrated in a source
domain to rewards obtained in a target environment. We show that our approach
allows safer learning than baseline methods, offering a reduced cumulative
crash metric over the agent's lifetime as it learns to drive in a realistic
simulated environment.Comment: Accepted in Conference on Robot Learning, 201
Clustering Without Knowing How To: Application and Evaluation
Crowdsourcing allows running simple human intelligence tasks on a large crowd
of workers, enabling solving problems for which it is difficult to formulate an
algorithm or train a machine learning model in reasonable time. One of such
problems is data clustering by an under-specified criterion that is simple for
humans, but difficult for machines. In this demonstration paper, we build a
crowdsourced system for image clustering and release its code under a free
license at https://github.com/Toloka/crowdclustering. Our experiments on two
different image datasets, dresses from Zalando's FEIDEGGER and shoes from the
Toloka Shoes Dataset, confirm that one can yield meaningful clusters with no
machine learning algorithms purely with crowdsourcing.Comment: accepted at ECIR 2023 Demonstration Trac
Differentiable Programming Tensor Networks
Differentiable programming is a fresh programming paradigm which composes
parameterized algorithmic components and trains them using automatic
differentiation (AD). The concept emerges from deep learning but is not only
limited to training neural networks. We present theory and practice of
programming tensor network algorithms in a fully differentiable way. By
formulating the tensor network algorithm as a computation graph, one can
compute higher order derivatives of the program accurately and efficiently
using AD. We present essential techniques to differentiate through the tensor
networks contractions, including stable AD for tensor decomposition and
efficient backpropagation through fixed point iterations. As a demonstration,
we compute the specific heat of the Ising model directly by taking the second
order derivative of the free energy obtained in the tensor renormalization
group calculation. Next, we perform gradient based variational optimization of
infinite projected entangled pair states for quantum antiferromagnetic
Heisenberg model and obtain start-of-the-art variational energy and
magnetization with moderate efforts. Differentiable programming removes
laborious human efforts in deriving and implementing analytical gradients for
tensor network programs, which opens the door to more innovations in tensor
network algorithms and applications.Comment: Typos corrected, discussion and refs added; revised version accepted
for publication in PRX. Source code available at
https://github.com/wangleiphy/tensorgra
The Double-edged Sword of Pedagogy: Modeling the Effect of Pedagogical Contexts on Preschoolers’ Exploratory Play
URL to paper from conference siteHow does explicit instruction affect exploratory play and learning? We present a model that captures pedagogical assumptions (adapted from Shafto and Goodman, 2008) and test the model with a novel experiment looking at 4-year-olds’ exploratory play in pedagogical and non-pedagogical contexts. Our findings are consistent with the model predictions: preschool children limit their exploration in pedagogical contexts, spending most of their free play performing only the demonstrated action. By contrast, children explore broadly both at baseline and after an accidental demonstration. Thus pedagogy constrains children’s exploration for better and for worse; children learn the demonstrated causal relationship but are less likely than children in non-pedagogical contexts to discover and learn other causal relationships.American Psychological Foundation (Elizabeth Munsterberg Koppitz Fellowship)Templeton FoundationJames S. McDonnell Foundatio
Forget Demonstrations, Focus on Learning from Textual Instructions
This work studies a challenging yet more realistic setting for zero-shot
cross-task generalization: demonstration-free learning from textual
instructions, presuming the existence of a paragraph-style task definition
while no demonstrations exist. To better learn the task supervision from the
definition, we propose two strategies: first, to automatically find out the
critical sentences in the definition; second, a ranking objective to force the
model to generate the gold outputs with higher probabilities when those
critical parts are highlighted in the definition. The joint efforts of the two
strategies yield state-of-the-art performance on the challenging benchmark. Our
code will be released in the final version of the paper.Comment: Preprin
Realizing a deep reinforcement learning agent discovering real-time feedback control strategies for a quantum system
To realize the full potential of quantum technologies, finding good strategies to control quantum information processing devices in real time becomes increasingly important. Usually these strategies require a precise understanding of the device itself, which is generally not available. Model-free reinforcement learning circumvents this need by discovering control strategies from scratch without relying on an accurate description of the quantum system. Furthermore, important tasks like state preparation, gate teleportation and error correction need feedback at time scales much shorter than the coherence time, which for superconducting circuits is in the microsecond range. Developing and training a deep reinforcement learning agent able to operate in this real-time feedback regime has been an open challenge. Here, we have implemented such an agent in the form of a latency-optimized deep neural network on a field-programmable gate array (FPGA). We demonstrate its use to efficiently initialize a superconducting qubit into a target state. To train the agent, we use model-free reinforcement learning that is based solely on measurement data. We study the agent’s performance for strong and weak measurements, and for three-level readout, and compare with simple strategies based on thresholding. This demonstration motivates further research towards adoption of reinforcement learning for real-time feedback control of quantum devices and more generally any physical system requiring learnable low-latency feedback control
Bayesian Neural Networks for Fast SUSY Predictions
One of the goals of current particle physics research is to obtain evidence
for new physics, that is, physics beyond the Standard Model (BSM), at
accelerators such as the Large Hadron Collider (LHC) at CERN. The searches for
new physics are often guided by BSM theories that depend on many unknown
parameters, which, in some cases, makes testing their predictions difficult. In
this paper, machine learning is used to model the mapping from the parameter
space of the phenomenological Minimal Supersymmetric Standard Model (pMSSM), a
BSM theory with 19 free parameters, to some of its predictions. Bayesian neural
networks are used to predict cross sections for arbitrary pMSSM parameter
points, the mass of the associated lightest neutral Higgs boson, and the
theoretical viability of the parameter points. All three quantities are modeled
with average percent errors of 3.34% or less and in a time significantly
shorter than is possible with the supersymmetry codes from which the results
are derived. These results are a further demonstration of the potential for
machine learning to model accurately the mapping from the high dimensional
spaces of BSM theories to their predictions
- …