2,581 research outputs found
Learning Task Specifications from Demonstrations
Real world applications often naturally decompose into several sub-tasks. In
many settings (e.g., robotics) demonstrations provide a natural way to specify
the sub-tasks. However, most methods for learning from demonstrations either do
not provide guarantees that the artifacts learned for the sub-tasks can be
safely recombined or limit the types of composition available. Motivated by
this deficit, we consider the problem of inferring Boolean non-Markovian
rewards (also known as logical trace properties or specifications) from
demonstrations provided by an agent operating in an uncertain, stochastic
environment. Crucially, specifications admit well-defined composition rules
that are typically easy to interpret. In this paper, we formulate the
specification inference task as a maximum a posteriori (MAP) probability
inference problem, apply the principle of maximum entropy to derive an analytic
demonstration likelihood model and give an efficient approach to search for the
most likely specification in a large candidate pool of specifications. In our
experiments, we demonstrate how learning specifications can help avoid common
problems that often arise due to ad-hoc reward composition.Comment: NIPS 201
Measuring Arbitrary Physical Properties in Analog Quantum Simulation
A central challenge in analog quantum simulation is to characterize desirable
physical properties of quantum states produced in experiments. However, in
conventional approaches, the extraction of arbitrary information requires
performing measurements in many different bases, which necessitates a high
level of control that present-day quantum devices may not have. Here, we
propose and analyze a scalable protocol that leverages the ergodic nature of
generic quantum dynamics, enabling the efficient extraction of many physical
properties. The protocol does not require sophisticated controls and can be
generically implemented in analog quantum simulation platforms today. Our
protocol involves introducing ancillary degrees of freedom in a predetermined
state to a system of interest, quenching the joint system under Hamiltonian
dynamics native to the particular experimental platform, and then measuring
globally in a single, fixed basis. We show that arbitrary information of the
original quantum state is contained within such measurement data, and can be
extracted using a classical data-processing procedure. We numerically
demonstrate our approach with a number of examples, including the measurements
of entanglement entropy, many-body Chern number, and various superconducting
orders in systems of neutral atom arrays, bosonic and fermionic particles on
optical lattices, respectively, only assuming existing technological
capabilities. Our protocol excitingly promises to overcome limited
controllability and, thus, enhance the versatility and utility of near-term
quantum technologies
Structures and Fragmentations of Small Silicon Oxide Clusters by ab Initio Calculations
The structures, energies, and fragmentation stabilities of silicon oxide clusters SimOn, with m = 1−5, n = 1, 2m + 1, are studied systematically by ab initio calculations. New structures for nine clusters are found to be energetically more favorable than previously proposed structures. Using the ground state structures and energies obtained from our calculations, we have also studied fragmentation pathways and dissociation energies of the clusters. Our computational results show that the dissociation energy is strongly correlated with the O/Si ratio. Oxygen-rich clusters tend to have larger dissociation energies, as well as larger HOMO−LUMO gaps. Our calculations also show that SiO is the most abundant species in the fragmentation products
Humans decompose tasks by trading off utility and computational cost
Human behavior emerges from planning over elaborate decompositions of tasks
into goals, subgoals, and low-level actions. How are these decompositions
created and used? Here, we propose and evaluate a normative framework for task
decomposition based on the simple idea that people decompose tasks to reduce
the overall cost of planning while maintaining task performance. Analyzing
11,117 distinct graph-structured planning tasks, we find that our framework
justifies several existing heuristics for task decomposition and makes
predictions that can be distinguished from two alternative normative accounts.
We report a behavioral study of task decomposition () that uses 30
randomly sampled graphs, a larger and more diverse set than that of any
previous behavioral study on this topic. We find that human responses are more
consistent with our framework for task decomposition than alternative normative
accounts and are most consistent with a heuristic -- betweenness centrality --
that is justified by our approach. Taken together, our results provide new
theoretical insight into the computational principles underlying the
intelligent structuring of goal-directed behavior
Learning Rewards from Linguistic Feedback
We explore unconstrained natural language feedback as a learning signal for
artificial agents. Humans use rich and varied language to teach, yet most prior
work on interactive learning from language assumes a particular form of input
(e.g., commands). We propose a general framework which does not make this
assumption, using aspect-based sentiment analysis to decompose feedback into
sentiment about the features of a Markov decision process. We then perform an
analogue of inverse reinforcement learning, regressing the sentiment on the
features to infer the teacher's latent reward function. To evaluate our
approach, we first collect a corpus of teaching behavior in a cooperative task
where both teacher and learner are human. We implement three artificial
learners: sentiment-based "literal" and "pragmatic" models, and an inference
network trained end-to-end to predict latent rewards. We then repeat our
initial experiment and pair them with human teachers. All three successfully
learn from interactive human feedback. The sentiment models outperform the
inference network, with the "pragmatic" model approaching human performance.
Our work thus provides insight into the information structure of naturalistic
linguistic feedback as well as methods to leverage it for reinforcement
learning.Comment: 9 pages, 4 figures. AAAI '2
Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners
Successful teaching requires an assumption of how the learner learns - how
the learner uses experiences from the world to update their internal states. We
investigate what expectations people have about a learner when they teach them
in an online manner using rewards and punishment. We focus on a common
reinforcement learning method, Q-learning, and examine what assumptions people
have using a behavioral experiment. To do so, we first establish a normative
standard, by formulating the problem as a machine teaching optimization
problem. To solve the machine teaching optimization problem, we use a deep
learning approximation method which simulates learners in the environment and
learns to predict how feedback affects the learner's internal states. What do
people assume about a learner's learning and discount rates when they teach
them an idealized exploration-exploitation task? In a behavioral experiment, we
find that people can teach the task to Q-learners in a relatively efficient and
effective manner when the learner uses a small value for its discounting rate
and a large value for its learning rate. However, they still are suboptimal. We
also find that providing people with real-time updates of how possible feedback
would affect the Q-learner's internal states weakly helps them teach. Our
results reveal how people teach using evaluative feedback and provide guidance
for how engineers should design machine agents in a manner that is intuitive
for people.Comment: 21 pages, 4 figure
Exploring the hierarchical structure of human plans via program generation
Human behavior is inherently hierarchical, resulting from the decomposition
of a task into subtasks or an abstract action into concrete actions. However,
behavior is typically measured as a sequence of actions, which makes it
difficult to infer its hierarchical structure. In this paper, we explore how
people form hierarchically-structured plans, using an experimental paradigm
that makes hierarchical representations observable: participants create
programs that produce sequences of actions in a language with explicit
hierarchical structure. This task lets us test two well-established principles
of human behavior: utility maximization (i.e. using fewer actions) and minimum
description length (MDL; i.e. having a shorter program). We find that humans
are sensitive to both metrics, but that both accounts fail to predict a
qualitative feature of human-created programs, namely that people prefer
programs with reuse over and above the predictions of MDL. We formalize this
preference for reuse by extending the MDL account into a generative model over
programs, modeling hierarchy choice as the induction of a grammar over actions.
Our account can explain the preference for reuse and provides the best
prediction of human behavior, going beyond simple accounts of compressibility
to highlight a principle that guides hierarchical planning
- …