2,581 research outputs found

    Learning Task Specifications from Demonstrations

    Full text link
    Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

    Measuring Arbitrary Physical Properties in Analog Quantum Simulation

    Full text link
    A central challenge in analog quantum simulation is to characterize desirable physical properties of quantum states produced in experiments. However, in conventional approaches, the extraction of arbitrary information requires performing measurements in many different bases, which necessitates a high level of control that present-day quantum devices may not have. Here, we propose and analyze a scalable protocol that leverages the ergodic nature of generic quantum dynamics, enabling the efficient extraction of many physical properties. The protocol does not require sophisticated controls and can be generically implemented in analog quantum simulation platforms today. Our protocol involves introducing ancillary degrees of freedom in a predetermined state to a system of interest, quenching the joint system under Hamiltonian dynamics native to the particular experimental platform, and then measuring globally in a single, fixed basis. We show that arbitrary information of the original quantum state is contained within such measurement data, and can be extracted using a classical data-processing procedure. We numerically demonstrate our approach with a number of examples, including the measurements of entanglement entropy, many-body Chern number, and various superconducting orders in systems of neutral atom arrays, bosonic and fermionic particles on optical lattices, respectively, only assuming existing technological capabilities. Our protocol excitingly promises to overcome limited controllability and, thus, enhance the versatility and utility of near-term quantum technologies

    Structures and Fragmentations of Small Silicon Oxide Clusters by ab Initio Calculations

    Get PDF
    The structures, energies, and fragmentation stabilities of silicon oxide clusters SimOn, with m = 1−5, n = 1, 2m + 1, are studied systematically by ab initio calculations. New structures for nine clusters are found to be energetically more favorable than previously proposed structures. Using the ground state structures and energies obtained from our calculations, we have also studied fragmentation pathways and dissociation energies of the clusters. Our computational results show that the dissociation energy is strongly correlated with the O/Si ratio. Oxygen-rich clusters tend to have larger dissociation energies, as well as larger HOMO−LUMO gaps. Our calculations also show that SiO is the most abundant species in the fragmentation products

    Humans decompose tasks by trading off utility and computational cost

    Full text link
    Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions. How are these decompositions created and used? Here, we propose and evaluate a normative framework for task decomposition based on the simple idea that people decompose tasks to reduce the overall cost of planning while maintaining task performance. Analyzing 11,117 distinct graph-structured planning tasks, we find that our framework justifies several existing heuristics for task decomposition and makes predictions that can be distinguished from two alternative normative accounts. We report a behavioral study of task decomposition (N=806N=806) that uses 30 randomly sampled graphs, a larger and more diverse set than that of any previous behavioral study on this topic. We find that human responses are more consistent with our framework for task decomposition than alternative normative accounts and are most consistent with a heuristic -- betweenness centrality -- that is justified by our approach. Taken together, our results provide new theoretical insight into the computational principles underlying the intelligent structuring of goal-directed behavior

    Learning Rewards from Linguistic Feedback

    Full text link
    We explore unconstrained natural language feedback as a learning signal for artificial agents. Humans use rich and varied language to teach, yet most prior work on interactive learning from language assumes a particular form of input (e.g., commands). We propose a general framework which does not make this assumption, using aspect-based sentiment analysis to decompose feedback into sentiment about the features of a Markov decision process. We then perform an analogue of inverse reinforcement learning, regressing the sentiment on the features to infer the teacher's latent reward function. To evaluate our approach, we first collect a corpus of teaching behavior in a cooperative task where both teacher and learner are human. We implement three artificial learners: sentiment-based "literal" and "pragmatic" models, and an inference network trained end-to-end to predict latent rewards. We then repeat our initial experiment and pair them with human teachers. All three successfully learn from interactive human feedback. The sentiment models outperform the inference network, with the "pragmatic" model approaching human performance. Our work thus provides insight into the information structure of naturalistic linguistic feedback as well as methods to leverage it for reinforcement learning.Comment: 9 pages, 4 figures. AAAI '2

    Using Machine Teaching to Investigate Human Assumptions when Teaching Reinforcement Learners

    Full text link
    Successful teaching requires an assumption of how the learner learns - how the learner uses experiences from the world to update their internal states. We investigate what expectations people have about a learner when they teach them in an online manner using rewards and punishment. We focus on a common reinforcement learning method, Q-learning, and examine what assumptions people have using a behavioral experiment. To do so, we first establish a normative standard, by formulating the problem as a machine teaching optimization problem. To solve the machine teaching optimization problem, we use a deep learning approximation method which simulates learners in the environment and learns to predict how feedback affects the learner's internal states. What do people assume about a learner's learning and discount rates when they teach them an idealized exploration-exploitation task? In a behavioral experiment, we find that people can teach the task to Q-learners in a relatively efficient and effective manner when the learner uses a small value for its discounting rate and a large value for its learning rate. However, they still are suboptimal. We also find that providing people with real-time updates of how possible feedback would affect the Q-learner's internal states weakly helps them teach. Our results reveal how people teach using evaluative feedback and provide guidance for how engineers should design machine agents in a manner that is intuitive for people.Comment: 21 pages, 4 figure

    Exploring the hierarchical structure of human plans via program generation

    Full text link
    Human behavior is inherently hierarchical, resulting from the decomposition of a task into subtasks or an abstract action into concrete actions. However, behavior is typically measured as a sequence of actions, which makes it difficult to infer its hierarchical structure. In this paper, we explore how people form hierarchically-structured plans, using an experimental paradigm that makes hierarchical representations observable: participants create programs that produce sequences of actions in a language with explicit hierarchical structure. This task lets us test two well-established principles of human behavior: utility maximization (i.e. using fewer actions) and minimum description length (MDL; i.e. having a shorter program). We find that humans are sensitive to both metrics, but that both accounts fail to predict a qualitative feature of human-created programs, namely that people prefer programs with reuse over and above the predictions of MDL. We formalize this preference for reuse by extending the MDL account into a generative model over programs, modeling hierarchy choice as the induction of a grammar over actions. Our account can explain the preference for reuse and provides the best prediction of human behavior, going beyond simple accounts of compressibility to highlight a principle that guides hierarchical planning
    corecore