15,576 research outputs found

    The Assistive Multi-Armed Bandit

    Full text link
    Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science. However, most work makes the assumption that humans are acting (noisily) optimally with respect to their preferences. Such approaches can fail when people are themselves learning about what they want. In this work, we introduce the assistive multi-armed bandit, where a robot assists a human playing a bandit task to maximize cumulative reward. In this problem, the human does not know the reward function but can learn it through the rewards received from arm pulls; the robot only observes which arms the human pulls but not the reward associated with each pull. We offer sufficient and necessary conditions for successfully assisting the human in this framework. Surprisingly, better human performance in isolation does not necessarily lead to better performance when assisted by the robot: a human policy can do better by effectively communicating its observed rewards to the robot. We conduct proof-of-concept experiments that support these results. We see this work as contributing towards a theory behind algorithms for human-robot interaction.Comment: Accepted to HRI 201

    Identifying Sources and Sinks in the Presence of Multiple Agents with Gaussian Process Vector Calculus

    Full text link
    In systems of multiple agents, identifying the cause of observed agent dynamics is challenging. Often, these agents operate in diverse, non-stationary environments, where models rely on hand-crafted environment-specific features to infer influential regions in the system's surroundings. To overcome the limitations of these inflexible models, we present GP-LAPLACE, a technique for locating sources and sinks from trajectories in time-varying fields. Using Gaussian processes, we jointly infer a spatio-temporal vector field, as well as canonical vector calculus operations on that field. Notably, we do this from only agent trajectories without requiring knowledge of the environment, and also obtain a metric for denoting the significance of inferred causal features in the environment by exploiting our probabilistic method. To evaluate our approach, we apply it to both synthetic and real-world GPS data, demonstrating the applicability of our technique in the presence of multiple agents, as well as its superiority over existing methods.Comment: KDD '18 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Pages 1254-1262, 9 pages, 5 figures, conference submission, University of Oxford. arXiv admin note: text overlap with arXiv:1709.0235

    Learning Task Specifications from Demonstrations

    Full text link
    Real world applications often naturally decompose into several sub-tasks. In many settings (e.g., robotics) demonstrations provide a natural way to specify the sub-tasks. However, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely recombined or limit the types of composition available. Motivated by this deficit, we consider the problem of inferring Boolean non-Markovian rewards (also known as logical trace properties or specifications) from demonstrations provided by an agent operating in an uncertain, stochastic environment. Crucially, specifications admit well-defined composition rules that are typically easy to interpret. In this paper, we formulate the specification inference task as a maximum a posteriori (MAP) probability inference problem, apply the principle of maximum entropy to derive an analytic demonstration likelihood model and give an efficient approach to search for the most likely specification in a large candidate pool of specifications. In our experiments, we demonstrate how learning specifications can help avoid common problems that often arise due to ad-hoc reward composition.Comment: NIPS 201

    ConXsense - Automated Context Classification for Context-Aware Access Control

    Full text link
    We present ConXsense, the first framework for context-aware access control on mobile devices based on context classification. Previous context-aware access control systems often require users to laboriously specify detailed policies or they rely on pre-defined policies not adequately reflecting the true preferences of users. We present the design and implementation of a context-aware framework that uses a probabilistic approach to overcome these deficiencies. The framework utilizes context sensing and machine learning to automatically classify contexts according to their security and privacy-related properties. We apply the framework to two important smartphone-related use cases: protection against device misuse using a dynamic device lock and protection against sensory malware. We ground our analysis on a sociological survey examining the perceptions and concerns of users related to contextual smartphone security and analyze the effectiveness of our approach with real-world context data. We also demonstrate the integration of our framework with the FlaskDroid architecture for fine-grained access control enforcement on the Android platform.Comment: Recipient of the Best Paper Awar
    corecore