7 research outputs found

    Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

    Get PDF
    Acting in domains where an agent must plan several steps ahead to achieve a goal can be a challenging task, especially if the agentʼs sensors provide only noisy or partial information. In this setting, Partially Observable Markov Decision Processes (POMDPs) provide a planning framework that optimally trades between actions that contribute to the agentʼs knowledge and actions that increase the agentʼs immediate reward. However, the task of specifying the POMDPʼs parameters is often onerous. In particular, setting the immediate rewards to achieve a desired balance between information-gathering and acting is often not intuitive. In this work, we propose an approximation based on minimizing the immediate Bayes risk for choosing actions when transition, observation, and reward models are uncertain. The Bayes-risk criterion avoids the computational intractability of solving a POMDP with a multi-dimensional continuous state space; we show it performs well in a variety of problems. We use policy queries—in which we ask an expert for the correct action—to infer the consequences of a potential pitfall without experiencing its effects. More important for human–robot interaction settings, policy queries allow the agent to learn the reward model without the reward values ever being specified

    A Bayesian nonparametric approach to modeling battery health

    Get PDF
    The batteries of many consumer products are both a substantial portion of the product's cost and commonly a first point of failure. Accurately predicting remaining battery life can lower costs by reducing unnecessary battery replacements. Unfortunately, battery dynamics are extremely complex, and we often lack the domain knowledge required to construct a model by hand. In this work, we take a data-driven approach and aim to learn a model of battery time-to-death from training data. Using a Dirichlet process prior over mixture weights, we learn an infinite mixture model for battery health. The Bayesian aspect of our model helps to avoid over-fitting while the nonparametric nature of the model allows the data to control the size of the model, preventing under-fitting. We demonstrate our model's effectiveness by making time-to-death predictions using real data from nickel-metal hydride battery packs.United States. Army Research Office (Nostra Project STTR W911NF-08-C-0066)iRobo

    Bayesian Nonparametric Methods for Partially-Observable Reinforcement Learning

    Get PDF
    Making intelligent decisions from incomplete information is critical in many applications: for example, robots must choose actions based on imperfect sensors, and speech-based interfaces must infer a user’s needs from noisy microphone inputs. What makes these tasks hard is that often we do not have a natural representation with which to model the domain and use for choosing actions; we must learn about the domain’s properties while simultaneously performing the task. Learning a representation also involves trade-offs between modeling the data that we have seen previously and being able to make predictions about new data. This article explores learning representations of stochastic systems using Bayesian nonparametric statistics. Bayesian nonparametric methods allow the sophistication of a representation to scale gracefully with the complexity in the data. Our main contribution is a careful empirical evaluation of how representations learned using Bayesian nonparametric methods compare to other standard learning approaches, especially in support of planning and control. We show that the Bayesian aspects of the methods result in achieving state-of-the-art performance in decision making with relatively few samples, while the nonparametric aspects often result in fewer computations. These results hold across a variety of different techniques for choosing actions given a representation

    Nonparametric Bayesian Policy Priors for Reinforcement Learning

    Get PDF
    We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning

    Infinite dynamic bayesian networks

    Get PDF
    We present the infinite dynamic Bayesian network model (iDBN), a nonparametric, factored state-space model that generalizes dynamic Bayesian networks (DBNs). The iDBN can infer every aspect of a DBN: the number of hidden factors, the number of values each factor can take, and (arbitrarily complex) connections and conditionals between factors and observations. In this way, the iDBN generalizes other nonparametric state space models, which until now generally focused on binary hidden nodes and more restricted connection structures. We show how this new prior allows us to find interesting structure in benchmark tests and on two realworld datasets involving weather data and neural information flow networks.Massachusetts Institute of Technology (Hugh Hampton Young Memorial Fund Fellowship)United States. Air Force Office of Scientific Research (AFOSR FA9550-07-1-0075

    A Bayesian Nonparametric Approach to Modeling Motion Patterns

    Get PDF
    The most difficult—and often most essential— aspect of many interception and tracking tasks is constructing motion models of the targets to be found. Experts can often provide only partial information, and fitting parameters for complex motion patterns can require large amounts of training data. Specifying how to parameterize complex motion patterns is in itself a difficult task. In contrast, nonparametric models are very flexible and generalize well with relatively little training data. We propose modeling target motion patterns as a mixture of Gaussian processes (GP) with a Dirichlet process (DP) prior over mixture weights. The GP provides a flexible representation for each individual motion pattern, while the DP assigns observed trajectories to particular motion patterns. Both automatically adjust the complexity of the motion model based on the available data. Our approach outperforms several parametric models on a helicopter-based car-tracking task on data collected from the greater Boston area

    Does the GDPR Help or Hinder Fair Algorithmic Decision-Making?

    No full text
    corecore