7,131 research outputs found

    Risk-sensitive Inverse Reinforcement Learning via Semi- and Non-Parametric Methods

    Full text link
    The literature on Inverse Reinforcement Learning (IRL) typically assumes that humans take actions in order to minimize the expected value of a cost function, i.e., that humans are risk neutral. Yet, in practice, humans are often far from being risk neutral. To fill this gap, the objective of this paper is to devise a framework for risk-sensitive IRL in order to explicitly account for a human's risk sensitivity. To this end, we propose a flexible class of models based on coherent risk measures, which allow us to capture an entire spectrum of risk preferences from risk-neutral to worst-case. We propose efficient non-parametric algorithms based on linear programming and semi-parametric algorithms based on maximum likelihood for inferring a human's underlying risk measure and cost function for a rich class of static and dynamic decision-making settings. The resulting approach is demonstrated on a simulated driving game with ten human participants. Our method is able to infer and mimic a wide range of qualitatively different driving styles from highly risk-averse to risk-neutral in a data-efficient manner. Moreover, comparisons of the Risk-Sensitive (RS) IRL approach with a risk-neutral model show that the RS-IRL framework more accurately captures observed participant behavior both qualitatively and quantitatively, especially in scenarios where catastrophic outcomes such as collisions can occur.Comment: Submitted to International Journal of Robotics Research; Revision 1: (i) Clarified minor technical points; (ii) Revised proof for Theorem 3 to hold under weaker assumptions; (iii) Added additional figures and expanded discussions to improve readabilit

    Disentangled Variational Auto-Encoder for Semi-supervised Learning

    Full text link
    Semi-supervised learning is attracting increasing attention due to the fact that datasets of many domains lack enough labeled data. Variational Auto-Encoder (VAE), in particular, has demonstrated the benefits of semi-supervised learning. The majority of existing semi-supervised VAEs utilize a classifier to exploit label information, where the parameters of the classifier are introduced to the VAE. Given the limited labeled data, learning the parameters for the classifiers may not be an optimal solution for exploiting label information. Therefore, in this paper, we develop a novel approach for semi-supervised VAE without classifier. Specifically, we propose a new model called Semi-supervised Disentangled VAE (SDVAE), which encodes the input data into disentangled representation and non-interpretable representation, then the category information is directly utilized to regularize the disentangled representation via the equality constraint. To further enhance the feature learning ability of the proposed VAE, we incorporate reinforcement learning to relieve the lack of data. The dynamic framework is capable of dealing with both image and text data with its corresponding encoder and decoder networks. Extensive experiments on image and text datasets demonstrate the effectiveness of the proposed framework.Comment: 6 figures, 10 pages, Information Sciences 201

    Q-learning: flexible learning about useful utilities

    Get PDF
    Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes

    Human-Machine Collaborative Optimization via Apprenticeship Scheduling

    Full text link
    Coordinating agents to complete a set of tasks with intercoupled temporal and resource constraints is computationally challenging, yet human domain experts can solve these difficult scheduling problems using paradigms learned through years of apprenticeship. A process for manually codifying this domain knowledge within a computational framework is necessary to scale beyond the ``single-expert, single-trainee" apprenticeship model. However, human domain experts often have difficulty describing their decision-making processes, causing the codification of this knowledge to become laborious. We propose a new approach for capturing domain-expert heuristics through a pairwise ranking formulation. Our approach is model-free and does not require enumerating or iterating through a large state space. We empirically demonstrate that this approach accurately learns multifaceted heuristics on a synthetic data set incorporating job-shop scheduling and vehicle routing problems, as well as on two real-world data sets consisting of demonstrations of experts solving a weapon-to-target assignment problem and a hospital resource allocation problem. We also demonstrate that policies learned from human scheduling demonstration via apprenticeship learning can substantially improve the efficiency of a branch-and-bound search for an optimal schedule. We employ this human-machine collaborative optimization technique on a variant of the weapon-to-target assignment problem. We demonstrate that this technique generates solutions substantially superior to those produced by human domain experts at a rate up to 9.5 times faster than an optimization approach and can be applied to optimally solve problems twice as complex as those solved by a human demonstrator.Comment: Portions of this paper were published in the Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper consists of 50 pages with 11 figures and 4 table

    New Statistical Learning Methods for Evaluating Dynamic Treatment Regimes and Optimal Dosing

    Full text link
    Dynamic treatment regimes (DTRs) have gained increasing interest in the field of personalized health care in the last two decades, as they provide a sequence of individualized decision rules for treating patients over time. In a DTR, treatment is adapted in response to the changes in an individual's disease progression and health care history. However, specific challenges emerge when applying the current methods of DTR in practice. For example, a treatment decision often happens after a medical test, and is thus nested within the decision of whether a test is needed or not. Such nested test-and-treat strategies are attractive to improve cost-effectiveness. In the first project of this dissertation, we develop a Step-adjusted Tree-based Learning (SAT-Learning) method to estimate the optimal DTR within such a step-nested multiple-stage multiple-treatment dynamic decision framework using test-and-treat observational data. At each step within each stage, we combine a doubly robust semiparametric estimator via Augmented Inverse Probability Weighting with a tree-based reinforcement learning procedure to achieve the counterfactual optimization. SAT-Learning is robust and easy to interpret for the strategies of disease screening and subsequent treatments when necessary. We applied our method to a Johns Hopkins University prostate cancer active surveillance dataset to evaluate the necessity of prostate biopsy and identify the optimal test-and-treatment regimes for prostate cancer patients. Our second project is motivated by scenarios in medical practice where one need to decide on patients radiation or drug doses over time. Due to the complexity of continuous dose scales, few existing studies have extended their methods of multi-treatment decision making to a method to estimate the optimal DTR with continuous doses. We develop a new method, Kernel-Involved-Dosage-Decision learning (KIDD-Learning), which combines a kernel estimation of the dose-response function with a tree-based dose-search algorithm, in a multiple-stage setting. At each stage, KIDD-Learning recursively estimates a personalized dose-response function using kernel regression and then identifies the interpretable optimal dosage regime by growing an interpretable decision tree. The application of KIDD-Learning is illustrated by evaluating the dynamic dosage regimes of the adaptive radiation therapy using a Michigan Medicine liver cancer dataset. In KIDD-Learning, our algorithm splits each node of a tree-based decision rule from the root node to terminal nodes. This heuristic algorithm may fail to identify the optimal decision rule when there are critical tailoring variables hidden from an imperceptible parent node. Therefore, in the third project, we propose an important modification of KIDD-Learning, Stochastic Spline-Involved Tree Search (SSITS), to estimate a more robust optimal dosage regime. This new method uses a simulated annealing algorithm to stochastically search the space of tree-based decision rules. In each visited decision rule, a non-parametric smooth coefficient model is applied to estimate the dose-response function. We further implement backward induction to estimate the optimal regime from the final stage in a reverse sequential order to previous treatment stages. We apply SSITS to determine the optimal dosing strategy for patients treated with Warfarin using data from the International Warfarin Pharmacogenetics Consortium.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163090/1/mingtang_1.pd
    • …
    corecore