86 research outputs found

    A robust test for the stationarity assumption in sequential decision making

    Get PDF
    Reinforcement learning (RL) is a powerful technique that allows an autonomous agent to learn an optimal policy to maximize the expected return. The optimality of various RL algorithms relies on the stationarity assumption, which requires time-invariant state transition and reward functions. However, deviations from stationarity over extended periods often occur in real-world applications like robotics control, health care and digital marketing, resulting in suboptimal policies learned under stationary assumptions. In this paper, we propose a model-based doubly robust procedure for testing the stationarity assumption and detecting change points in offline RL settings with certain degree of homogeneity. Our proposed testing procedure is robust to model misspecifications and can effectively control type-I error while achieving high statistical power, especially in high-dimensional settings. Extensive comparative simulations and a real-world interventional mobile health example illustrate the advantages of our method in detecting change points and optimizing long-term rewards in high-dimensional, non-stationary environments

    On testing conditional qualitative treatment effects

    Get PDF
    Precision medicine is an emerging medical paradigm that focuses on finding the most effective treatment strategy tailored for individual patients. In the literature, most of the existing works focused on estimating the optimal treatment regime. However, there has been less attention devoted to hypothesis testing regarding the optimal treatment regime. In this paper, we first introduce the notion of conditional qualitative treatment effects (CQTE) of a set of variables given another set of variables and provide a class of equivalent representations for the null hypothesis of no CQTE. The proposed definition of CQTE does not assume any parametric form for the optimal treatment rule and plays an important role for assessing the incremental value of a set of new variables in optimal treatment decision making conditional on an existing set of prescriptive variables. We then propose novel testing procedures for no CQTE based on kernel estimation of the conditional contrast functions. We show that our test statistics have asymptotically correct size and nonnegligible power against some nonstandard local alternatives. The empirical performance of the proposed tests are evaluated by simulations and an application to an AIDS data set

    A Review of Off-Policy Evaluation in Reinforcement Learning

    Full text link
    Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.Comment: Still under revisio

    Concordance and value information criteria for optimal treatment decision

    Get PDF
    Personalized medicine is a medical procedure that receives considerable scientific and commercial attention. The goal of personalized medicine is to assign the optimal treatment regime for each individual patient, according to his/her personal prognostic information. When there are a large number of pretreatment variables, it is crucial to identify those important variables that are necessary for treatment decision making. In this paper, we study two information criteria: the concordance and value information criteria, for variable selection in optimal treatment decision making. We consider both fixedp and high dimensional settings, and show our information criteria are consistent in model/tuning parameter selection. We further apply our information criteria to four estimation approaches, including robust learning, concordance-assisted learning, penalized A-learning, and sparse concordance-assisted learning, and demonstrate the empirical performance of our methods by simulations

    Sure Screening for Transelliptical Graphical Models

    Full text link
    We propose a sure screening approach for recovering the structure of a transelliptical graphical model in the high dimensional setting. We estimate the partial correlation graph by thresholding the elements of an estimator of the sample correlation matrix obtained using Kendall's tau statistic. Under a simple assumption on the relationship between the correlation and partial correlation graphs, we show that with high probability, the estimated edge set contains the true edge set, and the size of the estimated edge set is controlled. We develop a threshold value that allows for control of the expected false positive rate. In simulation and on an equities data set, we show that transelliptical graphical sure screening performs quite competitively with more computationally demanding techniques for graph estimation.Comment: The paper won the David Byar travel award in the Joint Statistical Meetings (JSM) 201

    Blessing from Experts: Super Reinforcement Learning in Confounded Environments

    Full text link
    We introduce super reinforcement learning in the batch setting, which takes the observed action as input for enhanced policy learning. In the presence of unmeasured confounders, the recommendations from human experts recorded in the observed data allow us to recover certain unobserved information. Including this information in the policy search, the proposed super reinforcement learning will yield a super-policy that is guaranteed to outperform both the standard optimal policy and the behavior one (e.g., the expert's recommendation). Furthermore, to address the issue of unmeasured confounding in finding super-policies, a number of non-parametric identification results are established. Finally, we develop two super-policy learning algorithms and derive their corresponding finite-sample regret guarantees

    Breaking the curse of nonregularity with subagging: inference of the mean outcome under optimal treatment regimes

    Get PDF
    Precision medicine is an emerging medical approach that allows physicians to select the treatment options based on individual patient information. The goal of precision medicine is to identify the optimal treatment regime (OTR) that yields the most favorable clinical outcome. Prior to adopting any OTR in clinical practice, it is crucial to know the impact of implementing such a policy. Although considerable research has been devoted to estimating the OTR in the literature, less attention has been paid to statistical inference of the OTR. Challenges arise in the nonregular cases where the OTR is not uniquely defined. To deal with nonregularity, we develop a novel inference method for the mean outcome under an OTR (the optimal value function) based on subsample aggregating (subagging). The proposed method can be applied to multi-stage studies where treatments are sequentially assigned over time. Bootstrap aggregating (bagging) and subagging have been recognized as effective variance reduction techniques to improve unstable estimators or classifiers (Buhlmann and Yu, 2002). However, it remains unknown whether these approaches can yield valid inference results. We show the proposed confidence interval (CI) for the optimal value function achieves nominal coverage. In addition, due to the variance reduction effect of subagging, our method enjoys certain statistical optimality. Specifically, we show that the mean squared error of the proposed value estimator is strictly smaller than that based on the simple sample-splitting estimator in the nonregular cases. Moreover, under certain conditions, the length of our proposed CI is shown to be on average shorter than CIs constructed based on the existing state-of-the-art method (Luedtke and van der Laan, 2016) and the \oracle"method which works as well as if an OTR were known. Extensive numerical studies are conducted to back up our theoretical findings

    Conformal off-policy prediction

    Get PDF
    Off-policy evaluation is critical in a number of applications where new policies need to be evaluated offline before online deployment. Most existing methods focus on the expected return, define the target parameter through averaging and provide a point estimator only. In this paper, we develop a novel procedure to produce reliable interval estimators for a target policy’s return starting from any initial state. Our proposal accounts for the variability of the return around its expectation, focuses on the individual effect and offers valid uncertainty quantification. Our main idea lies in designing a pseudo policy that generates subsamples as if they were sampled from the target policy so that existing conformal prediction algorithms are applicable to prediction interval construction. Our methods are justified by theories, synthetic data and real data from short-video platforms

    A sparse random projection-based test for overall qualitative treatment effects

    Get PDF
    In contrast to the classical “one-size-fits-all” approach, precision medicine proposes the customization of individualized treatment regimes to account for patients’ heterogeneity in response to treatments. Most of existing works in the literature focused on estimating optimal individualized treatment regimes. However, there has been less attention devoted to hypothesis testing regarding the existence of overall qualitative treatment effects, especially when there are a large number of prognostic covariates. When covariates do not have qualitative treatment effects, the optimal treatment regime will assign the same treatment to all patients regardless of their covariate values. In this article, we consider testing the overall qualitative treatment effects of patients’ prognostic covariates in a high-dimensional setting. We propose a sample splitting method to construct the test statistic, based on a nonparametric estimator of the contrast function. When the dimension of covariates is large, we construct the test based on sparse random projections of covariates into a low-dimensional space. We prove the consistency of our test statistic. In the regular cases, we show the asymptotic power function of our test statistic is asymptotically the same as the “oracle” test statistic which is constructed based on the “optimal” projection matrix. Simulation studies and real data applications validate our theoretical findings. Supplementary materials for this article are available online

    A massive data framework for M-estimators with cubic-rate

    Get PDF
    The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing the asymptotic distribution of the aggregated M-estimators using a weighted average with weights depending on the subgroup sample sizes. Under certain condition on the growing rate of the number of subgroups, the resulting aggregated estimators are shown to have faster convergence rate and asymptotic normal distribution, which are more tractable in both computation and inference than the original M-estimators based on pooled data. Our theory applies to a wide class of M-estimators with cube root convergence rate, including the location estimator, maximum score estimator, and value search estimator. Empirical performance via simulations and a real data application also validate our theoretical findings. Supplementary materials for this article are available online
    • …
    corecore