446,037 research outputs found

    A Multi-objective Exploratory Procedure for Regression Model Selection

    Full text link
    Variable selection is recognized as one of the most critical steps in statistical modeling. The problems encountered in engineering and social sciences are commonly characterized by over-abundance of explanatory variables, non-linearities and unknown interdependencies between the regressors. An added difficulty is that the analysts may have little or no prior knowledge on the relative importance of the variables. To provide a robust method for model selection, this paper introduces the Multi-objective Genetic Algorithm for Variable Selection (MOGA-VS) that provides the user with an optimal set of regression models for a given data-set. The algorithm considers the regression problem as a two objective task, and explores the Pareto-optimal (best subset) models by preferring those models over the other which have less number of regression coefficients and better goodness of fit. The model exploration can be performed based on in-sample or generalization error minimization. The model selection is proposed to be performed in two steps. First, we generate the frontier of Pareto-optimal regression models by eliminating the dominated models without any user intervention. Second, a decision making process is executed which allows the user to choose the most preferred model using visualisations and simple metrics. The method has been evaluated on a recently published real dataset on Communities and Crime within United States.Comment: in Journal of Computational and Graphical Statistics, Vol. 24, Iss. 1, 201

    Variable Selection for Estimating the Optimal Treatment Regimes in the Presence of a Large Number of Covariate

    Get PDF
    Most of existing methods for optimal treatment regimes, with few exceptions, focus on estimation and are not designed for variable selection with the objective of optimizing treatment decisions. In clinical trials and observational studies, often numerous baseline variables are collected and variable selection is essential for deriving reliable optimal treatment regimes. Although many variable selection methods exist, they mostly focus on selecting variables that are important for prediction (predictive variables) instead of variables that have a qualitative interaction with treatment (prescriptive variables) and hence are important for making treatment decisions. We propose a variable selection method within a general classification framework to select prescriptive variables and estimate the optimal treatment regime simultaneously. In this framework, an optimal treatment regime is equivalently defined as the one that minimizes a weighted misclassification error rate and the proposed method forward sequentially select prescriptive variables by minimizing this weighted misclassification error. A main advantage of this method is that it specifically targets selection of prescriptive variables and in the meantime is able to exploit predictive variables to improve performance. The method can be applied to both single- and multiple- decision point setting. The performance of the proposed method is evaluated by simulation studies and application to an clinical trial

    Bandit Problems with Side Observations

    Full text link
    An extension of the traditional two-armed bandit problem is considered, in which the decision maker has access to some side information before deciding which arm to pull. At each time t, before making a selection, the decision maker is able to observe a random variable X_t that provides some information on the rewards to be obtained. The focus is on finding uniformly good rules (that minimize the growth rate of the inferior sampling time) and on quantifying how much the additional information helps. Various settings are considered and for each setting, lower bounds on the achievable inferior sampling time are developed and asymptotically optimal adaptive schemes achieving these lower bounds are constructed.Comment: 16 pages, 3 figures. To be published in the IEEE Transactions on Automatic Contro

    Concordance and value information criteria for optimal treatment decision

    Get PDF
    Personalized medicine is a medical procedure that receives considerable scientific and commercial attention. The goal of personalized medicine is to assign the optimal treatment regime for each individual patient, according to his/her personal prognostic information. When there are a large number of pretreatment variables, it is crucial to identify those important variables that are necessary for treatment decision making. In this paper, we study two information criteria: the concordance and value information criteria, for variable selection in optimal treatment decision making. We consider both fixedp and high dimensional settings, and show our information criteria are consistent in model/tuning parameter selection. We further apply our information criteria to four estimation approaches, including robust learning, concordance-assisted learning, penalized A-learning, and sparse concordance-assisted learning, and demonstrate the empirical performance of our methods by simulations

    Optimal randomized and non-randomized procedures for multinomial selection problems

    Get PDF
    Multinomial selection problem procedures are ranking and selection techniques that aim to select the best (most probable) alternative based upon a sequence of multinomial observations. The classical formulation of the procedure design problem is to find a decision rule for terminating sampling. The decision rule should minimize the expected number of observations taken while achieving a specified indifference zone requirement on the prior probability of making a correct selection when the alternative configurations are in a particular subset of the probability space called the preference zone. We study the constrained version of the design problem in which there is a given maximum number of allowed observations. Numerous procedures have been proposed over the past 50 years, all of them suboptimal. In this thesis, we find via linear programming the optimal selection procedure for any given probability configuration. The optimal procedure turns out to be necessarily randomized in many cases. We also find via mixed integer programming the optimal non-randomized procedure. We demonstrate the performance of the methodology on a number of examples. We then reformulate the mathematical programs to make them more efficient to implement, thereby significantly expanding the range of computationally feasible problems. We prove that there exists an optimal policy which has at most one randomized decision point and we develop a procedure for finding such a policy. We also extend our formulation to replicate existing procedures. Next, we show that there is very little difference between the relative performances of the optimal randomized and non-randomized procedures. Additionally, we compare existing procedures using the optimal procedure as a benchmark, and produce updated tables for a number of those procedures. Then, we develop a methodology that guarantees the optimal randomized and non-randomized procedures for a broad class of variable observation cost functions -- the first of its kind. We examine procedure performance under a variety of cost functions, demonstrating that incorrect assumptions regarding marginal observation costs may lead to increased total costs. Finally, we investigate and challenge key assumptions concerning the indifference zone parameter and the conditional probability of correct selection, revealing some interesting implications.PhDCommittee Co-Chair: Goldsman, David; Committee Co-Chair: Tovey, Craig; Committee Member: Alexopoulos, Christos; Committee Member: Kleywegt, Anton; Committee Member: Sanchez, Susa

    Variable selection for optimal treatment decision

    Get PDF
    In decision-making on optimal treatment strategies, it is of great importance to identify variables that are involved in the decision rule, i.e. those interacting with the treatment. Effective variable selection helps to improve the prediction accuracy and enhance the interpretability of the decision rule. We propose a new penalized regression framework which can simultaneously estimate the optimal treatment strategy and identify important variables. The advantages of the new approach include: (i) it does not require the estimation of the baseline mean function of the response, which greatly improves the robustness of the estimator; (ii) the convenient loss-based framework makes it easier to adopt shrinkage methods for variable selection, which greatly facilitates implementation and statistical inferences for the estimator. The new procedure can be easily implemented by existing state-of-art software packages like LARS. Theoretical properties of the new estimator are studied. Its empirical performance is evaluated using simulation studies and further illustrated with an application to an AIDS clinical trial

    High-dimensional A-learning for optimal dynamic treatment regimes

    Get PDF
    Precision medicine is a medical paradigm that focuses on finding the most effective treatment decision based on individual patient information. For many complex diseases, such as cancer, treatment decisions need to be tailored over time according to patients' responses to previous treatments. Such an adaptive strategy is referred as a dynamic treatment regime. A major challenge in deriving an optimal dynamic treatment regime arises when an extraordinary large number of prognostic factors, such as patient's genetic information, demographic characteristics, medical history and clinical measurements over time are available, but not all of them are necessary for making treatment decision. This makes variable selection an emerging need in precision medicine. In this paper, we propose a penalized multi-stage A-learning for deriving the optimal dynamic treatment regime when the number of covariates is of the nonpolynomial (NP) order of the sample size. To preserve the double robustness property of the A-learning method, we adopt the Dantzig selector, which directly penalizes the A-leaning estimating equations. Oracle inequalities of the proposed estimators for the parameters in the optimal dynamic treatment regime and error bounds on the difference between the value functions of the estimated optimal dynamic treatment regime and the true optimal dynamic treatment regime are established. Empirical performance of the proposed approach is evaluated by simulations and illustrated with an application to data from the STAR∗D study

    Optimal Design of Experiments for Dual-Response Systems

    Get PDF
    abstract: The majority of research in experimental design has, to date, been focused on designs when there is only one type of response variable under consideration. In a decision-making process, however, relying on only one objective or criterion can lead to oversimplified, sub-optimal decisions that ignore important considerations. Incorporating multiple, and likely competing, objectives is critical during the decision-making process in order to balance the tradeoffs of all potential solutions. Consequently, the problem of constructing a design for an experiment when multiple types of responses are of interest does not have a clear answer, particularly when the response variables have different distributions. Responses with different distributions have different requirements of the design. Computer-generated optimal designs are popular design choices for less standard scenarios where classical designs are not ideal. This work presents a new approach to experimental designs for dual-response systems. The normal, binomial, and Poisson distributions are considered for the potential responses. Using the D-criterion for the linear model and the Bayesian D-criterion for the nonlinear models, a weighted criterion is implemented in a coordinate-exchange algorithm. The designs are evaluated and compared across different weights. The sensitivity of the designs to the priors supplied in the Bayesian D-criterion is explored in the third chapter of this work. The final section of this work presents a method for a decision-making process involving multiple objectives. There are situations where a decision-maker is interested in several optimal solutions, not just one. These types of decision processes fall into one of two scenarios: 1) wanting to identify the best N solutions to accomplish a goal or specific task, or 2) evaluating a decision based on several primary quantitative objectives along with secondary qualitative priorities. Design of experiment selection often involves the second scenario where the goal is to identify several contending solutions using the primary quantitative objectives, and then use the secondary qualitative objectives to guide the final decision. Layered Pareto Fronts can help identify a richer class of contenders to examine more closely. The method is illustrated with a supersaturated screening design example.Dissertation/ThesisDoctoral Dissertation Industrial Engineering 201
    • …
    corecore