276 research outputs found

    Efficient regularized isotonic regression with application to gene--gene interaction search

    Full text link
    Isotonic regression is a nonparametric approach for fitting monotonic models to data that has been widely studied from both theoretical and practical perspectives. However, this approach encounters computational and statistical overfitting issues in higher dimensions. To address both concerns, we present an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic regression based on recursively partitioning the covariate space through solution of progressively smaller "best cut" subproblems. This creates a regularized sequence of isotonic models of increasing model complexity that converges to the global isotonic regression solution. The models along the sequence are often more accurate than the unregularized isotonic regression model because of the complexity control they offer. We quantify this complexity control through estimation of degrees of freedom along the path. Success of the regularized models in prediction and IRPs favorable computational properties are demonstrated through a series of simulated and real data experiments. We discuss application of IRP to the problem of searching for gene--gene interactions and epistasis, and demonstrate it on data from genome-wide association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Extension of CART using multiple splits under order restrictions

    Get PDF
    CART was introduced by Breiman et al. (1984) as a classification tool. It divides the whole sample recursively in two subpopulations by finding the best possible split with respect to a optimisation criterion. This method, restricted up to date to binary splits, is extended in this paper for allowing also multiple splits. The main problem with this extension is related to the optimal number of splits and the location of the corresponding cutpoints. In order to reduce the computational effort and enhance parsimony, the reduced isotonic regression was used in order to solve this problem. The extended CART method was tested in a simulation study and was compared with the classical approach in an epidemiological study. In both studies the extended CART turned out to be a useful and reliable alternative

    Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint

    Get PDF
    In certain genetic studies, clinicians and genetic counselors are interested in estimating the cumulative risk of a disease for individuals with and without a rare deleterious mutation. Estimating the cumulative risk is difficult, however, when the estimates are based on family history data. Often, the genetic mutation status in many family members is unknown; instead, only estimated probabilities of a patient having a certain mutation status are available. Also, ages of disease-onset are subject to right censoring. Existing methods to estimate the cumulative risk using such family-based data only provide estimation at individual time points, and are not guaranteed to be monotonic or nonnegative. In this paper, we develop a novel method that combines Expectation-Maximization and isotonic regression to estimate the cumulative risk across the entire support. Our estimator is monotonic, satisfies self-consistent estimating equations and has high power in detecting differences between the cumulative risks of different populations. Application of our estimator to a Parkinson's disease (PD) study provides the age-at-onset distribution of PD in PARK2 mutation carriers and noncarriers, and reveals a significant difference between the distribution in compound heterozygous carriers compared to noncarriers, but not between heterozygous carriers and noncarriers.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS730 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Market Segmentation Trees

    Full text link
    We seek to provide an interpretable framework for segmenting users in a population for personalized decision-making. The standard approach is to perform market segmentation by clustering users according to similarities in their contextual features, after which a "response model" is fit to each segment to model how users respond to personalized decisions. However, this methodology is not ideal for personalization, since two users could in theory have similar features but different response behaviors. We propose a general methodology, Market Segmentation Trees (MSTs), for learning interpretable market segmentations explicitly driven by identifying differences in user response patterns. To demonstrate the versatility of our methodology, we design two new, specialized MST algorithms: (i) Choice Model Trees (CMTs) which can be used to predict a user's choice amongst multiple options, and (ii) Isotonic Regression Trees (IRTs) which can be used to solve the bid landscape forecasting problem. We provide a customizable, open-source code base for training MSTs in Python which employs several strategies for scalability, including parallel processing and warm starts. We provide a theoretical analysis of the asymptotic running time of our training method validating its computational tractability on large datasets. We assess the practical performance of MSTs on several synthetic and real world datasets, showing our method reliably finds market segmentations which accurately model response behavior. Further, when applying MSTs to historical bidding data from a leading demand-side platform (DSP), we show that MSTs consistently achieve a 5-29% improvement in bid landscape forecasting accuracy over the DSP's current model. Our findings indicate that integrating market segmentation with response modeling consistently leads to improvements in response prediction accuracy, thereby aiding personalization
    corecore