276 research outputs found
Efficient regularized isotonic regression with application to gene--gene interaction search
Isotonic regression is a nonparametric approach for fitting monotonic models
to data that has been widely studied from both theoretical and practical
perspectives. However, this approach encounters computational and statistical
overfitting issues in higher dimensions. To address both concerns, we present
an algorithm, which we term Isotonic Recursive Partitioning (IRP), for isotonic
regression based on recursively partitioning the covariate space through
solution of progressively smaller "best cut" subproblems. This creates a
regularized sequence of isotonic models of increasing model complexity that
converges to the global isotonic regression solution. The models along the
sequence are often more accurate than the unregularized isotonic regression
model because of the complexity control they offer. We quantify this complexity
control through estimation of degrees of freedom along the path. Success of the
regularized models in prediction and IRPs favorable computational properties
are demonstrated through a series of simulated and real data experiments. We
discuss application of IRP to the problem of searching for gene--gene
interactions and epistasis, and demonstrate it on data from genome-wide
association studies of three common diseases.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS504 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Extension of CART using multiple splits under order restrictions
CART was introduced by Breiman et al. (1984) as a classification tool. It divides the whole sample recursively in two subpopulations by finding the best possible split with respect to a optimisation criterion. This method, restricted up to date to binary splits, is extended in this paper for allowing also multiple splits. The main problem with this extension is related to the optimal number of splits and the location of the corresponding cutpoints. In order to reduce the computational effort and enhance parsimony, the reduced isotonic regression was used in order to solve this problem. The extended CART method was tested in a simulation study and was compared with the classical approach in an epidemiological study. In both studies the extended CART turned out to be a useful and reliable alternative
Combining isotonic regression and EM algorithm to predict genetic risk under monotonicity constraint
In certain genetic studies, clinicians and genetic counselors are interested
in estimating the cumulative risk of a disease for individuals with and without
a rare deleterious mutation. Estimating the cumulative risk is difficult,
however, when the estimates are based on family history data. Often, the
genetic mutation status in many family members is unknown; instead, only
estimated probabilities of a patient having a certain mutation status are
available. Also, ages of disease-onset are subject to right censoring. Existing
methods to estimate the cumulative risk using such family-based data only
provide estimation at individual time points, and are not guaranteed to be
monotonic or nonnegative. In this paper, we develop a novel method that
combines Expectation-Maximization and isotonic regression to estimate the
cumulative risk across the entire support. Our estimator is monotonic,
satisfies self-consistent estimating equations and has high power in detecting
differences between the cumulative risks of different populations. Application
of our estimator to a Parkinson's disease (PD) study provides the age-at-onset
distribution of PD in PARK2 mutation carriers and noncarriers, and reveals a
significant difference between the distribution in compound heterozygous
carriers compared to noncarriers, but not between heterozygous carriers and
noncarriers.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS730 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Market Segmentation Trees
We seek to provide an interpretable framework for segmenting users in a
population for personalized decision-making. The standard approach is to
perform market segmentation by clustering users according to similarities in
their contextual features, after which a "response model" is fit to each
segment to model how users respond to personalized decisions. However, this
methodology is not ideal for personalization, since two users could in theory
have similar features but different response behaviors. We propose a general
methodology, Market Segmentation Trees (MSTs), for learning interpretable
market segmentations explicitly driven by identifying differences in user
response patterns. To demonstrate the versatility of our methodology, we design
two new, specialized MST algorithms: (i) Choice Model Trees (CMTs) which can be
used to predict a user's choice amongst multiple options, and (ii) Isotonic
Regression Trees (IRTs) which can be used to solve the bid landscape
forecasting problem. We provide a customizable, open-source code base for
training MSTs in Python which employs several strategies for scalability,
including parallel processing and warm starts. We provide a theoretical
analysis of the asymptotic running time of our training method validating its
computational tractability on large datasets. We assess the practical
performance of MSTs on several synthetic and real world datasets, showing our
method reliably finds market segmentations which accurately model response
behavior. Further, when applying MSTs to historical bidding data from a leading
demand-side platform (DSP), we show that MSTs consistently achieve a 5-29%
improvement in bid landscape forecasting accuracy over the DSP's current model.
Our findings indicate that integrating market segmentation with response
modeling consistently leads to improvements in response prediction accuracy,
thereby aiding personalization
- …