32,145 research outputs found
Regression Trees for Longitudinal Data
While studying response trajectory, often the population of interest may be
diverse enough to exist distinct subgroups within it and the longitudinal
change in response may not be uniform in these subgroups. That is, the
timeslope and/or influence of covariates in longitudinal profile may vary among
these different subgroups. For example, Raudenbush (2001) used depression as an
example to argue that it is incorrect to assume that all the people in a given
population would be experiencing either increasing or decreasing levels of
depression. In such cases, traditional linear mixed effects model (assuming
common parametric form for covariates and time) is not directly applicable for
the entire population as a group-averaged trajectory can mask important
subgroup differences. Our aim is to identify and characterize longitudinally
homogeneous subgroups based on the combination of baseline covariates in the
most parsimonious way. This goal can be achieved via constructing regression
tree for longitudinal data using baseline covariates as partitioning variables.
We have proposed LongCART algorithm to construct regression tree for the
longitudinal data. In each node, the proposed LongCART algorithm determines the
need for further splitting (i.e. whether parameter(s) of longitudinal profile
is influenced by any baseline attributes) via parameter instability tests and
thus the decision of further splitting is type-I error controlled. We have
obtained the asymptotic results for the proposed instability test and examined
finite sample behavior of the whole algorithm through simulation studies.
Finally, we have applied the LongCART algorithm to study the longitudinal
changes in choline level among HIV patients
Particle Gibbs for Bayesian Additive Regression Trees
Additive regression trees are flexible non-parametric models and popular
off-the-shelf tools for real-world non-linear regression. In application
domains, such as bioinformatics, where there is also demand for probabilistic
predictions with measures of uncertainty, the Bayesian additive regression
trees (BART) model, introduced by Chipman et al. (2010), is increasingly
popular. As data sets have grown in size, however, the standard
Metropolis-Hastings algorithms used to perform inference in BART are proving
inadequate. In particular, these Markov chains make local changes to the trees
and suffer from slow mixing when the data are high-dimensional or the best
fitting trees are more than a few layers deep. We present a novel sampler for
BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a
top-down particle filtering algorithm for Bayesian decision trees
(Lakshminarayanan et al., 2013). Rather than making local changes to individual
trees, the PG sampler proposes a complete tree to fit the residual. Experiments
show that the PG sampler outperforms existing samplers in many settings
Differential Performance Debugging with Discriminant Regression Trees
Differential performance debugging is a technique to find performance
problems. It applies in situations where the performance of a program is
(unexpectedly) different for different classes of inputs. The task is to
explain the differences in asymptotic performance among various input classes
in terms of program internals. We propose a data-driven technique based on
discriminant regression tree (DRT) learning problem where the goal is to
discriminate among different classes of inputs. We propose a new algorithm for
DRT learning that first clusters the data into functional clusters, capturing
different asymptotic performance classes, and then invokes off-the-shelf
decision tree learning algorithms to explain these clusters. We focus on linear
functional clusters and adapt classical clustering algorithms (K-means and
spectral) to produce them. For the K-means algorithm, we generalize the notion
of the cluster centroid from a point to a linear function. We adapt spectral
clustering by defining a novel kernel function to capture the notion of linear
similarity between two data points. We evaluate our approach on benchmarks
consisting of Java programs where we are interested in debugging performance.
We show that our algorithm significantly outperforms other well-known
regression tree learning algorithms in terms of running time and accuracy of
classification.Comment: To Appear in AAAI 201
- …