12,819 research outputs found
Regression Trees for Longitudinal Data
While studying response trajectory, often the population of interest may be
diverse enough to exist distinct subgroups within it and the longitudinal
change in response may not be uniform in these subgroups. That is, the
timeslope and/or influence of covariates in longitudinal profile may vary among
these different subgroups. For example, Raudenbush (2001) used depression as an
example to argue that it is incorrect to assume that all the people in a given
population would be experiencing either increasing or decreasing levels of
depression. In such cases, traditional linear mixed effects model (assuming
common parametric form for covariates and time) is not directly applicable for
the entire population as a group-averaged trajectory can mask important
subgroup differences. Our aim is to identify and characterize longitudinally
homogeneous subgroups based on the combination of baseline covariates in the
most parsimonious way. This goal can be achieved via constructing regression
tree for longitudinal data using baseline covariates as partitioning variables.
We have proposed LongCART algorithm to construct regression tree for the
longitudinal data. In each node, the proposed LongCART algorithm determines the
need for further splitting (i.e. whether parameter(s) of longitudinal profile
is influenced by any baseline attributes) via parameter instability tests and
thus the decision of further splitting is type-I error controlled. We have
obtained the asymptotic results for the proposed instability test and examined
finite sample behavior of the whole algorithm through simulation studies.
Finally, we have applied the LongCART algorithm to study the longitudinal
changes in choline level among HIV patients
On Theory for BART
Ensemble learning is a statistical paradigm built on the premise that many
weak learners can perform exceptionally well when deployed collectively. The
BART method of Chipman et al. (2010) is a prominent example of Bayesian
ensemble learning, where each learner is a tree. Due to its impressive
performance, BART has received a lot of attention from practitioners. Despite
its wide popularity, however, theoretical studies of BART have begun emerging
only very recently. Laying the foundations for the theoretical analysis of
Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior
concentration under conditionally uniform tree priors. These priors deviate
from the actual priors implemented in BART. Here, we study the exact BART prior
and propose a simple modification so that it also enjoys optimality properties.
To this end, we dive into branching process theory. We obtain tail bounds for
the distribution of total progeny under heterogeneous Galton-Watson (GW)
processes exploiting their connection to random walks. We conclude with a
result stating the optimal rate of posterior convergence for BART.Comment: 2
BET: Bayesian Ensemble Trees for Clustering and Prediction in Heterogeneous Data
We propose a novel "tree-averaging" model that utilizes the ensemble of
classification and regression trees (CART). Each constituent tree is estimated
with a subset of similar data. We treat this grouping of subsets as Bayesian
ensemble trees (BET) and model them as an infinite mixture Dirichlet process.
We show that BET adapts to data heterogeneity and accurately estimates each
component. Compared with the bootstrap-aggregating approach, BET shows improved
prediction performance with fewer trees. We develop an efficient estimating
procedure with improved sampling strategies in both CART and mixture models. We
demonstrate these advantages of BET with simulations, classification of breast
cancer and regression of lung function measurement of cystic fibrosis patients.
Keywords: Bayesian CART; Dirichlet Process; Ensemble Approach; Heterogeneity;
Mixture of Trees
Causal Tree Estimation of Heterogeneous Household Response to Time-Of-Use Electricity Pricing Schemes
We examine the household-specific effects of the introduction of Time-of-Use
(TOU) electricity pricing schemes. Using a causal forest (Athey and Imbens,
2016; Wager and Athey, 2018; Athey et al., 2019), we consider the association
between past consumption and survey variables, and the effect of TOU pricing on
household electricity demand. We describe the heterogeneity in household
variables across quartiles of estimated demand response and utilise variable
importance measures.
Household-specific estimates produced by a causal forest exhibit reasonable
associations with covariates. For example, households that are younger, more
educated, and that consume more electricity, are predicted to respond more to a
new pricing scheme. In addition, variable importance measures suggest that some
aspects of past consumption information may be more useful than survey
information in producing these estimates.Comment: 34 pages, 6 figure
Interpretable Clustering via Optimal Trees
State-of-the-art clustering algorithms use heuristics to partition the
feature space and provide little insight into the rationale for cluster
membership, limiting their interpretability. In healthcare applications, the
latter poses a barrier to the adoption of these methods since medical
researchers are required to provide detailed explanations of their decisions in
order to gain patient trust and limit liability. We present a new unsupervised
learning algorithm that leverages Mixed Integer Optimization techniques to
generate interpretable tree-based clustering models. Utilizing the flexible
framework of Optimal Trees, our method approximates the globally optimal
solution leading to high quality partitions of the feature space. Our
algorithm, can incorporate various internal validation metrics, naturally
determines the optimal number of clusters, and is able to account for mixed
numeric and categorical data. It achieves comparable or superior performance on
both synthetic and real world datasets when compared to K-Means while offering
significantly higher interpretability.Comment: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018
arXiv:1811.0721
Integrating Economic Knowledge in Data Mining Algorithms
The assessment of knowledge derived from databases depends on many factors. Decision makers often need to convince others about the correctness and effectiveness of knowledge induced from data.The current data mining techniques do not contribute much to this process of persuasion.Part of this limitation can be removed by integrating knowledge from experts in the field, encoded in some accessible way, with knowledge derived form patterns stored in the database.In this paper we will in particular discuss methods for implementing monotonicity constraints in economic decision problems.This prior knowledge is combined with data mining algorithms based on decision trees and neural networks.The method is illustrated in a hedonic price model.knowledge;neural network;data mining;decision trees
Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data
Decision tree classifiers are a widely used tool in data stream mining. The
use of confidence intervals to estimate the gain associated with each split
leads to very effective methods, like the popular Hoeffding tree algorithm.
From a statistical viewpoint, the analysis of decision tree classifiers in a
streaming setting requires knowing when enough new information has been
collected to justify splitting a leaf. Although some of the issues in the
statistical analysis of Hoeffding trees have been already clarified, a general
and rigorous study of confidence intervals for splitting criteria is missing.
We fill this gap by deriving accurate confidence intervals to estimate the
splitting gain in decision tree learning with respect to three criteria:
entropy, Gini index, and a third index proposed by Kearns and Mansour. Our
confidence intervals depend in a more detailed way on the tree parameters. We
also extend our confidence analysis to a selective sampling setting, in which
the decision tree learner adaptively decides which labels to query in the
stream. We furnish theoretical guarantee bounding the probability that the
classification is non-optimal learning the decision tree via our selective
sampling strategy. Experiments on real and synthetic data in a streaming
setting show that our trees are indeed more accurate than trees with the same
number of leaves generated by other techniques and our active learning module
permits to save labeling cost. In addition, comparing our labeling strategy
with recent methods, we show that our approach is more robust and consistent
respect all the other techniques applied to incremental decision trees
Deep Embedding Forest: Forest-based Serving with Deep Embedding Features
Deep Neural Networks (DNN) have demonstrated superior ability to extract high
level embedding vectors from low level features. Despite the success, the
serving time is still the bottleneck due to expensive run-time computation of
multiple layers of dense matrices. GPGPU, FPGA, or ASIC-based serving systems
require additional hardware that are not in the mainstream design of most
commercial applications. In contrast, tree or forest-based models are widely
adopted because of low serving cost, but heavily depend on carefully engineered
features. This work proposes a Deep Embedding Forest model that benefits from
the best of both worlds. The model consists of a number of embedding layers and
a forest/tree layer. The former maps high dimensional (hundreds of thousands to
millions) and heterogeneous low-level features to the lower dimensional
(thousands) vectors, and the latter ensures fast serving.
Built on top of a representative DNN model called Deep Crossing, and two
forest/tree-based models including XGBoost and LightGBM, a two-step Deep
Embedding Forest algorithm is demonstrated to achieve on-par or slightly better
performance as compared with the DNN counterpart, with only a fraction of
serving time on conventional hardware. After comparing with a joint
optimization algorithm called partial fuzzification, also proposed in this
paper, it is concluded that the two-step Deep Embedding Forest has achieved
near optimal performance. Experiments based on large scale data sets (up to 1
billion samples) from a major sponsored search engine proves the efficacy of
the proposed model.Comment: 14 pages, 3 figures, 5 table
Some methods for heterogeneous treatment effect estimation in high-dimensions
When devising a course of treatment for a patient, doctors often have little
quantitative evidence on which to base their decisions, beyond their medical
education and published clinical trials. Stanford Health Care alone has
millions of electronic medical records (EMRs) that are only just recently being
leveraged to inform better treatment recommendations. These data present a
unique challenge because they are high-dimensional and observational. Our goal
is to make personalized treatment recommendations based on the outcomes for
past patients similar to a new patient. We propose and analyze three methods
for estimating heterogeneous treatment effects using observational data. Our
methods perform well in simulations using a wide variety of treatment effect
functions, and we present results of applying the two most promising methods to
data from The SPRINT Data Analysis Challenge, from a large randomized trial of
a treatment for high blood pressure
Algorithm Runtime Prediction: Methods & Evaluation
Perhaps surprisingly, it is possible to predict how long an algorithm will
take to run on a previously unseen input, using machine learning techniques to
build a model of the algorithm's runtime as a function of problem-specific
instance features. Such models have important applications to algorithm
analysis, portfolio-based algorithm selection, and the automatic configuration
of parameterized algorithms. Over the past decade, a wide variety of techniques
have been studied for building such models. Here, we describe extensions and
improvements of existing models, new families of models, and -- perhaps most
importantly -- a much more thorough treatment of algorithm parameters as model
inputs. We also comprehensively describe new and existing features for
predicting algorithm runtime for propositional satisfiability (SAT), travelling
salesperson (TSP) and mixed integer programming (MIP) problems. We evaluate
these innovations through the largest empirical analysis of its kind, comparing
to a wide range of runtime modelling techniques from the literature. Our
experiments consider 11 algorithms and 35 instance distributions; they also
span a very wide range of SAT, MIP, and TSP instances, with the least
structured having been generated uniformly at random and the most structured
having emerged from real industrial applications. Overall, we demonstrate that
our new models yield substantially better runtime predictions than previous
approaches in terms of their generalization to new problem instances, to new
algorithms from a parameterized space, and to both simultaneously.Comment: 51 pages, 13 figures, 8 tables. Added references, feature cost, and
experiments with subsets of features; reworded Sections 1&
- …