9,564 research outputs found
Axiomatic Interpretability for Multiclass Additive Models
Generalized additive models (GAMs) are favored in many regression and binary
classification problems because they are able to fit complex, nonlinear
functions while still remaining interpretable. In the first part of this paper,
we generalize a state-of-the-art GAM learning algorithm based on boosted trees
to the multiclass setting, and show that this multiclass algorithm outperforms
existing GAM learning algorithms and sometimes matches the performance of full
complexity models such as gradient boosted trees.
In the second part, we turn our attention to the interpretability of GAMs in
the multiclass setting. Surprisingly, the natural interpretability of GAMs
breaks down when there are more than two classes. Naive interpretation of
multiclass GAMs can lead to false conclusions. Inspired by binary GAMs, we
identify two axioms that any additive model must satisfy in order to not be
visually misleading. We then develop a technique called Additive
Post-Processing for Interpretability (API), that provably transforms a
pre-trained additive model to satisfy the interpretability axioms without
sacrificing accuracy. The technique works not just on models trained with our
learning algorithm, but on any multiclass additive model, including multiclass
linear and logistic regression. We demonstrate the effectiveness of API on a
12-class infant mortality dataset.Comment: KDD 201
An investigation into machine learning approaches for forecasting spatio-temporal demand in ride-hailing service
In this paper, we present machine learning approaches for characterizing and
forecasting the short-term demand for on-demand ride-hailing services. We
propose the spatio-temporal estimation of the demand that is a function of
variable effects related to traffic, pricing and weather conditions. With
respect to the methodology, a single decision tree, bootstrap-aggregated
(bagged) decision trees, random forest, boosted decision trees, and artificial
neural network for regression have been adapted and systematically compared
using various statistics, e.g. R-square, Root Mean Square Error (RMSE), and
slope. To better assess the quality of the models, they have been tested on a
real case study using the data of DiDi Chuxing, the main on-demand ride hailing
service provider in China. In the current study, 199,584 time-slots describing
the spatio-temporal ride-hailing demand has been extracted with an
aggregated-time interval of 10 mins. All the methods are trained and validated
on the basis of two independent samples from this dataset. The results revealed
that boosted decision trees provide the best prediction accuracy (RMSE=16.41),
while avoiding the risk of over-fitting, followed by artificial neural network
(20.09), random forest (23.50), bagged decision trees (24.29) and single
decision tree (33.55).Comment: Currently under review for journal publicatio
Localized Regression
The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen data¡adaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures
Variable Selection and Model Choice in Structured Survival Models
In many situations, medical applications ask for flexible survival models that allow to extend the classical Cox-model via the
inclusion of time-varying and nonparametric effects. These structured survival models are very flexible but additional
difficulties arise when model choice and variable selection is desired. In particular, it has to be decided which covariates
should be assigned time-varying effects or whether parametric modeling is sufficient for a given covariate. Component-wise
boosting provides a means of likelihood-based model fitting that enables simultaneous variable selection and model choice. We
introduce a component-wise likelihood-based boosting algorithm for survival data that permits the inclusion of both parametric
and nonparametric time-varying effects as well as nonparametric effects of continuous covariates utilizing penalized splines as
the main modeling technique. Its properties
and performance are investigated in simulation studies.
The new modeling approach is used to build a flexible survival model for
intensive care patients suffering from severe sepsis.
A software implementation is available to the interested reader
Mining large-scale human mobility data for long-term crime prediction
Traditional crime prediction models based on census data are limited, as they
fail to capture the complexity and dynamics of human activity. With the rise of
ubiquitous computing, there is the opportunity to improve such models with data
that make for better proxies of human presence in cities. In this paper, we
leverage large human mobility data to craft an extensive set of features for
crime prediction, as informed by theories in criminology and urban studies. We
employ averaging and boosting ensemble techniques from machine learning, to
investigate their power in predicting yearly counts for different types of
crimes occurring in New York City at census tract level. Our study shows that
spatial and spatio-temporal features derived from Foursquare venues and
checkins, subway rides, and taxi rides, improve the baseline models relying on
census and POI data. The proposed models achieve absolute R^2 metrics of up to
65% (on a geographical out-of-sample test set) and up to 89% (on a temporal
out-of-sample test set). This proves that, next to the residential population
of an area, the ambient population there is strongly predictive of the area's
crime levels. We deep-dive into the main crime categories, and find that the
predictive gain of the human dynamics features varies across crime types: such
features bring the biggest boost in case of grand larcenies, whereas assaults
are already well predicted by the census features. Furthermore, we identify and
discuss top predictive features for the main crime categories. These results
offer valuable insights for those responsible for urban policy or law
enforcement
- …