4,812 research outputs found
Random forests with random projections of the output space for high dimensional multi-label classification
We adapt the idea of random projections applied to the output space, so as to
enhance tree-based ensemble methods in the context of multi-label
classification. We show how learning time complexity can be reduced without
affecting computational complexity and accuracy of predictions. We also show
that random output space projections may be used in order to reach different
bias-variance tradeoffs, over a broad panel of benchmark problems, and that
this may lead to improved accuracy while reducing significantly the
computational burden of the learning stage
Encrypted statistical machine learning: new privacy preserving methods
We present two new statistical machine learning methods designed to learn on
fully homomorphic encrypted (FHE) data. The introduction of FHE schemes
following Gentry (2009) opens up the prospect of privacy preserving statistical
machine learning analysis and modelling of encrypted data without compromising
security constraints. We propose tailored algorithms for applying extremely
random forests, involving a new cryptographic stochastic fraction estimator,
and na\"{i}ve Bayes, involving a semi-parametric model for the class decision
boundary, and show how they can be used to learn and predict from encrypted
data. We demonstrate that these techniques perform competitively on a variety
of classification data sets and provide detailed information about the
computational practicalities of these and other FHE methods.Comment: 39 page
Mining large-scale human mobility data for long-term crime prediction
Traditional crime prediction models based on census data are limited, as they
fail to capture the complexity and dynamics of human activity. With the rise of
ubiquitous computing, there is the opportunity to improve such models with data
that make for better proxies of human presence in cities. In this paper, we
leverage large human mobility data to craft an extensive set of features for
crime prediction, as informed by theories in criminology and urban studies. We
employ averaging and boosting ensemble techniques from machine learning, to
investigate their power in predicting yearly counts for different types of
crimes occurring in New York City at census tract level. Our study shows that
spatial and spatio-temporal features derived from Foursquare venues and
checkins, subway rides, and taxi rides, improve the baseline models relying on
census and POI data. The proposed models achieve absolute R^2 metrics of up to
65% (on a geographical out-of-sample test set) and up to 89% (on a temporal
out-of-sample test set). This proves that, next to the residential population
of an area, the ambient population there is strongly predictive of the area's
crime levels. We deep-dive into the main crime categories, and find that the
predictive gain of the human dynamics features varies across crime types: such
features bring the biggest boost in case of grand larcenies, whereas assaults
are already well predicted by the census features. Furthermore, we identify and
discuss top predictive features for the main crime categories. These results
offer valuable insights for those responsible for urban policy or law
enforcement
- …