19,019 research outputs found
Mining large-scale human mobility data for long-term crime prediction
Traditional crime prediction models based on census data are limited, as they
fail to capture the complexity and dynamics of human activity. With the rise of
ubiquitous computing, there is the opportunity to improve such models with data
that make for better proxies of human presence in cities. In this paper, we
leverage large human mobility data to craft an extensive set of features for
crime prediction, as informed by theories in criminology and urban studies. We
employ averaging and boosting ensemble techniques from machine learning, to
investigate their power in predicting yearly counts for different types of
crimes occurring in New York City at census tract level. Our study shows that
spatial and spatio-temporal features derived from Foursquare venues and
checkins, subway rides, and taxi rides, improve the baseline models relying on
census and POI data. The proposed models achieve absolute R^2 metrics of up to
65% (on a geographical out-of-sample test set) and up to 89% (on a temporal
out-of-sample test set). This proves that, next to the residential population
of an area, the ambient population there is strongly predictive of the area's
crime levels. We deep-dive into the main crime categories, and find that the
predictive gain of the human dynamics features varies across crime types: such
features bring the biggest boost in case of grand larcenies, whereas assaults
are already well predicted by the census features. Furthermore, we identify and
discuss top predictive features for the main crime categories. These results
offer valuable insights for those responsible for urban policy or law
enforcement
An Extended Laplace Approximation Method for Bayesian Inference of Self-Exciting Spatial-Temporal Models of Count Data
Self-Exciting models are statistical models of count data where the
probability of an event occurring is influenced by the history of the process.
In particular, self-exciting spatio-temporal models allow for spatial
dependence as well as temporal self-excitation. For large spatial or temporal
regions, however, the model leads to an intractable likelihood. An increasingly
common method for dealing with large spatio-temporal models is by using Laplace
approximations (LA). This method is convenient as it can easily be applied and
is quickly implemented. However, as we will demonstrate in this manuscript,
when applied to self-exciting Poisson spatial-temporal models, Laplace
Approximations result in a significant bias in estimating some parameters. Due
to this bias, we propose using up to sixth-order corrections to the LA for
fitting these models. We will demonstrate how to do this in a Bayesian setting
for Self-Exciting Spatio-Temporal models. We will further show there is a
limited parameter space where the extended LA method still has bias. In these
uncommon instances we will demonstrate how a more computationally intensive
fully Bayesian approach using the Stan software program is possible in those
rare instances. The performance of the extended LA method is illustrated with
both simulation and real-world data
Assessing the Impact of Game Day Schedule and Opponents on Travel Patterns and Route Choice using Big Data Analytics
The transportation system is crucial for transferring people and goods from point A to point B. However, its reliability can be decreased by unanticipated congestion resulting from planned special events. For example, sporting events collect large crowds of people at specific venues on game days and disrupt normal traffic patterns.
The goal of this study was to understand issues related to road traffic management during major sporting events by using widely available INRIX data to compare travel patterns and behaviors on game days against those on normal days. A comprehensive analysis was conducted on the impact of all Nebraska Cornhuskers football games over five years on traffic congestion on five major routes in Nebraska. We attempted to identify hotspots, the unusually high-risk zones in a spatiotemporal space containing traffic congestion that occur on almost all game days. For hotspot detection, we utilized a method called Multi-EigenSpot, which is able to detect multiple hotspots in a spatiotemporal space. With this algorithm, we were able to detect traffic hotspot clusters on the five chosen routes in Nebraska. After detecting the hotspots, we identified the factors affecting the sizes of hotspots and other parameters. The start time of the game and the Cornhuskers’ opponent for a given game are two important factors affecting the number of people coming to Lincoln, Nebraska, on game days. Finally, the Dynamic Bayesian Networks (DBN) approach was applied to forecast the start times and locations of hotspot clusters in 2018 with a weighted mean absolute percentage error (WMAPE) of 13.8%
Modeling and estimation of multi-source clustering in crime and security data
While the presence of clustering in crime and security event data is well
established, the mechanism(s) by which clustering arises is not fully
understood. Both contagion models and history independent correlation models
are applied, but not simultaneously. In an attempt to disentangle contagion
from other types of correlation, we consider a Hawkes process with background
rate driven by a log Gaussian Cox process. Our inference methodology is an
efficient Metropolis adjusted Langevin algorithm for filtering of the intensity
and estimation of the model parameters. We apply the methodology to property
and violent crime data from Chicago, terrorist attack data from Northern
Ireland and Israel, and civilian casualty data from Iraq. For each data set we
quantify the uncertainty in the levels of contagion vs. history independent
correlation.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS647 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
How robust are the estimated effects of air pollution on health? Accounting for model uncertainty using Bayesian model averaging
The long-term impact of air pollution on human health can be estimated from small-area ecological studies in which the health outcome is regressed against air pollution concentrations and other covariates, such as socio-economic deprivation. Socio-economic deprivation is multi-factorial and difficult to measure, and includes aspects of income, education, and housing as well as others. However, these variables are potentially highly correlated, meaning one can either create an overall deprivation index, or use the individual characteristics, which can result in a variety of pollution-health effects. Other aspects of model choice may affect the pollution-health estimate, such as the estimation of pollution, and spatial autocorrelation model. Therefore, we propose a Bayesian model averaging approach to combine the results from multiple statistical models to produce a more robust representation of the overall pollution-health effect. We investigate the relationship between nitrogen dioxide concentrations and cardio-respiratory mortality in West Central Scotland between 2006 and 2012
- …