19,019 research outputs found

    Mining large-scale human mobility data for long-term crime prediction

    Full text link
    Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area's crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement

    An Extended Laplace Approximation Method for Bayesian Inference of Self-Exciting Spatial-Temporal Models of Count Data

    Full text link
    Self-Exciting models are statistical models of count data where the probability of an event occurring is influenced by the history of the process. In particular, self-exciting spatio-temporal models allow for spatial dependence as well as temporal self-excitation. For large spatial or temporal regions, however, the model leads to an intractable likelihood. An increasingly common method for dealing with large spatio-temporal models is by using Laplace approximations (LA). This method is convenient as it can easily be applied and is quickly implemented. However, as we will demonstrate in this manuscript, when applied to self-exciting Poisson spatial-temporal models, Laplace Approximations result in a significant bias in estimating some parameters. Due to this bias, we propose using up to sixth-order corrections to the LA for fitting these models. We will demonstrate how to do this in a Bayesian setting for Self-Exciting Spatio-Temporal models. We will further show there is a limited parameter space where the extended LA method still has bias. In these uncommon instances we will demonstrate how a more computationally intensive fully Bayesian approach using the Stan software program is possible in those rare instances. The performance of the extended LA method is illustrated with both simulation and real-world data

    Assessing the Impact of Game Day Schedule and Opponents on Travel Patterns and Route Choice using Big Data Analytics

    Get PDF
    The transportation system is crucial for transferring people and goods from point A to point B. However, its reliability can be decreased by unanticipated congestion resulting from planned special events. For example, sporting events collect large crowds of people at specific venues on game days and disrupt normal traffic patterns. The goal of this study was to understand issues related to road traffic management during major sporting events by using widely available INRIX data to compare travel patterns and behaviors on game days against those on normal days. A comprehensive analysis was conducted on the impact of all Nebraska Cornhuskers football games over five years on traffic congestion on five major routes in Nebraska. We attempted to identify hotspots, the unusually high-risk zones in a spatiotemporal space containing traffic congestion that occur on almost all game days. For hotspot detection, we utilized a method called Multi-EigenSpot, which is able to detect multiple hotspots in a spatiotemporal space. With this algorithm, we were able to detect traffic hotspot clusters on the five chosen routes in Nebraska. After detecting the hotspots, we identified the factors affecting the sizes of hotspots and other parameters. The start time of the game and the Cornhuskers’ opponent for a given game are two important factors affecting the number of people coming to Lincoln, Nebraska, on game days. Finally, the Dynamic Bayesian Networks (DBN) approach was applied to forecast the start times and locations of hotspot clusters in 2018 with a weighted mean absolute percentage error (WMAPE) of 13.8%

    Modeling and estimation of multi-source clustering in crime and security data

    Full text link
    While the presence of clustering in crime and security event data is well established, the mechanism(s) by which clustering arises is not fully understood. Both contagion models and history independent correlation models are applied, but not simultaneously. In an attempt to disentangle contagion from other types of correlation, we consider a Hawkes process with background rate driven by a log Gaussian Cox process. Our inference methodology is an efficient Metropolis adjusted Langevin algorithm for filtering of the intensity and estimation of the model parameters. We apply the methodology to property and violent crime data from Chicago, terrorist attack data from Northern Ireland and Israel, and civilian casualty data from Iraq. For each data set we quantify the uncertainty in the levels of contagion vs. history independent correlation.Comment: Published in at http://dx.doi.org/10.1214/13-AOAS647 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    How robust are the estimated effects of air pollution on health? Accounting for model uncertainty using Bayesian model averaging

    Get PDF
    The long-term impact of air pollution on human health can be estimated from small-area ecological studies in which the health outcome is regressed against air pollution concentrations and other covariates, such as socio-economic deprivation. Socio-economic deprivation is multi-factorial and difficult to measure, and includes aspects of income, education, and housing as well as others. However, these variables are potentially highly correlated, meaning one can either create an overall deprivation index, or use the individual characteristics, which can result in a variety of pollution-health effects. Other aspects of model choice may affect the pollution-health estimate, such as the estimation of pollution, and spatial autocorrelation model. Therefore, we propose a Bayesian model averaging approach to combine the results from multiple statistical models to produce a more robust representation of the overall pollution-health effect. We investigate the relationship between nitrogen dioxide concentrations and cardio-respiratory mortality in West Central Scotland between 2006 and 2012
    • …
    corecore