5 research outputs found

    Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation

    Full text link
    Traffic speed data imputation is a fundamental challenge for data-driven transport analysis. In recent years, with the ubiquity of GPS-enabled devices and the widespread use of crowdsourcing alternatives for the collection of traffic data, transportation professionals increasingly look to such user-generated data for many analysis, planning, and decision support applications. However, due to the mechanics of the data collection process, crowdsourced traffic data such as probe-vehicle data is highly prone to missing observations, making accurate imputation crucial for the success of any application that makes use of that type of data. In this article, we propose the use of multi-output Gaussian processes (GPs) to model the complex spatial and temporal patterns in crowdsourced traffic data. While the Bayesian nonparametric formalism of GPs allows us to model observation uncertainty, the multi-output extension based on convolution processes effectively enables us to capture complex spatial dependencies between nearby road segments. Using 6 months of crowdsourced traffic speed data or "probe vehicle data" for several locations in Copenhagen, the proposed approach is empirically shown to significantly outperform popular state-of-the-art imputation methods.Comment: 10 pages, IEEE Transactions on Intelligent Transportation Systems, 201

    Exploring Statistical and Machine Learning-Based Missing Data Imputation Methods to Improve Crash Frequency Prediction Models for Highway-Rail Grade Crossings

    Get PDF
    Highway-rail grade crossings (HRGCs) are critical spatial locations of transportation safety because crashes at HRGCs are often catastrophic, potentially causing several injuries and fatalities. Every year in the United States, a significant number of crashes occur at these crossings, prompting local and state organizations to engage in safety analysis and estimate crash frequency prediction models for resource allocation. These models provide valuable insights into safety and risk mitigation strategies for HRGCs. Furthermore, the estimation of these models is based on inventory details of HRGCs, and their quality is crucial for reliable crash predictions. However, many of these models exclude crossings with missing inventory details, which can adversely affect the precision of these models. In this study, a random sample of inventory details of 2000 HRGCs was taken from the Federal Railroad Administration’s HRGCs inventory database. Data filters were applied to retain only those crossings in the data that were at-grade, public and operational (N=1096). Missing values were imputed using various statistical and machine learning methods, including Mean, Median and Mode (MMM) imputation, Last Observation Carried Forward (LOCF) imputation, K-Nearest Neighbors (KNN) imputation, Expectation-Maximization (EM) imputation, Support Vector Machine (SVM) imputation, and Random Forest (RF) imputation. The results indicated that the crash frequency models based on machine learning imputation methods yielded better-fitted models (lower AIC and BIC values). The findings underscore the importance of obtaining complete inventory data through machine learning imputation methods when developing crash frequency models for HRGCs. This approach can substantially enhance the precision of these models, improving their predictive capabilities, and ultimately saving valuable human lives

    How to Provide Accurate and Robust Traffic Forecasts Practically?

    Get PDF

    Efficient least angle regression for identification of linear-in-the-parameters models

    Get PDF
    Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm

    Spatio-temporal forecasting of network data

    Get PDF
    In the digital age, data are collected in unprecedented volumes on a plethora of networks. These data provide opportunities to develop our understanding of network processes by allowing data to drive method, revealing new and often unexpected insights. To date, there has been extensive research into the structure and function of complex networks, but there is scope for improvement in modelling the spatio-temporal evolution of network processes in order to forecast future conditions. This thesis focusses on forecasting using data collected on road networks. Road traffic congestion is a serious and persistent problem in most major cities around the world, and it is the task of researchers and traffic engineers to make use of voluminous traffic data to help alleviate congestion. Recently, spatio-temporal models have been applied to traffic data, showing improvements over time series methods. Although progress has been made, challenges remain. Firstly, most existing methods perform well under typical conditions, but less well under atypical conditions. Secondly, existing spatio-temporal models have been applied to traffic data with high spatial resolution, and there has been little research into how to incorporate spatial information on spatially sparse sensor networks, where the dependency relationships between locations are uncertain. Thirdly, traffic data is characterised by high missing rates, and existing methods are generally poorly equipped to deal with this in a real time setting. In this thesis, a local online kernel ridge regression model is developed that addresses these three issues, with application to forecasting of travel times collected by automatic number plate recognition on London’s road network. The model parameters can vary spatially and temporally, allowing it to better model the time varying characteristics of traffic data, and to deal with abnormal traffic situations. Methods are defined for linking the spatially sparse sensor network to the physical road network, providing an improved representation of the spatial relationship between sensor locations. The incorporation of the spatio-temporal neighbourhood enables the model to forecast effectively under missing data. The proposed model outperforms a range of benchmark models at forecasting under normal conditions, and under various missing data scenarios
    corecore