Accurately modeling traffic speeds is a fundamental part of efficient
intelligent transportation systems. Nowadays, with the widespread deployment of
GPS-enabled devices, it has become possible to crowdsource the collection of
speed information to road users (e.g. through mobile applications or dedicated
in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced
speed data also brings very important challenges, such as the highly variable
measurement noise in the data due to a variety of driving behaviors and sample
sizes. When not properly accounted for, this noise can severely compromise any
application that relies on accurate traffic data. In this article, we propose
the use of heteroscedastic Gaussian processes (HGP) to model the time-varying
uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a
HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of
sample size information (probe vehicles per minute) as well as previous
observed speeds, in order to more accurately model the uncertainty in observed
speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we
empirically show that the proposed heteroscedastic models produce significantly
better predictive distributions when compared to current state-of-the-art
methods for both speed imputation and short-term forecasting tasks.Comment: 22 pages, Transportation Research Part C: Emerging Technologies
(Elsevier