13,618 research outputs found
Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data
Accurately modeling traffic speeds is a fundamental part of efficient
intelligent transportation systems. Nowadays, with the widespread deployment of
GPS-enabled devices, it has become possible to crowdsource the collection of
speed information to road users (e.g. through mobile applications or dedicated
in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced
speed data also brings very important challenges, such as the highly variable
measurement noise in the data due to a variety of driving behaviors and sample
sizes. When not properly accounted for, this noise can severely compromise any
application that relies on accurate traffic data. In this article, we propose
the use of heteroscedastic Gaussian processes (HGP) to model the time-varying
uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a
HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of
sample size information (probe vehicles per minute) as well as previous
observed speeds, in order to more accurately model the uncertainty in observed
speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we
empirically show that the proposed heteroscedastic models produce significantly
better predictive distributions when compared to current state-of-the-art
methods for both speed imputation and short-term forecasting tasks.Comment: 22 pages, Transportation Research Part C: Emerging Technologies
(Elsevier
Multi-Output Gaussian Processes for Crowdsourced Traffic Data Imputation
Traffic speed data imputation is a fundamental challenge for data-driven
transport analysis. In recent years, with the ubiquity of GPS-enabled devices
and the widespread use of crowdsourcing alternatives for the collection of
traffic data, transportation professionals increasingly look to such
user-generated data for many analysis, planning, and decision support
applications. However, due to the mechanics of the data collection process,
crowdsourced traffic data such as probe-vehicle data is highly prone to missing
observations, making accurate imputation crucial for the success of any
application that makes use of that type of data. In this article, we propose
the use of multi-output Gaussian processes (GPs) to model the complex spatial
and temporal patterns in crowdsourced traffic data. While the Bayesian
nonparametric formalism of GPs allows us to model observation uncertainty, the
multi-output extension based on convolution processes effectively enables us to
capture complex spatial dependencies between nearby road segments. Using 6
months of crowdsourced traffic speed data or "probe vehicle data" for several
locations in Copenhagen, the proposed approach is empirically shown to
significantly outperform popular state-of-the-art imputation methods.Comment: 10 pages, IEEE Transactions on Intelligent Transportation Systems,
201
Two Approaches to Imputation and Adjustment of Air Quality Data from a Composite Monitoring Network
An analysis of air quality data is provided for the municipal area of Taranto characterized by high environmental risks, due to the massive presence of industrial sites with elevated environmental impact activities. The present study is focused on particulate matter as measured by PM10 concentrations. Preliminary analysis involved addressing several data problems, mainly: (i) an imputation techniques were considered to cope with the large number of missing data, due to both different working periods for groups of monitoring stations and occasional malfunction of PM10 sensors; (ii) due to the use of different validation techniques for each of the three monitoring networks, a calibration procedure was devised to allow for data comparability. Missing data imputation and calibration were addressed by three alternative procedures sharing a leave-one-out type mechanism and based on {\it ad hoc} exploratory tools and on the recursive Bayesian estimation and prediction of spatial linear mixed effects models. The three procedures are introduced by motivating issues and compared in terms of performance
Two Approaches to Imputation and Adjustment of Air Quality Data from a Composite Monitoring Network
An analysis of air quality data is provided for the municipal area of Taranto characterized by high environmental risks, due to the massive presence of industrial sites with elevated environmental impact activities. The present study is focused on particulate matter as measured by PM10 concentrations. Preliminary analysis involved addressing several data problems, mainly: (i) an imputation techniques were considered to cope with the large number of missing data, due to both different working periods for groups of monitoring stations and occasional malfunction of PM10 sensors; (ii) due to the use of different validation techniques for each of the three monitoring networks, a calibration procedure was devised to allow for data comparability. Missing data imputation and calibration were addressed by three alternative procedures sharing a leave-one-out type mechanism and based on {\it ad hoc} exploratory tools and on the recursive Bayesian estimation and prediction of spatial linear mixed effects models. The three procedures are introduced by motivating issues and compared in terms of performance
A review of RCTs in four medical journals to assess the use of imputation to overcome missing data in quality of life outcomes
Background: Randomised controlled trials (RCTs) are perceived as the gold-standard method for evaluating healthcare interventions, and increasingly include quality of life (QoL) measures. The observed results are susceptible to bias if a substantial proportion of outcome data are missing. The review aimed to determine whether imputation was used to deal with missing QoL outcomes. Methods: A random selection of 285 RCTs published during 2005/6 in the British Medical Journal, Lancet, New England Journal of Medicine and Journal of American Medical Association were identified. Results: QoL outcomes were reported in 61 (21%) trials. Six (10%) reported having no missing data, 20 (33%) reported ≤ 10% missing, eleven (18%) 11%–20% missing, and eleven (18%) reported >20% missing. Missingness was unclear in 13 (21%). Missing data were imputed in 19 (31%) of the 61 trials. Imputation was part of the primary analysis in 13 trials, but a sensitivity analysis in six. Last value carried forward was used in 12 trials and multiple imputation in two. Following imputation, the most common analysis method was analysis of covariance (10 trials). Conclusion: The majority of studies did not impute missing data and carried out a complete-case analysis. For those studies that did impute missing data, researchers tended to prefer simpler methods of imputation, despite more sophisticated methods being available.The Health Services Research Unit is funded by the Chief Scientist Office of the Scottish Government Health Directorate. Shona Fielding is also currently funded by the Chief Scientist Office on a Research Training Fellowship (CZF/1/31)
A review of RCTs in four medical journals to assess the use of imputation to overcome missing data in quality of life outcomes
Peer reviewedPublisher PD
Parameter estimation in Cox models with missing failure indicators and the OPPERA study
In a prospective cohort study, examining all participants for incidence of
the condition of interest may be prohibitively expensive. For example, the
"gold standard" for diagnosing temporomandibular disorder (TMD) is a physical
examination by a trained clinician. In large studies, examining all
participants in this manner is infeasible. Instead, it is common to use
questionnaires to screen for incidence of TMD and perform the "gold standard"
examination only on participants who screen positively. Unfortunately, some
participants may leave the study before receiving the "gold standard"
examination. Within the framework of survival analysis, this results in missing
failure indicators. Motivated by the Orofacial Pain: Prospective Evaluation and
Risk Assessment (OPPERA) study, a large cohort study of TMD, we propose a
method for parameter estimation in survival models with missing failure
indicators. We estimate the probability of being an incident case for those
lacking a "gold standard" examination using logistic regression. These
estimated probabilities are used to generate multiple imputations of case
status for each missing examination that are combined with observed data in
appropriate regression models. The variance introduced by the procedure is
estimated using multiple imputation. The method can be used to estimate both
regression coefficients in Cox proportional hazard models as well as incidence
rates using Poisson regression. We simulate data with missing failure
indicators and show that our method performs as well as or better than
competing methods. Finally, we apply the proposed method to data from the
OPPERA study.Comment: Version 4: 23 pages, 0 figure
- …