107 research outputs found

    Heteroscedastic Gaussian processes for uncertainty modeling in large-scale crowdsourced traffic data

    Full text link
    Accurately modeling traffic speeds is a fundamental part of efficient intelligent transportation systems. Nowadays, with the widespread deployment of GPS-enabled devices, it has become possible to crowdsource the collection of speed information to road users (e.g. through mobile applications or dedicated in-vehicle devices). Despite its rather wide spatial coverage, crowdsourced speed data also brings very important challenges, such as the highly variable measurement noise in the data due to a variety of driving behaviors and sample sizes. When not properly accounted for, this noise can severely compromise any application that relies on accurate traffic data. In this article, we propose the use of heteroscedastic Gaussian processes (HGP) to model the time-varying uncertainty in large-scale crowdsourced traffic data. Furthermore, we develop a HGP conditioned on sample size and traffic regime (SRC-HGP), which makes use of sample size information (probe vehicles per minute) as well as previous observed speeds, in order to more accurately model the uncertainty in observed speeds. Using 6 months of crowdsourced traffic data from Copenhagen, we empirically show that the proposed heteroscedastic models produce significantly better predictive distributions when compared to current state-of-the-art methods for both speed imputation and short-term forecasting tasks.Comment: 22 pages, Transportation Research Part C: Emerging Technologies (Elsevier

    Large-scale Heteroscedastic Regression via Gaussian Process

    Full text link
    Heteroscedastic regression considering the varying noises among observations has many applications in the fields like machine learning and statistics. Here we focus on the heteroscedastic Gaussian process (HGP) regression which integrates the latent function and the noise function together in a unified non-parametric Bayesian framework. Though showing remarkable performance, HGP suffers from the cubic time complexity, which strictly limits its application to big data. To improve the scalability, we first develop a variational sparse inference algorithm, named VSHGP, to handle large-scale datasets. Furthermore, two variants are developed to improve the scalability and capability of VSHGP. The first is stochastic VSHGP (SVSHGP) which derives a factorized evidence lower bound, thus enhancing efficient stochastic variational inference. The second is distributed VSHGP (DVSHGP) which (i) follows the Bayesian committee machine formalism to distribute computations over multiple local VSHGP experts with many inducing points; and (ii) adopts hybrid parameters for experts to guard against over-fitting and capture local variety. The superiority of DVSHGP and SVSHGP as compared to existing scalable heteroscedastic/homoscedastic GPs is then extensively verified on various datasets.Comment: 14 pages, 15 figure

    Retrieval of maize leaf area index using hyperspectral and multispectral data

    Get PDF
    Field spectra acquired from a handheld spectroradiometer and Sentinel-2 images spectra were used to investigate the applicability of hyperspectral and multispectral data in retrieving the maize leaf area index in low-input crop systems, with high spatial and intra-annual variability, and low yield, in southern Mozambique, during three years. Seventeen vegetation indices, comprising two and three band indices, and nine machine learning regression algorithms (MLRA) were tested for the statistical approach while five cost functions were tested in the look-up-table (LUT) inversion approach. The three band vegetation indices were selected, specifically the modified difference index (mDId: 725; 715; 565) for the hyperspectral dataset and the modified simple ratio (mSRc: 740; 705; 865) for the multispectral dataset of field spectra and the three band spectral index (TBSIb: 665; 865; 783) for the Sentinel-2 dataset. The relevant vector machine was the selected MLRA for the two datasets of field spectra (multispectral and hyperspectral) while the support vector machine was selected for the Sentinel-2 data. When using the LUT inversion technique, the minimum contrast estimation and the Bhattacharyya divergence cost functions were the best performing. The vegetation indices outperformed the other two approaches, with the TBSIb as the most accurate index (RMSE = 0.35). At the field scale, spectral data from Sentinel-2 can accurately retrieve the maize leaf area index in the study areainfo:eu-repo/semantics/publishedVersio

    A Survey on Gaussian Processes for Earth-Observation Data Analysis: A Comprehensive Investigation

    Get PDF
    Gaussian processes (GPs) have experienced tremendous success in biogeophysical parameter retrieval in the last few years. GPs constitute a solid Bayesian framework to consistently formulate many function approximation problems. This article reviews the main theoretical GP developments in the field, considering new algorithms that respect signal and noise characteristics, extract knowledge via automatic relevance kernels to yield feature rankings automatically, and allow applicability of associated uncertainty intervals to transport GP models in space and time that can be used to uncover causal relations between variables and can encode physically meaningful prior knowledge via radiative transfer model (RTM) emulation. The important issue of computational efficiency will also be addressed. These developments are illustrated in the field of geosciences and remote sensing at local and global scales through a set of illustrative examples. In particular, important problems for land, ocean, and atmosphere monitoring are considered, from accurately estimating oceanic chlorophyll content and pigments to retrieving vegetation properties from multi- and hyperspectral sensors as well as estimating atmospheric parameters (e.g., temperature, moisture, and ozone) from infrared sounders
    • …
    corecore