382 research outputs found

    Arriving on time: estimating travel time distributions on large-scale road networks

    Full text link
    Most optimal routing problems focus on minimizing travel time or distance traveled. Oftentimes, a more useful objective is to maximize the probability of on-time arrival, which requires statistical distributions of travel times, rather than just mean values. We propose a method to estimate travel time distributions on large-scale road networks, using probe vehicle data collected from GPS. We present a framework that works with large input of data, and scales linearly with the size of the network. Leveraging the planar topology of the graph, the method computes efficiently the time correlations between neighboring streets. First, raw probe vehicle traces are compressed into pairs of travel times and number of stops for each traversed road segment using a `stop-and-go' algorithm developed for this work. The compressed data is then used as input for training a path travel time model, which couples a Markov model along with a Gaussian Markov random field. Finally, scalable inference algorithms are developed for obtaining path travel time distributions from the composite MM-GMRF model. We illustrate the accuracy and scalability of our model on a 505,000 road link network spanning the San Francisco Bay Area

    On dimension reduction in Gaussian filters

    Full text link
    A priori dimension reduction is a widely adopted technique for reducing the computational complexity of stationary inverse problems. In this setting, the solution of an inverse problem is parameterized by a low-dimensional basis that is often obtained from the truncated Karhunen-Loeve expansion of the prior distribution. For high-dimensional inverse problems equipped with smoothing priors, this technique can lead to drastic reductions in parameter dimension and significant computational savings. In this paper, we extend the concept of a priori dimension reduction to non-stationary inverse problems, in which the goal is to sequentially infer the state of a dynamical system. Our approach proceeds in an offline-online fashion. We first identify a low-dimensional subspace in the state space before solving the inverse problem (the offline phase), using either the method of "snapshots" or regularized covariance estimation. Then this subspace is used to reduce the computational complexity of various filtering algorithms - including the Kalman filter, extended Kalman filter, and ensemble Kalman filter - within a novel subspace-constrained Bayesian prediction-and-update procedure (the online phase). We demonstrate the performance of our new dimension reduction approach on various numerical examples. In some test cases, our approach reduces the dimensionality of the original problem by orders of magnitude and yields up to two orders of magnitude in computational savings

    Foundational principles for large scale inference: Illustrations through correlation mining

    Full text link
    When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number nn of acquired samples (statistical replicates) is far fewer than the number pp of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size nn is fixed, and the dimension pp grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

    Distributed Convergence Verification for Gaussian Belief Propagation

    Full text link
    Gaussian belief propagation (BP) is a computationally efficient method to approximate the marginal distribution and has been widely used for inference with high dimensional data as well as distributed estimation in large-scale networks. However, the convergence of Gaussian BP is still an open issue. Though sufficient convergence conditions have been studied in the literature, verifying these conditions requires gathering all the information over the whole network, which defeats the main advantage of distributed computing by using Gaussian BP. In this paper, we propose a novel sufficient convergence condition for Gaussian BP that applies to both the pairwise linear Gaussian model and to Gaussian Markov random fields. We show analytically that this sufficient convergence condition can be easily verified in a distributed way that satisfies the network topology constraint.Comment: accepted by Asilomar Conference on Signals, Systems, and Computers, 2017, Asilomar, Pacific Grove, CA. arXiv admin note: text overlap with arXiv:1706.0407

    Meta-analysis of functional neuroimaging data using Bayesian nonparametric binary regression

    Full text link
    In this work we perform a meta-analysis of neuroimaging data, consisting of locations of peak activations identified in 162 separate studies on emotion. Neuroimaging meta-analyses are typically performed using kernel-based methods. However, these methods require the width of the kernel to be set a priori and to be constant across the brain. To address these issues, we propose a fully Bayesian nonparametric binary regression method to perform neuroimaging meta-analyses. In our method, each location (or voxel) has a probability of being a peak activation, and the corresponding probability function is based on a spatially adaptive Gaussian Markov random field (GMRF). We also include parameters in the model to robustify the procedure against miscoding of the voxel response. Posterior inference is implemented using efficient MCMC algorithms extended from those introduced in Holmes and Held [Bayesian Anal. 1 (2006) 145--168]. Our method allows the probability function to be locally adaptive with respect to the covariates, that is, to be smooth in one region of the covariate space and wiggly or even discontinuous in another. Posterior miscoding probabilities for each of the identified voxels can also be obtained, identifying voxels that may have been falsely classified as being activated. Simulation studies and application to the emotion neuroimaging data indicate that our method is superior to standard kernel-based methods.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS523 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore