382 research outputs found
Arriving on time: estimating travel time distributions on large-scale road networks
Most optimal routing problems focus on minimizing travel time or distance
traveled. Oftentimes, a more useful objective is to maximize the probability of
on-time arrival, which requires statistical distributions of travel times,
rather than just mean values. We propose a method to estimate travel time
distributions on large-scale road networks, using probe vehicle data collected
from GPS. We present a framework that works with large input of data, and
scales linearly with the size of the network. Leveraging the planar topology of
the graph, the method computes efficiently the time correlations between
neighboring streets. First, raw probe vehicle traces are compressed into pairs
of travel times and number of stops for each traversed road segment using a
`stop-and-go' algorithm developed for this work. The compressed data is then
used as input for training a path travel time model, which couples a Markov
model along with a Gaussian Markov random field. Finally, scalable inference
algorithms are developed for obtaining path travel time distributions from the
composite MM-GMRF model. We illustrate the accuracy and scalability of our
model on a 505,000 road link network spanning the San Francisco Bay Area
On dimension reduction in Gaussian filters
A priori dimension reduction is a widely adopted technique for reducing the
computational complexity of stationary inverse problems. In this setting, the
solution of an inverse problem is parameterized by a low-dimensional basis that
is often obtained from the truncated Karhunen-Loeve expansion of the prior
distribution. For high-dimensional inverse problems equipped with smoothing
priors, this technique can lead to drastic reductions in parameter dimension
and significant computational savings.
In this paper, we extend the concept of a priori dimension reduction to
non-stationary inverse problems, in which the goal is to sequentially infer the
state of a dynamical system. Our approach proceeds in an offline-online
fashion. We first identify a low-dimensional subspace in the state space before
solving the inverse problem (the offline phase), using either the method of
"snapshots" or regularized covariance estimation. Then this subspace is used to
reduce the computational complexity of various filtering algorithms - including
the Kalman filter, extended Kalman filter, and ensemble Kalman filter - within
a novel subspace-constrained Bayesian prediction-and-update procedure (the
online phase). We demonstrate the performance of our new dimension reduction
approach on various numerical examples. In some test cases, our approach
reduces the dimensionality of the original problem by orders of magnitude and
yields up to two orders of magnitude in computational savings
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Distributed Convergence Verification for Gaussian Belief Propagation
Gaussian belief propagation (BP) is a computationally efficient method to
approximate the marginal distribution and has been widely used for inference
with high dimensional data as well as distributed estimation in large-scale
networks. However, the convergence of Gaussian BP is still an open issue.
Though sufficient convergence conditions have been studied in the literature,
verifying these conditions requires gathering all the information over the
whole network, which defeats the main advantage of distributed computing by
using Gaussian BP. In this paper, we propose a novel sufficient convergence
condition for Gaussian BP that applies to both the pairwise linear Gaussian
model and to Gaussian Markov random fields. We show analytically that this
sufficient convergence condition can be easily verified in a distributed way
that satisfies the network topology constraint.Comment: accepted by Asilomar Conference on Signals, Systems, and Computers,
2017, Asilomar, Pacific Grove, CA. arXiv admin note: text overlap with
arXiv:1706.0407
Meta-analysis of functional neuroimaging data using Bayesian nonparametric binary regression
In this work we perform a meta-analysis of neuroimaging data, consisting of
locations of peak activations identified in 162 separate studies on emotion.
Neuroimaging meta-analyses are typically performed using kernel-based methods.
However, these methods require the width of the kernel to be set a priori and
to be constant across the brain. To address these issues, we propose a fully
Bayesian nonparametric binary regression method to perform neuroimaging
meta-analyses. In our method, each location (or voxel) has a probability of
being a peak activation, and the corresponding probability function is based on
a spatially adaptive Gaussian Markov random field (GMRF). We also include
parameters in the model to robustify the procedure against miscoding of the
voxel response. Posterior inference is implemented using efficient MCMC
algorithms extended from those introduced in Holmes and Held [Bayesian Anal. 1
(2006) 145--168]. Our method allows the probability function to be locally
adaptive with respect to the covariates, that is, to be smooth in one region of
the covariate space and wiggly or even discontinuous in another. Posterior
miscoding probabilities for each of the identified voxels can also be obtained,
identifying voxels that may have been falsely classified as being activated.
Simulation studies and application to the emotion neuroimaging data indicate
that our method is superior to standard kernel-based methods.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS523 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …