1,862 research outputs found
Random projections for Bayesian regression
This article deals with random projections applied as a data reduction
technique for Bayesian regression analysis. We show sufficient conditions under
which the entire -dimensional distribution is approximately preserved under
random projections by reducing the number of data points from to in the case . Under mild
assumptions, we prove that evaluating a Gaussian likelihood function based on
the projected data instead of the original data yields a
-approximation in terms of the Wasserstein
distance. Our main result shows that the posterior distribution of Bayesian
linear regression is approximated up to a small error depending on only an
-fraction of its defining parameters. This holds when using
arbitrary Gaussian priors or the degenerate case of uniform distributions over
for . Our empirical evaluations involve different
simulated settings of Bayesian linear regression. Our experiments underline
that the proposed method is able to recover the regression model up to small
error while considerably reducing the total running time
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
- …