561 research outputs found
Input Sparsity and Hardness for Robust Subspace Approximation
In the subspace approximation problem, we seek a k-dimensional subspace F of
R^d that minimizes the sum of p-th powers of Euclidean distances to a given set
of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing
sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss
function M(), for example, M-Estimators, which include the Huber and Tukey loss
functions. Such subspaces provide alternatives to the singular value
decomposition (SVD), which is the p=2 case, finding such an F that minimizes
the sum of squares of distances. For p in [1,2), and for typical M-Estimators,
the minimizing gives a solution that is more robust to outliers than that
provided by the SVD. We give several algorithmic and hardness results for these
robust subspace approximation problems.
We think of the n points as forming an n x d matrix A, and letting nnz(A)
denote the number of non-zero entries of A. Our results hold for p in [1,2). We
use poly(n) to denote n^{O(1)} as n -> infty. We obtain: (1) For minimizing
sum_i dist(a_i,F)^p, we give an algorithm running in O(nnz(A) +
(n+d)poly(k/eps) + exp(poly(k/eps))), (2) we show that the problem of
minimizing sum_i dist(a_i, F)^p is NP-hard, even to output a
(1+1/poly(d))-approximation, answering a question of Kannan and Vempala, and
complementing prior results which held for p >2, (3) For loss functions for a
wide class of M-Estimators, we give a problem-size reduction: for a parameter
K=(log n)^{O(log k)}, our reduction takes O(nnz(A) log n + (n+d) poly(K/eps))
time to reduce the problem to a constrained version involving matrices whose
dimensions are poly(K eps^{-1} log n). We also give bicriteria solutions, (4)
Our techniques lead to the first O(nnz(A) + poly(d/eps)) time algorithms for
(1+eps)-approximate regression for a wide class of convex M-Estimators.Comment: paper appeared in FOCS, 201
Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing Inequality
We study the tradeoff between the statistical error and communication cost of
distributed statistical estimation problems in high dimensions. In the
distributed sparse Gaussian mean estimation problem, each of the machines
receives data points from a -dimensional Gaussian distribution with
unknown mean which is promised to be -sparse. The machines
communicate by message passing and aim to estimate the mean . We
provide a tight (up to logarithmic factors) tradeoff between the estimation
error and the number of bits communicated between the machines. This directly
leads to a lower bound for the distributed \textit{sparse linear regression}
problem: to achieve the statistical minimax error, the total communication is
at least , where is the number of observations that
each machine receives and is the ambient dimension. These lower results
improve upon [Sha14,SD'14] by allowing multi-round iterative communication
model. We also give the first optimal simultaneous protocol in the dense case
for mean estimation.
As our main technique, we prove a \textit{distributed data processing
inequality}, as a generalization of usual data processing inequalities, which
might be of independent interest and useful for other problems.Comment: To appear at STOC 2016. Fixed typos in theorem 4.5 and incorporated
reviewers' suggestion
Las Vegas Academy Jazz Band III: A Celebration of Black History Month
Program listing performers and works performed
Computing confidence intervals on solution costs for stochastic grid generation expansion problems.
A range of core operations and planning problems for the national electrical grid are naturally formulated and solved as stochastic programming problems, which minimize expected costs subject to a range of uncertain outcomes relating to, for example, uncertain demands or generator output. A critical decision issue relating to such stochastic programs is: How many scenarios are required to ensure a specific error bound on the solution cost? Scenarios are the key mechanism used to sample from the uncertainty space, and the number of scenarios drives computational difficultly. We explore this question in the context of a long-term grid generation expansion problem, using a bounding procedure introduced by Mak, Morton, and Wood. We discuss experimental results using problem formulations independently minimizing expected cost and down-side risk. Our results indicate that we can use a surprisingly small number of scenarios to yield tight error bounds in the case of expected cost minimization, which has key practical implications. In contrast, error bounds in the case of risk minimization are significantly larger, suggesting more research is required in this area in order to achieve rigorous solutions for decision makers
The stochastic vehicle routing problem : a literature review, part II : solution methods
Building on the work of Gendreau et al. (Oper Res 44(3):469–477, 1996), and complementing the first part of this survey, we review the solution methods used for the past 20 years in the scientific literature on stochastic vehicle routing problems (SVRP). We describe the methods and indicate how they are used when dealing with stochastic vehicle routing problems. Keywords: vehicle routing (VRP), stochastic programmingm, SVRPpublishedVersio
The stochastic vehicle routing problem : a literature review, part I : models
Building on the work of Gendreau et al. (Eur J Oper Res 88(1):3–12; 1996), we review the past 20 years of scientific literature on stochastic vehicle routing problems. The numerous variants of the problem that have been studied in the literature are described and categorized. Keywords: vehicle routing (VRP), stochastic programming, SVRPpublishedVersio
On Deterministic Sketching and Streaming for Sparse Recovery and Norm Estimation
We study classic streaming and sparse recovery problems using deterministic linear sketches, including and sparse recovery problems (the latter also being known as ℓ1ℓ1-heavy hitters), norm estimation, and approximate inner product. We focus on devising a fixed matrix and a deterministic recovery/estimation procedure which work for all possible input vectors simultaneously. Our results improve upon existing work, the following being our main contributions:
• A proof that sparse recovery and inner product estimation are equivalent, and that incoherent matrices can be used to solve both problems. Our upper bound for the number of measurements is . We can also obtain fast sketching and recovery algorithms by making use of the Fast Johnson–Lindenstrauss transform. Both our running times and number of measurements improve upon previous work. We can also obtain better error guarantees than previous work in terms of a smaller tail of the input vector.
• A new lower bound for the number of linear measurements required to solve sparse recovery. We show measurements are required to recover an x′ with , where is x projected onto all but its largest k coordinates in magnitude.
• A tight bound of on the number of measurements required to solve deterministic norm estimation, i.e., to recover .
For all the problems we study, tight bounds are already known for the randomized complexity from previous work, except in the case of sparse recovery, where a nearly tight bound is known. Our work thus aims to study the deterministic complexities of these problems. We remark that some of the matrices used in our algorithms, although known to exist, currently are not yet explicit in the sense that deterministic polynomial time constructions are not yet known, although in all cases polynomial time Monte Carlo algorithms are known.Engineering and Applied Science
- …