264 research outputs found
Subsampling Mathematical Relaxations and Average-case Complexity
We initiate a study of when the value of mathematical relaxations such as
linear and semidefinite programs for constraint satisfaction problems (CSPs) is
approximately preserved when restricting the instance to a sub-instance induced
by a small random subsample of the variables. Let be a family of CSPs such
as 3SAT, Max-Cut, etc., and let be a relaxation for , in the sense
that for every instance , is an upper bound the maximum
fraction of satisfiable constraints of . Loosely speaking, we say that
subsampling holds for and if for every sufficiently dense instance and every , if we let be the instance obtained by
restricting to a sufficiently large constant number of variables, then
. We say that weak subsampling holds if the
above guarantee is replaced with whenever
. We show: 1. Subsampling holds for the BasicLP and BasicSDP
programs. BasicSDP is a variant of the relaxation considered by Raghavendra
(2008), who showed it gives an optimal approximation factor for every CSP under
the unique games conjecture. BasicLP is the linear programming analog of
BasicSDP. 2. For tighter versions of BasicSDP obtained by adding additional
constraints from the Lasserre hierarchy, weak subsampling holds for CSPs of
unique games type. 3. There are non-unique CSPs for which even weak subsampling
fails for the above tighter semidefinite programs. Also there are unique CSPs
for which subsampling fails for the Sherali-Adams linear programming hierarchy.
As a corollary of our weak subsampling for strong semidefinite programs, we
obtain a polynomial-time algorithm to certify that random geometric graphs (of
the type considered by Feige and Schechtman, 2002) of max-cut value
have a cut value at most .Comment: Includes several more general results that subsume the previous
version of the paper
Subsampling Algorithms for Semidefinite Programming
We derive a stochastic gradient algorithm for semidefinite optimization using
randomization techniques. The algorithm uses subsampling to reduce the
computational cost of each iteration and the subsampling ratio explicitly
controls granularity, i.e. the tradeoff between cost per iteration and total
number of iterations. Furthermore, the total computational cost is directly
proportional to the complexity (i.e. rank) of the solution. We study numerical
performance on some large-scale problems arising in statistical learning.Comment: Final version, to appear in Stochastic System
Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints
We study online prediction where regret of the algorithm is measured against
a benchmark defined via evolving constraints. This framework captures online
prediction on graphs, as well as other prediction problems with combinatorial
structure. A key aspect here is that finding the optimal benchmark predictor
(even in hindsight, given all the data) might be computationally hard due to
the combinatorial nature of the constraints. Despite this, we provide
polynomial-time \emph{prediction} algorithms that achieve low regret against
combinatorial benchmark sets. We do so by building improper learning algorithms
based on two ideas that work together. The first is to alleviate part of the
computational burden through random playout, and the second is to employ
Lasserre semidefinite hierarchies to approximate the resulting integer program.
Interestingly, for our prediction algorithms, we only need to compute the
values of the semidefinite programs and not the rounded solutions. However, the
integrality gap for Lasserre hierarchy \emph{does} enter the generic regret
bound in terms of Rademacher complexity of the benchmark set. This establishes
a trade-off between the computation time and the regret bound of the algorithm
Playing with Duality: An Overview of Recent Primal-Dual Approaches for Solving Large-Scale Optimization Problems
Optimization methods are at the core of many problems in signal/image
processing, computer vision, and machine learning. For a long time, it has been
recognized that looking at the dual of an optimization problem may drastically
simplify its solution. Deriving efficient strategies which jointly brings into
play the primal and the dual problems is however a more recent idea which has
generated many important new contributions in the last years. These novel
developments are grounded on recent advances in convex analysis, discrete
optimization, parallel processing, and non-smooth optimization with emphasis on
sparsity issues. In this paper, we aim at presenting the principles of
primal-dual approaches, while giving an overview of numerical methods which
have been proposed in different contexts. We show the benefits which can be
drawn from primal-dual algorithms both for solving large-scale convex
optimization problems and discrete ones, and we provide various application
examples to illustrate their usefulness
VerdictDB: Universalizing Approximate Query Processing
Despite 25 years of research in academia, approximate query processing (AQP)
has had little industrial adoption. One of the major causes of this slow
adoption is the reluctance of traditional vendors to make radical changes to
their legacy codebases, and the preoccupation of newer vendors (e.g.,
SQL-on-Hadoop products) with implementing standard features. Additionally, the
few AQP engines that are available are each tied to a specific platform and
require users to completely abandon their existing databases---an unrealistic
expectation given the infancy of the AQP technology. Therefore, we argue that a
universal solution is needed: a database-agnostic approximation engine that
will widen the reach of this emerging technology across various platforms.
Our proposal, called VerdictDB, uses a middleware architecture that requires
no changes to the backend database, and thus, can work with all off-the-shelf
engines. Operating at the driver-level, VerdictDB intercepts analytical queries
issued to the database and rewrites them into another query that, if executed
by any standard relational engine, will yield sufficient information for
computing an approximate answer. VerdictDB uses the returned result set to
compute an approximate answer and error estimates, which are then passed on to
the user or application. However, lack of access to the query execution layer
introduces significant challenges in terms of generality, correctness, and
efficiency. This paper shows how VerdictDB overcomes these challenges and
delivers up to 171 speedup (18.45 on average) for a variety of
existing engines, such as Impala, Spark SQL, and Amazon Redshift, while
incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache
License.Comment: Extended technical report of the paper that appeared in Proceedings
of the 2018 International Conference on Management of Data, pp. 1461-1476.
ACM, 201
- âŠ