124 research outputs found
Coresets for Wasserstein Distributionally Robust Optimization Problems
Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular
model to enhance the robustness of machine learning with ambiguous data.
However, the complexity of \textsf{WDRO} can be prohibitive in practice since
solving its ``minimax'' formulation requires a great amount of computation.
Recently, several fast \textsf{WDRO} training algorithms for some specific
machine learning tasks (e.g., logistic regression) have been developed.
However, the research on designing efficient algorithms for general large-scale
\textsf{WDRO}s is still quite limited, to the best of our knowledge.
\textit{Coreset} is an important tool for compressing large dataset, and thus
it has been widely applied to reduce the computational complexities for many
optimization problems. In this paper, we introduce a unified framework to
construct the -coreset for the general \textsf{WDRO} problems. Though
it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the
uncertainty issue of ambiguous data, we show that we can compute a ``dual
coreset'' by using the strong duality property of \textsf{WDRO}. Also, the
error introduced by the dual coreset can be theoretically guaranteed for the
original \textsf{WDRO} objective. To construct the dual coreset, we propose a
novel grid sampling approach that is particularly suitable for the dual
formulation of \textsf{WDRO}. Finally, we implement our coreset approach and
illustrate its effectiveness for several \textsf{WDRO} problems in the
experiments
New Frameworks for Offline and Streaming Coreset Constructions
A coreset for a set of points is a small subset of weighted points that
approximately preserves important properties of the original set. Specifically,
if is a set of points, is a set of queries, and is a cost function, then a set with weights
is an -coreset for some parameter if
is a multiplicative approximation to
for all . Coresets are used to solve fundamental
problems in machine learning under various big data models of computation. Many
of the suggested coresets in the recent decade used, or could have used a
general framework for constructing coresets whose size depends quadratically on
what is known as total sensitivity .
In this paper we improve this bound from to . Thus our
results imply more space efficient solutions to a number of problems, including
projective clustering, -line clustering, and subspace approximation.
Moreover, we generalize the notion of sensitivity sampling for sup-sampling
that supports non-multiplicative approximations, negative cost functions and
more. The main technical result is a generic reduction to the sample complexity
of learning a class of functions with bounded VC dimension. We show that
obtaining an -sample for this class of functions with appropriate
parameters and suffices to achieve space efficient
-coresets.
Our result implies more efficient coreset constructions for a number of
interesting problems in machine learning; we show applications to
-median/-means, -line clustering, -subspace approximation, and the
integer -projective clustering problem
Determinantal Point Processes for Coresets
International audienceWhen one is faced with a dataset too large to be used all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, but among them " coresets " are especially appealing. A coreset is a (small) weighted sample of the original data that comes with a guarantee: that a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution, iid random sampling has turned to be one of the most successful methods to build coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to construct general coreset theorems. We apply our results to the k-means problem, and give empirical evidence of the superior performance of DPP samples over state of the art methods
Fast -Approximation Algorithms for Binary Matrix Factorization
We introduce efficient -approximation algorithms for the
binary matrix factorization (BMF) problem, where the inputs are a matrix
, a rank parameter , as well as an
accuracy parameter , and the goal is to approximate
as a product of low-rank factors and
. Equivalently, we want to find
and that minimize the Frobenius loss . Before this work, the state-of-the-art for this problem was
the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a
-approximation for some constant . We give the first
-approximation algorithm using running time singly exponential
in , where is typically a small integer. Our techniques generalize to
other common variants of the BMF problem, admitting bicriteria
-approximation algorithms for loss functions and the
setting where matrix operations are performed in . Our approach
can be implemented in standard big data models, such as the streaming or
distributed models.Comment: ICML 202
High-Dimensional Geometric Streaming in Polynomial Space
Many existing algorithms for streaming geometric data analysis have been
plagued by exponential dependencies in the space complexity, which are
undesirable for processing high-dimensional data sets. In particular, once
, there are no known non-trivial streaming algorithms for problems
such as maintaining convex hulls and L\"owner-John ellipsoids of points,
despite a long line of work in streaming computational geometry since [AHV04].
We simultaneously improve these results to bits of
space by trading off with a factor distortion. We
achieve these results in a unified manner, by designing the first streaming
algorithm for maintaining a coreset for subspace embeddings with
space and distortion. Our
algorithm also gives similar guarantees in the \emph{online coreset} model.
Along the way, we sharpen results for online numerical linear algebra by
replacing a log condition number dependence with a dependence,
answering a question of [BDM+20]. Our techniques provide a novel connection
between leverage scores, a fundamental object in numerical linear algebra, and
computational geometry.
For subspace embeddings, we give nearly optimal trade-offs between
space and distortion for one-pass streaming algorithms. For instance, we give a
deterministic coreset using space and
distortion for , whereas previous deterministic algorithms incurred a
factor in the space or the distortion [CDW18].
Our techniques have implications in the offline setting, where we give
optimal trade-offs between the space complexity and distortion of subspace
sketch data structures. To do this, we give an elementary proof of a "change of
density" theorem of [LT80] and make it algorithmic.Comment: Abstract shortened to meet arXiv limits; v2 fix statements concerning
online condition numbe
A Novel Sequential Coreset Method for Gradient Descent Algorithms
A wide range of optimization problems arising in machine learning can be
solved by gradient descent algorithms, and a central question in this area is
how to efficiently compress a large-scale dataset so as to reduce the
computational complexity. {\em Coreset} is a popular data compression technique
that has been extensively studied before. However, most of existing coreset
methods are problem-dependent and cannot be used as a general tool for a
broader range of applications. A key obstacle is that they often rely on the
pseudo-dimension and total sensitivity bound that can be very high or hard to
obtain. In this paper, based on the ''locality'' property of gradient descent
algorithms, we propose a new framework, termed ''sequential coreset'', which
effectively avoids these obstacles. Moreover, our method is particularly
suitable for sparse optimization whence the coreset size can be further reduced
to be only poly-logarithmically dependent on the dimension. In practice, the
experimental results suggest that our method can save a large amount of running
time compared with the baseline algorithms
- …