124 research outputs found

    Coresets for Wasserstein Distributionally Robust Optimization Problems

    Full text link
    Wasserstein distributionally robust optimization (\textsf{WDRO}) is a popular model to enhance the robustness of machine learning with ambiguous data. However, the complexity of \textsf{WDRO} can be prohibitive in practice since solving its ``minimax'' formulation requires a great amount of computation. Recently, several fast \textsf{WDRO} training algorithms for some specific machine learning tasks (e.g., logistic regression) have been developed. However, the research on designing efficient algorithms for general large-scale \textsf{WDRO}s is still quite limited, to the best of our knowledge. \textit{Coreset} is an important tool for compressing large dataset, and thus it has been widely applied to reduce the computational complexities for many optimization problems. In this paper, we introduce a unified framework to construct the ϵ\epsilon-coreset for the general \textsf{WDRO} problems. Though it is challenging to obtain a conventional coreset for \textsf{WDRO} due to the uncertainty issue of ambiguous data, we show that we can compute a ``dual coreset'' by using the strong duality property of \textsf{WDRO}. Also, the error introduced by the dual coreset can be theoretically guaranteed for the original \textsf{WDRO} objective. To construct the dual coreset, we propose a novel grid sampling approach that is particularly suitable for the dual formulation of \textsf{WDRO}. Finally, we implement our coreset approach and illustrate its effectiveness for several \textsf{WDRO} problems in the experiments

    New Frameworks for Offline and Streaming Coreset Constructions

    Full text link
    A coreset for a set of points is a small subset of weighted points that approximately preserves important properties of the original set. Specifically, if PP is a set of points, QQ is a set of queries, and f:P×QRf:P\times Q\to\mathbb{R} is a cost function, then a set SPS\subseteq P with weights w:P[0,)w:P\to[0,\infty) is an ϵ\epsilon-coreset for some parameter ϵ>0\epsilon>0 if sSw(s)f(s,q)\sum_{s\in S}w(s)f(s,q) is a (1+ϵ)(1+\epsilon) multiplicative approximation to pPf(p,q)\sum_{p\in P}f(p,q) for all qQq\in Q. Coresets are used to solve fundamental problems in machine learning under various big data models of computation. Many of the suggested coresets in the recent decade used, or could have used a general framework for constructing coresets whose size depends quadratically on what is known as total sensitivity tt. In this paper we improve this bound from O(t2)O(t^2) to O(tlogt)O(t\log t). Thus our results imply more space efficient solutions to a number of problems, including projective clustering, kk-line clustering, and subspace approximation. Moreover, we generalize the notion of sensitivity sampling for sup-sampling that supports non-multiplicative approximations, negative cost functions and more. The main technical result is a generic reduction to the sample complexity of learning a class of functions with bounded VC dimension. We show that obtaining an (ν,α)(\nu,\alpha)-sample for this class of functions with appropriate parameters ν\nu and α\alpha suffices to achieve space efficient ϵ\epsilon-coresets. Our result implies more efficient coreset constructions for a number of interesting problems in machine learning; we show applications to kk-median/kk-means, kk-line clustering, jj-subspace approximation, and the integer (j,k)(j,k)-projective clustering problem

    Determinantal Point Processes for Coresets

    Get PDF
    International audienceWhen one is faced with a dataset too large to be used all at once, an obvious solution is to retain only part of it. In practice this takes a wide variety of different forms, but among them " coresets " are especially appealing. A coreset is a (small) weighted sample of the original data that comes with a guarantee: that a cost function can be evaluated on the smaller set instead of the larger one, with low relative error. For some classes of problems, and via a careful choice of sampling distribution, iid random sampling has turned to be one of the most successful methods to build coresets efficiently. However, independent samples are sometimes overly redundant, and one could hope that enforcing diversity would lead to better performance. The difficulty lies in proving coreset properties in non-iid samples. We show that the coreset property holds for samples formed with determinantal point processes (DPP). DPPs are interesting because they are a rare example of repulsive point processes with tractable theoretical properties, enabling us to construct general coreset theorems. We apply our results to the k-means problem, and give empirical evidence of the superior performance of DPP samples over state of the art methods

    Fast (1+ε)(1+\varepsilon)-Approximation Algorithms for Binary Matrix Factorization

    Full text link
    We introduce efficient (1+ε)(1+\varepsilon)-approximation algorithms for the binary matrix factorization (BMF) problem, where the inputs are a matrix A{0,1}n×d\mathbf{A}\in\{0,1\}^{n\times d}, a rank parameter k>0k>0, as well as an accuracy parameter ε>0\varepsilon>0, and the goal is to approximate A\mathbf{A} as a product of low-rank factors U{0,1}n×k\mathbf{U}\in\{0,1\}^{n\times k} and V{0,1}k×d\mathbf{V}\in\{0,1\}^{k\times d}. Equivalently, we want to find U\mathbf{U} and V\mathbf{V} that minimize the Frobenius loss UVAF2\|\mathbf{U}\mathbf{V} - \mathbf{A}\|_F^2. Before this work, the state-of-the-art for this problem was the approximation algorithm of Kumar et. al. [ICML 2019], which achieves a CC-approximation for some constant C576C\ge 576. We give the first (1+ε)(1+\varepsilon)-approximation algorithm using running time singly exponential in kk, where kk is typically a small integer. Our techniques generalize to other common variants of the BMF problem, admitting bicriteria (1+ε)(1+\varepsilon)-approximation algorithms for LpL_p loss functions and the setting where matrix operations are performed in F2\mathbb{F}_2. Our approach can be implemented in standard big data models, such as the streaming or distributed models.Comment: ICML 202

    High-Dimensional Geometric Streaming in Polynomial Space

    Full text link
    Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once dlognd\geq\log n, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of nn points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to poly(d,logn)\mathrm{poly}(d,\log n) bits of space by trading off with a poly(d,logn)\mathrm{poly}(d,\log n) factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for \ell_\infty subspace embeddings with poly(d,logn)\mathrm{poly}(d,\log n) space and poly(d,logn)\mathrm{poly}(d,\log n) distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a logn\log n dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For p\ell_p subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using O(d2logn)O(d^2\log n) space and O((dlogn)1/21/p)O((d\log n)^{1/2-1/p}) distortion for p>2p>2, whereas previous deterministic algorithms incurred a poly(n)\mathrm{poly}(n) factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.Comment: Abstract shortened to meet arXiv limits; v2 fix statements concerning online condition numbe

    A Novel Sequential Coreset Method for Gradient Descent Algorithms

    Full text link
    A wide range of optimization problems arising in machine learning can be solved by gradient descent algorithms, and a central question in this area is how to efficiently compress a large-scale dataset so as to reduce the computational complexity. {\em Coreset} is a popular data compression technique that has been extensively studied before. However, most of existing coreset methods are problem-dependent and cannot be used as a general tool for a broader range of applications. A key obstacle is that they often rely on the pseudo-dimension and total sensitivity bound that can be very high or hard to obtain. In this paper, based on the ''locality'' property of gradient descent algorithms, we propose a new framework, termed ''sequential coreset'', which effectively avoids these obstacles. Moreover, our method is particularly suitable for sparse optimization whence the coreset size can be further reduced to be only poly-logarithmically dependent on the dimension. In practice, the experimental results suggest that our method can save a large amount of running time compared with the baseline algorithms
    corecore