16 research outputs found

    Testing Properties of Multiple Distributions with Few Samples

    Get PDF
    We propose a new setting for testing properties of distributions while receiving samples from several distributions, but few samples per distribution. Given samples from ss distributions, p1,p2,…,psp_1, p_2, \ldots, p_s, we design testers for the following problems: (1) Uniformity Testing: Testing whether all the pip_i's are uniform or Ο΅\epsilon-far from being uniform in β„“1\ell_1-distance (2) Identity Testing: Testing whether all the pip_i's are equal to an explicitly given distribution qq or Ο΅\epsilon-far from qq in β„“1\ell_1-distance, and (3) Closeness Testing: Testing whether all the pip_i's are equal to a distribution qq which we have sample access to, or Ο΅\epsilon-far from qq in β„“1\ell_1-distance. By assuming an additional natural condition about the source distributions, we provide sample optimal testers for all of these problems.Comment: ITCS 202

    A Concentration Inequality for the Facility Location Problem

    Full text link
    We give a concentration inequality for a stochastic version of the facility location problem on the plane. We show the objective Cn(X)=min⁑FβŠ†[0,1]2β€‰βˆ£F∣+βˆ‘x∈Xmin⁑f∈Fβˆ₯xβˆ’fβˆ₯ C_n(X) = \min_{F \subseteq [0,1]^2} \, |F| + \sum_{x\in X} \min_{f \in F} \| x-f\| is concentrated in an interval of length O(n1/6)O(n^{1/6}) and E[Cn]=Θ(n2/3)\mathbb{E}[C_n] = \Theta(n^{2/3}) if the input XX consists of nn i.i.d. uniform points in the unit square. Our main tool is to use a suitable geometric quantity, previously used in the design of approximation algorithms for the facility location problem, to analyze a martingale process.Comment: 6 pages, 1 figur

    Property Testing of LP-Type Problems

    Get PDF
    Given query access to a set of constraints S, we wish to quickly check if some objective function ? subject to these constraints is at most a given value k. We approach this problem using the framework of property testing where our goal is to distinguish the case ?(S) ? k from the case that at least an ? fraction of the constraints in S need to be removed for ?(S) ? k to hold. We restrict our attention to the case where (S,?) are LP-Type problems which is a rich family of combinatorial optimization problems with an inherent geometric structure. By utilizing a simple sampling procedure which has been used previously to study these problems, we are able to create property testers for any LP-Type problem whose query complexities are independent of the number of constraints. To the best of our knowledge, this is the first work that connects the area of LP-Type problems and property testing in a systematic way. Among our results are property testers for a variety of LP-Type problems that are new and also problems that have been studied previously such as a tight upper bound on the query complexity of testing clusterability with one cluster considered by Alon, Dar, Parnas, and Ron (FOCS 2000). We also supply a corresponding tight lower bound for this problem and other LP-Type problems using geometric constructions

    Smoothed Analysis of the Condition Number Under Low-Rank Perturbations

    Get PDF
    Let MM be an arbitrary nn by nn matrix of rank nβˆ’kn-k. We study the condition number of MM plus a \emph{low-rank} perturbation UVTUV^T where U,VU, V are nn by kk random Gaussian matrices. Under some necessary assumptions, it is shown that M+UVTM+UV^T is unlikely to have a large condition number. The main advantages of this kind of perturbation over the well-studied dense Gaussian perturbation, where every entry is independently perturbed, is the O(nk)O(nk) cost to store U,VU,V and the O(nk)O(nk) increase in time complexity for performing the matrix-vector multiplication (M+UVT)x(M+UV^T)x. This improves the Ξ©(n2)\Omega(n^2) space and time complexity increase required by a dense perturbation, which is especially burdensome if MM is originally sparse. Our results also extend to the case where UU and VV have rank larger than kk and to symmetric and complex settings. We also give an application to linear systems solving and perform some numerical experiments. Lastly, barriers in applying low-rank noise to other problems studied in the smoothed analysis framework are discussed

    Faster Linear Algebra for Distance Matrices

    Full text link
    The distance matrix of a dataset XX of nn points with respect to a distance function ff represents all pairwise distances between points in XX induced by ff. Due to their wide applicability, distance matrices and related families of matrices have been the focus of many recent algorithmic works. We continue this line of research and take a broad view of algorithm design for distance matrices with the goal of designing fast algorithms, which are specifically tailored for distance matrices, for fundamental linear algebraic primitives. Our results include efficient algorithms for computing matrix-vector products for a wide class of distance matrices, such as the β„“1\ell_1 metric for which we get a linear runtime, as well as an Ξ©(n2)\Omega(n^2) lower bound for any algorithm which computes a matrix-vector product for the β„“βˆž\ell_{\infty} case, showing a separation between the β„“1\ell_1 and the β„“βˆž\ell_{\infty} metrics. Our upper bound results, in conjunction with recent works on the matrix-vector query model, have many further downstream applications, including the fastest algorithm for computing a relative error low-rank approximation for the distance matrix induced by β„“1\ell_1 and β„“22\ell_2^2 functions and the fastest algorithm for computing an additive error low-rank approximation for the β„“2\ell_2 metric, in addition to applications for fast matrix multiplication among others. We also give algorithms for constructing distance matrices and show that one can construct an approximate β„“2\ell_2 distance matrix in time faster than the bound implied by the Johnson-Lindenstrauss lemma.Comment: Selected as Oral for NeurIPS 202

    Randomized Dimensionality Reduction for Facility Location and Single-Linkage Clustering

    Full text link
    Random dimensionality reduction is a versatile tool for speeding up algorithms for high-dimensional problems. We study its application to two clustering problems: the facility location problem, and the single-linkage hierarchical clustering problem, which is equivalent to computing the minimum spanning tree. We show that if we project the input pointset XX onto a random d=O(dX)d = O(d_X)-dimensional subspace (where dXd_X is the doubling dimension of XX), then the optimum facility location cost in the projected space approximates the original cost up to a constant factor. We show an analogous statement for minimum spanning tree, but with the dimension dd having an extra log⁑log⁑n\log \log n term and the approximation factor being arbitrarily close to 11. Furthermore, we extend these results to approximating solutions instead of just their costs. Lastly, we provide experimental results to validate the quality of solutions and the speedup due to the dimensionality reduction. Unlike several previous papers studying this approach in the context of kk-means and kk-medians, our dimension bound does not depend on the number of clusters but only on the intrinsic dimensionality of XX.Comment: 25 pages. Published as a conference paper in ICML 202
    corecore