710 research outputs found

    Self-improving Algorithms for Coordinate-wise Maxima

    Full text link
    Computing the coordinate-wise maxima of a planar point set is a classic and well-studied problem in computational geometry. We give an algorithm for this problem in the \emph{self-improving setting}. We have nn (unknown) independent distributions \cD_1, \cD_2, ..., \cD_n of planar points. An input pointset (p1,p2,...,pn)(p_1, p_2, ..., p_n) is generated by taking an independent sample pip_i from each \cD_i, so the input distribution \cD is the product \prod_i \cD_i. A self-improving algorithm repeatedly gets input sets from the distribution \cD (which is \emph{a priori} unknown) and tries to optimize its running time for \cD. Our algorithm uses the first few inputs to learn salient features of the distribution, and then becomes an optimal algorithm for distribution \cD. Let \OPT_\cD denote the expected depth of an \emph{optimal} linear comparison tree computing the maxima for distribution \cD. Our algorithm eventually has an expected running time of O(\text{OPT}_\cD + n), even though it did not know \cD to begin with. Our result requires new tools to understand linear comparison trees for computing maxima. We show how to convert general linear comparison trees to very restricted versions, which can then be related to the running time of our algorithm. An interesting feature of our algorithm is an interleaved search, where the algorithm tries to determine the likeliest point to be maximal with minimal computation. This allows the running time to be truly optimal for the distribution \cD.Comment: To appear in Symposium of Computational Geometry 2012 (17 pages, 2 figures

    Minimum Coresets for Maxima Representation of Multidimensional Data

    Get PDF
    Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set P of points in R^d , where d is a small constant, and an error parameter ε ∈ (0, 1), a subset Q ⊆ P is an ε-coreset for the maxima representation of P iff the maximum of Q is an ε-approximation of the maximum of P for any vector u ∈ R^d , where the maximum is taken over the inner products between the set of points (P or Q) and u. We define a novel minimum ε-coreset problem that asks for an ε-coreset of the smallest size for the maxima representation of a point set. For the two-dimensional case, we develop an optimal polynomial-time algorithm for the minimum ε-coreset problem by transforming it into the shortest-cycle problem in a directed graph. Then, we prove that this problem is NP-hard in three or higher dimensions and present polynomial-time approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms.Peer reviewe

    A simple and efficient preprocessing step for convex hull problem

    Full text link
    The present paper is concerned with a recursive algorithm as a preprocessing step to find the convex hull of nn random points uniformly distributed in the plane. For such a set of points, it is shown that eliminating all but O(logn)O(\log n) of points can derive the same convex hull as the input set. Finally it will be shown that the running time of the algorithm is $O(n

    Online Multivariate Changepoint Detection: Leveraging Links With Computational Geometry

    Full text link
    The increasing volume of data streams poses significant computational challenges for detecting changepoints online. Likelihood-based methods are effective, but their straightforward implementation becomes impractical online. We develop two online algorithms that exactly calculate the likelihood ratio test for a single changepoint in p-dimensional data streams by leveraging fascinating connections with computational geometry. Our first algorithm is straightforward and empirically quasi-linear. The second is more complex but provably quasi-linear: O(nlog(n)p+1)\mathcal{O}(n\log(n)^{p+1}) for nn data points. Through simulations, we illustrate, that they are fast and allow us to process millions of points within a matter of minutes up to p=5p=5.Comment: 31 pages,15 figure
    corecore