710 research outputs found
Self-improving Algorithms for Coordinate-wise Maxima
Computing the coordinate-wise maxima of a planar point set is a classic and
well-studied problem in computational geometry. We give an algorithm for this
problem in the \emph{self-improving setting}. We have (unknown) independent
distributions \cD_1, \cD_2, ..., \cD_n of planar points. An input pointset
is generated by taking an independent sample from
each \cD_i, so the input distribution \cD is the product \prod_i \cD_i. A
self-improving algorithm repeatedly gets input sets from the distribution \cD
(which is \emph{a priori} unknown) and tries to optimize its running time for
\cD. Our algorithm uses the first few inputs to learn salient features of the
distribution, and then becomes an optimal algorithm for distribution \cD. Let
\OPT_\cD denote the expected depth of an \emph{optimal} linear comparison
tree computing the maxima for distribution \cD. Our algorithm eventually has
an expected running time of O(\text{OPT}_\cD + n), even though it did not
know \cD to begin with.
Our result requires new tools to understand linear comparison trees for
computing maxima. We show how to convert general linear comparison trees to
very restricted versions, which can then be related to the running time of our
algorithm. An interesting feature of our algorithm is an interleaved search,
where the algorithm tries to determine the likeliest point to be maximal with
minimal computation. This allows the running time to be truly optimal for the
distribution \cD.Comment: To appear in Symposium of Computational Geometry 2012 (17 pages, 2
figures
Minimum Coresets for Maxima Representation of Multidimensional Data
Coresets are succinct summaries of large datasets such that, for a given problem, the solution obtained from a coreset is provably competitive with the solution obtained from the full dataset. As such, coreset-based data summarization techniques have been successfully applied to various problems, e.g., geometric optimization, clustering, and approximate query processing, for scaling them up to massive data. In this paper, we study coresets for the maxima representation of multidimensional data: Given a set P of points in R^d , where d is a small constant, and an error parameter ε ∈ (0, 1), a subset Q ⊆ P is an ε-coreset for the maxima representation of P iff the maximum of Q is an ε-approximation of the maximum of P for any vector u ∈ R^d , where the maximum is taken over the inner products between the set of points (P or Q) and u. We define a novel minimum ε-coreset problem that asks for an ε-coreset of the smallest size for the maxima representation of a point set. For the two-dimensional case, we develop an optimal polynomial-time algorithm for the minimum ε-coreset problem by transforming it into the shortest-cycle problem in a directed graph. Then, we prove that this problem is NP-hard in three or higher dimensions and present polynomial-time approximation algorithms in an arbitrary fixed dimension. Finally, we provide extensive experimental results on both real and synthetic datasets to demonstrate the superior performance of our proposed algorithms.Peer reviewe
A simple and efficient preprocessing step for convex hull problem
The present paper is concerned with a recursive algorithm as a preprocessing
step to find the convex hull of random points uniformly distributed in the
plane. For such a set of points, it is shown that eliminating all but of points can derive the same convex hull as the input set. Finally it will
be shown that the running time of the algorithm is $O(n
Online Multivariate Changepoint Detection: Leveraging Links With Computational Geometry
The increasing volume of data streams poses significant computational
challenges for detecting changepoints online. Likelihood-based methods are
effective, but their straightforward implementation becomes impractical online.
We develop two online algorithms that exactly calculate the likelihood ratio
test for a single changepoint in p-dimensional data streams by leveraging
fascinating connections with computational geometry. Our first algorithm is
straightforward and empirically quasi-linear. The second is more complex but
provably quasi-linear: for data points.
Through simulations, we illustrate, that they are fast and allow us to process
millions of points within a matter of minutes up to .Comment: 31 pages,15 figure
- …