20,949 research outputs found

    Geometric Approximation Algorithms in the Online and Data Stream Models

    Get PDF
    The online and data stream models of computation have recently attracted considerable research attention due to many real-world applications in various areas such as data mining, machine learning, distributed computing, and robotics. In both these models, input items arrive one at a time, and the algorithms must decide based on the partial data received so far, without any secure information about the data that will arrive in the future. In this thesis, we investigate efficient algorithms for a number of fundamental geometric optimization problems in the online and data stream models. The problems studied in this thesis can be divided into two major categories: geometric clustering and computing various extent measures of a set of points. In the online setting, we show that the basic unit clustering problem admits non-trivial algorithms even in the simplest one-dimensional case: we show that the naive upper bounds on the competitive ratio of algorithms for this problem can be beaten using randomization. In the data stream model, we propose a new streaming algorithm for maintaining "core-sets" of a set of points in fixed dimensions, and also, introduce a new simple framework for transforming a class of offline algorithms to their equivalents in the data stream model. These results together lead to improved streaming approximation algorithms for a wide variety of geometric optimization problems in fixed dimensions, including diameter, width, k-center, smallest enclosing ball, minimum-volume bounding box, minimum enclosing cylinder, minimum-width enclosing spherical shell/annulus, etc. In high-dimensional data streams, where the dimension is not a constant, we propose a simple streaming algorithm for the minimum enclosing ball (the 1-center) problem with an improved approximation factor

    Deterministic Sampling and Range Counting in Geometric Data Streams

    Get PDF
    We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

    Improved Algorithms for Time Decay Streams

    Get PDF
    In the time-decay model for data streams, elements of an underlying data set arrive sequentially with the recently arrived elements being more important. A common approach for handling large data sets is to maintain a coreset, a succinct summary of the processed data that allows approximate recovery of a predetermined query. We provide a general framework that takes any offline-coreset and gives a time-decay coreset for polynomial time decay functions. We also consider the exponential time decay model for k-median clustering, where we provide a constant factor approximation algorithm that utilizes the online facility location algorithm. Our algorithm stores O(k log(h Delta)+h) points where h is the half-life of the decay function and Delta is the aspect ratio of the dataset. Our techniques extend to k-means clustering and M-estimators as well

    Coresets-Methods and History: A Theoreticians Design Pattern for Approximation and Streaming Algorithms

    Get PDF
    We present a technical survey on the state of the art approaches in data reduction and the coreset framework. These include geometric decompositions, gradient methods, random sampling, sketching and random projections. We further outline their importance for the design of streaming algorithms and give a brief overview on lower bounding techniques
    • …
    corecore