7 research outputs found

    Graph Coloring via Degeneracy in Streaming and Other Space-Conscious Models

    Get PDF
    We study the problem of coloring a given graph using a small number of colors in several well-established models of computation for big data. These include the data streaming model, the general graph query model, the massively parallel computation (MPC) model, and the CONGESTED-CLIQUE and the LOCAL models of distributed computation. On the one hand, we give algorithms with sublinear complexity, for the appropriate notion of complexity in each of these models. Our algorithms color a graph GG using about κ(G)\kappa(G) colors, where κ(G)\kappa(G) is the degeneracy of GG: this parameter is closely related to the arboricity α(G)\alpha(G). As a function of κ(G)\kappa(G) alone, our results are close to best possible, since the optimal number of colors is κ(G)+1\kappa(G)+1. On the other hand, we establish certain lower bounds indicating that sublinear algorithms probably cannot go much further. In particular, we prove that any randomized coloring algorithm that uses κ(G)+1\kappa(G)+1 many colors, would require Ω(n2)\Omega(n^2) storage in the one pass streaming model, and Ω(n2)\Omega(n^2) many queries in the general graph query model, where nn is the number of vertices in the graph. These lower bounds hold even when the value of κ(G)\kappa(G) is known in advance; at the same time, our upper bounds do not require κ(G)\kappa(G) to be given in advance.Comment: 26 page

    Almost-Smooth Histograms and Sliding-Window Graph Algorithms

    Full text link
    We study algorithms for the sliding-window model, an important variant of the data-stream model, in which the goal is to compute some function of a fixed-length suffix of the stream. We extend the smooth-histogram framework of Braverman and Ostrovsky (FOCS 2007) to almost-smooth functions, which includes all subadditive functions. Specifically, we show that if a subadditive function can be (1+ϵ)(1+\epsilon)-approximated in the insertion-only streaming model, then it can be (2+ϵ)(2+\epsilon)-approximated also in the sliding-window model with space complexity larger by factor O(ϵ1logw)O(\epsilon^{-1}\log w), where ww is the window size. We demonstrate how our framework yields new approximation algorithms with relatively little effort for a variety of problems that do not admit the smooth-histogram technique. For example, in the frequency-vector model, a symmetric norm is subadditive and thus we obtain a sliding-window (2+ϵ)(2+\epsilon)-approximation algorithm for it. Another example is for streaming matrices, where we derive a new sliding-window (2+ϵ)(\sqrt{2}+\epsilon)-approximation algorithm for Schatten 44-norm. We then consider graph streams and show that many graph problems are subadditive, including maximum submodular matching, minimum vertex-cover, and maximum kk-cover, thereby deriving sliding-window O(1)O(1)-approximation algorithms for them almost for free (using known insertion-only algorithms). Finally, we design for every d(1,2]d\in (1,2] an artificial function, based on the maximum-matching size, whose almost-smoothness parameter is exactly dd

    New Algorithms for Large Datasets and Distributions

    Get PDF
    In this dissertation, we make progress on certain algorithmic problems broadly over two computational models: the streaming model for large datasets and the distribution testing model for large probability distributions. First we consider the streaming model, where a large sequence of data items arrives one by one. The computer needs to make one pass over this sequence, processing every item quickly, in a limited space. In Chapter 2 motivated by a bioinformatics application, we consider the problem of estimating the number of low-frequency items in a stream, which has received only a limited theoretical work so far. We give an efficient streaming algorithm for this problem and show its complexity is almost optimal. In Chapter 3 we consider a distributed variation of the streaming model, where each item of the data sequence arrives arbitrarily to one among a set of computers, who together need to compute certain functions over the entire stream. In such scenarios combining the data at a computer is infeasible due to large communication overhead. We give the first algorithm for k-median clustering in this model. Moreover, we give new algorithms for frequency moments and clustering functions in the distributed sliding window model, where the computation is limited to the most recent W items, as the items arrive in the stream. In Chapter 5, in our identity testing problem, given two distributions P (unknown, only samples are obtained) and Q (known) over a common sample space of exponential size, we need to distinguish P = Q (output ‘yes’) versus P is far from Q (output ‘no’). This problem requires an exponential number of samples. To circumvent this lower bound, this problem was recently studied with certain structural assumptions. In particular, optimally efficient testers were given assuming P and Q are product distributions. For such product distributions, we give the first tolerant testers, which not only output yes when P = Q but also when P is close to Q, in Chapter 5. Likewise, we study the tolerant closeness testing problem for such product distributions, where Q too is accessed only by samples. Adviser: Vinodchandran N. Variya

    New Algorithms for Large Datasets and Distributions

    Get PDF
    In this dissertation, we make progress on certain algorithmic problems broadly over two computational models: the streaming model for large datasets and the distribution testing model for large probability distributions. First we consider the streaming model, where a large sequence of data items arrives one by one. The computer needs to make one pass over this sequence, processing every item quickly, in a limited space. In Chapter 2 motivated by a bioinformatics application, we consider the problem of estimating the number of low-frequency items in a stream, which has received only a limited theoretical work so far. We give an efficient streaming algorithm for this problem and show its complexity is almost optimal. In Chapter 3 we consider a distributed variation of the streaming model, where each item of the data sequence arrives arbitrarily to one among a set of computers, who together need to compute certain functions over the entire stream. In such scenarios combining the data at a computer is infeasible due to large communication overhead. We give the first algorithm for k-median clustering in this model. Moreover, we give new algorithms for frequency moments and clustering functions in the distributed sliding window model, where the computation is limited to the most recent W items, as the items arrive in the stream. In Chapter 5, in our identity testing problem, given two distributions P (unknown, only samples are obtained) and Q (known) over a common sample space of exponential size, we need to distinguish P = Q (output ‘yes’) versus P is far from Q (output ‘no’). This problem requires an exponential number of samples. To circumvent this lower bound, this problem was recently studied with certain structural assumptions. In particular, optimally efficient testers were given assuming P and Q are product distributions. For such product distributions, we give the first tolerant testers, which not only output yes when P = Q but also when P is close to Q, in Chapter 5. Likewise, we study the tolerant closeness testing problem for such product distributions, where Q too is accessed only by samples. Adviser: Vinodchandran N. Variya

    Сучасні напрями розвитку інформаційно-комунікаційних технологій та засобів управління. Том 2

    Get PDF
    У збірнику подано тези доповідей десятої міжнародної науково-технічної конференції "Сучасні напрями розвитку інформаційно-комунікаційних технологій та засобів управління”. Розглянуті питання за такими напрямами: теоретичні та прикладні аспекти систем прийняття рішень, оптимізації та управління системами і процесами; комп’ютерні методи і засоби інформаційно-комунікаційних технологій та управління; методи швидкої та достовірної обробки даних в комп’ютерних системах та мережах; інформаційні технології у цивільній безпеці; сучасні інформаційно-вимірювальні системи; інформаційні технології у машинобудуванні

    Сучасні напрями розвитку інформаційно-комунікаційних технологій та засобів управління. Том 2

    Get PDF
    У збірнику подано тези доповідей десятої міжнародної науково-технічної конференції "Сучасні напрями розвитку інформаційно-комунікаційних технологій та засобів управління”. Розглянуті питання за такими напрямами: теоретичні та прикладні аспекти систем прийняття рішень, оптимізації та управління системами і процесами; комп’ютерні методи і засоби інформаційно-комунікаційних технологій та управління; методи швидкої та достовірної обробки даних в комп’ютерних системах та мережах; інформаційні технології у цивільній безпеці; сучасні інформаційно-вимірювальні системи; інформаційні технології у машинобудуванні

    Models of computation for big data

    No full text
    corecore