202 research outputs found
Streaming Kernelization
Kernelization is a formalization of preprocessing for combinatorially hard
problems. We modify the standard definition for kernelization, which allows any
polynomial-time algorithm for the preprocessing, by requiring instead that the
preprocessing runs in a streaming setting and uses
bits of memory on instances . We obtain
several results in this new setting, depending on the number of passes over the
input that such a streaming kernelization is allowed to make. Edge Dominating
Set turns out as an interesting example because it has no single-pass
kernelization but two passes over the input suffice to match the bounds of the
best standard kernelization
Parameterized Streaming Algorithms for Vertex Cover
As graphs continue to grow in size, we seek ways to effectively process such
data at scale. The model of streaming graph processing, in which a compact
summary is maintained as each edge insertion/deletion is observed, is an
attractive one. However, few results are known for optimization problems over
such dynamic graph streams.
In this paper, we introduce a new approach to handling graph streams, by
instead seeking solutions for the parameterized versions of these problems
where we are given a parameter and the objective is to decide whether there
is a solution bounded by . By combining kernelization techniques with
randomized sketch structures, we obtain the first streaming algorithms for the
parameterized versions of the Vertex Cover problem. We consider the following
three models for a graph stream on nodes:
1. The insertion-only model where the edges can only be added.
2. The dynamic model where edges can be both inserted and deleted.
3. The \emph{promised} dynamic model where we are guaranteed that at each
timestamp there is a solution of size at most .
In each of these three models we are able to design parameterized streaming
algorithms for the Vertex Cover problem. We are also able to show matching
lower bound for the space complexity of our algorithms.
(Due to the arXiv limit of 1920 characters for abstract field, please see the
abstract in the paper for detailed description of our results)Comment: Fixed some typo
Parameterized Streaming Algorithms for Min-Ones d-SAT
In this work, we initiate the study of the Min-Ones d-SAT problem in the parameterized streaming model. An instance of the problem consists of a d-CNF formula F and an integer k, and the objective is to determine if F has a satisfying assignment which sets at most k variables to 1. In the parameterized streaming model, input is provided as a stream, just as in the usual streaming model. A key difference is that the bound on the read-write memory available to the algorithm is O(f(k) log n) (f: N -> N, a computable function) as opposed to the O(log n) bound of the usual streaming model. The other important difference is that the number of passes the algorithm makes over its input must be a (preferably small) function of k.
We design a (k + 1)-pass parameterized streaming algorithm that solves Min-Ones d-SAT (d >= 2) using space O((kd^(ck) + k^d)log n) (c > 0, a constant) and a (d + 1)^k-pass algorithm that uses space O(k log n). We also design a streaming kernelization for Min-Ones 2-SAT that makes (k + 2) passes and uses space O(k^6 log n) to produce a kernel with O(k^6) clauses.
To complement these positive results, we show that any k-pass algorithm for or Min-Ones d-SAT (d >= 2) requires space Omega(max{n^(1/k) / 2^k, log(n / k)}) on instances (F, k). This is achieved via a reduction from the streaming problem POT Pointer Chasing (Guha and McGregor [ICALP 2008]), which might be of independent interest. Given this, our (k + 1)-pass parameterized streaming algorithm is the best possible, inasmuch as the number of passes is concerned.
In contrast to the results of Fafianie and Kratsch [MFCS 2014] and Chitnis et al. [SODA 2015], who independently showed that there are 1-pass parameterized streaming algorithms for Vertex Cover (a restriction of Min-Ones 2-SAT), we show using lower bounds from Communication Complexity that for any d >= 1, a 1-pass streaming algorithm for Min-Ones d-SAT requires space Omega(n). This excludes the possibility of a 1-pass parameterized streaming algorithm for the problem. Additionally, we show that any p-pass algorithm for the problem requires space Omega(n/p)
Towards a Theory of Parameterized Streaming Algorithms
Parameterized complexity attempts to give a more fine-grained analysis of the complexity of problems: instead of measuring the running time as a function of only the input size, we analyze the running time with respect to additional parameters. This approach has proven to be highly successful in delineating our understanding of NP-hard problems. Given this success with the TIME resource, it seems but natural to use this approach for dealing with the SPACE resource. First attempts in this direction have considered a few individual problems, with some success: Fafianie and Kratsch [MFCS\u2714] and Chitnis et al. [SODA\u2715] introduced the notions of streaming kernels and parameterized streaming algorithms respectively. For example, the latter shows how to refine the Omega(n^2) bit lower bound for finding a minimum Vertex Cover (VC) in the streaming setting by designing an algorithm for the parameterized k-VC problem which uses O(k^{2}log n) bits.
In this paper, we initiate a systematic study of graph problems from the paradigm of parameterized streaming algorithms. We first define a natural hierarchy of space complexity classes of FPS, SubPS, SemiPS, SupPS and BrutePS, and then obtain tight classifications for several well-studied graph problems such as Longest Path, Feedback Vertex Set, Dominating Set, Girth, Treewidth, etc. into this hierarchy (see Figure 1 and Table 1). On the algorithmic side, our parameterized streaming algorithms use techniques from the FPT world such as bidimensionality, iterative compression and bounded-depth search trees. On the hardness side, we obtain lower bounds for the parameterized streaming complexity of various problems via novel reductions from problems in communication complexity. We also show a general (unconditional) lower bound for space complexity of parameterized streaming algorithms for a large class of problems inspired by the recently developed frameworks for showing (conditional) kernelization lower bounds.
Parameterized algorithms and streaming algorithms are approaches to cope with TIME and SPACE intractability respectively. It is our hope that this work on parameterized streaming algorithms leads to two-way flow of ideas between these two previously separated areas of theoretical computer science
FPT-space Graph Kernelizations
Let be the size of a parametrized problem and the parameter. We
present a full kernel for Path Contraction and Cluster Editing/Deletion as well
as a kernel for Feedback Vertex Set whose sizes are all polynomial in , that
are computable in polynomial time, and use bits. By
first executing the new kernelizations and subsequently the best known
polynomial-time kernelizations for the problem under consideration, we obtain
the best known kernels in polynomial time with bits
A Taxonomy of Big Data for Optimal Predictive Machine Learning and Data Mining
Big data comes in various ways, types, shapes, forms and sizes. Indeed,
almost all areas of science, technology, medicine, public health, economics,
business, linguistics and social science are bombarded by ever increasing flows
of data begging to analyzed efficiently and effectively. In this paper, we
propose a rough idea of a possible taxonomy of big data, along with some of the
most commonly used tools for handling each particular category of bigness. The
dimensionality p of the input space and the sample size n are usually the main
ingredients in the characterization of data bigness. The specific statistical
machine learning technique used to handle a particular big data set will depend
on which category it falls in within the bigness taxonomy. Large p small n data
sets for instance require a different set of tools from the large n small p
variety. Among other tools, we discuss Preprocessing, Standardization,
Imputation, Projection, Regularization, Penalization, Compression, Reduction,
Selection, Kernelization, Hybridization, Parallelization, Aggregation,
Randomization, Replication, Sequentialization. Indeed, it is important to
emphasize right away that the so-called no free lunch theorem applies here, in
the sense that there is no universally superior method that outperforms all
other methods on all categories of bigness. It is also important to stress the
fact that simplicity in the sense of Ockham's razor non plurality principle of
parsimony tends to reign supreme when it comes to massive data. We conclude
with a comparison of the predictive performance of some of the most commonly
used methods on a few data sets.Comment: 18 pages, 2 figures 3 table
Dynamic Parameterized Problems and Algorithms
Fixed-parameter algorithms and kernelization are two powerful methods to solve NP-hard problems. Yet, so far those algorithms have been largely restricted to static inputs. In this paper we provide fixed-parameter algorithms and kernelizations for fundamental NP-hard problems with dynamic inputs. We consider a variety of parameterized graph and hitting set problems which are known to have f(k)n^{1+o(1)} time algorithms on inputs of size n, and we consider the question of whether there is a data structure that supports small updates (such as edge/vertex/set/element insertions and deletions) with an update time of g(k)n^{o(1)}; such an update time would be essentially optimal. Update and query times independent of n are particularly desirable. Among many other results, we show that Feedback Vertex Set and k-Path admit dynamic algorithms with f(k)log O(1) n update and query times for some function f depending on the solution size k only.
We complement our positive results by several conditional and unconditional lower bounds. For example, we show that unlike their undirected counterparts, Directed Feedback Vertex Set and Directed k-Path do not admit dynamic algorithms with n^{o(1) } update and query times even for constant solution sizes k <= 3, assuming popular hardness hypotheses. We also show that unconditionally, in the cell probe model, Directed Feedback Vertex Set cannot be solved with update time that is purely a function of k
- …