3,382 research outputs found
Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six
We consider the problem of counting all k-vertex subgraphs in an input graph, for any constant k. This problem (denoted SUB-CNT_k) has been studied extensively in both theory and practice. In a classic result, Chiba and Nishizeki (SICOMP 85) gave linear time algorithms for clique and 4-cycle counting for bounded degeneracy graphs. This is a rich class of sparse graphs that contains, for example, all minor-free families and preferential attachment graphs. The techniques from this result have inspired a number of recent practical algorithms for SUB-CNT_k. Towards a better understanding of the limits of these techniques, we ask: for what values of k can SUB_CNT_k be solved in linear time?
We discover a chasm at k=6. Specifically, we prove that for k < 6, SUB_CNT_k can be solved in linear time. Assuming a standard conjecture in fine-grained complexity, we prove that for all k ? 6, SUB-CNT_k cannot be solved even in near-linear time
Streaming Verification of Graph Computations via Graph Structure
We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication
When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks
We introduce a framework for the modeling of sequential data capturing
pathways of varying lengths observed in a network. Such data are important,
e.g., when studying click streams in information networks, travel patterns in
transportation systems, information cascades in social networks, biological
pathways or time-stamped social interactions. While it is common to apply graph
analytics and network analysis to such data, recent works have shown that
temporal correlations can invalidate the results of such methods. This raises a
fundamental question: when is a network abstraction of sequential data
justified? Addressing this open question, we propose a framework which combines
Markov chains of multiple, higher orders into a multi-layer graphical model
that captures temporal correlations in pathways at multiple length scales
simultaneously. We develop a model selection technique to infer the optimal
number of layers of such a model and show that it outperforms previously used
Markov order detection techniques. An application to eight real-world data sets
on pathways and temporal networks shows that it allows to infer graphical
models which capture both topological and temporal characteristics of such
data. Our work highlights fallacies of network abstractions and provides a
principled answer to the open question when they are justified. Generalizing
network representations to multi-order graphical models, it opens perspectives
for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy
available on gitHu
Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
The degree distribution is one of the most fundamental graph properties of
interest for real-world graphs. It has been widely observed in numerous domains
that graphs typically have a tailed or scale-free degree distribution. While
the average degree is usually quite small, the variance is quite high and there
are vertices with degrees at all scales. We focus on the problem of
approximating the degree distribution of a large streaming graph, with small
storage. We design an algorithm headtail, whose main novelty is a new estimator
of infrequent degrees using truncated geometric random variables. We give a
mathematical analysis of headtail and show that it has excellent behavior in
practice. We can process streams will millions of edges with storage less than
1% and get extremely accurate approximations for all scales in the degree
distribution.
We also introduce a new notion of Relative Hausdorff distance between tailed
histograms. Existing notions of distances between distributions are not
suitable, since they ignore infrequent degrees in the tail. The Relative
Hausdorff distance measures deviations at all scales, and is a more suitable
distance for comparing degree distributions. By tracking this new measure, we
are able to give strong empirical evidence of the convergence of headtail
Linear-Time Superbubble Identification Algorithm for Genome Assembly
DNA sequencing is the process of determining the exact order of the
nucleotide bases of an individual's genome in order to catalogue sequence
variation and understand its biological implications. Whole-genome sequencing
techniques produce masses of data in the form of short sequences known as
reads. Assembling these reads into a whole genome constitutes a major
algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs
constructed from reads for this purpose. A critical step of these algorithms is
to detect typical motif structures in the graph caused by sequencing errors and
genome repeats, and filter them out; one such complex subgraph class is a
so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to
detect all superbubbles in a directed acyclic graph with n nodes and m
(directed) edges, improving the best-known O(m log m)-time algorithm by Sung et
al
Recommended from our members
Massive Graph Analysis in the Data Stream Model
Graphs have become an abstraction of choice in modeling highly-structured data. The need to compute graph-theoretic properties of datasets arises in many applications that involve entities and pairwise relations between them. However, in practice the datasets in question can be too large to be stored in main memory, distributed across many machines, or changing over time. Moreover, in an increasing number of applications the algorithm has to make real time decisions as the data arrives, which puts further limitations on the time and space that can realistically be used. These characteristics render classical algorithmic approaches obsolete and necessitate the development of new techniques. The streaming model of computation takes these challenges into account, providing a trade-off between the resources used by the algorithm and its accuracy. A graph stream is defined by a sequence of edge insertions (and sometimes deletions) into an initially empty graph. The objective is to compute a certain property of the graph at the end of the stream while minimizing the amount of space the algorithm uses. In this model, we explore fundamental graph-theoretic problems that also serve as important primitives in massive graph analysis. Our results can be divided into three main categories: Finding large matchings and related problems. We describe two optimal algorithms for finding large matchings in dynamic (insert-delete) graph streams---an approximation of an arbitrary maximum matching and an exact algorithm under the assumption that the matching is of certain size. We also show how the techniques developed in these algorithms can be used to solve a variety of related problems such as vertex cover and hitting set in hypergraphs. We then concentrate on estimating just the size of the matching and present a series of sublinear results for the class of low arboricity graphs. Counting the number of cycles. We fully resolve in which settings there exist algorithms approximating the number of fixed length cycles that do not store the entire graph. For cycles of length five or greater, we show that no such algorithms exist. For triangles and four-cycles, we describe several counting results and a few lower bounds for the insert-only model, considering such parameters as the number of passes taken over the stream and its ordering. Vertex ordering problems in directed graphs. We consider such fundamental problems as topologically sorting a directed acyclic graph (DAG), checking whether the input is in fact a DAG, and finding a minimum feedback arc set. It can be shown that when the input graph is arbitrary, these problems have high space complexity in the streaming model. Thus, we concentrate on designing algorithms for tournaments and a certain family of random graphs. Together, these results complement the much more mature body of work on algorithms for undirected graph streams
- …