Search CORE

3,382 research outputs found

Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six

Author: Bera Suman K.
Pashanasangi Noujan
Seshadhri C.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 11th Innovations in Theoretical Computer Science Conference (ITCS 2020)
Publication date: 27/11/2019
Field of study

We consider the problem of counting all k-vertex subgraphs in an input graph, for any constant k. This problem (denoted SUB-CNT_k) has been studied extensively in both theory and practice. In a classic result, Chiba and Nishizeki (SICOMP 85) gave linear time algorithms for clique and 4-cycle counting for bounded degeneracy graphs. This is a rich class of sparse graphs that contains, for example, all minor-free families and preferential attachment graphs. The techniques from this result have inspired a number of recent practical algorithms for SUB-CNT_k. Towards a better understanding of the limits of these techniques, we ask: for what values of k can SUB_CNT_k be solved in linear time? We discover a chasm at k=6. Specifically, we prove that for k < 6, SUB_CNT_k can be solved in linear time. Assuming a standard conjecture in fine-grained complexity, we prove that for all k ? 6, SUB-CNT_k cannot be solved even in near-linear time

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Streaming Verification of Graph Computations via Graph Structure

Author: Chakrabarti Amit
Ghosh Prantar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

We give new algorithms in the annotated data streaming setting - also known as verifiable data stream computation - for certain graph problems. This setting is meant to model outsourced computation, where a space-bounded verifier limited to sequential data access seeks to overcome its computational limitations by engaging a powerful prover, without needing to trust the prover. As is well established, several problems that admit no sublinear-space algorithms under traditional streaming do allow protocols using a sublinear amount of prover/verifier communication and sublinear-space verification. We give algorithms for many well-studied graph problems including triangle counting, its generalization to subgraph counting, maximum matching, problems about the existence (or not) of short paths, finding the shortest path between two vertices, and testing for an independent set. While some of these problems have been studied before, our results achieve new tradeoffs between space and communication costs that were hitherto unknown. In particular, two of our results disprove explicit conjectures of Thaler (ICALP, 2016) by giving triangle counting and maximum matching algorithms for n-vertex graphs, using o(n) space and o(n^2) communication

Dagstuhl Research Online Publication Server

When is a Network a Network? Multi-Order Graphical Model Selection in Pathways and Temporal Networks

Author: Costa Alceu Ferraz
de Bruijn N. G.
Zweig Katharina A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/03/2017
Field of study

We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in information networks, travel patterns in transportation systems, information cascades in social networks, biological pathways or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: when is a network abstraction of sequential data justified? Addressing this open question, we propose a framework which combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms previously used Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models which capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.Comment: 10 pages, 4 figures, 1 table, companion python package pathpy available on gitHu

arXiv.org e-Print Archive

Crossref

Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution

Author: McGregor Andrew
Seshadhri C.
Simpson Olivia
Publication venue
Publication date: 25/11/2015
Field of study

The degree distribution is one of the most fundamental graph properties of interest for real-world graphs. It has been widely observed in numerous domains that graphs typically have a tailed or scale-free degree distribution. While the average degree is usually quite small, the variance is quite high and there are vertices with degrees at all scales. We focus on the problem of approximating the degree distribution of a large streaming graph, with small storage. We design an algorithm headtail, whose main novelty is a new estimator of infrequent degrees using truncated geometric random variables. We give a mathematical analysis of headtail and show that it has excellent behavior in practice. We can process streams will millions of edges with storage less than 1% and get extremely accurate approximations for all scales in the degree distribution. We also introduce a new notion of Relative Hausdorff distance between tailed histograms. Existing notions of distances between distributions are not suitable, since they ignore infrequent degrees in the tail. The Relative Hausdorff distance measures deviations at all scales, and is a more suitable distance for comparing degree distributions. By tracking this new measure, we are able to give strong empirical evidence of the convergence of headtail

arXiv.org e-Print Archive

Crossref

Linear-Time Superbubble Identification Algorithm for Genome Assembly

Author: Brankovic Ljiljana
Iliopoulos Costas S.
Kundu Ritu
Mohamed Manal
Pissis Solon P.
Vayani Fatima
Publication venue
Publication date: 17/09/2015
Field of study

DNA sequencing is the process of determining the exact order of the nucleotide bases of an individual's genome in order to catalogue sequence variation and understand its biological implications. Whole-genome sequencing techniques produce masses of data in the form of short sequences known as reads. Assembling these reads into a whole genome constitutes a major algorithmic challenge. Most assembly algorithms utilize de Bruijn graphs constructed from reads for this purpose. A critical step of these algorithms is to detect typical motif structures in the graph caused by sequencing errors and genome repeats, and filter them out; one such complex subgraph class is a so-called superbubble. In this paper, we propose an O(n+m)-time algorithm to detect all superbubbles in a directed acyclic graph with n nodes and m (directed) edges, improving the best-known O(m log m)-time algorithm by Sung et al

arXiv.org e-Print Archive

University of Newcastle's Digital Repository

Elsevier - Publisher Connector

King's Research Portal

Recommended from our members

Massive Graph Analysis in the Data Stream Model

Author: Vorotnikova Sofya
Publication venue: ScholarWorks@UMass Amherst
Publication date: 02/07/2019
Field of study

Graphs have become an abstraction of choice in modeling highly-structured data. The need to compute graph-theoretic properties of datasets arises in many applications that involve entities and pairwise relations between them. However, in practice the datasets in question can be too large to be stored in main memory, distributed across many machines, or changing over time. Moreover, in an increasing number of applications the algorithm has to make real time decisions as the data arrives, which puts further limitations on the time and space that can realistically be used. These characteristics render classical algorithmic approaches obsolete and necessitate the development of new techniques. The streaming model of computation takes these challenges into account, providing a trade-off between the resources used by the algorithm and its accuracy. A graph stream is defined by a sequence of edge insertions (and sometimes deletions) into an initially empty graph. The objective is to compute a certain property of the graph at the end of the stream while minimizing the amount of space the algorithm uses. In this model, we explore fundamental graph-theoretic problems that also serve as important primitives in massive graph analysis. Our results can be divided into three main categories: Finding large matchings and related problems. We describe two optimal algorithms for finding large matchings in dynamic (insert-delete) graph streams---an approximation of an arbitrary maximum matching and an exact algorithm under the assumption that the matching is of certain size. We also show how the techniques developed in these algorithms can be used to solve a variety of related problems such as vertex cover and hitting set in hypergraphs. We then concentrate on estimating just the size of the matching and present a series of sublinear results for the class of low arboricity graphs. Counting the number of cycles. We fully resolve in which settings there exist algorithms approximating the number of fixed length cycles that do not store the entire graph. For cycles of length five or greater, we show that no such algorithms exist. For triangles and four-cycles, we describe several counting results and a few lower bounds for the insert-only model, considering such parameters as the number of passes taken over the stream and its ordering. Vertex ordering problems in directed graphs. We consider such fundamental problems as topologically sorting a directed acyclic graph (DAG), checking whether the input is in fact a DAG, and finding a minimum feedback arc set. It can be shown that when the input graph is arbitrary, these problems have high space complexity in the streaming model. Thus, we concentrate on designing algorithms for tournaments and a certain family of random graphs. Together, these results complement the much more mature body of work on algorithms for undirected graph streams

ScholarWorks@UMass Amherst