343 research outputs found
Counting Hypergraphs in Data Streams
We present the first streaming algorithm for counting an arbitrary hypergraph
of constant size in a massive hypergraph . Our algorithm can handle both
edge-insertions and edge-deletions, and is applicable for the distributed
setting. Moreover, our approach provides the first family of graph polynomials
for the hypergraph counting problem. Because of the close relationship between
hypergraphs and set systems, our approach may have applications in studying
similar problems
Counting Hypergraphs in Data Streams
We present the first streaming algorithm for counting an arbitrary hypergraph of constant size in a massive hypergraph . Our algorithm can handle both edge-insertions and edge-deletions, and is applicable for the distributed setting. Moreover, our approach provides the first family of graph polynomials for the hypergraph counting problem. Because of the close relationship between hypergraphs and set systems, our approach may have applications in studying similar problems
The Sketching Complexity of Graph and Hypergraph Counting
Subgraph counting is a fundamental primitive in graph processing, with
applications in social network analysis (e.g., estimating the clustering
coefficient of a graph), database processing and other areas. The space
complexity of subgraph counting has been studied extensively in the literature,
but many natural settings are still not well understood. In this paper we
revisit the subgraph (and hypergraph) counting problem in the sketching model,
where the algorithm's state as it processes a stream of updates to the graph is
a linear function of the stream. This model has recently received a lot of
attention in the literature, and has become a standard model for solving
dynamic graph streaming problems.
In this paper we give a tight bound on the sketching complexity of counting
the number of occurrences of a small subgraph in a bounded degree graph
presented as a stream of edge updates. Specifically, we show that the space
complexity of the problem is governed by the fractional vertex cover number of
the graph . Our subgraph counting algorithm implements a natural vertex
sampling approach, with sampling probabilities governed by the vertex cover of
. Our main technical contribution lies in a new set of Fourier analytic
tools that we develop to analyze multiplayer communication protocols in the
simultaneous communication model, allowing us to prove a tight lower bound. We
believe that our techniques are likely to find applications in other settings.
Besides giving tight bounds for all graphs , both our algorithm and lower
bounds extend to the hypergraph setting, albeit with some loss in space
complexity
Sketching Cuts in Graphs and Hypergraphs
Sketching and streaming algorithms are in the forefront of current research
directions for cut problems in graphs. In the streaming model, we show that
-approximation for Max-Cut must use space;
moreover, beating -approximation requires polynomial space. For the
sketching model, we show that -uniform hypergraphs admit a
-cut-sparsifier (i.e., a weighted subhypergraph that
approximately preserves all the cuts) with
edges. We also make first steps towards sketching general CSPs (Constraint
Satisfaction Problems)
Counting and Sampling Small Structures in Graph and Hypergraph Data Streams
In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.
First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus we resolve the space complexity of the simplex counting problem by providing an algorithm that matches the lower bound.
Second, we examine the triangle counting question –a hypergraph where k = 2. We develop and analyze an almost optimal O (n+m 3/2 / T) triangle-counting algorithm based on ideas introduced in [KMPT12]. The proposed algorithm is subsequently used to establish a method for uniformly sampling triangles in a graph stream using O(m 3/2 / T) bits of space, which beats the state-of-the-art O(mn / T) algorithm given by [PTTW13
Counting Simplices in Hypergraph Streams
We consider the problem of space-efficiently estimating the number of
simplices in a hypergraph stream. This is the most natural hypergraph
generalization of the highly-studied problem of estimating the number of
triangles in a graph stream. Our input is a -uniform hypergraph with
vertices and hyperedges. A -simplex in is a subhypergraph on
vertices such that all possible hyperedges among exist in .
The goal is to process a stream of hyperedges of and compute a good
estimate of , the number of -simplices in .
We design a suite of algorithms for this problem. Under a promise that
, our algorithms use at most four passes and together imply a
space bound of for each fixed , in order to
guarantee an estimate within with probability at least
. We also give a simpler -pass algorithm that achieves
space, where (respectively, ) denotes
the maximum number of -simplices that share a hyperedge (respectively, a
vertex). We complement these algorithmic results with space lower bounds of the
form , , and
for multi-pass algorithms and
for -pass algorithms, which show that some of the dependencies on parameters
in our upper bounds are nearly tight. Our techniques extend and generalize
several different ideas previously developed for triangle counting in graphs,
using appropriate innovations to handle the more complicated combinatorics of
hypergraphs
Hypergraph Motifs and Their Extensions Beyond Binary
Hypergraphs naturally represent group interactions, which are omnipresent in
many domains: collaborations of researchers, co-purchases of items, and joint
interactions of proteins, to name a few. In this work, we propose tools for
answering the following questions: (Q1) what are the structural design
principles of real-world hypergraphs? (Q2) how can we compare local structures
of hypergraphs of different sizes? (Q3) how can we identify domains from which
hypergraphs are? We first define hypergraph motifs (h-motifs), which describe
the overlapping patterns of three connected hyperedges. Then, we define the
significance of each h-motif in a hypergraph as its occurrences relative to
those in properly randomized hypergraphs. Lastly, we define the characteristic
profile (CP) as the vector of the normalized significance of every h-motif.
Regarding Q1, we find that h-motifs' occurrences in 11 real-world hypergraphs
from 5 domains are clearly distinguished from those of randomized hypergraphs.
Then, we demonstrate that CPs capture local structural patterns unique to each
domain, and thus comparing CPs of hypergraphs addresses Q2 and Q3. The concept
of CP is extended to represent the connectivity pattern of each node or
hyperedge as a vector, which proves useful in node classification and hyperedge
prediction. Our algorithmic contribution is to propose MoCHy, a family of
parallel algorithms for counting h-motifs' occurrences in a hypergraph. We
theoretically analyze their speed and accuracy and show empirically that the
advanced approximate version MoCHy-A+ is more accurate and faster than the
basic approximate and exact versions, respectively. Furthermore, we explore
ternary hypergraph motifs that extends h-motifs by taking into account not only
the presence but also the cardinality of intersections among hyperedges. This
extension proves beneficial for all previously mentioned applications.Comment: Extended version of VLDB 2020 paper arXiv:2003.0185
- …