472 research outputs found
Space-Query Tradeoffs in Range Subgraph Counting and Listing
This paper initializes the study of range subgraph counting and range subgraph listing, both of which are motivated by the significant demands in practice to perform graph analytics on subgraphs pertinent to only selected, as opposed to all, vertices. In the first problem, there is an undirected graph G where each vertex carries a real-valued attribute. Given an interval q and a pattern Q, a query counts the number of occurrences of Q in the subgraph of G induced by the vertices whose attributes fall in q. The second problem has the same setup except that a query needs to enumerate (rather than count) those occurrences with a small delay. In both problems, our goal is to understand the tradeoff between space usage and query cost, or more specifically: (i) given a target on query efficiency, how much pre-computed information about G must we store? (ii) Or conversely, given a budget on space usage, what is the best query time we can hope for? We establish a suite of upper- and lower-bound results on such tradeoffs for various query patterns
Borel versions of the Local Lemma and LOCAL algorithms for graphs of finite asymptotic separation index
Asymptotic separation index is a parameter that measures how easily a Borel
graph can be approximated by its subgraphs with finite components. In contrast
to the more classical notion of hyperfiniteness, asymptotic separation index is
well-suited for combinatorial applications in the Borel setting. The main
result of this paper is a Borel version of the Lov\'asz Local Lemma -- a
powerful general-purpose tool in probabilistic combinatorics -- under a finite
asymptotic separation index assumption. As a consequence, we show that locally
checkable labeling problems that are solvable by efficient randomized
distributed algorithms admit Borel solutions on bounded degree Borel graphs
with finite asymptotic separation index. From this we derive a number of
corollaries, for example a Borel version of Brooks's theorem for graphs with
finite asymptotic separation index
Mining Butterflies in Streaming Graphs
This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection.
sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals
Recommended from our members
Foundations of Node Representation Learning
Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical \u27spectral\u27 embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations?
We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance.
To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of networks with bounded max degree or arboricity, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph
Enumerating Subgraphs of Constant Sizes in External Memory
We present an indivisible I/O-efficient algorithm for subgraph enumeration, where the objective is to list all the subgraphs of a massive graph G : = (V, E) that are isomorphic to a pattern graph Q having k = O(1) vertices. Our algorithm performs O((|E|^{k/2})/(M^{{k/2}-1} B) log_{M/B}(|E|/B) + (|E|^?)/(M^{?-1} B) I/Os with high probability, where ? is the fractional edge covering number of Q (it always holds ? ? k/2, regardless of Q), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ? > k/2. When ? = k/2, our algorithm is still optimal as long as M/B ? (|E|/B)^? for any constant ? > 0
Constructing disjoint Steiner trees in Sierpi\'{n}ski graphs
Let be a graph and with . Then the trees
in are \emph{internally disjoint Steiner trees}
connecting (or -Steiner trees) if and
for every pair of distinct integers , . Similarly, if we only have the condition but without the condition , then they are
\emph{edge-disjoint Steiner trees}. The \emph{generalized -connectivity},
denoted by , of a graph , is defined as
,
where is the maximum number of internally disjoint -Steiner
trees. The \emph{generalized local edge-connectivity} is the
maximum number of edge-disjoint Steiner trees connecting in . The {\it
generalized -edge-connectivity} of is defined as
. These
measures are generalizations of the concepts of connectivity and
edge-connectivity, and they and can be used as measures of vulnerability of
networks. It is, in general, difficult to compute these generalized
connectivities. However, there are precise results for some special classes of
graphs. In this paper, we obtain the exact value of
for , and the exact value of for
, where is the Sierpi\'{n}ski graphs with order
. As a direct consequence, these graphs provide additional interesting
examples when . We also study the
some network properties of Sierpi\'{n}ski graphs
Efficient parameterized algorithms on structured graphs
In der klassischen Komplexitätstheorie werden worst-case Laufzeiten von Algorithmen typischerweise einzig abhängig von der Eingabegröße angegeben. In dem Kontext der parametrisierten Komplexitätstheorie versucht man die Analyse der Laufzeit dahingehend zu verfeinern, dass man zusätzlich zu der Eingabengröße noch einen Parameter berücksichtigt, welcher angibt, wie strukturiert die Eingabe bezüglich einer gewissen Eigenschaft ist. Ein parametrisierter Algorithmus nutzt dann diese beschriebene Struktur aus und erreicht so eine Laufzeit, welche schneller ist als die eines besten unparametrisierten Algorithmus, falls der Parameter klein ist.
Der erste Hauptteil dieser Arbeit führt die Forschung in diese Richtung weiter aus und untersucht den Einfluss von verschieden Parametern auf die Laufzeit von bekannten effizient lösbaren Problemen. Einige vorgestellte Algorithmen sind dabei adaptive Algorithmen, was bedeutet, dass die Laufzeit von diesen Algorithmen mit der Laufzeit des besten unparametrisierten Algorithm für den größtmöglichen Parameterwert übereinstimmt und damit theoretisch niemals schlechter als die besten unparametrisierten Algorithmen und übertreffen diese bereits für leicht nichttriviale Parameterwerte.
Motiviert durch den allgemeinen Erfolg und der Vielzahl solcher parametrisierten Algorithmen, welche eine vielzahl verschiedener Strukturen ausnutzen, untersuchen wir im zweiten Hauptteil dieser Arbeit, wie man solche unterschiedliche homogene Strukturen zu mehr heterogenen Strukturen vereinen kann. Ausgehend von algebraischen Ausdrücken, welche benutzt werden können, um von Parametern beschriebene Strukturen zu definieren, charakterisieren wir klar und robust heterogene Strukturen und zeigen exemplarisch, wie sich die Parameter tree-depth und modular-width heterogen verbinden lassen. Wir beschreiben dazu effiziente Algorithmen auf heterogenen Strukturen mit Laufzeiten, welche im Spezialfall mit den homogenen Algorithmen übereinstimmen.In classical complexity theory, the worst-case running times of algorithms depend solely on the size of the input. In parameterized complexity the goal is to refine the analysis of the running time of an algorithm by additionally considering a parameter that measures some kind of structure in the input. A parameterized algorithm then utilizes the structure described by the parameter and achieves a running time that is faster than the best general (unparameterized) algorithm for instances of low parameter value.
In the first part of this thesis, we carry forward in this direction and investigate the influence of several parameters on the running times of well-known tractable problems.
Several presented algorithms are adaptive algorithms, meaning that they match the running time of a best unparameterized algorithm for worst-case parameter values. Thus, an adaptive parameterized algorithm is asymptotically never worse than the best unparameterized algorithm, while it outperforms the best general algorithm already for slightly non-trivial parameter values.
As illustrated in the first part of this thesis, for many problems there exist efficient parameterized algorithms regarding multiple parameters, each describing a different kind of structure.
In the second part of this thesis, we explore how to combine such homogeneous structures to more general and heterogeneous structures.
Using algebraic expressions, we define new combined graph classes
of heterogeneous structure in a clean and robust way, and we showcase this for the heterogeneous merge of the parameters tree-depth and modular-width, by presenting parameterized algorithms
on such heterogeneous graph classes and getting running times that match the homogeneous cases throughout
Massively Parallel Algorithms for the Stochastic Block Model
Learning the community structure of a large-scale graph is a fundamental
problem in machine learning, computer science and statistics. We study the
problem of exactly recovering the communities in a graph generated from the
Stochastic Block Model (SBM) in the Massively Parallel Computation (MPC) model.
Specifically, given vertices that are partitioned into equal-sized
clusters (i.e., each has size ), a graph on these vertices is randomly
generated such that each pair of vertices is connected with probability~ if
they are in the same cluster and with probability if not, where . We give MPC algorithms for the SBM in the (very general) \emph{-space
MPC model}, where each machine has memory . Under the
condition that for any integer , our first algorithm exactly recovers all the clusters in
rounds using total space, or in
rounds using total space. If , our second algorithm achieves
rounds and total space complexity. Both algorithms
significantly improve upon a recent result of Cohen-Addad et al. [PODC'22], who
gave algorithms that only work in the \emph{sublinear space MPC model}, where
each machine has local memory~ for some constant ,
with a much stronger condition on .
Our algorithms are based on collecting the -step neighborhood of each
vertex and comparing the difference of some statistical information generated
from the local neighborhoods for each pair of vertices. To implement the
clustering algorithms in parallel, we present efficient approaches for
implementing some basic graph operations in the -space MPC model
- …