763 research outputs found
Computing complexity measures of degenerate graphs
We show that the VC-dimension of a graph can be computed in time , where is the degeneracy of the input graph. The core idea
of our algorithm is a data structure to efficiently query the number of
vertices that see a specific subset of vertices inside of a (small) query set.
The construction of this data structure takes time , afterwards
queries can be computed efficiently using fast M\"obius inversion.
This data structure turns out to be useful for a range of tasks, especially
for finding bipartite patterns in degenerate graphs, and we outline an
efficient algorithms for counting the number of times specific patterns occur
in a graph. The largest factor in the running time of this algorithm is
, where is a parameter of the pattern we call its left covering
number.
Concrete applications of this algorithm include counting the number of
(non-induced) bicliques in linear time, the number of co-matchings in quadratic
time, as well as a constant-factor approximation of the ladder index in linear
time.
Finally, we supplement our theoretical results with several implementations
and run experiments on more than 200 real-world datasets -- the largest of
which has 8 million edges -- where we obtain interesting insights into the
VC-dimension of real-world networks.Comment: Accepted for publication in the 18th International Symposium on
Parameterized and Exact Computation (IPEC 2023
Space-Query Tradeoffs in Range Subgraph Counting and Listing
This paper initializes the study of range subgraph counting and range subgraph listing, both of which are motivated by the significant demands in practice to perform graph analytics on subgraphs pertinent to only selected, as opposed to all, vertices. In the first problem, there is an undirected graph G where each vertex carries a real-valued attribute. Given an interval q and a pattern Q, a query counts the number of occurrences of Q in the subgraph of G induced by the vertices whose attributes fall in q. The second problem has the same setup except that a query needs to enumerate (rather than count) those occurrences with a small delay. In both problems, our goal is to understand the tradeoff between space usage and query cost, or more specifically: (i) given a target on query efficiency, how much pre-computed information about G must we store? (ii) Or conversely, given a budget on space usage, what is the best query time we can hope for? We establish a suite of upper- and lower-bound results on such tradeoffs for various query patterns
Borel versions of the Local Lemma and LOCAL algorithms for graphs of finite asymptotic separation index
Asymptotic separation index is a parameter that measures how easily a Borel
graph can be approximated by its subgraphs with finite components. In contrast
to the more classical notion of hyperfiniteness, asymptotic separation index is
well-suited for combinatorial applications in the Borel setting. The main
result of this paper is a Borel version of the Lov\'asz Local Lemma -- a
powerful general-purpose tool in probabilistic combinatorics -- under a finite
asymptotic separation index assumption. As a consequence, we show that locally
checkable labeling problems that are solvable by efficient randomized
distributed algorithms admit Borel solutions on bounded degree Borel graphs
with finite asymptotic separation index. From this we derive a number of
corollaries, for example a Borel version of Brooks's theorem for graphs with
finite asymptotic separation index
Finding Small Complete Subgraphs Efficiently
(I) We revisit the algorithmic problem of finding all triangles in a graph
with vertices and edges. According to a result of Chiba and
Nishizeki (1985), this task can be achieved by a combinatorial algorithm
running in time, where is the
graph arboricity. We provide a new very simple combinatorial algorithm for
finding all triangles in a graph and show that is amenable to the same running
time analysis. We derive these worst-case bounds from first principles and with
very simple proofs that do not rely on classic results due to Nash-Williams
from the 1960s.
(II) We extend our arguments to the problem of finding all small complete
subgraphs of a given fixed size. We show that the dependency on and
in the running time of the algorithm of
Chiba and Nishizeki for listing all copies of , where , is
asymptotically tight.
(III) We give improved arboricity-sensitive running times for counting and/or
detection of copies of , for small . A key ingredient in
our algorithms is, once again, the algorithm of Chiba and Nishizeki. Our new
algorithms are faster than all previous algorithms in certain high-range
arboricity intervals for every .Comment: 14 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:2105.0126
Mining Butterflies in Streaming Graphs
This thesis introduces two main-memory systems sGrapp and sGradd for performing the fundamental analytic tasks of biclique counting and concept drift detection over a streaming graph. A data-driven heuristic is used to architect the systems. To this end, initially, the growth patterns of bipartite streaming graphs are mined and the emergence principles of streaming motifs are discovered. Next, the discovered principles are (a) explained by a graph generator called sGrow; and (b) utilized to establish the requirements for efficient, effective, explainable, and interpretable management and processing of streams. sGrow is used to benchmark stream analytics, particularly in the case of concept drift detection.
sGrow displays robust realization of streaming growth patterns independent of initial conditions, scale and temporal characteristics, and model configurations. Extensive evaluations confirm the simultaneous effectiveness and efficiency of sGrapp and sGradd. sGrapp achieves mean absolute percentage error up to 0.05/0.14 for the cumulative butterfly count in streaming graphs with uniform/non-uniform temporal distribution and a processing throughput of 1.5 million data records per second. The throughput and estimation error of sGrapp are 160x higher and 0.02x lower than baselines. sGradd demonstrates an improving performance over time, achieves zero false detection rates when there is not any drift and when drift is already detected, and detects sequential drifts in zero to a few seconds after their occurrence regardless of drift intervals
Recommended from our members
Foundations of Node Representation Learning
Low-dimensional node representations, also called node embeddings, are a cornerstone in the modeling and analysis of complex networks. In recent years, advances in deep learning have spurred development of novel neural network-inspired methods for learning node representations which have largely surpassed classical \u27spectral\u27 embeddings in performance. Yet little work asks the central questions of this thesis: Why do these novel deep methods outperform their classical predecessors, and what are their limitations?
We pursue several paths to answering these questions. To further our understanding of deep embedding methods, we explore their relationship with spectral methods, which are better understood, and show that some popular deep methods are equivalent to spectral methods in a certain natural limit. We also introduce the problem of inverting node embeddings in order to probe what information they contain. Further, we propose a simple, non-deep method for node representation learning, and find it to often be competitive with modern deep graph networks in downstream performance.
To better understand the limitations of node embeddings, we prove some upper and lower bounds on their capabilities. Most notably, we prove that node embeddings are capable of exact low-dimensional representation of networks with bounded max degree or arboricity, and we further show that a simple algorithm can find such exact embeddings for real-world networks. By contrast, we also prove inherent bounds on random graph models, including those derived from node embeddings, to capture key structural properties of networks without simply memorizing a given graph
Enumerating Subgraphs of Constant Sizes in External Memory
We present an indivisible I/O-efficient algorithm for subgraph enumeration, where the objective is to list all the subgraphs of a massive graph G : = (V, E) that are isomorphic to a pattern graph Q having k = O(1) vertices. Our algorithm performs O((|E|^{k/2})/(M^{{k/2}-1} B) log_{M/B}(|E|/B) + (|E|^?)/(M^{?-1} B) I/Os with high probability, where ? is the fractional edge covering number of Q (it always holds ? ? k/2, regardless of Q), M is the number of words in (internal) memory, and B is the number of words in a disk block. Our solution is optimal in the class of indivisible algorithms for all pattern graphs with ? > k/2. When ? = k/2, our algorithm is still optimal as long as M/B ? (|E|/B)^? for any constant ? > 0
Constructing disjoint Steiner trees in Sierpi\'{n}ski graphs
Let be a graph and with . Then the trees
in are \emph{internally disjoint Steiner trees}
connecting (or -Steiner trees) if and
for every pair of distinct integers , . Similarly, if we only have the condition but without the condition , then they are
\emph{edge-disjoint Steiner trees}. The \emph{generalized -connectivity},
denoted by , of a graph , is defined as
,
where is the maximum number of internally disjoint -Steiner
trees. The \emph{generalized local edge-connectivity} is the
maximum number of edge-disjoint Steiner trees connecting in . The {\it
generalized -edge-connectivity} of is defined as
. These
measures are generalizations of the concepts of connectivity and
edge-connectivity, and they and can be used as measures of vulnerability of
networks. It is, in general, difficult to compute these generalized
connectivities. However, there are precise results for some special classes of
graphs. In this paper, we obtain the exact value of
for , and the exact value of for
, where is the Sierpi\'{n}ski graphs with order
. As a direct consequence, these graphs provide additional interesting
examples when . We also study the
some network properties of Sierpi\'{n}ski graphs
Efficient parameterized algorithms on structured graphs
In der klassischen KomplexitĂ€tstheorie werden worst-case Laufzeiten von Algorithmen typischerweise einzig abhĂ€ngig von der EingabegröĂe angegeben. In dem Kontext der parametrisierten KomplexitĂ€tstheorie versucht man die Analyse der Laufzeit dahingehend zu verfeinern, dass man zusĂ€tzlich zu der EingabengröĂe noch einen Parameter berĂŒcksichtigt, welcher angibt, wie strukturiert die Eingabe bezĂŒglich einer gewissen Eigenschaft ist. Ein parametrisierter Algorithmus nutzt dann diese beschriebene Struktur aus und erreicht so eine Laufzeit, welche schneller ist als die eines besten unparametrisierten Algorithmus, falls der Parameter klein ist.
Der erste Hauptteil dieser Arbeit fĂŒhrt die Forschung in diese Richtung weiter aus und untersucht den Einfluss von verschieden Parametern auf die Laufzeit von bekannten effizient lösbaren Problemen. Einige vorgestellte Algorithmen sind dabei adaptive Algorithmen, was bedeutet, dass die Laufzeit von diesen Algorithmen mit der Laufzeit des besten unparametrisierten Algorithm fĂŒr den gröĂtmöglichen Parameterwert ĂŒbereinstimmt und damit theoretisch niemals schlechter als die besten unparametrisierten Algorithmen und ĂŒbertreffen diese bereits fĂŒr leicht nichttriviale Parameterwerte.
Motiviert durch den allgemeinen Erfolg und der Vielzahl solcher parametrisierten Algorithmen, welche eine vielzahl verschiedener Strukturen ausnutzen, untersuchen wir im zweiten Hauptteil dieser Arbeit, wie man solche unterschiedliche homogene Strukturen zu mehr heterogenen Strukturen vereinen kann. Ausgehend von algebraischen AusdrĂŒcken, welche benutzt werden können, um von Parametern beschriebene Strukturen zu definieren, charakterisieren wir klar und robust heterogene Strukturen und zeigen exemplarisch, wie sich die Parameter tree-depth und modular-width heterogen verbinden lassen. Wir beschreiben dazu effiziente Algorithmen auf heterogenen Strukturen mit Laufzeiten, welche im Spezialfall mit den homogenen Algorithmen ĂŒbereinstimmen.In classical complexity theory, the worst-case running times of algorithms depend solely on the size of the input. In parameterized complexity the goal is to refine the analysis of the running time of an algorithm by additionally considering a parameter that measures some kind of structure in the input. A parameterized algorithm then utilizes the structure described by the parameter and achieves a running time that is faster than the best general (unparameterized) algorithm for instances of low parameter value.
In the first part of this thesis, we carry forward in this direction and investigate the influence of several parameters on the running times of well-known tractable problems.
Several presented algorithms are adaptive algorithms, meaning that they match the running time of a best unparameterized algorithm for worst-case parameter values. Thus, an adaptive parameterized algorithm is asymptotically never worse than the best unparameterized algorithm, while it outperforms the best general algorithm already for slightly non-trivial parameter values.
As illustrated in the first part of this thesis, for many problems there exist efficient parameterized algorithms regarding multiple parameters, each describing a different kind of structure.
In the second part of this thesis, we explore how to combine such homogeneous structures to more general and heterogeneous structures.
Using algebraic expressions, we define new combined graph classes
of heterogeneous structure in a clean and robust way, and we showcase this for the heterogeneous merge of the parameters tree-depth and modular-width, by presenting parameterized algorithms
on such heterogeneous graph classes and getting running times that match the homogeneous cases throughout
- âŠ