14,484 research outputs found
Transfer matrix for spanning trees, webs and colored forests
We use the transfer matrix formalism for dimers proposed by Lieb, and
generalize it to address the corresponding problem for arrow configurations (or
trees) associated to dimer configurations through Temperley's correspondence.
On a cylinder, the arrow configurations can be partitioned into sectors
according to the number of non-contractible loops they contain. We show how
Lieb's transfer matrix can be adapted in order to disentangle the various
sectors and to compute the corresponding partition functions. In order to
address the issue of Jordan cells, we introduce a new, extended transfer
matrix, which not only keeps track of the positions of the dimers, but also
propagates colors along the branches of the associated trees. We argue that
this new matrix contains Jordan cells.Comment: 29 pages, 7 figure
Fast approximation of centrality and distances in hyperbolic graphs
We show that the eccentricities (and thus the centrality indices) of all
vertices of a -hyperbolic graph can be computed in linear
time with an additive one-sided error of at most , i.e., after a
linear time preprocessing, for every vertex of one can compute in
time an estimate of its eccentricity such that
for a small constant . We
prove that every -hyperbolic graph has a shortest path tree,
constructible in linear time, such that for every vertex of ,
. These results are based on an
interesting monotonicity property of the eccentricity function of hyperbolic
graphs: the closer a vertex is to the center of , the smaller its
eccentricity is. We also show that the distance matrix of with an additive
one-sided error of at most can be computed in
time, where is a small constant. Recent empirical studies show that
many real-world graphs (including Internet application networks, web networks,
collaboration networks, social networks, biological networks, and others) have
small hyperbolicity. So, we analyze the performance of our algorithms for
approximating centrality and distance matrix on a number of real-world
networks. Our experimental results show that the obtained estimates are even
better than the theoretical bounds.Comment: arXiv admin note: text overlap with arXiv:1506.01799 by other author
Tree-based Coarsening and Partitioning of Complex Networks
Many applications produce massive complex networks whose analysis would
benefit from parallel processing. Parallel algorithms, in turn, often require a
suitable network partition. For solving optimization tasks such as graph
partitioning on large networks, multilevel methods are preferred in practice.
Yet, complex networks pose challenges to established multilevel algorithms, in
particular to their coarsening phase.
One way to specify a (recursive) coarsening of a graph is to rate its edges
and then contract the edges as prioritized by the rating. In this paper we (i)
define weights for the edges of a network that express the edges' importance
for connectivity, (ii) compute a minimum weight spanning tree with
respect to these weights, and (iii) rate the network edges based on the
conductance values of 's fundamental cuts. To this end, we also (iv)
develop the first optimal linear-time algorithm to compute the conductance
values of \emph{all} fundamental cuts of a given spanning tree. We integrate
the new edge rating into a leading multilevel graph partitioner and equip the
latter with a new greedy postprocessing for optimizing the maximum
communication volume (MCV). Experiments on bipartitioning frequently used
benchmark networks show that the postprocessing already reduces MCV by 11.3%.
Our new edge rating further reduces MCV by 10.3% compared to the previously
best rating with the postprocessing in place for both ratings. In total, with a
modest increase in running time, our new approach reduces the MCV of complex
network partitions by 20.4%
Approximating the Smallest Spanning Subgraph for 2-Edge-Connectivity in Directed Graphs
Let be a strongly connected directed graph. We consider the following
three problems, where we wish to compute the smallest strongly connected
spanning subgraph of that maintains respectively: the -edge-connected
blocks of (\textsf{2EC-B}); the -edge-connected components of
(\textsf{2EC-C}); both the -edge-connected blocks and the -edge-connected
components of (\textsf{2EC-B-C}). All three problems are NP-hard, and thus
we are interested in efficient approximation algorithms. For \textsf{2EC-C} we
can obtain a -approximation by combining previously known results. For
\textsf{2EC-B} and \textsf{2EC-B-C}, we present new -approximation
algorithms that run in linear time. We also propose various heuristics to
improve the size of the computed subgraphs in practice, and conduct a thorough
experimental study to assess their merits in practical scenarios
Join-Reachability Problems in Directed Graphs
For a given collection G of directed graphs we define the join-reachability
graph of G, denoted by J(G), as the directed graph that, for any pair of
vertices a and b, contains a path from a to b if and only if such a path exists
in all graphs of G. Our goal is to compute an efficient representation of J(G).
In particular, we consider two versions of this problem. In the explicit
version we wish to construct the smallest join-reachability graph for G. In the
implicit version we wish to build an efficient data structure (in terms of
space and query time) such that we can report fast the set of vertices that
reach a query vertex in all graphs of G. This problem is related to the
well-studied reachability problem and is motivated by emerging applications of
graph-structured databases and graph algorithms. We consider the construction
of join-reachability structures for two graphs and develop techniques that can
be applied to both the explicit and the implicit problem. First we present
optimal and near-optimal structures for paths and trees. Then, based on these
results, we provide efficient structures for planar graphs and general directed
graphs
Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff
The relative ease of collaborative data science and analysis has led to a
proliferation of many thousands or millions of of the same datasets
in many scientific and commercial domains, acquired or constructed at various
stages of data analysis across many users, and often over long periods of time.
Managing, storing, and recreating these dataset versions is a non-trivial task.
The fundamental challenge here is the : the more
storage we use, the faster it is to recreate or retrieve versions, while the
less storage we use, the slower it is to recreate or retrieve versions. Despite
the fundamental nature of this problem, there has been a surprisingly little
amount of work on it. In this paper, we study this trade-off in a principled
manner: we formulate six problems under various settings, trading off these
quantities in various ways, demonstrate that most of the problems are
intractable, and propose a suite of inexpensive heuristics drawing from
techniques in delay-constrained scheduling, and spanning tree literature, to
solve these problems. We have built a prototype version management system, that
aims to serve as a foundation to our DATAHUB system for facilitating
collaborative data science. We demonstrate, via extensive experiments, that our
proposed heuristics provide efficient solutions in practical dataset versioning
scenarios
Motif counting beyond five nodes
Counting graphlets is a well-studied problem in graph mining and social network analysis. Recently, several papers explored very simple and natural algorithms based on Monte Carlo sampling of Markov Chains (MC), and reported encouraging results. We show, perhaps surprisingly, that such algorithms are outperformed by color coding (CC) [2], a sophisticated algorithmic technique that we extend to the case of graphlet sampling and for which we prove strong statistical guarantees. Our computational experiments on graphs with millions of nodes show CC to be more accurate than MC; furthermore, we formally show that the mixing time of the MC approach is too high in general, even when the input graph has high conductance. All this comes at a price however. While MC is very efficient in terms of space, CC’s memory requirements become demanding when the size of the input graph and that of the graphlets grow. And yet, our experiments show that CC can push the limits of the state-of-the-art, both in terms of the size of the input graph and of that of the graphlets
- …