8 research outputs found
0-1 Integer Linear Programming with a Linear Number of Constraints
We give an exact algorithm for the 0-1 Integer Linear Programming problem
with a linear number of constraints that improves over exhaustive search by an
exponential factor. Specifically, our algorithm runs in time
where n is the number of variables and cn is the
number of constraints. The key idea for the algorithm is a reduction to the
Vector Domination problem and a new algorithm for that subproblem
Decoding Hidden Markov Models Faster Than Viterbi Via Online Matrix-Vector (max, +)-Multiplication
In this paper, we present a novel algorithm for the maximum a posteriori
decoding (MAPD) of time-homogeneous Hidden Markov Models (HMM), improving the
worst-case running time of the classical Viterbi algorithm by a logarithmic
factor. In our approach, we interpret the Viterbi algorithm as a repeated
computation of matrix-vector -multiplications. On time-homogeneous
HMMs, this computation is online: a matrix, known in advance, has to be
multiplied with several vectors revealed one at a time. Our main contribution
is an algorithm solving this version of matrix-vector -multiplication
in subquadratic time, by performing a polynomial preprocessing of the matrix.
Employing this fast multiplication algorithm, we solve the MAPD problem in
time for any time-homogeneous HMM of size and observation
sequence of length , with an extra polynomial preprocessing cost negligible
for . To the best of our knowledge, this is the first algorithm for the
MAPD problem requiring subquadratic time per observation, under the only
assumption -- usually verified in practice -- that the transition probability
matrix does not change with time.Comment: AAAI 2016, to appea
Faster all-pairs shortest paths via circuit complexity
We present a new randomized method for computing the min-plus product
(a.k.a., tropical product) of two matrices, yielding a faster
algorithm for solving the all-pairs shortest path problem (APSP) in dense
-node directed graphs with arbitrary edge weights. On the real RAM, where
additions and comparisons of reals are unit cost (but all other operations have
typical logarithmic cost), the algorithm runs in time
and is correct with high probability.
On the word RAM, the algorithm runs in time for edge weights in . Prior algorithms used either time for
various , or time for various
and .
The new algorithm applies a tool from circuit complexity, namely the
Razborov-Smolensky polynomials for approximately representing
circuits, to efficiently reduce a matrix product over the algebra to
a relatively small number of rectangular matrix products over ,
each of which are computable using a particularly efficient method due to
Coppersmith. We also give a deterministic version of the algorithm running in
time for some , which utilizes the
Yao-Beigel-Tarui translation of circuits into "nice" depth-two
circuits.Comment: 24 pages. Updated version now has slightly faster running time. To
appear in ACM Symposium on Theory of Computing (STOC), 201
Struktury danych i algorytmy dynamiczne dla grafĂłw planarnych
Obtaining provably efficient algorithms for the most basic graph problems like finding (shortest)
paths or computing maximum matchings, fast enough to handle real-world-scale graphs (i.e.,
consisting of millions of vertices and edges), is a very challenging task. For example, in a very
general regime of strongly-polynomial algorithms (see, e.g., [65]), we still do not know how
to compute shortest paths in a real-weighted sparse directed graph significantly faster than in
quadratic time, using the classical, but somewhat simple-minded, Bellman-Ford method.
One way to circumvent this problem is to consider more restricted computation models for
graph algorithms. If, for example, we restrict ourselves to graphs with integral edge weights, we
can improve upon the Bellman-Ford algorithm [14, 31]. Although these results are very deep
algorithmically, their theoretical efficiency is still very far from the only known trivial linear
lower bound on the actual time complexity of the negatively-weighted shortest path problem.
Another approach is to develop algorithms specialized for certain graph classes that appear
in practice. Planar graphs constitute one of the most important and well-studied such classes.
Many of the real-world networks can be drawn on a plane with no or few edge crossings. The
examples include not very complex road networks and graphs considered in the domain of VLSI
design. Complex road networks, although far from being planar, share with planar graphs some
useful properties, like the existence of small separators [20]. Special cases of planar graphs, such
as grids, appear often in the area of image processing (e.g., [7]).
And indeed, if we restrict ourselves to planar graphs, many of the classical polynomial-time
graph problems, in particular computing shortest paths [35, 58] and maximum flows [4, 5, 21]
in real-weighted graphs, can be solved either optimally or in nearly-linear time. The very
rich combinatorial structure of planar graphs often allows breaking barriers that appear in
the respective problems for general graphs by using techniques from computational geometry
(e.g., [27]), or by applying sophisticated data structures, such as dynamic trees [4, 10, 21, 66].
In this thesis, we focus on the data-structural aspect of planar graph algorithmics. By this,
we mean that rather than concentrating on particular planar graph problems, we study more
abstract, âlow-levelâ problems. Efficient algorithms for these problems can be used in a blackbox manner to design algorithms for multiple specific problems at once. Such an approach
allows us to improve upon many known complexity upper bounds for different planar graph
problems simultaneously, without going into the specifics of these problems.
We also study dynamic algorithms for planar graphs, i.e., algorithms that maintain certain
information about a dynamically changing graph (such as âis the graph connected?â) much more
efficiently than by recomputing this information from scratch after each update. We consider
the edge-update model where the input graph can be modified only by adding or removing
1
single edges. A graph algorithm is called fully-dynamic if it supports both edge insertions and
edge deletions, and partially dynamic if it supports either only edge insertions (then we call it
incremental) or only edge deletions (then it is called decremental).
When designing dynamic graph algorithms, we care about the update time, i.e., the time
needed by the algorithm to adapt to an elementary change of the graph, and query time, i.e., the
time needed by the algorithm to recompute the requested portion of the maintained information.
Sometimes, especially in partially dynamic settings, it is more convenient to measure the total
update time, i.e., the total time needed by the algorithm to process any possible sequence
of updates. For some dynamic problems, it is worth focusing on a more restricted explicit
maintenance model where the entire maintained information is explicitly updated (so that the
user is notified about the update) after each change. In this model the query procedure is trivial
and thus we only care about the update time.
Note that there is actually no clear distinction between dynamic graph algorithms and graph
data structures, since dynamic algorithms are often used as black-boxes to obtain efficient static
algorithms (e.g., [26]). For example, the incremental connectivity problem, where one needs to
process queries about the existence of a path between given vertices, while the input undirected
graph undergoes edge insertions, is actually equivalent to the disjoint-set data structure problem,
also called the union-find data structure problem (see, e.g., [15]).
We concentrate mostly on the decremental model and obtain very efficient decremental
algorithms for problems on unweighted planar graphs related to reachability and connectivity.
We also apply our dynamic algorithms to static problems, thus confirming once again the datastructural character of these results.
In the following, let G = (V, E) denote the input planar graph with n vertices. For clarity
of this summary, assume G is a simple graph. Then, by planarity, it has O(n) edges. When we
talk about general graphs, we denote by m the number of edges of the graph.
2 Contracting a Planar Graph
The first part of the thesis is devoted to the data-structural aspect of contracting edges in planar
graphs. Edge contraction is one of the fundamental graph operations. Given an undirected
graph and its edge e, contracting the edge e consists in removing it from the graph and merging
its endpoints. The notion of contraction has been used to describe a number of prominent graph
algorithms, including Edmondsâ algorithm for computing maximum matchings [19], or Kargerâs
minimum cut algorithm [44].
Edge contractions are of particular interest in planar graphs, as a number of planar graph
properties can be described using contractions. For example, it is well-known that a graph
is planar precisely when it cannot be transformed into K5 or K3,3 by contracting edges, or
removing vertices or edges (see e.g., [17]). Moreover, contracting an edge preserves planarity.
We would like to have at our disposal a data structure that performs contractions on the
input planar graph and still provides access to the most basic information about our graph,
such as the sizes of neighbors sets of individual vertices and the adjacency relation. While
contraction operation is conceptually very simple, its efficient implementation is challenging.
This is because it is not clear how to represent individual verticesâ adjacency lists so that
adjacency list merges, adjacency queries, and neighborhood size queries are all efficient. By
using standard data structures (e.g., balanced binary search trees), one can maintain adjacency
lists of a graph subject to contractions in polylogarithmic amortized time. However, in many
planar graph algorithms this becomes a bottleneck.
As an example, consider the problem of computing a 5-coloring of a planar graph. There
exists a very simple algorithm based on contractions [53] that only relies on a folklore fact that
2
a planar graph has a vertex of degree no more than 5. However, linear-time algorithms solving
this problem use some more involved planar graph properties [23, 53, 60]. For example, the
algorithm by Matula et al. [53] uses the fact that every planar graph has either a vertex of
degree at most 4 or a vertex of degree 5 adjacent to at least four vertices, each having degree
at most 11. Similarly, although there exists a very simple algorithm for computing a minimum
spanning tree of a planar graph based on edge contractions, various different methods have been
used to implement it efficiently [23, 51, 52].
The problem of maintaining a planar graph under contractions has been studied before. In
their book, Klein and Mozes [46] showed that there exists a (a bit more general) data structure
maintaining a planar graph under edge contractions and deletions, and answering adjacency
queries in O(1) worst-case time. The update time is O(log n). This result is based on the work
of Brodal and Fagerberg [8], who showed how to maintain a bounded-outdegree orientation of
a dynamic planar graph so that the edge set updates are supported in O(log n) amortized time.
Gustedt [32] showed an optimal solution to the union-find problem in the case when at any
time the actual subsets form disjoint and connected subgraphs of a given planar graph G. In
other words, in this problem the allowed unions correspond to the edges of a planar graph and
the execution of a union operation can be seen as a contraction of the respective edge.
Our Results
We show a data structure that can efficiently maintain a planar graph subject to edge contractions in linear total time, assuming the standard word-RAM model with word size âŠ(log n). It
can report groups of parallel edges and self-loops that emerge. It also supports constant-time
adjacency queries and maintains the neighbor lists and degrees explicitly. The data structure
can be used as a black-box to implement planar graph algorithms that use contractions.
As an example, our data structure can be used to give clean and conceptually simple lineartime implementations of algorithms for computing 5-coloring or minimum spanning tree.
More importantly, by using our data structure, we give improved algorithms for a few
problems in planar graphs. In particular, we obtain optimal algorithms for decremental 2-edgeconnectivity (see, e.g., [30]), finding a unique perfect matching [26], and computing maximal
3-edge-connected subgraphs [12].
In order to obtain our result, we first partition the graph into small pieces of roughly logarithmic size (using so-called r-divisions [24]). Then we solve our problem recursively for each
of the pieces, and separately using a simple-minded approach for the subgraph induced by o(n)
vertices contained in multiple pieces (the so-called boundary vertices). Such an approach proved
successful in obtaining optimal data structures for the planar union-find problem [32] and decremental connectivity [50]. In fact, our data-structural problem can be seen as a generalization
of the former problem. However, maintaining the status of each edge e of the initial graph G
(i.e., whether e has become a self-loop or a parallel edge) subject to edge contractions, and
supporting constant-time adjacency queries without resorting to randomization, turn out to be
serious technical challenges. Overcoming these difficulties is our main contribution of this part
of the thesis.
3 Decremental Reachability
The second part of this thesis is devoted to dynamic reachability problems in planar graphs. In
the dynamic reachability problem we are given a (directed) graph G subject to edge updates and
the goal is to design a data structure that would allow answering queries about the existence of
a path between a pair of query vertices u, v â V .
3
Two variants of dynamic reachability are studied most often. In the all-pairs variant, our
data structure has to support queries between arbitrary pairs of vertices. This variant is also
called the dynamic transitive closure problem, since a path u â v exists in G if uv is an edge
of the transitive closure of G.
In the single-source reachability problem, a source vertex s â V is fixed from the very
beginning and the only allowed queries are about the existence of a path s â v, where v â V .
If we work with undirected graphs, the dynamic reachability problem is called the dynamic
connectivity problem. Note that in the undirected case a path u â v exists in G if and only if
a path v â u exists in G.
State of the Art
Dynamic reachability in general directed graphs turns out to be a very challenging problem.
First of all, it is computationally much more demanding than its undirected counterpart. For
undirected graphs, fully-dynamic all-pairs algorithms with polylogarithmic amortized update
and query bounds are known [36, 38, 71]. For directed graphs, on the other hand, in most
settings (either single-source or all-pairs, either incremental, decremental or fully-dynamic) the
best known algorithm has either polynomial update time or polynomial query time. The only
exception is the incremental single-source reachability problem, for which a trivial extension of
depth-first search [68] achieves O(1) amortized update time.
One of the possible reasons behind such a big gap between the undirected and directed
settings is that one needs only linear time to compute the connected components of an undirected
graph, and thus there exists a O(n)-space static data structure that can answer connectivity
queries in undirected graphs in O(1) time. On the other hand, the best known algorithm for
computing the transitive closure runs in Oe(min(n
Ï
, nm)) = Oe(n
2
)
1
time [11, 59].
So far, the best known bounds for fully-dynamic reachability are as follows. For dynamic
transitive closure, there exist a number of algorithms with O(n
2
) update time and O(1) query
time [16, 61, 64]. These algorithms, in fact, maintain the transitive closure explicitly. There also
exist a few fully-dynamic algorithms that are better for sparse graphs, each of which has âŠ(n)
amortized update time and query time which is o(n) but still polynomial in n [62, 63, 64]. For
the single-source variant, the only known non-trivial (i.e., other than recompute-from-scratch)
algorithm has O(n
1.53) update time and O(1) query time [64].
Algorithms with O(nm) total update time are known for both incremental [39] and decremental [48, 62] transitive closure. Note that for sparse graphs this bound is only poly-logarithmic
factors away from the best known static transitive closure upper bound [11].
All the known partially-dynamic single-source reachability algorithms work in the explicit
maintenance model. As mentioned before, for incremental single-source reachability, an optimal
(in the amortized sense) algorithm is known. Interestingly, the first algorithms with O(mn1â
)
total update time (where > 0) have been obtained only recently [33, 34]. The best known
algorithm to date has Oe(m
â
n) total update time and is due to Chechik et al. [13].
Dynamic reachability has also been previously studied for planar graphs. Diks and Sankowski
[18] showed a fully-dynamic transitive closure algorithm with Oe(
â
n) update and query times,
which works under the assumption that the graph is plane embedded and the inserted edges
can only connect vertices sharing some adjacent face. ĆÄ
cki [48] showed that one can maintain
the strongly connected components of a planar graph under edge deletions in O(n
â
n) total
time. By known reductions, it follows that there exists a decremental single-source reachability
algorithm for planar graphs with O(n
â
n) total update time. Note that this bound matches the
recent best known bound for general graphs [13] up to polylogarithmic factors.
1We denote by Oe(f(n)) the order O(f(n) polylog n)
Systems and Algorithms for Dynamic Graph Processing
Data generated from human and systems interactions could be naturally represented as graph data. Several emerging applications rely on graph data, such as the semantic web, social networks, bioinformatics, finance, and trading among others. These applications require graph querying capabilities which are often implemented in graph database management systems (GDBMS). Many GDBMSs have capabilities to evaluate one-time versions of recursive or subgraph queries over static graphs â graphs that do not change or a single snapshot of a changing graph. They generally do not support incrementally maintaining queries as graphs change. However, most applications that employ graphs are dynamic in nature resulting in graphs that change over time, also known as dynamic graphs.
This thesis investigates how to build a generic and scalable incremental computation solution that is oblivious to graph workloads. It focuses on two fundamental computations performed by many applications: recursive queries and subgraph queries. Specifically, for
subgraph queries, this thesis presents the first approach that (i) performs joins with worstcase optimal computation and communication costs; and (ii) maintains a total memory footprint almost linear in the number of input edges. For recursive queries, this thesis studies optimizations for using differential computation (DC). DC is a general incremental computation that can maintain the output of a recursive dataflow computation upon changes. However, it requires a prohibitively large amount of memory because it maintains differences that track changes in queries input/output. The thesis proposes a suite of optimizations that are based on reducing the number of these differences and recomputing them when necessary. The techniques and optimizations in this thesis, for subgraph and recursive computations, represent a proposal for how to build a state-of-the-art generic and
scalable GDBMS for dynamic graph data management
Graph-based Analysis of Dynamic Systems
The analysis of dynamic systems provides insights into their time-dependent characteristics. This enables us to monitor, evaluate, and improve systems from various areas. They are often represented as graphs that model the system's components and their relations. The analysis of the resulting dynamic graphs yields great insights into the system's underlying structure, its characteristics, as well as properties of single components. The interpretation of these results can help us understand how a system works and how parameters influence its performance. This knowledge supports the design of new systems and the improvement of existing ones.
The main issue in this scenario is the performance of analyzing the dynamic graph to obtain relevant properties. While various approaches have been developed to analyze dynamic graphs, it is not always clear which one performs best for the analysis of a specific graph. The runtime also depends on many other factors, including the size and topology of the graph, the frequency of changes, and the data structures used to represent the graph in memory. While the benefits and drawbacks of many data structures are well-known, their runtime is hard to predict when used for the representation of dynamic graphs. Hence, tools are required to benchmark and compare different algorithms for the computation of graph properties and data structures for the representation of dynamic graphs in memory. Based on deeper insights into their performance, new algorithms can be developed and efficient data structures can be selected.
In this thesis, we present four contributions to tackle these problems: A benchmarking framework for dynamic graph analysis, novel algorithms for the efficient analysis of dynamic graphs, an approach for the parallelization of dynamic graph analysis, and a novel paradigm to select and adapt graph data structures. In addition, we present three use cases from the areas of social, computer, and biological networks to illustrate the great insights provided by their graph-based analysis.
We present a new benchmarking framework for the analysis of dynamic graphs, the Dynamic Network Analyzer (DNA). It provides tools to benchmark and compare different algorithms for the analysis of dynamic graphs as well as the data structures used to represent them in memory. DNA supports the development of new algorithms and the automatic verification of their results. Its visualization component provides different ways to represent dynamic graphs and the results of their analysis.
We introduce three new stream-based algorithms for the analysis of dynamic graphs. We evaluate their performance on synthetic as well as real-world dynamic graphs and compare their runtimes to snapshot-based algorithms. Our results show great performance gains for all three algorithms. The new stream-based algorithm StreaM_k, which counts the frequencies of k-vertex motifs, achieves speedups up to 19,043 x for synthetic and 2882 x for real-world datasets.
We present a novel approach for the distributed processing of dynamic graphs, called parallel Dynamic Graph Analysis (pDNA). To analyze a dynamic graph, the work is distributed by a partitioner that creates subgraphs and assigns them to workers. They compute the properties of their respective subgraph using standard algorithms. Their results are used by the collator component to merge them to the properties of the original graph. We evaluate the performance of pDNA for the computation of five graph properties on two real-world dynamic graphs with up to 32 workers. Our approach achieves great speedups, especially for the analysis of complex graph measures.
We introduce two novel approaches for the selection of efficient graph data structures. The compile-time approach estimates the workload of an analysis after an initial profiling phase and recommends efficient data structures based on benchmarking results. It achieves speedups of up to 5.4 x over baseline data structure configurations for the analysis of real-word dynamic graphs. The run-time approach monitors the workload during analysis and exchanges the graph representation if it finds a configuration that promises to be more efficient for the current workload. Compared to baseline configurations, it achieves speedups up to 7.3 x for the analysis of a synthetic workload.
Our contributions provide novel approaches for the efficient analysis of dynamic graphs and tools to further investigate the trade-offs between different factors that influence the performance.:1 Introduction
2 Notation and Terminology
3 Related Work
4 DNA - Dynamic Network Analyzer
5 Algorithms
6 Parallel Dynamic Network Analysis
7 Selection of Efficient Graph Data Structures
8 Use Cases
9 Conclusion
A DNA - Dynamic Network Analyzer
B Algorithms
C Selection of Efficient Graph Data Structures
D Parallel Dynamic Network Analysis
E Graph-based Intrusion Detection System
F Molecular Dynamic
Graph-based Analysis of Dynamic Systems
The analysis of dynamic systems provides insights into their time-dependent characteristics. This enables us to monitor, evaluate, and improve systems from various areas. They are often represented as graphs that model the system's components and their relations. The analysis of the resulting dynamic graphs yields great insights into the system's underlying structure, its characteristics, as well as properties of single components. The interpretation of these results can help us understand how a system works and how parameters influence its performance. This knowledge supports the design of new systems and the improvement of existing ones.
The main issue in this scenario is the performance of analyzing the dynamic graph to obtain relevant properties. While various approaches have been developed to analyze dynamic graphs, it is not always clear which one performs best for the analysis of a specific graph. The runtime also depends on many other factors, including the size and topology of the graph, the frequency of changes, and the data structures used to represent the graph in memory. While the benefits and drawbacks of many data structures are well-known, their runtime is hard to predict when used for the representation of dynamic graphs. Hence, tools are required to benchmark and compare different algorithms for the computation of graph properties and data structures for the representation of dynamic graphs in memory. Based on deeper insights into their performance, new algorithms can be developed and efficient data structures can be selected.
In this thesis, we present four contributions to tackle these problems: A benchmarking framework for dynamic graph analysis, novel algorithms for the efficient analysis of dynamic graphs, an approach for the parallelization of dynamic graph analysis, and a novel paradigm to select and adapt graph data structures. In addition, we present three use cases from the areas of social, computer, and biological networks to illustrate the great insights provided by their graph-based analysis.
We present a new benchmarking framework for the analysis of dynamic graphs, the Dynamic Network Analyzer (DNA). It provides tools to benchmark and compare different algorithms for the analysis of dynamic graphs as well as the data structures used to represent them in memory. DNA supports the development of new algorithms and the automatic verification of their results. Its visualization component provides different ways to represent dynamic graphs and the results of their analysis.
We introduce three new stream-based algorithms for the analysis of dynamic graphs. We evaluate their performance on synthetic as well as real-world dynamic graphs and compare their runtimes to snapshot-based algorithms. Our results show great performance gains for all three algorithms. The new stream-based algorithm StreaM_k, which counts the frequencies of k-vertex motifs, achieves speedups up to 19,043 x for synthetic and 2882 x for real-world datasets.
We present a novel approach for the distributed processing of dynamic graphs, called parallel Dynamic Graph Analysis (pDNA). To analyze a dynamic graph, the work is distributed by a partitioner that creates subgraphs and assigns them to workers. They compute the properties of their respective subgraph using standard algorithms. Their results are used by the collator component to merge them to the properties of the original graph. We evaluate the performance of pDNA for the computation of five graph properties on two real-world dynamic graphs with up to 32 workers. Our approach achieves great speedups, especially for the analysis of complex graph measures.
We introduce two novel approaches for the selection of efficient graph data structures. The compile-time approach estimates the workload of an analysis after an initial profiling phase and recommends efficient data structures based on benchmarking results. It achieves speedups of up to 5.4 x over baseline data structure configurations for the analysis of real-word dynamic graphs. The run-time approach monitors the workload during analysis and exchanges the graph representation if it finds a configuration that promises to be more efficient for the current workload. Compared to baseline configurations, it achieves speedups up to 7.3 x for the analysis of a synthetic workload.
Our contributions provide novel approaches for the efficient analysis of dynamic graphs and tools to further investigate the trade-offs between different factors that influence the performance.:1 Introduction
2 Notation and Terminology
3 Related Work
4 DNA - Dynamic Network Analyzer
5 Algorithms
6 Parallel Dynamic Network Analysis
7 Selection of Efficient Graph Data Structures
8 Use Cases
9 Conclusion
A DNA - Dynamic Network Analyzer
B Algorithms
C Selection of Efficient Graph Data Structures
D Parallel Dynamic Network Analysis
E Graph-based Intrusion Detection System
F Molecular Dynamic