1,065 research outputs found
Certified Context-Free Parsing: A formalisation of Valiant's Algorithm in Agda
Valiant (1975) has developed an algorithm for recognition of context free
languages. As of today, it remains the algorithm with the best asymptotic
complexity for this purpose. In this paper, we present an algebraic
specification, implementation, and proof of correctness of a generalisation of
Valiant's algorithm. The generalisation can be used for recognition, parsing or
generic calculation of the transitive closure of upper triangular matrices. The
proof is certified by the Agda proof assistant. The certification is
representative of state-of-the-art methods for specification and proofs in
proof assistants based on type-theory. As such, this paper can be read as a
tutorial for the Agda system
Algorithmic Complexity of Power Law Networks
It was experimentally observed that the majority of real-world networks
follow power law degree distribution. The aim of this paper is to study the
algorithmic complexity of such "typical" networks. The contribution of this
work is twofold.
First, we define a deterministic condition for checking whether a graph has a
power law degree distribution and experimentally validate it on real-world
networks. This definition allows us to derive interesting properties of power
law networks. We observe that for exponents of the degree distribution in the
range such networks exhibit double power law phenomenon that was
observed for several real-world networks. Our observation indicates that this
phenomenon could be explained by just pure graph theoretical properties.
The second aim of our work is to give a novel theoretical explanation why
many algorithms run faster on real-world data than what is predicted by
algorithmic worst-case analysis. We show how to exploit the power law degree
distribution to design faster algorithms for a number of classical P-time
problems including transitive closure, maximum matching, determinant, PageRank
and matrix inverse. Moreover, we deal with the problems of counting triangles
and finding maximum clique. Previously, it has been only shown that these
problems can be solved very efficiently on power law graphs when these graphs
are random, e.g., drawn at random from some distribution. However, it is
unclear how to relate such a theoretical analysis to real-world graphs, which
are fixed. Instead of that, we show that the randomness assumption can be
replaced with a simple condition on the degrees of adjacent vertices, which can
be used to obtain similar results. As a result, in some range of power law
exponents, we are able to solve the maximum clique problem in polynomial time,
although in general power law networks the problem is NP-complete
Fully Dynamic Single-Source Reachability in Practice: An Experimental Study
Given a directed graph and a source vertex, the fully dynamic single-source
reachability problem is to maintain the set of vertices that are reachable from
the given vertex, subject to edge deletions and insertions. It is one of the
most fundamental problems on graphs and appears directly or indirectly in many
and varied applications. While there has been theoretical work on this problem,
showing both linear conditional lower bounds for the fully dynamic problem and
insertions-only and deletions-only upper bounds beating these conditional lower
bounds, there has been no experimental study that compares the performance of
fully dynamic reachability algorithms in practice. Previous experimental
studies in this area concentrated only on the more general all-pairs
reachability or transitive closure problem and did not use real-world dynamic
graphs.
In this paper, we bridge this gap by empirically studying an extensive set of
algorithms for the single-source reachability problem in the fully dynamic
setting. In particular, we design several fully dynamic variants of well-known
approaches to obtain and maintain reachability information with respect to a
distinguished source. Moreover, we extend the existing insertions-only or
deletions-only upper bounds into fully dynamic algorithms. Even though the
worst-case time per operation of all the fully dynamic algorithms we evaluate
is at least linear in the number of edges in the graph (as is to be expected
given the conditional lower bounds) we show in our extensive experimental
evaluation that their performance differs greatly, both on generated as well as
on real-world instances
Accelerating transitive closure of large-scale sparse graphs
Finding the transitive closure of a graph is a fundamental graph problem where another graph is obtained in which an edge exists between two nodes if and only if there is a path in our graph from one node to the other. The reachability matrix of a graph is its transitive closure. This thesis describes a novel approach that uses anti-sections to obtain the transitive closure of a graph. It also examines its advantages when implemented in parallel on a CPU using the Hornet graph data structure.
Graph representations of real-world systems are typically sparse in nature due to lesser connectivity between nodes. The anti-section approach is designed specifically to improve performance for large scale sparse graphs. The NVIDIA Titan V CPU is used for the execution of the anti-section parallel implementations. The Dual-Round and Hash-Based implementations of the Anti-Section transitive closure approach provide a significant speedup over several parallel and sequential implementations
Compressing and Performing Algorithms on Massively Large Networks
Networks are represented as a set of nodes (vertices) and the arcs (links) connecting them. Such networks can model various real-world structures such as social networks (e.g., Facebook), information networks (e.g., citation networks), technological networks (e.g., the Internet), and biological networks (e.g., gene-phenotype network). Analysis of such structures is a heavily studied area with many applications. However, in this era of big data, we find ourselves with networks so massive that the space requirements inhibit network analysis.
Since many of these networks have nodes and arcs on the order of billions to trillions, even basic data structures such as adjacency lists could cost petabytes to zettabytes of storage. Storing these networks in secondary memory would require I/O access (i.e., disk access) during analysis, thus drastically slowing analysis time. To perform analysis efficiently on such extensive data, we either need enough main memory for the data structures and algorithms, or we need to develop compressions that require much less space while still being able to answer queries efficiently.
In this dissertation, we develop several compression techniques that succinctly represent these real-world networks while still being able to efficiently query the network (e.g., check if an arc exists between two nodes). Furthermore, since many of these networks continue to grow over time, our compression techniques also support the ability to add and remove nodes and edges directly on the compressed structure. We also provide a way to compress the data quickly without any intermediate structure, thus giving minimal memory overhead. We provide detailed analysis and prove that our compression is indeed succinct (i.e., achieves the information-theoretic lower bound). Also, we empirically show that our compression rates outperform or are equal to existing compression algorithms on many benchmark datasets.
We also extend our technique to time-evolving networks. That is, we store the entire state of the network at each time frame. Studying time-evolving networks allows us to find patterns throughout the time that would not be available in regular, static network analysis. A succinct representation for time-evolving networks is arguably more important than static graphs, due to the extra dimension inflating the space requirements of basic data structures even more. Again, we manage to achieve succinctness while also providing fast encoding, minimal memory overhead during encoding, fast queries, and fast, direct modification. We also compare against several benchmarks and empirically show that we achieve compression rates better than or equal to the best performing benchmark for each dataset.
Finally, we also develop both static and time-evolving algorithms that run directly on our compressed structures. Using our static graph compression combined with our differential technique, we find that we can speed up matrix-vector multiplication by reusing previously computed products. We compare our results against a similar technique using the Webgraph Framework, and we see that not only are our base query speeds faster, but we also gain a more significant speed-up from reusing products. Then, we use our time-evolving compression to solve the earliest arrival paths problem and time-evolving transitive closure. We found that not only were we the first to run such algorithms directly on compressed data, but that our technique was particularly efficient at doing so
Conditional Lower Bounds for Space/Time Tradeoffs
In recent years much effort has been concentrated towards achieving
polynomial time lower bounds on algorithms for solving various well-known
problems. A useful technique for showing such lower bounds is to prove them
conditionally based on well-studied hardness assumptions such as 3SUM, APSP,
SETH, etc. This line of research helps to obtain a better understanding of the
complexity inside P.
A related question asks to prove conditional space lower bounds on data
structures that are constructed to solve certain algorithmic tasks after an
initial preprocessing stage. This question received little attention in
previous research even though it has potential strong impact.
In this paper we address this question and show that surprisingly many of the
well-studied hard problems that are known to have conditional polynomial time
lower bounds are also hard when concerning space. This hardness is shown as a
tradeoff between the space consumed by the data structure and the time needed
to answer queries. The tradeoff may be either smooth or admit one or more
singularity points.
We reveal interesting connections between different space hardness
conjectures and present matching upper bounds. We also apply these hardness
conjectures to both static and dynamic problems and prove their conditional
space hardness.
We believe that this novel framework of polynomial space conjectures can play
an important role in expressing polynomial space lower bounds of many important
algorithmic problems. Moreover, it seems that it can also help in achieving a
better understanding of the hardness of their corresponding problems in terms
of time
Efficient parallel computation on multiprocessors with optical interconnection networks
This dissertation studies optical interconnection networks, their architecture, address schemes, and computation and communication capabilities. We focus on a simple but powerful optical interconnection network model - the Linear Array with Reconfigurable pipelined Bus System (LARPBS). We extend the LARPBS model to a simplified higher dimensional LAPRBS and provide a set of basic computation operations. We then study the following two groups of parallel computation problems on both one dimensional LARPBS\u27s as well as multi-dimensional LARPBS\u27s: parallel comparison problems, including sorting, merging, and selection; Boolean matrix multiplication, transitive closure and their applications to connected component problems. We implement an optimal sorting algorithm on an n-processor LARPBS. With this optimal sorting algorithm at disposal, we study the sorting problem for higher dimensional LARPBS\u27s and obtain the following results: • An optimal basic Columnsort algorithm on a 2D LARPBS. • Two optimal two-way merge sort algorithms on a 2D LARPBS. • An optimal multi-way merge sorting algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 2D LARPBS. • An optimal generalized column sort algorithm on a 3D LARPBS. • An optimal 5-phase sorting algorithm on a 3D LARPBS. Results for selection problems are as follows: • A constant time maximum-finding algorithm on an LARPBS. • An optimal maximum-finding algorithm on an LARPBS. • An O((log log n)2) time parallel selection algorithm on an LARPBS. • An O(k(log log n)2) time parallel multi-selection algorithm on an LARPBS. While studying the computation and communication properties of the LARPBS model, we find Boolean matrix multiplication and its applications to the graph are another set of problem that can be solved efficiently on the LARPBS. Following is a list of results we have obtained in this area. • A constant time Boolean matrix multiplication algorithm. • An O(log n)-time transitive closure algorithm. • An O(log n)-time connected components algorithm. • An O(log n)-time strongly connected components algorithm. The results provided in this dissertation show the strong computation and communication power of optical interconnection networks
- …