235,538 research outputs found
I/O efficient bisimulation partitioning on very large directed acyclic graphs
In this paper we introduce the first efficient external-memory algorithm to
compute the bisimilarity equivalence classes of a directed acyclic graph (DAG).
DAGs are commonly used to model data in a wide variety of practical
applications, ranging from XML documents and data provenance models, to web
taxonomies and scientific workflows. In the study of efficient reasoning over
massive graphs, the notion of node bisimilarity plays a central role. For
example, grouping together bisimilar nodes in an XML data set is the first step
in many sophisticated approaches to building indexing data structures for
efficient XPath query evaluation. To date, however, only internal-memory
bisimulation algorithms have been investigated. As the size of real-world DAG
data sets often exceeds available main memory, storage in external memory
becomes necessary. Hence, there is a practical need for an efficient approach
to computing bisimulation in external memory.
Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)),
where |N| and |E| are the numbers of nodes and edges, resp., in the data graph
and Sort(n) is the number of accesses to external memory needed to sort an
input of size n. We also study specializations of this algorithm to common
variations of bisimulation for tree-structured XML data sets. We empirically
verify efficient performance of the algorithms on graphs and XML documents
having billions of nodes and edges, and find that the algorithms can process
such graphs efficiently even when very limited internal memory is available.
The proposed algorithms are simple enough for practical implementation and use,
and open the door for further study of external-memory bisimulation algorithms.
To this end, the full open-source C++ implementation has been made freely
available
I/O efficient Core Graph Decomposition at web scale.
Core decomposition is a fundamental graph problem with a large number of
applications. Most existing approaches for core decomposition assume that the
graph is kept in memory of a machine. Nevertheless, many real-world graphs are
big and may not reside in memory. In the literature, there is only one work for
I/O efficient core decomposition that avoids loading the whole graph in memory.
However, this approach is not scalable to handle big graphs because it cannot
bound the memory size and may load most parts of the graph in memory. In
addition, this approach can hardly handle graph updates. In this paper, we
study I/O efficient core decomposition following a semi-external model, which
only allows node information to be loaded in memory. This model works well in
many web-scale graphs. We propose a semi-external algorithm and two optimized
algorithms for I/O efficient core decomposition using very simple structures
and data access model. To handle dynamic graph updates, we show that our
algorithm can be naturally extended to handle edge deletion. We also propose an
I/O efficient core maintenance algorithm to handle edge insertion, and an
improved algorithm to further reduce I/O and CPU cost by investigating some new
graph properties. We conduct extensive experiments on 12 real large graphs. Our
optimal algorithm significantly outperform the existing I/O efficient algorithm
in terms of both processing time and memory consumption. In many
memory-resident graphs, our algorithms for both core decomposition and
maintenance can even outperform the in-memory algorithm due to the simple
structures and data access model used. Our algorithms are very scalable to
handle web-scale graphs. As an example, we are the first to handle a web graph
with 978.5 million nodes and 42.6 billion edges using less than 4.2 GB memory
Efficient external-memory bisimulation on DAGs
ABSTRACT In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N | + |E|)), where |N | and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for treestructured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of externalmemory bisimulation algorithms. To this end, the full opensource C++ implementation has been made freely available
Space efficient algorithms for string processing
The suffix array (SA), which is an array containing the suffixes of a string sorted into lexicographical order, was introduced in the late eighties as a space efficient alternative to the suffix tree. It has since emerged as a useful data structure in string processing problems such as pattern matching, pattern discovery, and data compression. The SA is often coupled with the longest-common-prefix (LCP) array that contains the length of the longest common prefixes between consecutive suffixes in the SA. When enhanced with the LCP array, the SA can provide efficient solutions to the above applications including a problem called pattern mining. To date, all the mining algorithms lie at either extreme of the efficiency spectrum: they are either fast and use enormous amounts of space, or they are compact and orders of magnitude slower. We present a mining algorithm that achieves the best of both these extremes, having runtime comparable to the fastest published algorithms while using less space than the most space efficient. In all these applications, the construction of the SA --- also known as suffix sorting --- is one of the main computational bottlenecks. Most papers describing the SA assume the SA fits in RAM memory, limiting their applications. The fastest algorithms in this large memory suffix sorting category use powerful pointer copying heuristics to expedite suffix sorting. Several space efficient algorithms have emerged in the last five years, where the trend is to use as little RAM as possible. They do so by finding a clever way to trade runtime, or by using slow compressed data structures, or by using external memory (disk), or some combination of these techniques. In this thesis, we focus on improving the runtime of a space efficient algorithm due to Kärkkäinen by adapting the heuristics from large memory suffix sorting to a semi-external setting. Also, pointer copying has been heavily used to speed up the construction of the SA, but not the LCP array. We also discuss our attempts of combining the pointer copying heuristics to an efficient LCP construction algorithm due to Kärkkäinen, Manzini and Puglisi. The Burrows-Wheeler transform (BWT) was discovered independently of the SA, but it is now known that the two data structures are deeply linked. The BWT is central to practical compression tools such as szip and bzip2. Many papers have been published on constructing the BWT either in RAM or in external memory but few on inverting the BWT to obtain the original string --- in fact none in external memory. For larger datasets, the existing traditional approaches cannot be used to invert the BWT. In such cases, we have to use disk. We close the gap between theory and practice by examining the problem of inverting the BWT efficiently on disk. We provide a practical implementation of the only theoretical proposal for the problem by Ferragina, Gagie and Manzini. We also provide new, faster solutions to the problem based on simple scanning and compression techniques
A Bulk-Parallel Priority Queue in External Memory with STXXL
We propose the design and an implementation of a bulk-parallel external
memory priority queue to take advantage of both shared-memory parallelism and
high external memory transfer speeds to parallel disks. To achieve higher
performance by decoupling item insertions and extractions, we offer two
parallelization interfaces: one using "bulk" sequences, the other by defining
"limit" items. In the design, we discuss how to parallelize insertions using
multiple heaps, and how to calculate a dynamic prediction sequence to prefetch
blocks and apply parallel multiway merge for extraction. Our experimental
results show that in the selected benchmarks the priority queue reaches 75% of
the full parallel I/O bandwidth of rotational disks and and 65% of SSDs, or the
speed of sorting in external memory when bounded by computation.Comment: extended version of SEA'15 conference pape
- …