    I/O efficient bisimulation partitioning on very large directed acyclic graphs

    In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available

    PatTrieSort - External String Sorting based on Patricia Tries

    External merge sort belongs to the most efficient and widely used algorithms to sort big data: As much data as fits inside is sorted in main memory and afterwards swapped to external storage as so called initial run. After sorting all the data in this way block-wise, the initial runs are merged in a merging phase in order to retrieve the final sorted run containing the completely sorted original data. Patricia tries are one of the most space-efficient ways to store strings especially those with common prefixes. Hence, we propose to use patricia tries for initial run generation in an external merge sort variant, such that initial runs can become large compared to traditional external merge sort using the same main memory size. Furthermore, we store the initial runs as patricia tries instead of lists of sorted strings. As we will show in this paper, patricia tries can be efficiently merged having a superior performance in comparison to merging runs of sorted strings. We complete our discussion with a complexity analysis as well as a comprehensive performance evaluation, where our new approach outperforms traditional external merge sort by a factor of 4 for sorting over 4 billion strings of real world data

    External Batched Range Minimum Queries and LCP Construction

    This work deals with I/O-optimal suffix array (SA) and longest common prefix (LCP) array construction in external memory. For this purpose, the I/O-optimale DC3 algorithm is enhanced by LCP construction and adapted accordingly to the external memory model. In this context we present a method to construct the required range minimum queries (RMQs) efficiently in external memory. The core of this work is a description and implementation of the resulting external DC3-LCP algorithm using the Stxxl - the C++ Standard Template Library for Extra Large Data Sets. Experimental results based on realistic, real-world instances rounds off this work

    Efficient external-memory bisimulation on DAGs

    Tackling Latency Using FG

    Applications that operate on datasets which are too big to fit in main memory, known in the literature as external-memory or out-of-core applications, store their data on one or more disks. Several of these applications make multiple passes over the data, where each pass reads data from disk, operates on it, and writes data back to disk. Compared with an in-memory operation, a disk-I/O operation takes orders of magnitude (approx. 100,000 times) longer; that is, disk-I/O is a high-latency operation. Out-of-core algorithms often run on a distributed-memory cluster to take advantage of a cluster\u27s computing power, memory, disk space, and bandwidth. By doing so, however, they introduce another high-latency operation: interprocessor communication. Efficient implementations of these algorithms access data in blocks to amortize the cost of a single data transfer over the disk or the network, and they introduce asynchrony to overlap high-latency operations and computations. FG, short for Asynchronous Buffered Computation Design and Engineering Framework Generator, is a programming framework that helps to mitigate latency in out-of-core programs that run on distributed-memory clusters. An FG program is composed of a pipeline of stages operating on buffers. FG runs the stages asynchronously so that stages performing high-latency operations can overlap their work with other stages. FG supplies the code to create a pipeline, synchronize the stages, and manage data buffers; the user provides a straightforward function, containing only synchronous calls, for each stage. In this thesis, we use FG to tackle latency and exploit the available parallelism in out-of-core and distributed-memory programs. We show how FG helps us design out-of-core programs and think about parallel computing in general using three instances: an out-of-core, distribution-based sorting program; an implementation of external-memory suffix arrays; and a scientific-computing application called the fast Gauss transform. FG\u27s interaction with these real-world programs is symbiotic: FG enables efficient implementations of these programs, and the design of the first two of these programs pointed us toward further extensions for FG. Today\u27s era of multicore machines compels us to harness all opportunities for parallelism that are available in a program, and so in the latter two applications, we combine FG\u27s multithreading capabilities with the routines that OpenMP offers for in-core parallelism. In the fast Gauss transform application, we use this strategy to realize an up to 20-fold performance improvement compared with an alternate fast Gauss transform implementation. In addition, we use our experience with designing programs in FG to provide some suggestions for the next version of FG

    Autonomic visualisation.

    This thesis introduces the concept of autonomic visualisation, where principles of autonomic systems are brought to the field of visualisation infrastructure. Problems in visualisation have a specific set of requirements which are not always met by existing systems. The first half of this thesis explores a specific problem for large scale visualisation; that of data management. Visualisation algorithms have somewhat different requirements to other external memory problems, due to the fact that they often require access to all, or a large subset, of the data in a way that is highly dependent on the view. This thesis proposes a knowledge-based approach to pre-fetching in this context, and presents evidence that such an approach yields good performance. The knowledge based approach is incorporated into a five-layer model, which provides a systematic way of categorising and designing out-of-core, or external memory, systems. This model is demonstrated with two example implementations, on in the local and one in the remote context. The second half explores autonomic visualisation in the more general case. A simulation tool, created for the purpose of designing autonomic visualisation infrastructure is presented. This tool, SimEAC, provides a way of facilitating the development of techniques for managing large-scale visualisation systems. The abstract design of the simulation system, as well as details of the implementation are presented. The architecture of the simulator is explored, and then the system is evaluated in a number of case studies indicating some of the ways in which it can be used. The simulator provides a framework for experimentation and rapid prototyping of large scale autonomic systems