2,463 research outputs found

    Answering Shortest Path Distance Queries in Large Complex Networks

    Get PDF
    The \emph{distance query} problem is to find the shortest-path distance between an arbitrary pair of vertices in a graph. It is considered as a fundamental problem in graph theory. Despite a tremendous amount of research on the subject, there is still no satisfactory solution that can scale to large complex networks which may have billions of vertices and edges. Furthermore, many real-world complex networks such as social networks and web graphs are typically dynamic, undergoing discrete changes such as edge insertion and deletion in their topological structure over time. Thus, there is also a pressing need to address the distance query problem on dynamic networks. The goal of this thesis is to address the distance query problem on large static and dynamic complex networks. Labelling-based methods are well-known for rendering fast response time to distance queries; however, existing labelling-based methods can only construct distance labelling for moderately large graphs with millions of vertices and edges and cannot scale to large graphs with billions of vertices and edges due to their prohibitively large space requirements and unbearably long pre-processing time. This thesis proposes a scalable approach that enables fast construction of a distance labelling of a limited size, which contains only distance information from all vertices in a graph to some ``important" vertices (not all) - called \emph{landmarks}. Such a distance labelling is considered as a \emph{partial distance labelling}, in contrast to a \emph{full distance labelling} that contains distance information for all pairs of vertices in a graph. Then, we combine a partial distance labelling that can be computed in an offline manner with online searching to leverage the advantages from both sides - accelerating query processing through a small sized partial distance labelling that provides a good approximation to bound online searches. The proposed method can efficiently construct a distance labelling for a graph with billions of vertices and edges, and enable fast distance computation, e.g. in the order of milliseconds. Since graphs in real-world are dynamic that undergo changes such as edge insertion or deletion in their topological structure, existing labelling-based methods still greatly suffer from the drawback of scalability on dynamic graphs and they can hardly update a distance labelling efficiently. In this thesis, we propose a fully dynamic method which can efficiently reflect graph changes (i.e., single edge insertions or deletions) by dynamically maintaining a distance labelling in order to answer distance queries on dynamice graphs. At its core, our proposed method incorporates two building blocks: (i) \emph{incremental algorithm} for handling incremental update operations, i.e. edge insertion, and (ii) \emph{decremental algorithm} for handling decremental update operations, i.e. edge deletion. Moreover, this thesis also introduces a batch-dynamic method which can process batch of updates (i.e., batches of edge insertions and deletions) efficiently to further improve the performance of answering distance queries on graphs that undergo rapid changes in their topological structure. The proposed batch-dynamic method enables us to unify edge insertions and deletion, helps us to avoid unnecessary and repeated computations, and allows us to exploit the potential of parallelism which as a result is much more efficient than processing graph changes separately one by one. In this thesis, we have conducted extensive experiments on 15-17 real-world networks from a variety of application domains to test the scalability, efficiency, and robustness of the proposed static and dynamic methods against existing state-of-the-art static and dynamic methods

    An exploration of graph algorithms and graph databases

    Get PDF
    With data becoming larger in quantity, the need for complex, efficient algorithms to solve computationally complex problems has become greater. In this thesis we evaluate a selection of graph algorithms; we provide a novel algorithm for solving and approximating the Longest Simple Cycle problem, as well as providing novel implementations of other graph algorithms in graph database systems.The first area of exploration is finding the Longest Simple Cycle in a graph problem. We propose two methods of finding the longest simple cycle. The first method is an exact approach based on a flow-based Integer Linear Program. The second is a multi-start local search heuristic which uses a simple depth-first search as a basis for a cycle, and improves this with four perturbation operators.Secondly, we focus on implementing the Minimum Dominating Set problem into graph database systems. An unoptimised greedy heuristic solution to the Minimum Dominating Set problem is implemented into a client-server system using a declarative query language, an embedded database system using an imperative query language and a high level language as a direct comparison. The performance of the graph back-end on the database systems is evaluated. The language expressiveness of the query languages is also explored. We identify limitations of the query methods of the database system, and propose a function that increases the functionality of the queries

    BlockTag: Design and applications of a tagging system for blockchain analysis

    Full text link
    Annotating blockchains with auxiliary data is useful for many applications. For example, e-crime investigations of illegal Tor hidden services, such as Silk Road, often involve linking Bitcoin addresses, from which money is sent or received, to user accounts and related online activities. We present BlockTag, an open-source tagging system for blockchains that facilitates such tasks. We describe BlockTag's design and present three analyses that illustrate its capabilities in the context of privacy research and law enforcement
    • …
    corecore