6,125 research outputs found

    Loom: Query-aware Partitioning of Online Graphs

    Full text link
    As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to inter-partition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition boundaries. However, these partitioners are sub-optimal with respect to many workloads, especially queries, which may require more frequent traversal of specific subsets of inter-partition edges. Furthermore, they largely unsuited to operating incrementally on dynamic, growing graphs. We present a new graph partitioning algorithm, Loom, that operates on a stream of graph updates and continuously allocates the new vertices and edges to partitions, taking into account a query workload of graph pattern expressions along with their relative frequencies. First we capture the most common patterns of edge traversals which occur when executing queries. We then compare sub-graphs, which present themselves incrementally in the graph update stream, against these common patterns. Finally we attempt to allocate each match to single partitions, reducing the number of inter-partition edges within frequently traversed sub-graphs and improving average query performance. Loom is extensively evaluated over several large test graphs with realistic query workloads and various orderings of the graph updates. We demonstrate that, given a workload, our prototype produces partitionings of significantly better quality than existing streaming graph partitioning algorithms Fennel and LDG

    TAPER: query-aware, partition-enhancement for large, heterogenous, graphs

    Full text link
    Graph partitioning has long been seen as a viable approach to address Graph DBMS scalability. A partitioning, however, may introduce extra query processing latency unless it is sensitive to a specific query workload, and optimised to minimise inter-partition traversals for that workload. Additionally, it should also be possible to incrementally adjust the partitioning in reaction to changes in the graph topology, the query workload, or both. Because of their complexity, current partitioning algorithms fall short of one or both of these requirements, as they are designed for offline use and as one-off operations. The TAPER system aims to address both requirements, whilst leveraging existing partitioning algorithms. TAPER takes any given initial partitioning as a starting point, and iteratively adjusts it by swapping chosen vertices across partitions, heuristically reducing the probability of inter-partition traversals for a given pattern matching queries workload. Iterations are inexpensive thanks to time and space optimisations in the underlying support data structures. We evaluate TAPER on two different large test graphs and over realistic query workloads. Our results indicate that, given a hash-based partitioning, TAPER reduces the number of inter-partition traversals by around 80%; given an unweighted METIS partitioning, by around 30%. These reductions are achieved within 8 iterations and with the additional advantage of being workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe

    An information-theoretic view of network management

    Get PDF
    We present an information-theoretic framework for network management for recovery from nonergodic link failures. Building on recent work in the field of network coding, we describe the input-output relations of network nodes in terms of network codes. This very general concept of network behavior as a code provides a way to quantify essential management information as that needed to switch among different codes (behaviors) for different failure scenarios. We compare two types of recovery schemes, receiver-based and network-wide, and consider two formulations for quantifying network management. The first is a centralized formulation where network behavior is described by an overall code determining the behavior of every node, and the management requirement is taken as the logarithm of the number of such codes that the network may switch among. For this formulation, we give bounds, many of which are tight, on management requirements for various network connection problems in terms of basic parameters such as the number of source processes and the number of links in a minimum source-receiver cut. Our results include a lower bound for arbitrary connections and an upper bound for multitransmitter multicast connections, for linear receiver-based and network-wide recovery from all single link failures. The second is a node-based formulation where the management requirement is taken as the sum over all nodes of the logarithm of the number of different behaviors for each node. We show that the minimum node-based requirement for failures of links adjacent to a single receiver is achieved with receiver-based schemes
    • …
    corecore