4 research outputs found

    Distributed Graph Storage And Querying System

    Get PDF
    Graph databases offer an efficient way to store and access inter-connected data. However, to query large graphs that no longer fit in memory, it becomes necessary to make multiple trips to the storage device to filter and gather data based on the query. But I/O accesses are expensive operations and immensely slow down query response time and prevent us from fully exploiting the graph specific benefits that graph databases offer. The storage models of most existing graph database systems view graphs as indivisible structures and hence do not allow a hierarchical layering of the graph. This adversely affects query performance for large graphs as there is no way to filter the graph on a higher level without actually accessing the entire information from the disk. Distributing the storage and processing is one way to extract better performance. But current distributed solutions to this problem are not entirely effective, again due to the indivisible representation of graphs adopted in the storage format. This causes unnecessary latency due to increased inter-processor communication. In this dissertation, we propose an optimized distributed graph storage system for scalable and faster querying of big graph data. We start with our unique physical storage model, in which the graph is decomposed into three different levels of abstraction, each with a different storage hierarchy. We use a hybrid storage model to store the most critical component and restrict the I/O trips to only when absolutely necessary. This lets us actively make use of multi-level filters while querying, without the need of comprehensive indexes. Our results show that our system outperforms established graph databases for several class of queries. We show that this separation also eases the difficulties in distributing graph data and go on propose a more efficient distributed model for querying general purpose graph data using the Spark framework

    Scalable Parallel Algorithms for Massive Scale-free Graphs

    Get PDF
    Efficiently storing and processing massive graph data sets is a challenging problem as researchers seek to leverage “Big Data” to answer next-generation scientific questions. New techniques are required to process large scale-free graphs in shared, distributed, and external memory. This dissertation develops new techniques to parallelize the storage, computation, and communication for scale-free graphs with high-degree vertices. Our work facilitates the processing of large real-world graph datasets through the development of parallel algorithms and tools that scale to large computational and memory resources, overcoming challenges not addressed by existing techniques. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers, clusters with local non-volatile memory, and shared memory systems. We present three novel techniques to address scaling challenges in processing large scale-free graphs. We apply an asynchronous graph traversal technique using prioritized visitor queues that is capable of tolerating data latencies to the external graph storage media and message passing communication. To accommodate large high-degree vertices, we present an edge list partitioning technique that evenly partitions graphs containing high-degree vertices. Finally, we propose a technique we call distributed delegates that distributes and parallelizes the storage, computation, and communication when processing high-degree vertices. The edges of high-degree vertices are distributed, providing additional opportunities for parallelism not present in existing methods. We apply our techniques to multiple graph algorithms: Breadth-First Search, Single Source Shortest Path, Connected Components, K-Core decomposition, Triangle Counting, and Page Rank. Our experimental study of these algorithms demonstrates excellent scalability on supercomputers, clusters with non-volatile memory, and shared memory systems. Our study includes multiple synthetic scale-free graph models, the largest of which has trillion edges, and real-world input graphs. On a supercomputer, we demonstrate scalability up to 131K processors, and improve the best known Graph500 results for IBM BG/P Intrepid by 15%

    Steering the Random Surfer on Directed Webgraphs

    No full text
    corecore