6 research outputs found

    Distributed block formation and layout for disk-based management of large-scale graphs

    Get PDF
    We are witnessing an enormous growth in social networks as well as in the volume of data generated by them. An important portion of this data is in the form of graphs. In recent years, several graph processing and management systems emerged to handle large-scale graphs. The primary goal of these systems is to run graph algorithms and queries in an efficient and scalable manner. Unlike relational data, graphs are semi-structured in nature. Thus, storing and accessing graph data using secondary storage requires new solutions that can provide locality of access for graph processing workloads. In this work, we propose a scalable block formation and layout technique for graphs, which aims at reducing the I/O cost of disk-based graph processing algorithms. To achieve this, we designed a scalable MapReduce-style method called ICBL, which can divide the graph into a series of disk blocks that contain sub-graphs with high locality. Furthermore, ICBL can order the resulting blocks on disk to further reduce non-local accesses. We experimentally evaluated ICBL to showcase its scalability, layout quality, as well as the effectiveness of automatic parameter tuning for ICBL. We deployed the graph layouts generated by ICBL on the Neo4j open source graph database, http://www.neo4j.org/ (2015) graph database management system. Our results show that the layout generated by ICBL reduces the query running times over Neo4j more than 2 × compared to the default layout. © 2017, Springer Science+Business Media New York

    Disk-based management of interaction graphs

    Get PDF
    In our increasingly connected and instrumented world, live data recording the interactions between people, systems, and the environment is available in various domains, such as telecommunciations and social media. This data often takes the form of a temporally evolving graph, where entities are the vertices and the interactions between them are the edges. An important feature of this graph is that the number of edges it has grows continuously, as new interactions take place. We call such graphs interaction graphs. In this paper we study the problem of storing interaction graphs such that temporal queries on them can be answered efficiently. Since interaction graphs are append-only and edges are added continuously, traditional graph layout and storage algorithms that are batch based cannot be applied directly. We present the design and implementation of a system that caches recent interactions in memory, while quickly placing the expired interactions to disk blocks such that those edges that are likely to be accessed together are placed together. We develop live block formation algorithms that are fast, yet can take advantage of temporal and spatial locality among the edges to optimize the storage layout with the goal of improving query performance. We evaluate the system on synthetic as well as real-world interaction graphs, and show that our block formation algorithms are effective for answering temporal neighborhood queries on the graph. Such queries form a foundation for building more complex online and offline temporal analytics on interaction graphs. © 1989-2012 IEEE

    Disk-Based Management of Interaction Graphs

    No full text