1,030 research outputs found

    Fast Exact Shortest-Path Distance Queries on Large Networks by Pruned Landmark Labeling

    Full text link
    We propose a new exact method for shortest-path distance queries on large-scale networks. Our method precomputes distance labels for vertices by performing a breadth-first search from every vertex. Seemingly too obvious and too inefficient at first glance, the key ingredient introduced here is pruning during breadth-first searches. While we can still answer the correct distance for any pair of vertices from the labels, it surprisingly reduces the search space and sizes of labels. Moreover, we show that we can perform 32 or 64 breadth-first searches simultaneously exploiting bitwise operations. We experimentally demonstrate that the combination of these two techniques is efficient and robust on various kinds of large-scale real-world networks. In particular, our method can handle social networks and web graphs with hundreds of millions of edges, which are two orders of magnitude larger than the limits of previous exact methods, with comparable query time to those of previous methods.Comment: To appear in SIGMOD 201

    Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty

    Get PDF
    There is a growing need for methods which can capture uncertainties and answer queries over graph-structured data. Two common types of uncertainty are uncertainty over the attribute values of nodes and uncertainty over the existence of edges. In this paper, we combine those with identity uncertainty. Identity uncertainty represents uncertainty over the mapping from objects mentioned in the data, or references, to the underlying real-world entities. We propose the notion of a probabilistic entity graph (PEG), a probabilistic graph model that defines a distribution over possible graphs at the entity level. The model takes into account node attribute uncertainty, edge existence uncertainty, and identity uncertainty, and thus enables us to systematically reason about all three types of uncertainties in a uniform manner. We introduce a general framework for constructing a PEG given uncertain data at the reference level and develop highly efficient algorithms to answer subgraph pattern matching queries in this setting. Our algorithms are based on two novel ideas: context-aware path indexing and reduction by join-candidates, which drastically reduce the query search space. A comprehensive experimental evaluation shows that our approach outperforms baseline implementations by orders of magnitude

    Subjectively interesting connecting trees and forests

    Get PDF
    Consider a large graph or network, and a user-provided set of query vertices between which the user wishes to explore relations. For example, a researcher may want to connect research papers in a citation network, an analyst may wish to connect organized crime suspects in a communication network, or an internet user may want to organize their bookmarks given their location in the world wide web. A natural way to do this is to connect the vertices in the form of a tree structure that is present in the graph. However, in sufficiently dense graphs, most such trees will be large or somehow trivial (e.g. involving high degree vertices) and thus not insightful. Extending previous research, we define and investigate the new problem of mining subjectively interesting trees connecting a set of query vertices in a graph, i.e., trees that are highly surprising to the specific user at hand. Using information theoretic principles, we formalize the notion of interestingness of such trees mathematically, taking in account certain prior beliefs the user has specified about the graph. A remaining problem is efficiently fitting a prior belief model. We show how this can be done for a large class of prior beliefs. Given a specified prior belief model, we then propose heuristic algorithms to find the best trees efficiently. An empirical validation of our methods on a large real graphs evaluates the different heuristics and validates the interestingness of the given trees

    K-Reach: Who is in Your Small World

    Full text link
    We study the problem of answering k-hop reachability queries in a directed graph, i.e., whether there exists a directed path of length k, from a source query vertex to a target query vertex in the input graph. The problem of k-hop reachability is a general problem of the classic reachability (where k=infinity). Existing indexes for processing classic reachability queries, as well as for processing shortest path queries, are not applicable or not efficient for processing k-hop reachability queries. We propose an index for processing k-hop reachability queries, which is simple in design and efficient to construct. Our experimental results on a wide range of real datasets show that our index is more efficient than the state-of-the-art indexes even for processing classic reachability queries, for which these indexes are primarily designed. We also show that our index is efficient in answering k-hop reachability queries.Comment: VLDB201

    Efficient Snapshot Retrieval over Historical Graph Data

    Full text link
    We address the problem of managing historical data for large evolving information networks like social networks or citation networks, with the goal to enable temporal and evolutionary queries and analysis. We present the design and architecture of a distributed graph database system that stores the entire history of a network and provides support for efficient retrieval of multiple graphs from arbitrary time points in the past, in addition to maintaining the current state for ongoing updates. Our system exposes a general programmatic API to process and analyze the retrieved snapshots. We introduce DeltaGraph, a novel, extensible, highly tunable, and distributed hierarchical index structure that enables compactly recording the historical information, and that supports efficient retrieval of historical graph snapshots for single-site or parallel processing. Along with the original graph data, DeltaGraph can also maintain and index auxiliary information; this functionality can be used to extend the structure to efficiently execute queries like subgraph pattern matching over historical data. We develop analytical models for both the storage space needed and the snapshot retrieval times to aid in choosing the right parameters for a specific scenario. In addition, we present strategies for materializing portions of the historical graph state in memory to further speed up the retrieval process. Secondly, we present an in-memory graph data structure called GraphPool that can maintain hundreds of historical graph instances in main memory in a non-redundant manner. We present a comprehensive experimental evaluation that illustrates the effectiveness of our proposed techniques at managing historical graph information
    • …
    corecore