145 research outputs found

    High-Performance Reachability Query Processing under Index Size Restrictions

    Full text link
    In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer - even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.Comment: 30 page

    Extended high dimensional indexing approach for reachability queries on very large graphs

    Get PDF
    Given a directed acyclic graph G = (V,A) and two vertices u, v āˆˆ V , the reachability problem is to answer if there is a path from u to v in the graph. In the context of very large graphs, with millions of vertices and a series of queries to be answered, it is not practical to search the graph for each query. On the other hand, the storage of the full transitive closure of the graph is also impractical due to its O(|V |2) size. Scalable approaches aim to create indices used to prune the search during its execution. Negative indices may be able to determine (in constant time) that a query has a negative answer while positive indices may determine (again in constant time) that a query has a positive answer. In this paper we propose novel scalable approach called LYNX that uses a large number of topological sorts of G as a negative cut index without degrading the query time. A similar strategy is applied regarding a positive cut index. In addition, LYNX proposes a user-defined index size that enables the user to control the ratio between negative and positive cuts depending on the expected query pattern. We show by computational experiments that LYNX consistently outperforms the state-of-the-art approach in terms of query-time using the same index-size for graphs with high reachability ratio. In intelligent computer systems that rely on frequent tests of connectivity in graphs, LYNX can reduce the time delay experience by end users through a reduced query time. This comes at the expense of an increased setup time whenever the underlying graph is updated. Keywords: directed acyclic graphs, topological sorts, reachability queries, graph indexingpublishedVersio

    Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable

    Full text link
    There has been significant recent interest in parallel graph processing due to the need to quickly analyze the large graphs available today. Many graph codes have been designed for distributed memory or external memory. However, today even the largest publicly-available real-world graph (the Hyperlink Web graph with over 3.5 billion vertices and 128 billion edges) can fit in the memory of a single commodity multicore server. Nevertheless, most experimental work in the literature report results on much smaller graphs, and the ones for the Hyperlink graph use distributed or external memory. Therefore, it is natural to ask whether we can efficiently solve a broad class of graph problems on this graph in memory. This paper shows that theoretically-efficient parallel graph algorithms can scale to the largest publicly-available graphs using a single machine with a terabyte of RAM, processing them in minutes. We give implementations of theoretically-efficient parallel algorithms for 20 important graph problems. We also present the optimizations and techniques that we used in our implementations, which were crucial in enabling us to process these large graphs quickly. We show that the running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs. For many of the problems that we consider, this is the first time they have been solved on graphs at this scale. We have made the implementations developed in this work publicly-available as the Graph-Based Benchmark Suite (GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 201

    RDF Querying

    Get PDF
    Reactive Web systems, Web services, and Web-based publish/ subscribe systems communicate events as XML messages, and in many cases require composite event detection: it is not sufficient to react to single event messages, but events have to be considered in relation to other events that are received over time. Emphasizing language design and formal semantics, we describe the rule-based query language XChangeEQ for detecting composite events. XChangeEQ is designed to completely cover and integrate the four complementary querying dimensions: event data, event composition, temporal relationships, and event accumulation. Semantics are provided as model and fixpoint theories; while this is an established approach for rule languages, it has not been applied for event queries before

    Indexing collections of XML documents with arbitrary links

    Get PDF
    In recent years, the popularity of XML has increased significantly. XML is the extensible markup language of the World Wide Web Consortium (W3C). XML is used to represent data in many areas, such as traditional database management systems, e-business environments, and the World Wide Web. XML data, unlike relational and object-oriented data, has no fixed schema known in advance and is stored separately from the data. XML data is self-describing and can model heterogeneity more naturally than relational or object-oriented data models. Moreover, XML data usually has XLinks or XPointers to data in other documents (e.g., global-links). In addition to XLink or XPointer links, the XML standard allows to add internal-links between different elements in the same XML document using the ID/IDREF attributes. The rise in popularity of XML has generated much interest in query processing over graph-structured data. In order to facilitate efficient evaluation of path expressions, structured indexes have been proposed. However, most variants of structured indexes ignore global- or interior-document references. They assume a tree-like structure of XML-documents, which do not contain such global-and internal-links. Extending these indexes to work with large XML graphs considering of global- or internal-document links, firstly requires a lot of computing power for the creation process. Secondly, this would also require a great deal of space in which to store the indexes. As a latter demonstrates, the efficient evaluation of ancestors-descendants queries over arbitrary graphs with long paths is indeed a complex issue. This thesis proposes the HID index (2-Hop cover path Index based on DAG) is based on the concept of a two-hop cover for a directed graph. The algorithms proposed for the HID index creation, in effect, scales down the original graph size substantially. As a result, a directed acyclic graph (DAG) with a smaller number of nodes and edges will emerge. This reduces the number of computing steps required for building the index. In addition to this, computing time and space will be reduced as well. The index also permits to efficiently evaluate ancestors-descendants relationships. Moreover, the proposed index has an advantage over other comparable indexes: it is optimized for descendants- or-self queries on arbitrary graphs with link relationship, a task that would stress any index structures. Our experiments with real life XML data show that, the HID index provides better performance than other indexes

    A Hybrid Approach to Logic Evaluation

    Get PDF
    In this thesis, we contribute the hybrid approach ā€“ a means of combining the practical advantages of feature-rich logic evaluation in the cloud, with the performance benefits of hand-written, optimized, efficient native code. In the first part of our hybrid approach, we introduce a cloud-based distribution for logic programs, which may be deployed as a service, in standard cloud environments, across cheap commodity hardware. Modern systems are in the cloud; while distributed logic solvers exist, these systems are highly specialized, requiring expensive, resource intensive hardware infrastructures. Our original technique achieves a fully automatic synthesis of cloud infrastructure for logic programs, and includes a range of practical features not present in existing distributed logic solvers. We show that an implementation of the distribution scales effectively within real-world cloud environments, against a distribution over cores of the same machine. We show that our multi-node distribution may be effectively combined with existing multi-threaded techniques to mitigate the network communication cost incurred by distribution. In the second part of our hybrid approach, we introduce extra-logical algorithms, to achieve performance for logic programs that would not be possible within a bottom-up logic evaluation. Modern systems must deliver high performance on big data; however, even the most powerful logic engines, distributed or otherwise, can be beaten by hand-written code on particular problems. We give a novel implementation of a system for the high-impact problem of sink-reachability, designed such that its algorithms may be used in logic programs. A thorough empirical evaluation, across a range of large-scale, real-world datasets, shows our system outperforms the current state of the art for the sink-reachability problem in all cases. Our hybrid approach addresses the two major deficiencies of modern logic systems, providing a practical means of evaluating logic in distributed cloud-based environments, while offering performance gains for specific high-impact problems that would not be possible using logic programming alone

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 11561 and 11562 constitutes the refereed proceedings of the 31st International Conference on Computer Aided Verification, CAV 2019, held in New York City, USA, in July 2019. The 52 full papers presented together with 13 tool papers and 2 case studies, were carefully reviewed and selected from 258 submissions. The papers were organized in the following topical sections: Part I: automata and timed systems; security and hyperproperties; synthesis; model checking; cyber-physical systems and machine learning; probabilistic systems, runtime techniques; dynamical, hybrid, and reactive systems; Part II: logics, decision procedures; and solvers; numerical programs; verification; distributed systems and networks; verification and invariants; and concurrency
    • ā€¦
    corecore