9,445 research outputs found

    GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

    Full text link
    From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

    Loom: Query-aware Partitioning of Online Graphs

    Full text link
    As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to inter-partition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition boundaries. However, these partitioners are sub-optimal with respect to many workloads, especially queries, which may require more frequent traversal of specific subsets of inter-partition edges. Furthermore, they largely unsuited to operating incrementally on dynamic, growing graphs. We present a new graph partitioning algorithm, Loom, that operates on a stream of graph updates and continuously allocates the new vertices and edges to partitions, taking into account a query workload of graph pattern expressions along with their relative frequencies. First we capture the most common patterns of edge traversals which occur when executing queries. We then compare sub-graphs, which present themselves incrementally in the graph update stream, against these common patterns. Finally we attempt to allocate each match to single partitions, reducing the number of inter-partition edges within frequently traversed sub-graphs and improving average query performance. Loom is extensively evaluated over several large test graphs with realistic query workloads and various orderings of the graph updates. We demonstrate that, given a workload, our prototype produces partitionings of significantly better quality than existing streaming graph partitioning algorithms Fennel and LDG

    The First-Order Theory of Sets with Cardinality Constraints is Decidable

    Full text link
    We show that the decidability of the first-order theory of the language that combines Boolean algebras of sets of uninterpreted elements with Presburger arithmetic operations. We thereby disprove a recent conjecture that this theory is undecidable. Our language allows relating the cardinalities of sets to the values of integer variables, and can distinguish finite and infinite sets. We use quantifier elimination to show the decidability and obtain an elementary upper bound on the complexity. Precise program analyses can use our decidability result to verify representation invariants of data structures that use an integer field to represent the number of stored elements.Comment: 18 page

    Real root finding for equivariant semi-algebraic systems

    Get PDF
    Let RR be a real closed field. We consider basic semi-algebraic sets defined by nn-variate equations/inequalities of ss symmetric polynomials and an equivariant family of polynomials, all of them of degree bounded by 2d<n2d < n. Such a semi-algebraic set is invariant by the action of the symmetric group. We show that such a set is either empty or it contains a point with at most 2d12d-1 distinct coordinates. Combining this geometric result with efficient algorithms for real root finding (based on the critical point method), one can decide the emptiness of basic semi-algebraic sets defined by ss polynomials of degree dd in time (sn)O(d)(sn)^{O(d)}. This improves the state-of-the-art which is exponential in nn. When the variables x1,,xnx_1, \ldots, x_n are quantified and the coefficients of the input system depend on parameters y1,,yty_1, \ldots, y_t, one also demonstrates that the corresponding one-block quantifier elimination problem can be solved in time (sn)O(dt)(sn)^{O(dt)}
    corecore