9,445 research outputs found
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Loom: Query-aware Partitioning of Online Graphs
As with general graph processing systems, partitioning data over a cluster of
machines improves the scalability of graph database management systems.
However, these systems will incur additional network cost during the execution
of a query workload, due to inter-partition traversals. Workload-agnostic
partitioning algorithms typically minimise the likelihood of any edge crossing
partition boundaries. However, these partitioners are sub-optimal with respect
to many workloads, especially queries, which may require more frequent
traversal of specific subsets of inter-partition edges. Furthermore, they
largely unsuited to operating incrementally on dynamic, growing graphs.
We present a new graph partitioning algorithm, Loom, that operates on a
stream of graph updates and continuously allocates the new vertices and edges
to partitions, taking into account a query workload of graph pattern
expressions along with their relative frequencies.
First we capture the most common patterns of edge traversals which occur when
executing queries. We then compare sub-graphs, which present themselves
incrementally in the graph update stream, against these common patterns.
Finally we attempt to allocate each match to single partitions, reducing the
number of inter-partition edges within frequently traversed sub-graphs and
improving average query performance.
Loom is extensively evaluated over several large test graphs with realistic
query workloads and various orderings of the graph updates. We demonstrate
that, given a workload, our prototype produces partitionings of significantly
better quality than existing streaming graph partitioning algorithms Fennel and
LDG
The First-Order Theory of Sets with Cardinality Constraints is Decidable
We show that the decidability of the first-order theory of the language that
combines Boolean algebras of sets of uninterpreted elements with Presburger
arithmetic operations. We thereby disprove a recent conjecture that this theory
is undecidable. Our language allows relating the cardinalities of sets to the
values of integer variables, and can distinguish finite and infinite sets. We
use quantifier elimination to show the decidability and obtain an elementary
upper bound on the complexity.
Precise program analyses can use our decidability result to verify
representation invariants of data structures that use an integer field to
represent the number of stored elements.Comment: 18 page
Real root finding for equivariant semi-algebraic systems
Let be a real closed field. We consider basic semi-algebraic sets defined
by -variate equations/inequalities of symmetric polynomials and an
equivariant family of polynomials, all of them of degree bounded by .
Such a semi-algebraic set is invariant by the action of the symmetric group. We
show that such a set is either empty or it contains a point with at most
distinct coordinates. Combining this geometric result with efficient algorithms
for real root finding (based on the critical point method), one can decide the
emptiness of basic semi-algebraic sets defined by polynomials of degree
in time . This improves the state-of-the-art which is exponential
in . When the variables are quantified and the
coefficients of the input system depend on parameters , one
also demonstrates that the corresponding one-block quantifier elimination
problem can be solved in time
- …