3,791 research outputs found
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
The Complexity of Reasoning with FODD and GFODD
Recent work introduced Generalized First Order Decision Diagrams (GFODD) as a
knowledge representation that is useful in mechanizing decision theoretic
planning in relational domains. GFODDs generalize function-free first order
logic and include numerical values and numerical generalizations of existential
and universal quantification. Previous work presented heuristic inference
algorithms for GFODDs and implemented these heuristics in systems for decision
theoretic planning. In this paper, we study the complexity of the computational
problems addressed by such implementations. In particular, we study the
evaluation problem, the satisfiability problem, and the equivalence problem for
GFODDs under the assumption that the size of the intended model is given with
the problem, a restriction that guarantees decidability. Our results provide a
complete characterization placing these problems within the polynomial
hierarchy. The same characterization applies to the corresponding restriction
of problems in first order logic, giving an interesting new avenue for
efficient inference when the number of objects is bounded. Our results show
that for formulas, and for corresponding GFODDs, evaluation and
satisfiability are complete, and equivalence is
complete. For formulas evaluation is complete, satisfiability
is one level higher and is complete, and equivalence is
complete.Comment: A short version of this paper appears in AAAI 2014. Version 2
includes a reorganization and some expanded proof
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
Accelerating Backward Aggregation in GCN Training with Execution Path Preparing on GPUs
The emerging Graph Convolutional Network (GCN) has now been widely used in
many domains, and it is challenging to improve the efficiencies of applications
by accelerating the GCN trainings. For the sparsity nature and exploding scales
of input real-world graphs, state-of-the-art GCN training systems (e.g.,
GNNAdvisor) employ graph processing techniques to accelerate the message
exchanging (i.e. aggregations) among the graph vertices. Nevertheless, these
systems treat both the aggregation stages of forward and backward propagation
phases as all-active graph processing procedures that indiscriminately conduct
computation on all vertices of an input graph.
In this paper, we first point out that in a GCN training problem with a given
training set, the aggregation stages of its backward propagation phase (called
as backward aggregations in this paper) can be converted to partially-active
graph processing procedures, which conduct computation on only partial vertices
of the input graph. By leveraging such a finding, we propose an execution path
preparing method that collects and coalesces the data used during backward
propagations of GCN training conducted on GPUs. The experimental results show
that compared with GNNAdvisor, our approach improves the performance of the
backward aggregation of GCN trainings on typical real-world graphs by
1.48x~5.65x. Moreover, the execution path preparing can be conducted either
before the training (during preprocessing) or on-the-fly with the training.
When used during preprocessing, our approach improves the overall GCN training
by 1.05x~1.37x. And when used on-the-fly, our approach improves the overall GCN
training by 1.03x~1.35x
- …