448 research outputs found
Palgol: A High-Level DSL for Vertex-Centric Graph Processing with Remote Data Access
Pregel is a popular distributed computing model for dealing with large-scale
graphs. However, it can be tricky to implement graph algorithms correctly and
efficiently in Pregel's vertex-centric model, especially when the algorithm has
multiple computation stages, complicated data dependencies, or even
communication over dynamic internal data structures. Some domain-specific
languages (DSLs) have been proposed to provide more intuitive ways to implement
graph algorithms, but due to the lack of support for remote access --- reading
or writing attributes of other vertices through references --- they cannot
handle the above mentioned dynamic communication, causing a class of Pregel
algorithms with fast convergence impossible to implement.
To address this problem, we design and implement Palgol, a more declarative
and powerful DSL which supports remote access. In particular, programmers can
use a more declarative syntax called chain access to naturally specify dynamic
communication as if directly reading data on arbitrary remote vertices. By
analyzing the logic patterns of chain access, we provide a novel algorithm for
compiling Palgol programs to efficient Pregel code. We demonstrate the power of
Palgol by using it to implement several practical Pregel algorithms, and the
evaluation result shows that the efficiency of Palgol is comparable with that
of hand-written code.Comment: 12 pages, 10 figures, extended version of APLAS 2017 pape
GiViP: A Visual Profiler for Distributed Graph Processing Systems
Analyzing large-scale graphs provides valuable insights in different
application scenarios. While many graph processing systems working on top of
distributed infrastructures have been proposed to deal with big graphs, the
tasks of profiling and debugging their massive computations remain time
consuming and error-prone. This paper presents GiViP, a visual profiler for
distributed graph processing systems based on a Pregel-like computation model.
GiViP captures the huge amount of messages exchanged throughout a computation
and provides an interactive user interface for the visual analysis of the
collected data. We show how to take advantage of GiViP to detect anomalies
related to the computation and to the infrastructure, such as slow computing
units and anomalous message patterns.Comment: Appears in the Proceedings of the 25th International Symposium on
Graph Drawing and Network Visualization (GD 2017
Scalable Facility Location for Massive Graphs on Pregel-like Systems
We propose a new scalable algorithm for facility location. Facility location
is a classic problem, where the goal is to select a subset of facilities to
open, from a set of candidate facilities F , in order to serve a set of clients
C. The objective is to minimize the total cost of opening facilities plus the
cost of serving each client from the facility it is assigned to. In this work,
we are interested in the graph setting, where the cost of serving a client from
a facility is represented by the shortest-path distance on the graph. This
setting allows to model natural problems arising in the Web and in social media
applications. It also allows to leverage the inherent sparsity of such graphs,
as the input is much smaller than the full pairwise distances between all
vertices.
To obtain truly scalable performance, we design a parallel algorithm that
operates on clusters of shared-nothing machines. In particular, we target
modern Pregel-like architectures, and we implement our algorithm on Apache
Giraph. Our solution makes use of a recent result to build sketches for massive
graphs, and of a fast parallel algorithm to find maximal independent sets, as
building blocks. In so doing, we show how these problems can be solved on a
Pregel-like architecture, and we investigate the properties of these
algorithms. Extensive experimental results show that our algorithm scales
gracefully to graphs with billions of edges, while obtaining values of the
objective function that are competitive with a state-of-the-art sequential
algorithm
- …