120 research outputs found
Distributed GraphLab: A Framework for Machine Learning in the Cloud
While high-level data parallel frameworks, like MapReduce, simplify the
design and implementation of large-scale data processing systems, they do not
naturally or efficiently support many important data mining and machine
learning algorithms and can lead to inefficient learning systems. To help fill
this critical void, we introduced the GraphLab abstraction which naturally
expresses asynchronous, dynamic, graph-parallel computation while ensuring data
consistency and achieving a high degree of parallel performance in the
shared-memory setting. In this paper, we extend the GraphLab framework to the
substantially more challenging distributed setting while preserving strong data
consistency guarantees. We develop graph based extensions to pipelined locking
and data versioning to reduce network congestion and mitigate the effect of
network latency. We also introduce fault tolerance to the GraphLab abstraction
using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can
be easily implemented by exploiting the GraphLab abstraction itself. Finally,
we evaluate our distributed implementation of the GraphLab abstraction on a
large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains
over Hadoop-based implementations.Comment: VLDB201
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
i2MapReduce: Incremental MapReduce for Mining Evolving Big Data
As new data and updates are constantly arriving, the results of data mining
applications become stale and obsolete over time. Incremental processing is a
promising approach to refreshing mining results. It utilizes previously saved
states to avoid the expense of re-computation from scratch.
In this paper, we propose i2MapReduce, a novel incremental processing
extension to MapReduce, the most widely used framework for mining big data.
Compared with the state-of-the-art work on Incoop, i2MapReduce (i) performs
key-value pair level incremental processing rather than task level
re-computation, (ii) supports not only one-step computation but also more
sophisticated iterative computation, which is widely used in data mining
applications, and (iii) incorporates a set of novel techniques to reduce I/O
overhead for accessing preserved fine-grain computation states. We evaluate
i2MapReduce using a one-step algorithm and three iterative algorithms with
diverse computation characteristics. Experimental results on Amazon EC2 show
significant performance improvements of i2MapReduce compared to both plain and
iterative MapReduce performing re-computation
- …