Search CORE

656 research outputs found

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Monitoring Data Integrity in Big Data Analytics Services

Author: Kloukinas C.
Mantzoukas K.
Spanoudakis G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Enabled by advances in Cloud technologies, Big Data Analytics Services (BDAS) can improve many processes and identify extra information from previously untapped data sources. As our experience with BDAS and its benefits grows and technology for obtaining even more data improves, BDAS becomes ever more important for many different domains and for our daily lives. Most efforts in improving BDAS technologies have focused on scaling and efficiency issues. However, an equally important property is that of security, especially as we increasingly use public Cloud infrastructures instead of private ones. In this paper we present our approach for strengthening BDAS security by modifying the popular Spark infrastructure so as to monitor at run-time the integrity of data manipulated. In this way, we can ensure that the results obtained by the complex and resource-intensive computations performed on the Cloud are based on correct data and not data that have been tampered with or modified through faults in one of the many and complex subsystems of the overall system

City Research Online

Crossref

Hadoop-based solutions for variant calling and variant analysis

Author: Decap Dries
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2017
Field of study

Ghent University Academic Bibliography