Search CORE

40,509 research outputs found

LINVIEW: Incremental View Maintenance for Complex Analytical Queries

Author: Abadi D.
Arasu A.
Deng L.
Grama A.
Kamvar S.
Kraska T.
McSherry F.
Motwani R.
Press W.
Seeger M.
Stonebraker M.
Stonebraker M.
Venkataraman S.
Whaley C.
Zaharia M.
Zhang Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2014
Field of study

Many analytics tasks and machine learning problems can be naturally expressed by iterative linear algebra programs. In this paper, we study the incremental view maintenance problem for such complex analytical queries. We develop a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost. Linear algebra operations tend to cause an avalanche effect where even very local changes to the input matrices spread out and infect all of the intermediate results and the final view, causing incremental view maintenance to lose its performance benefit over re-evaluation. We develop techniques based on matrix factorizations to contain such epidemics of change. As a consequence, our techniques make incremental view maintenance of linear algebra practical and usually substantially cheaper than re-evaluation. We show, both analytically and experimentally, the usefulness of these techniques when applied to standard analytics tasks. Our evaluation demonstrates the efficiency of LINVIEW in generating parallel incremental programs that outperform re-evaluation techniques by more than an order of magnitude.Comment: 14 pages, SIGMO

arXiv.org e-Print Archive

Crossref

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Incremental evolution strategy for function optimization

Author: Guan SU
Mo W
Publication venue: 'IOS Press'
Publication date: 01/01/2006
Field of study

This paper presents a novel evolutionary approach for function optimization Incremental Evolution Strategy (IES). Two strategies are proposed. One is to evolve the input variables incrementally. The whole evolution consists of several phases and one more variable is focused in each phase. The number of phases is equal to the number of variables in maximum. Each phase is composed of two stages: in the single-variable evolution (SVE) stage, evolution is taken on one independent variable in a series of cutting planes; in the multi-variable evolving (MVE) stage, the initial population is formed by integrating the populations obtained by the SVE and the MVE in the last phase. And the evolution is taken on the incremented variable set. The other strategy is a hybrid of particle swarm optimization (PSO) and evolution strategy (ES). PSO is applied to adjust the cutting planes/hyper-planes (in SVEs/MVEs) while (1+1)-ES is applied to searching optima in the cutting planes/hyper-planes. The results of experiments show that the performance of IES is generally better than that of three other evolutionary algorithms, improved normal GA, PSO and SADE_CERAF, in the sense that IES finds solutions closer to the true optima and with more optimal objective values

Brunel University Research Archive