40,509 research outputs found
LINVIEW: Incremental View Maintenance for Complex Analytical Queries
Many analytics tasks and machine learning problems can be naturally expressed
by iterative linear algebra programs. In this paper, we study the incremental
view maintenance problem for such complex analytical queries. We develop a
framework, called LINVIEW, for capturing deltas of linear algebra programs and
understanding their computational cost. Linear algebra operations tend to cause
an avalanche effect where even very local changes to the input matrices spread
out and infect all of the intermediate results and the final view, causing
incremental view maintenance to lose its performance benefit over
re-evaluation. We develop techniques based on matrix factorizations to contain
such epidemics of change. As a consequence, our techniques make incremental
view maintenance of linear algebra practical and usually substantially cheaper
than re-evaluation. We show, both analytically and experimentally, the
usefulness of these techniques when applied to standard analytics tasks. Our
evaluation demonstrates the efficiency of LINVIEW in generating parallel
incremental programs that outperform re-evaluation techniques by more than an
order of magnitude.Comment: 14 pages, SIGMO
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Recommended from our members
Incremental evolution strategy for function optimization
This paper presents a novel evolutionary approach for function optimization Incremental Evolution Strategy (IES). Two strategies are proposed. One is to evolve the input variables incrementally. The whole evolution consists of several phases and one more variable is focused in each phase. The number of phases is equal to the number of variables in maximum. Each phase is composed of two stages: in the single-variable evolution (SVE) stage, evolution is taken on one independent variable in a series of cutting planes; in the multi-variable evolving (MVE) stage, the initial population is formed by integrating the populations obtained by the SVE and the MVE in the last phase. And the evolution is taken on the incremented variable set. The other strategy is a hybrid of particle swarm optimization (PSO) and evolution strategy (ES). PSO is applied to adjust the cutting planes/hyper-planes (in SVEs/MVEs) while (1+1)-ES is applied to searching optima in the cutting planes/hyper-planes. The results of experiments show that the performance of IES is generally better than that of three other evolutionary algorithms, improved normal GA, PSO and SADE_CERAF, in the sense that IES finds solutions closer to the true optima and with more optimal objective values
- …