9,113 research outputs found
A Compiler and Runtime Infrastructure for Automatic Program Distribution
This paper presents the design and the implementation of a compiler and runtime infrastructure for automatic program distribution. We are building a research infrastructure that enables experimentation with various program partitioning and mapping strategies and the study of automatic distribution's effect on resource consumption (e.g., CPU, memory, communication). Since many optimization techniques are faced with conflicting optimization targets (e.g., memory and communication), we believe that it is important to be able to study their interaction.
We present a set of techniques that enable flexible resource modeling and program distribution. These are: dependence analysis, weighted graph partitioning, code and communication generation, and profiling. We have developed these ideas in the context of the Java language. We present in detail the design and implementation of each of the techniques as part of our compiler and runtime infrastructure. Then, we evaluate our design and present preliminary experimental data for each component, as well as for the entire system
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
From social networks to language modeling, the growing scale and importance
of graph data has driven the development of numerous new graph-parallel systems
(e.g., Pregel, GraphLab). By restricting the computation that can be expressed
and introducing new techniques to partition and distribute the graph, these
systems can efficiently execute iterative graph algorithms orders of magnitude
faster than more general data-parallel systems. However, the same restrictions
that enable the performance gains also make it difficult to express many of the
important stages in a typical graph-analytics pipeline: constructing the graph,
modifying its structure, or expressing computation that spans multiple graphs.
As a consequence, existing graph analytics pipelines compose graph-parallel and
data-parallel systems using external storage systems, leading to extensive data
movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph
computation framework that unifies graph-parallel and data-parallel
computation. GraphX provides a small, core set of graph-parallel operators
expressive enough to implement the Pregel and PowerGraph abstractions, yet
simple enough to be cast in relational algebra. GraphX uses a collection of
query optimization techniques such as automatic join rewrites to efficiently
implement these graph-parallel operators. We evaluate GraphX on real-world
graphs and workloads and demonstrate that GraphX achieves comparable
performance as specialized graph computation systems, while outperforming them
in end-to-end graph pipelines. Moreover, GraphX achieves a balance between
expressiveness, performance, and ease of use
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
A Test Suite for High-Performance Parallel Java
The Java programming language has a number of features that make it attractive for writing high-quality, portable parallel programs. A pure object formulation, strong typing and the exception model make programs easier to create, debug, and maintain. The elegant threading provides a simple route to parallelism on shared-memory machines. Anticipating great improvements in numerical performance, this paper presents a suite of simple programs that indicate how a pure Java Navier-Stokes solver might perform. The suite includes a parallel Euler solver. We present results from a 32-processor Hewlett-Packard machine and a 4-processor Sun server. While speedup is excellent on both machines, indicating a high-quality thread scheduler, the single-processor performance needs much improvement
- âŠ