3,158 research outputs found
Performance analysis of asynchronous parallel Jacobi
The directed acyclic graph (DAG) associated with a parallel algorithm captures the partial order in which separaT.L.cal computations are completed and how their outputs are subsequently used in further computations. Unlike in a synchronous parallel algorithm, the DAG associated with an asynchronous parallel algorithm is not predetermined. Instead, it is a product of the asynchronous timing dynamics of the machine and cannot be known in advance, as such it is best thought of as a pseudorandom variable. In this paper, we present a formalism for analyzing the performance of asynchronous parallel Jacobi’s method in terms of its DAG. We use this app.roach to prove error bounds and bounds on the rate of convergence. The rate of convergence bounds is based on the statistical properties of the DAG and is valid for systems with a non-negative iteration matrix. We supp.ort our theoretical results with a suit of numerical examples, where we compare the performance of synchronous and asynchronous parallel Jacobi to certain statistical properties of the DAGs associated with the computations. We also present some examples of small matrices with elements of mixed sign, which demonstrate that determining whether a system will converge under asynchronous iteration in this more general setting is a far more difficult problem
The finite element machine: An experiment in parallel processing
The finite element machine is a prototype computer designed to support parallel solutions to structural analysis problems. The hardware architecture and support software for the machine, initial solution algorithms and test applications, and preliminary results are described
Convergence-Optimal Quantizer Design of Distributed Contraction-based Iterative Algorithms with Quantized Message Passing
In this paper, we study the convergence behavior of distributed iterative
algorithms with quantized message passing. We first introduce general iterative
function evaluation algorithms for solving fixed point problems distributively.
We then analyze the convergence of the distributed algorithms, e.g. Jacobi
scheme and Gauss-Seidel scheme, under the quantized message passing. Based on
the closed-form convergence performance derived, we propose two quantizer
designs, namely the time invariant convergence-optimal quantizer (TICOQ) and
the time varying convergence-optimal quantizer (TVCOQ), to minimize the effect
of the quantization error on the convergence. We also study the tradeoff
between the convergence error and message passing overhead for both TICOQ and
TVCOQ. As an example, we apply the TICOQ and TVCOQ designs to the iterative
waterfilling algorithm of MIMO interference game.Comment: 17 pages, 9 figures, Transaction on Signal Processing, accepte
GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high
performance computer systems are still a matter of intense research, there
appears to be a general consensus that they will be strongly heterogeneous,
featuring "standard" as well as "accelerated" resources. Today, such resources
are available as multicore processors, graphics processing units (GPUs), and
other accelerators such as the Intel Xeon Phi. Any software infrastructure that
claims usefulness for such environments must be able to meet their inherent
challenges: massive multi-level parallelism, topology, asynchronicity, and
abstraction. The "General, Hybrid, and Optimized Sparse Toolkit" (GHOST) is a
collection of building blocks that targets algorithms dealing with sparse
matrix representations on current and future large-scale systems. It implements
the "MPI+X" paradigm, has a pure C interface, and provides hybrid-parallel
numerical kernels, intelligent resource management, and truly heterogeneous
parallelism for multicore CPUs, Nvidia GPUs, and the Intel Xeon Phi. We
describe the details of its design with respect to the challenges posed by
modern heterogeneous supercomputers and recent algorithmic developments.
Implementation details which are indispensable for achieving high efficiency
are pointed out and their necessity is justified by performance measurements or
predictions based on performance models. The library code and several
applications are available as open source. We also provide instructions on how
to make use of GHOST in existing software packages, together with a case study
which demonstrates the applicability and performance of GHOST as a component
within a larger software stack.Comment: 32 pages, 11 figure
- …