14,580 research outputs found
Parallelization of Modular Algorithms
In this paper we investigate the parallelization of two modular algorithms.
In fact, we consider the modular computation of Gr\"obner bases (resp. standard
bases) and the modular computation of the associated primes of a
zero-dimensional ideal and describe their parallel implementation in SINGULAR.
Our modular algorithms to solve problems over Q mainly consist of three parts,
solving the problem modulo p for several primes p, lifting the result to Q by
applying Chinese remainder resp. rational reconstruction, and a part of
verification. Arnold proved using the Hilbert function that the verification
part in the modular algorithm to compute Gr\"obner bases can be simplified for
homogeneous ideals (cf. \cite{A03}). The idea of the proof could easily be
adapted to the local case, i.e. for local orderings and not necessarily
homogeneous ideals, using the Hilbert-Samuel function (cf. \cite{Pf07}). In
this paper we prove the corresponding theorem for non-homogeneous ideals in
case of a global ordering.Comment: 16 page
An Algorithm for Dynamic Load Balancing of Synchronous Monte Carlo Simulations on Multiprocessor Systems
We describe an algorithm for dynamic load balancing of geometrically
parallelized synchronous Monte Carlo simulations of physical models. This
algorithm is designed for a (heterogeneous) multiprocessor system of the MIMD
type with distributed memory. The algorithm is based on a dynamic partitioning
of the domain of the algorithm, taking into account the actual processor
resources of the various processors of the multiprocessor system.Comment: 12 pages, uuencoded figures included, 75.93.0
Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program. II: Wavelength Parallelization
We describe an important addition to the parallel implementation of our
generalized NLTE stellar atmosphere and radiative transfer computer program
PHOENIX. In a previous paper in this series we described data and task parallel
algorithms we have developed for radiative transfer, spectral line opacity, and
NLTE opacity and rate calculations. These algorithms divided the work spatially
or by spectral lines, that is distributing the radial zones, individual
spectral lines, or characteristic rays among different processors and employ,
in addition task parallelism for logically independent functions (such as
atomic and molecular line opacities). For finite, monotonic velocity fields,
the radiative transfer equation is an initial value problem in wavelength, and
hence each wavelength point depends upon the previous one. However, for
sophisticated NLTE models of both static and moving atmospheres needed to
accurately describe, e.g., novae and supernovae, the number of wavelength
points is very large (200,000--300,000) and hence parallelization over
wavelength can lead both to considerable speedup in calculation time and the
ability to make use of the aggregate memory available on massively parallel
supercomputers. Here, we describe an implementation of a pipelined design for
the wavelength parallelization of PHOENIX, where the necessary data from the
processor working on a previous wavelength point is sent to the processor
working on the succeeding wavelength point as soon as it is known. Our
implementation uses a MIMD design based on a relatively small number of
standard MPI library calls and is fully portable between serial and parallel
computers.Comment: AAS-TeX, 15 pages, full text with figures available at
ftp://calvin.physast.uga.edu/pub/preprints/Wavelength-Parallel.ps.gz ApJ, in
pres
Towards an Achievable Performance for the Loop Nests
Numerous code optimization techniques, including loop nest optimizations,
have been developed over the last four decades. Loop optimization techniques
transform loop nests to improve the performance of the code on a target
architecture, including exposing parallelism. Finding and evaluating an
optimal, semantic-preserving sequence of transformations is a complex problem.
The sequence is guided using heuristics and/or analytical models and there is
no way of knowing how close it gets to optimal performance or if there is any
headroom for improvement. This paper makes two contributions. First, it uses a
comparative analysis of loop optimizations/transformations across multiple
compilers to determine how much headroom may exist for each compiler. And
second, it presents an approach to characterize the loop nests based on their
hardware performance counter values and a Machine Learning approach that
predicts which compiler will generate the fastest code for a loop nest. The
prediction is made for both auto-vectorized, serial compilation and for
auto-parallelization. The results show that the headroom for state-of-the-art
compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to
1.71x for the auto-parallelized code. These results are based on the Machine
Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and
Compilers for Parallel Computing (LCPC 2018
A parallel Heap-Cell Method for Eikonal equations
Numerous applications of Eikonal equations prompted the development of many
efficient numerical algorithms. The Heap-Cell Method (HCM) is a recent serial
two-scale technique that has been shown to have advantages over other serial
state-of-the-art solvers for a wide range of problems. This paper presents a
parallelization of HCM for a shared memory architecture. The numerical
experiments in show that the parallel HCM exhibits good algorithmic
behavior and scales well, resulting in a very fast and practical solver.
We further explore the influence on performance and scaling of data
precision, early termination criteria, and the hardware architecture. A shorter
version of this manuscript (omitting these more detailed tests) has been
submitted to SIAM Journal on Scientific Computing in 2012.Comment: (a minor update to address the reviewers' comments) 31 pages; 15
figures; this is an expanded version of a paper accepted by SIAM Journal on
Scientific Computin
A review of parallel computing for large-scale remote sensing image mosaicking
Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further research of image mosaicking parallelism on a large scale. This paper provides a perspective on the current state of image mosaicking parallelization for large scale applications. We firstly introduce the motivation of image mosaicking parallel for large scale application, and analyze the difficulty and problem of parallel image mosaicking at large scale such as scheduling with huge number of dependent tasks, programming with multiple-step procedure, dealing with frequent I/O operation. Then we summarize the existing studies of parallel computing in image mosaicking for large scale applications with respect to problem decomposition and parallel strategy, parallel architecture, task schedule strategy and implementation of image mosaicking parallelization. Finally, the key problems and future potential research directions for image mosaicking are addressed
- …