14,580 research outputs found

    Parallelization of Modular Algorithms

    Get PDF
    In this paper we investigate the parallelization of two modular algorithms. In fact, we consider the modular computation of Gr\"obner bases (resp. standard bases) and the modular computation of the associated primes of a zero-dimensional ideal and describe their parallel implementation in SINGULAR. Our modular algorithms to solve problems over Q mainly consist of three parts, solving the problem modulo p for several primes p, lifting the result to Q by applying Chinese remainder resp. rational reconstruction, and a part of verification. Arnold proved using the Hilbert function that the verification part in the modular algorithm to compute Gr\"obner bases can be simplified for homogeneous ideals (cf. \cite{A03}). The idea of the proof could easily be adapted to the local case, i.e. for local orderings and not necessarily homogeneous ideals, using the Hilbert-Samuel function (cf. \cite{Pf07}). In this paper we prove the corresponding theorem for non-homogeneous ideals in case of a global ordering.Comment: 16 page

    An Algorithm for Dynamic Load Balancing of Synchronous Monte Carlo Simulations on Multiprocessor Systems

    Full text link
    We describe an algorithm for dynamic load balancing of geometrically parallelized synchronous Monte Carlo simulations of physical models. This algorithm is designed for a (heterogeneous) multiprocessor system of the MIMD type with distributed memory. The algorithm is based on a dynamic partitioning of the domain of the algorithm, taking into account the actual processor resources of the various processors of the multiprocessor system.Comment: 12 pages, uuencoded figures included, 75.93.0

    Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program. II: Wavelength Parallelization

    Get PDF
    We describe an important addition to the parallel implementation of our generalized NLTE stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000--300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard MPI library calls and is fully portable between serial and parallel computers.Comment: AAS-TeX, 15 pages, full text with figures available at ftp://calvin.physast.uga.edu/pub/preprints/Wavelength-Parallel.ps.gz ApJ, in pres

    Towards an Achievable Performance for the Loop Nests

    Full text link
    Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.Comment: Accepted at the 31st International Workshop on Languages and Compilers for Parallel Computing (LCPC 2018

    A parallel Heap-Cell Method for Eikonal equations

    Full text link
    Numerous applications of Eikonal equations prompted the development of many efficient numerical algorithms. The Heap-Cell Method (HCM) is a recent serial two-scale technique that has been shown to have advantages over other serial state-of-the-art solvers for a wide range of problems. This paper presents a parallelization of HCM for a shared memory architecture. The numerical experiments in R3R^3 show that the parallel HCM exhibits good algorithmic behavior and scales well, resulting in a very fast and practical solver. We further explore the influence on performance and scaling of data precision, early termination criteria, and the hardware architecture. A shorter version of this manuscript (omitting these more detailed tests) has been submitted to SIAM Journal on Scientific Computing in 2012.Comment: (a minor update to address the reviewers' comments) 31 pages; 15 figures; this is an expanded version of a paper accepted by SIAM Journal on Scientific Computin

    A review of parallel computing for large-scale remote sensing image mosaicking

    Get PDF
    Interest in image mosaicking has been spurred by a wide variety of research and management needs. However, for large-scale applications, remote sensing image mosaicking usually requires significant computational capabilities. Several studies have attempted to apply parallel computing to improve image mosaicking algorithms and to speed up calculation process. The state of the art of this field has not yet been summarized, which is, however, essential for a better understanding and for further research of image mosaicking parallelism on a large scale. This paper provides a perspective on the current state of image mosaicking parallelization for large scale applications. We firstly introduce the motivation of image mosaicking parallel for large scale application, and analyze the difficulty and problem of parallel image mosaicking at large scale such as scheduling with huge number of dependent tasks, programming with multiple-step procedure, dealing with frequent I/O operation. Then we summarize the existing studies of parallel computing in image mosaicking for large scale applications with respect to problem decomposition and parallel strategy, parallel architecture, task schedule strategy and implementation of image mosaicking parallelization. Finally, the key problems and future potential research directions for image mosaicking are addressed
    • …
    corecore