867 research outputs found

    Software Support for Irregular and Loosely Synchronous Problems

    Get PDF
    A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D

    Software Support for Irregular and Loosely Synchronous Problems

    Get PDF
    A large class of scientific and engineering applications may be classified as irregular and loosely synchronous from the perspective of parallel processing. We present a partial classification of such problems. This classification has motivated us to enhance Fortran D to provide language support for irregular, loosely synchronous problems. We present techniques for parallelization of such problems in the context of Fortran D

    Applications for Multicore System

    Get PDF
    A multi-core processor is a single computing unit with two or more processors (“cores”). These cores are integrated into a single IC for enhanced performance, reduced power consumption and more efficient simultaneous processing of multiple tasks. Homogeneous multi-core systems include only identical cores, whereas heterogeneous multi-core systems have cores that are not identical. Most of the computers and workstations these days have multicore processors. However most software programs are not designed to make use of multi-core processors and hence even though we run these programs on the new machines equipped with multicore processors, we don’t see sizable improvements in application performance. The idea behind improved performance is in parallelizing the code and distributing the work amongst multiple cores, but writing programming logic to achieve this is complex. The conventional model of lock-based parallelism for writing such programs is difficult in use, error-prone and does not always lead to efficient use of the resources but with the help of OpenMP, programmers have enhanced support for parallel programming. In this work I have implemented quicksort algorithm using OpenMP library and analysed the performance in terms of execution time

    Parallelization of Plane Sweep Based Voronoi Construction with Compiler Directives

    Get PDF
    Voronoi diagram construction is a common and fundamental problem in computational geometry and spatial computing. Numerous sequential and parallel algorithms for Voronoi diagram construction exists in literature. This paper presents a multi-threaded approach where we augment an existing sequential implementation of Fortune\u27s planesweep algorithm with compiler directives. The novelty of our fine-grained parallel algorithm lies in exploiting the concurrency available at each event point encountered during the algorithm. On the Intel Xeon E5 CPU, our shared-memory parallelization with OpenMP achieves around 2x speedup compared to the sequential implementation using datasets containing 2k-128k sites

    High Performance Fortran and Possible Extensions to support Conjugate Gradient Algorithms

    Get PDF
    We evaluate the High-Performance Fortran (HPF) language for the compact expression and efficient implementation of conjugate gradient iterative matrix-solvers on High Performance Computing and Communications(HPCC) platforms. We discuss the use of intrinsic functions, data distribution directives and explicitly parallel constructs to optimize performance by minimizing communications requirements in a portable manner. We focus on implementations using the existing HPF definitions but also discuss issues arising that may influence a revised definition for HPF-2. Some of the codes discussed are available on the World Wide Web at http://www.npac.syr.edu/hpfa/ alongwith other educational and discussion material related to applications in HPF

    Compilation techniques for irregular problems on parallel machines

    Get PDF
    Massively parallel computers have ushered in the era of teraflop computing. Even though large and powerful machines are being built, they are used by only a fraction of the computing community. The fundamental reason for this situation is that parallel machines are difficult to program. Development of compilers that automatically parallelize programs will greatly increase the use of these machines.;A large class of scientific problems can be categorized as irregular computations. In this class of computation, the data access patterns are known only at runtime, creating significant difficulties for a parallelizing compiler to generate efficient parallel codes. Some compilers with very limited abilities to parallelize simple irregular computations exist, but the methods used by these compilers fail for any non-trivial applications code.;This research presents development of compiler transformation techniques that can be used to effectively parallelize an important class of irregular programs. A central aim of these transformation techniques is to generate codes that aggressively prefetch data. Program slicing methods are used as a part of the code generation process. In this approach, a program written in a data-parallel language, such as HPF, is transformed so that it can be executed on a distributed memory machine. An efficient compiler runtime support system has been developed that performs data movement and software caching

    Efficient Machine-Independent Programming of High-Performance Multiprocessors

    Get PDF
    Parallel computing is regarded by most computer scientists as the most likely approach for significantly improving computing power for scientists and engineers. Advances in programming languages and parallelizing compilers are making parallel computers easier to use by providing a high-level portable programming model that protects software investment. However, experience has shown that simply finding parallelism is not always sufficient for obtaining good performance from today's multiprocessors. The goal of this project is to develop advanced compiler analysis of data and computation decompositions, thread placement, communication, synchronization, and memory system effects needed in order to take advantage of performance-critical elements in modern parallel architectures

    Run-time parallelization and scheduling of loops

    Get PDF
    The class of problems that can be effectively compiled by parallelizing compilers is discussed. This is accomplished with the doconsider construct which would allow these compilers to parallelize many problems in which substantial loop-level parallelism is available but cannot be detected by standard compile-time analysis. We describe and experimentally analyze mechanisms used to parallelize the work required for these types of loops. In each of these methods, a new loop structure is produced by modifying the loop to be parallelized. We also present the rules by which these loop transformations may be automated in order that they be included in language compilers. The main application area of the research involves problems in scientific computations and engineering. The workload used in our experiment includes a mixture of real problems as well as synthetically generated inputs. From our extensive tests on the Encore Multimax/320, we have reached the conclusion that for the types of workloads we have investigated, self-execution almost always performs better than pre-scheduling. Further, the improvement in performance that accrues as a result of global topological sorting of indices as opposed to the less expensive local sorting, is not very significant in the case of self-execution
    corecore