376,788 research outputs found

    CCL: a portable and tunable collective communication library for scalable parallel computers

    Get PDF
    A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model

    Acceleration computing process in wavelength scanning interferometry

    Get PDF
    The optical interferometry has been widely explored for surface measurement due to the advantages of non-contact and high accuracy interrogation. Eventually, some interferometers are used to measure both rough and smooth surfaces such as white light interferometry and wavelength scanning interferometry (WSI). The WSI can be used to measure large discontinuous surface profiles without the phase ambiguity problems. However, the WSI usually needs to capture hundreds of interferograms at different wavelength in order to evaluate the surface finish for a sample. The evaluating process for this large amount of data needs long processing time if CPUs traditional programming is used. This paper presents a parallel programming model to achieve the data parallelism for accelerating the computing analysis of the captured data. This parallel programming is based on CUDATM C program structure that developed by NVIDIA. Additionally, this paper explains the mathematical algorithm that has been used for evaluating the surface profiles. The computing time and accuracy obtained from CUDA program, using GeForce GTX 280 graphics processing unit (GPU), were compared to those obtained from sequential execution Matlab program, using Intel® Core™2 Duo CPU. The results of measuring a step height sample shows that the parallel programming capability of the GPU can highly accelerate the floating point calculation throughput compared to multicore CPU

    Genetic Programming Approach for Classification Problem using GPU

    Get PDF
    Genetic programming (GP) is a machine learning technique that is based on the evolution of computer programs using a genetic algorithm. Genetic programming have proven to be a good technique for solving data set classification problems but at high computational cost. The objectives of this research is to accelerate the execution of the classification algorithms by proposing a general model of execution in GPU of the adjustment function of the individuals of the population. The computation times of each of the phases of the evolutionary process and the operation of the model of parallel programming in GPU were studied. Genetic programming is interesting to parallelize from the perspective of evolving a population of individuals in parallel

    Linda Implementations Using Monitors and Message Passing

    Get PDF
    Linda is a new parallel programming language that is built around an interprocess communication model called generative communication that differs from previous models in specifying that shared data be added in tuple form to an environment called tuple space, where a tuple exists independently until some process chooses to use it. Interesting properties arise from the model, including space and time uncoupling as well as structured naming. We delineate the essential Linda operations, then discuss the properties of generative communication. We are particularly concerned with implementing Linda on top of two traditional parallel programming paradigms - process communication through globally shared memory via monitors, and process communication in local memory architectures through the use of message passing constructs. We discuss monitors and message passing, then follow with a description of the two Linda implementations

    An Abstract Description Method of Map-Reduce-Merge Using Haskell

    Get PDF
    Map-Reduce-Merge is an improved parallel programming model based on Map-Reduce in cloud computing environment. Through the new Merge module, Map-Reduce-Merge can support processing multiple related heterogeneous datasets more efficiently. In order to demonstrate the validity and effectiveness of this new model, we present a rigorous description for Map-Reduce-Merge model using Haskell. Firstly, we describe the basic program skeleton of Map-Reduce-Merge programming model. Secondly, an abstract description for the Merge module is presented by analyzing the structure and function of the Merge module with Haskell as the description tool. Thirdly, we evaluate the Map-Reduce-Merge model on the basis of our description. We capture the functional characteristics of the Map-Reduce-Merge model by our abstract description, which can provide theoretical basis for designing more efficient parallel programming model to process join operation

    Optimizing Parallel Reduction In Cuda To Reach GPU Peak Performance

    Get PDF
    GPUs are massively multi threaded many core chips that have hundreds of cores and thousands of concurrent threads swith high performance and memory bandwidth. Now days, GPUs have already been used to accelerate some numerically intensive high performance computing applications in parallel not only used to graphic processing. This thesis aims primarily to demonstrate the programming model approaches that can be maximize the performance of GPUs. This is accomplished by a proof of maximum reach of bandwidth memory and get speed up from the GPU that used to process parallel computation. The programming environment that used is NVIDIA’s CUDA, it is parallel architecture and model for generalpurpose computing on a GPU
    corecore