Search CORE

376,788 research outputs found

CCL: a portable and tunable collective communication library for scalable parallel computers

Author: Alex Ho
Ching-tien Ho
Jehoshua Bruck
Marc Snir
Pablo Elustondo
Robert Cypher
Senior Member
Senior Member
Shlomo Kipnis
Vasanth Bala
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1995
Field of study

A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model

CiteSeerX

Caltech Authors

Acceleration computing process in wavelength scanning interferometry

Author: Gao F.
Jiang Xiang
Muhamedsalih Hussam
Publication venue
Publication date: 29/06/2011
Field of study

The optical interferometry has been widely explored for surface measurement due to the advantages of non-contact and high accuracy interrogation. Eventually, some interferometers are used to measure both rough and smooth surfaces such as white light interferometry and wavelength scanning interferometry (WSI). The WSI can be used to measure large discontinuous surface profiles without the phase ambiguity problems. However, the WSI usually needs to capture hundreds of interferograms at different wavelength in order to evaluate the surface finish for a sample. The evaluating process for this large amount of data needs long processing time if CPUs traditional programming is used. This paper presents a parallel programming model to achieve the data parallelism for accelerating the computing analysis of the captured data. This parallel programming is based on CUDATM C program structure that developed by NVIDIA. Additionally, this paper explains the mathematical algorithm that has been used for evaluating the surface profiles. The computing time and accuracy obtained from CUDA program, using GeForce GTX 280 graphics processing unit (GPU), were compared to those obtained from sequential execution Matlab program, using Intel® Core™2 Duo CPU. The results of measuring a step height sample shows that the parallel programming capability of the GPU can highly accelerate the floating point calculation throughput compared to multicore CPU

University of Huddersfield Repository

Genetic Programming Approach for Classification Problem using GPU

Author: Santoso Leo Willyanto
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/10/2020
Field of study

Genetic programming (GP) is a machine learning technique that is based on the evolution of computer programs using a genetic algorithm. Genetic programming have proven to be a good technique for solving data set classification problems but at high computational cost. The objectives of this research is to accelerate the execution of the classification algorithms by proposing a general model of execution in GPU of the adjustment function of the individuals of the population. The computation times of each of the phases of the evolutionary process and the operation of the model of parallel programming in GPU were studied. Genetic programming is interesting to parallelize from the perspective of evolving a population of individuals in parallel

Proceeding of the Electrical Engineering Computer Science and Informatics

Linda Implementations Using Monitors and Message Passing

Author: Leveton Alan L.
Publication venue: UNF Digital Commons
Publication date: 01/01/1990
Field of study

Linda is a new parallel programming language that is built around an interprocess communication model called generative communication that differs from previous models in specifying that shared data be added in tuple form to an environment called tuple space, where a tuple exists independently until some process chooses to use it. Interesting properties arise from the model, including space and time uncoupling as well as structured naming. We delineate the essential Linda operations, then discuss the properties of generative communication. We are particularly concerned with implementing Linda on top of two traditional parallel programming paradigms - process communication through globally shared memory via monitors, and process communication in local memory architectures through the use of message passing constructs. We discuss monitors and message passing, then follow with a description of the two Linda implementations

UNF Digital Commons

An Abstract Description Method of Map-Reduce-Merge Using Haskell

Author: Dongqing Liu
Lei Liu
Peng Zhang
Shuai Lü
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Map-Reduce-Merge is an improved parallel programming model based on Map-Reduce in cloud computing environment. Through the new Merge module, Map-Reduce-Merge can support processing multiple related heterogeneous datasets more efficiently. In order to demonstrate the validity and effectiveness of this new model, we present a rigorous description for Map-Reduce-Merge model using Haskell. Firstly, we describe the basic program skeleton of Map-Reduce-Merge programming model. Secondly, an abstract description for the Merge module is presented by analyzing the structure and function of the Merge module with Haskell as the description tool. Thirdly, we evaluate the Map-Reduce-Merge model on the basis of our description. We capture the functional characteristics of the Map-Reduce-Merge model by our abstract description, which can provide theoretical basis for designing more efficient parallel programming model to process join operation

Crossref

Directory of Open Access Journals

Recommended from our members

ParaMonte: An Efficient Serial/Parallel MCMC Library

Author: Bagheri Fatemeh
Osborne Joshua A
Sapkota Parvat
Shahmoradi Amir
Shashank Kumbhare
Publication venue
Publication date: 01/01/2021
Field of study

The scientific inference is a multistep process requiring observational data from which a model/hypothesis is derived. The parameters of this physical model then have to be tuned to more accurately represent data in a process known as model calibration. This calibrated model is then validated and is finally used to predict different quantities of interest. The most fundamental tool for model calibration and uncertainty quantification is the Markov Chain Monte Carlo (MCMC). While existing packages achieve many of the goals of the MCMC simulations, none currently addresses all critical aspects of an MCMC simulation. For instance, packages are frequently limited to only one programming language environment, perform serial or parallel simulations, or lack restart functionality. We present ParaMonte, a generic user-friendly, high- performance Monte Carlo simulation toolbox for serial and parallel Monte Carlo simulations accessible from multiple programming languages. ParaMonte features automatically-enabled restart functionality of all simulations in serial or parallel and comprehensive post-processing and visualization of the simulation results. This package is available to the public under the MIT license from its permanent repository: https://github.com/cdslaborg/paramont

Texas ScholarWorks

Optimizing Parallel Reduction In Cuda To Reach GPU Peak Performance

Author: Hasta Deni Tri
Mahardito Adityo
Suhendra Adang
Publication venue
Publication date
Field of study

GPUs are massively multi threaded many core chips that have hundreds of cores and thousands of concurrent threads swith high performance and memory bandwidth. Now days, GPUs have already been used to accelerate some numerically intensive high performance computing applications in parallel not only used to graphic processing. This thesis aims primarily to demonstrate the programming model approaches that can be maximize the performance of GPUs. This is accomplished by a proof of maximum reach of bandwidth memory and get speed up from the GPU that used to process parallel computation. The programming environment that used is NVIDIA’s CUDA, it is parallel architecture and model for generalpurpose computing on a GPU

Gunadarma University Repository

Recommended from our members

UPC++ v1.0 Programmer’s Guide, Revision 2020.3.0

Author: Bachan J
Baden S
Bonachea Dan
Grossman J
Hargrove P
Hofmeyr S
Jacquelin M
Kamil A
Van Straalen B
Publication venue: eScholarship, University of California
Publication date: 12/03/2020
Field of study

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate constituent process having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the processes. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores

eScholarship - University of California