Search CORE

386,773 research outputs found

CRBLASTER: A Parallel-Processing Computational Framework for Embarrassingly-Parallel Image-Analysis Algorithms

Author: Mighell Kenneth John
Publication venue: 'University of Chicago Press'
Publication date: 12/08/2010
Field of study

The development of parallel-processing image-analysis codes is generally a challenging task that requires complicated choreography of interprocessor communications. If, however, the image-analysis algorithm is embarrassingly parallel, then the development of a parallel-processing implementation of that algorithm can be a much easier task to accomplish because, by definition, there is little need for communication between the compute processes. I describe the design, implementation, and performance of a parallel-processing image-analysis application, called CRBLASTER, which does cosmic-ray rejection of CCD (charge-coupled device) images using the embarrassingly-parallel L.A.COSMIC algorithm. CRBLASTER is written in C using the high-performance computing industry standard Message Passing Interface (MPI) library. The code has been designed to be used by research scientists who are familiar with C as a parallel-processing computational framework that enables the easy development of parallel-processing image-analysis programs based on embarrassingly-parallel algorithms. The CRBLASTER source code is freely available at the official application website at the National Optical Astronomy Observatory. Removing cosmic rays from a single 800x800 pixel Hubble Space Telescope WFPC2 image takes 44 seconds with the IRAF script lacos_im.cl running on a single core of an Apple Mac Pro computer with two 2.8-GHz quad-core Intel Xeon processors. CRBLASTER is 7.4 times faster processing the same image on a single core on the same machine. Processing the same image with CRBLASTER simultaneously on all 8 cores of the same machine takes 0.875 seconds -- which is a speedup factor of 50.3 times faster than the IRAF script. A detailed analysis is presented of the performance of CRBLASTER using between 1 and 57 processors on a low-power Tilera 700-MHz 64-core TILE64 processor.Comment: 8 pages, 2 figures, 1 table, accepted for publication in PAS

arXiv.org e-Print Archive

Crossref

On Probabilistic Parallel Programs with Process Creation and Synchronisation

Author: Kiefer Stefan
Wojtczak Dominik
Publication venue
Publication date: 01/01/2010
Field of study

We initiate the study of probabilistic parallel programs with dynamic process creation and synchronisation. To this end, we introduce probabilistic split-join systems (pSJSs), a model for parallel programs, generalising both probabilistic pushdown systems (a model for sequential probabilistic procedural programs which is equivalent to recursive Markov chains) and stochastic branching processes (a classical mathematical model with applications in various areas such as biology, physics, and language processing). Our pSJS model allows for a possibly recursive spawning of parallel processes; the spawned processes can synchronise and return values. We study the basic performance measures of pSJSs, especially the distribution and expectation of space, work and time. Our results extend and improve previously known results on the subsumed models. We also show how to do performance analysis in practice, and present two case studies illustrating the modelling power of pSJSs.Comment: This is a technical report accompanying a TACAS'11 pape

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Improving parallel program performance using critical path analysis

Author: Bic Lubomir
Gajski Daniel D.
Kwan Andrew W.
Publication venue: eScholarship, University of California
Publication date: 01/01/1989
Field of study

A programming tool that performs analysis of critical paths for parallel programs has been developed. This tool determines the critical path for the program as scheduled onto a parallel computer with P processing elements, the critical path for the program expressed as a data flow graph (when maximal parallelism can be expressed), and the minimum number of processing elements (P_opt) needed to obtain maximum program speedup. Experiments were performed using several versions of a Gaussian elimination program to examine how speedup varied with changes in granularity and critical path length. These experiments showed that when the available numer of processing elements P < P_opt, increasing granularity improved program speedup more than reducing (the data flow graph's) critical path length, whereas when P ≥ P_opt, increasing granularity degraded program speedup while reducing critical path length improved program speedup

eScholarship - University of California

PETRI NET BASED MODELING OF PARALLEL PROGRAMS EXECUTING ON DISTRIBUTED MEMORY MULTIPROCESSOR SYSTEMS

Author: Ferscha A.
Haring G.
Publication venue: Periodica Polytechnica Electrical Engineering (Archives)
Publication date: 01/01/1991
Field of study

The development of parallel programs following the paradigm of communicating sequen- tial processes to be executed on distributed memory multiprocessor systems is addressed. The key issue in programming parallel machines today is to provide computerized tools supporting the development of efficient parallel software, i.e. software effectively har- nessing the power of parallel processing systems. The critical situations where a parallel programmer needs help is in expressing a parallel algorithm in a programming language, in getting a parallel program to work and in tuning it to get optimum performance (for example speedup). . We show that the Petri net formalism is higly suitable as a performance modeling technique for asynchronous parallel systems, by introducing a model taking care of the parallel program, parallel architecture and mapping influences on overall system perfor- mance. PRM -net (Program-Resource- Mapping) models comprise a Petri net model of the multiple flows of control in a parallel program, a Petri net model of the parallel hardware and the process-to-processor mapping information into a single integrated performance model. Automated analysis of PRM-net models addresses correctness and performance of parallel programs mapped to parallel hardware. Questions upon the correctness of parallel programs can be answered by investigating behavioural properties of Petri net programs like liveness, reachability, boundedness, mutualy exclusiveness etc. Peformance of parallel programs is usefully considered only in concern with a dedicated target hard- ware. For this reason it is essential to integrate multiprocessor hardware characteristics into the specification of a parallel program. The integration is done by assigning the concurrent processes to physical processing devices and communication patterns among parallel processes to communication media connecting processing elements yielding an in- tegrated, Petri net based performance model. Evaluation of the integrated model applies simulation and markovian analysis to derive expressions characterising the peformance of the program being developed. Synthesis and decomposition rules for hierarchical models naturally give raise to use PRM-net models for graphical, performance oriented parallel programming, support- ing top-down (stepwise refinement) as well as bottom-up development approaches. The graphical representation of Petri net programs visualizes phenomena like parallelism, syn- chronisation, communication, sequential and alternative execution. Modularity of pro- gram blocks aids reusability, prototyping is promoted by automated code generation on the basis of high level program specifications

Periodica Polytechnica (Budapest University of Technology and Economics)

The paradigm compiler: Mapping a functional language for the connection machine

Author: Dennis Jack B.
Publication venue
Publication date
Field of study

The Paradigm Compiler implements a new approach to compiling programs written in high level languages for execution on highly parallel computers. The general approach is to identify the principal data structures constructed by the program and to map these structures onto the processing elements of the target machine. The mapping is chosen to maximize performance as determined through compile time global analysis of the source program. The source language is Sisal, a functional language designed for scientific computations, and the target language is Paris, the published low level interface to the Connection Machine. The data structures considered are multidimensional arrays whose dimensions are known at compile time. Computations that build such arrays usually offer opportunities for highly parallel execution; they are data parallel. The Connection Machine is an attractive target for these computations, and the parallel for construct of the Sisal language is a convenient high level notation for data parallel algorithms. The principles and organization of the Paradigm Compiler are discussed

NASA Technical Reports Server

Three-dimensional boundary layer analysis program Blay and its application

Author: Ishiguro T.
Matsuno K. I.
Publication venue
Publication date
Field of study

The boundary layer calculation program (BLAY) is a program code which accurately analyzes the three-dimensional boundary layer of a wing with an undefined plane. In comparison with other preexisting programs, the BLAY is characterized by the following: (1) the time required for computation is shorter than any other; (2) the program is adaptable to a parallel processing computer; and (3) the program is associated with a secondary accuracy in the z-direction. As a boundary layer modification to transonic nonviscous flow analysis programs, it is used to adjust viscous and nonviscous interference problems repeatedly. Its efficiency is an important factor in cost reduction in aircraft designing

NASA Technical Reports Server

Towards Parallel Programming Models for Predictability

Author
Publication venue: OASIcs - OpenAccess Series in Informatics. 12th International Workshop on Worst-Case Execution Time Analysis
Publication date: 01/01/2012
Field of study

Future embedded systems for performance-demanding applications will be massively parallel. High performance tasks will be parallel programs, running on several cores, rather than single threads running on single cores. For hard real-time applications, WCETs for such tasks must be bounded. Low-level parallel programming models, based on concurrent threads, are notoriously hard to use due to their inherent nondeterminism. Therefore the parallel processing community has long considered high-level parallel programming models, which restrict the low-level models to regain determinism. In this position paper we argue that such parallel programming models are beneficial also for WCET analysis of parallel programs. We review some proposed models, and discuss their influence on timing predictability. In particular we identify data parallel programming as a suitable paradigm as it is deterministic and allows current methods for WCET analysis to be extended to parallel code. GPUs are increasingly used for high performance applications: we discuss a current GPU architecture, and we argue that it offers a parallel platform for compute-intensive applications for which it seems possible to construct precise timing models. Thus, a promising route for the future is to develop WCET analyses for data-parallel software running on GPUs

Dagstuhl Research Online Publication Server

PyPele Rewritten To Use MPI

Author: Hockney George
Lee Seungwon
Publication venue
Publication date
Field of study

A computer program known as PyPele, originally written as a Pythonlanguage extension module of a C++ language program, has been rewritten in pure Python language. The original version of PyPele dispatches and coordinates parallel-processing tasks on cluster computers and provides a conceptual framework for spacecraft-mission- design and -analysis software tools to run in an embarrassingly parallel mode. The original version of PyPele uses SSH (Secure Shell a set of standards and an associated network protocol for establishing a secure channel between a local and a remote computer) to coordinate parallel processing. Instead of SSH, the present Python version of PyPele uses Message Passing Interface (MPI) [an unofficial de-facto standard language-independent application programming interface for message- passing on a parallel computer] while keeping the same user interface. The use of MPI instead of SSH and the preservation of the original PyPele user interface make it possible for parallel application programs written previously for the original version of PyPele to run on MPI-based cluster computers. As a result, engineers using the previously written application programs can take advantage of embarrassing parallelism without need to rewrite those programs

NASA Technical Reports Server

Parallel programming in biomedical signal processing

Author: Chorão Ricardo Daniel Domingos
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2012
Field of study

Dissertação para obtenção do Grau de Mestre em Engenharia BiomédicaPatients with neuromuscular and cardiorespiratory diseases need to be monitored continuously. This constant monitoring gives rise to huge amounts of multivariate data which need to be processed as soon as possible, so that their most relevant features can be extracted. The field of parallel processing, an area from the computational sciences, comes naturally as a way to provide an answer to this problem. For the parallel processing to succeed it is necessary to adapt the pre-existing signal processing algorithms to the modern architectures of computer systems with several processing units. In this work parallel processing techniques are applied to biosignals, connecting the area of computer science to the biomedical domain. Several considerations are made on how to design parallel algorithms for signal processing, following the data parallel paradigm. The emphasis is given to algorithm design, rather than the computing systems that execute these algorithms. Nonetheless, shared memory systems and distributed memory systems are mentioned in the present work. Two signal processing tools integrating some of the parallel programming concepts mentioned throughout this work were developed. These tools allow a fast and efficient analysis of long-term biosignals. The two kinds of analysis are focused on heart rate variability and breath frequency, and aim to the processing of electrocardiograms and respiratory signals, respectively. The proposed tools make use of the several processing units that most of the actual computers include in their architecture, giving the clinician a fast tool without him having to set up a system specifically meant to run parallel programs

Repositório da Universidade Nova de Lisboa