Search CORE

5,094 research outputs found

Safe Concurrency Introduction through Slicing

Author: Bozó I.
Cesarini F.
Cheng J.
Nyblom P.
Tóth M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Traditional refactoring is about modifying the structure of existing code without changing its behaviour, but with the aim of making code easier to understand, modify, or reuse. In this paper, we introduce three novel refactorings for retrofitting concurrency to Erlang applications, and demonstrate how the use of program slicing makes the automation of these refactorings possible

CiteSeerX

Crossref

Kent Academic Repository

Accelerating NBODY6 with Graphics Processing Units

Author: Aarseth
Aarseth
Aarseth
Ahmad
Belleman
Bulirsch
Fukushima
Gaburov
Keigo Nitadori
Kustaanheimo
Makino
Makino
Makino
Makino
Mikkola
Mikkola
Mikkola
Spurzem
Sverre J. Aarseth
Tanikawa
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY6 which is widely used for direct

N

-body simulations. Over the years, the

N^2

nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time-steps, the calculation cost was first reduced by the introduction of a neighbour scheme. After a decade of GRAPE computers which speeded up the force calculation further, we are now in the era of GPUs where relatively small hardware systems are highly cost-effective. A significant gain in efficiency is achieved by employing the GPU to obtain the so-called regular force which typically involves some 99 percent of the particles, while the remaining local forces are evaluated on the host. However, the latter operation is performed up to 20 times more frequently and may still account for a significant cost. This effort is reduced by parallel SSE/AVX procedures where each interaction term is calculated using mainly single precision. We also discuss further strategies connected with coordinate and velocity prediction required by the integration scheme. This leaves hard binaries and multiple close encounters which are treated by several regularization methods. The present nbody6-GPU code is well balanced for simulations in the particle range

10^4-2 \times 10^5

for a dual GPU system attached to a standard PC.Comment: 8 pages, 3 figures, 2 tables, MNRAS accepte

arXiv.org e-Print Archive

CiteSeerX

Crossref

The PARSE Programming Paradigm. Part I: Software Development Methodology. Part II: Software Development Support Tools

Author: Casavant T. L.
Dietz Henry G.
Sheu P. C.-Y.
Siegel H. J.
Publication venue: 'Purdue University (bepress)'
Publication date: 01/06/1987
Field of study

The programming methodology of PARSE (parallel software environment), a software environment being developed for reconfigurable non-shared memory parallel computers, is described. This environment will consist of an integrated collection of language interfaces, automatic and semi-automatic debugging and analysis tools, and operating system —all of which are made more flexible by the use of a knowledge-based implementation for the tools that make up PARSE. The programming paradigm supports the user freely choosing among three basic approaches /abstractions for programming a parallel machine: logic-based descriptive, sequential-control procedural, and parallel-control procedural programming. All of these result in efficient parallel execution. The current work discusses the methodology underlying PARSE, whereas the companion paper, “The PARSE Programming Paradigm — II: Software Development Support Tools,” details each of the component tools

Purdue E-Pubs

Recommended from our members

Computer-aided programming for multiprocessing systems

Author: Gajski Daniel D.
Wu Min-You
Publication venue: eScholarship, University of California
Publication date: 30/06/1988
Field of study

As both the number of processors and the complexity of problems to be solved increase, programming multiprocessing systems becomes more difficult and error-prone. This report discusses parallel models of computation and tools for computer-aided programming (CAP). Program development tools are necessary since programmers are not able to develop complex parallel programs efficiently. In particular, a CAP tool, named Hypertool, is described here. It performs scheduling and handles the communication primitive insertion automatically so that many errors are eliminated. It also generates the performance estimates and other program quality measures to help programmers in improving their algorithms and programs. Experiments have shown that up to a 300% performance improvement can be achieved by computer-aided programming

eScholarship - University of California

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

A Reverse Engineering Methodology for Extracting Parallelism From Design Abstractions.

Author: Erraguntla Ravi Chandra
Publication venue: LSU Digital Commons
Publication date: 01/01/1996
Field of study

Migration of code from sequential environments to the parallel processing environments is often done in an ad hoc manner. The purpose of this research is to develop a reverse engineering methodology to facilitate systematic migration of code from sequential to the parallel processing environments. The research results include the development of a three-phase methodology and the design and development of a reverse engineering toolkit (abbreviated as RETK) which serves to establish a working model for the methodology. The methodology consists of three phases: Analysis, Synthesis, and Transformation. The Analysis phase uses concepts from reverse engineering research to recover the sequential design description from programs using a new design recovery technique. The Synthesis phase is comprised of processes that compute the data and control dependences by using the design abstractions produced by the Analysis phase to construct the program dependence graph. The Transformation phase consists of processes that require knowledge-based analysis of the program and dependence information produced by the Analysis and Synthesis phases, respectively. Design recommendations for parallel environments are the key output of the Transformation phase. The main components of RETK are an Information Extractor, a Dependence Analyzer, and a Design Assistant that implement the processes of the Analysis, Synthesis, and Transformation phases, respectively. The object-oriented design and implementation of the Information Extractor and Dependence Analyzer are described. The design and implementation of the Design Assistant using C Language Interface Production System (CLIPS) are described. In addition, experimental results of applying the methodology to test programs by RETK are presented. The results include analysis of a Numerical Aerodynamic Simulation (NAS) benchmark program. By uniquely combining research in reverse engineering, dependence analysis, and knowledge-based analysis, the methodology provides a systematic approach for code migration. The benefits of using the methodology are increased comprehensibility and improved efficiency in migrating sequential systems to parallel environments

Louisiana State University

Master of Science

Author: Rudy Gabe
Publication venue: University of Utah
Publication date: 29/07/2010
Field of study

thesisThe advent of the era of cheap and pervasive many-core and multicore parallel sys-tems has highlighted the disparity of the performance achieved between novice and expert developers targeting parallel architectures. This disparity is most notiable with software for running general purpose computations on grachics processing units (GPGPU programs). Current methods for implementing GPGPU programs require an expert level understanding of the memory hierarchy and execution model of the hardware to reach peak performance. Even for experts, rewriting a program to exploit these hardware features can be tedious and error prone. Compilers and their ability to make code transformations can assist in the implementation of GPGPU programs, handling many of the target specic details. This thesis presents CUDA-CHiLL, a source to source compiler transformation and code generation framework for the parallelization and optimization of computations expressed in sequential loop nests for running on many-core GPUs. This system uniquely uses a complete scripting language to describe composable compiler transformations that can be written, shared and reused by nonexpert application and library developers. CUDA-CHiLL is built on the polyhedral program transformation and code generation framework CHiLL, which is capable of robust composition of transformations while preserving the correctness of the program at each step. Through its use of powerful abstractions and a scripting interface, CUDA-CHiLL allows for a developer to focus on optimization strategies and ignore the error prone details and low level constructs of GPGPU programming. The high level framework can be used inside an orthogonal auto-tuning system that can quickly evaluate the space of possible implementations. Although specicl to CUDA at the moment, many of the abstractions would hold for any GPGPU framework, particularly Open CL. The contributions of this thesis include a programming language approach to providing transformation abstraction and composition, a unifying framework for general and GPU specicl transformations, and demonstration of the framework on standard benchmarks that show it capable of matching or outperforming hand-tuned GPU kernels

The University of Utah: J. Willard Marriott Digital Library

Functional design for operational earth resources ground data processing

Author: Baldwin C. J.
Bradford L. H.
Hutson D. E.
Jugle D. R.
Publication venue
Publication date
Field of study

The author has identified the following significant results. Study emphasis was on developing a unified concept for the required ground system, capable of handling data from all viable acquisition platforms and sensor groupings envisaged as supporting operational earth survey programs. The platforms considered include both manned and unmanned spacecraft in near earth orbit, and continued use of low and high altitude aircraft. The sensor systems include both imaging and nonimaging devices, operated both passively and actively, from the ultraviolet to the microwave regions of the electromagnetic spectrum

NASA Technical Reports Server