5,094 research outputs found

    Safe Concurrency Introduction through Slicing

    Get PDF
    Traditional refactoring is about modifying the structure of existing code without changing its behaviour, but with the aim of making code easier to understand, modify, or reuse. In this paper, we introduce three novel refactorings for retrofitting concurrency to Erlang applications, and demonstrate how the use of program slicing makes the automation of these refactorings possible

    Accelerating NBODY6 with Graphics Processing Units

    Full text link
    We describe the use of Graphics Processing Units (GPUs) for speeding up the code NBODY6 which is widely used for direct NN-body simulations. Over the years, the N2N^2 nature of the direct force calculation has proved a barrier for extending the particle number. Following an early introduction of force polynomials and individual time-steps, the calculation cost was first reduced by the introduction of a neighbour scheme. After a decade of GRAPE computers which speeded up the force calculation further, we are now in the era of GPUs where relatively small hardware systems are highly cost-effective. A significant gain in efficiency is achieved by employing the GPU to obtain the so-called regular force which typically involves some 99 percent of the particles, while the remaining local forces are evaluated on the host. However, the latter operation is performed up to 20 times more frequently and may still account for a significant cost. This effort is reduced by parallel SSE/AVX procedures where each interaction term is calculated using mainly single precision. We also discuss further strategies connected with coordinate and velocity prediction required by the integration scheme. This leaves hard binaries and multiple close encounters which are treated by several regularization methods. The present nbody6-GPU code is well balanced for simulations in the particle range 1042×10510^4-2 \times 10^5 for a dual GPU system attached to a standard PC.Comment: 8 pages, 3 figures, 2 tables, MNRAS accepte

    The PARSE Programming Paradigm. Part I: Software Development Methodology. Part II: Software Development Support Tools

    Get PDF
    The programming methodology of PARSE (parallel software environment), a software environment being developed for reconfigurable non-shared memory parallel computers, is described. This environment will consist of an integrated collection of language interfaces, automatic and semi-automatic debugging and analysis tools, and operating system —all of which are made more flexible by the use of a knowledge-based implementation for the tools that make up PARSE. The programming paradigm supports the user freely choosing among three basic approaches /abstractions for programming a parallel machine: logic-based descriptive, sequential-control procedural, and parallel-control procedural programming. All of these result in efficient parallel execution. The current work discusses the methodology underlying PARSE, whereas the companion paper, “The PARSE Programming Paradigm — II: Software Development Support Tools,” details each of the component tools

    Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

    Get PDF
    NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

    A Reverse Engineering Methodology for Extracting Parallelism From Design Abstractions.

    Get PDF
    Migration of code from sequential environments to the parallel processing environments is often done in an ad hoc manner. The purpose of this research is to develop a reverse engineering methodology to facilitate systematic migration of code from sequential to the parallel processing environments. The research results include the development of a three-phase methodology and the design and development of a reverse engineering toolkit (abbreviated as RETK) which serves to establish a working model for the methodology. The methodology consists of three phases: Analysis, Synthesis, and Transformation. The Analysis phase uses concepts from reverse engineering research to recover the sequential design description from programs using a new design recovery technique. The Synthesis phase is comprised of processes that compute the data and control dependences by using the design abstractions produced by the Analysis phase to construct the program dependence graph. The Transformation phase consists of processes that require knowledge-based analysis of the program and dependence information produced by the Analysis and Synthesis phases, respectively. Design recommendations for parallel environments are the key output of the Transformation phase. The main components of RETK are an Information Extractor, a Dependence Analyzer, and a Design Assistant that implement the processes of the Analysis, Synthesis, and Transformation phases, respectively. The object-oriented design and implementation of the Information Extractor and Dependence Analyzer are described. The design and implementation of the Design Assistant using C Language Interface Production System (CLIPS) are described. In addition, experimental results of applying the methodology to test programs by RETK are presented. The results include analysis of a Numerical Aerodynamic Simulation (NAS) benchmark program. By uniquely combining research in reverse engineering, dependence analysis, and knowledge-based analysis, the methodology provides a systematic approach for code migration. The benefits of using the methodology are increased comprehensibility and improved efficiency in migrating sequential systems to parallel environments

    Master of Science

    Get PDF
    thesisThe advent of the era of cheap and pervasive many-core and multicore parallel sys-tems has highlighted the disparity of the performance achieved between novice and expert developers targeting parallel architectures. This disparity is most notiable with software for running general purpose computations on grachics processing units (GPGPU programs). Current methods for implementing GPGPU programs require an expert level understanding of the memory hierarchy and execution model of the hardware to reach peak performance. Even for experts, rewriting a program to exploit these hardware features can be tedious and error prone. Compilers and their ability to make code transformations can assist in the implementation of GPGPU programs, handling many of the target specic details. This thesis presents CUDA-CHiLL, a source to source compiler transformation and code generation framework for the parallelization and optimization of computations expressed in sequential loop nests for running on many-core GPUs. This system uniquely uses a complete scripting language to describe composable compiler transformations that can be written, shared and reused by nonexpert application and library developers. CUDA-CHiLL is built on the polyhedral program transformation and code generation framework CHiLL, which is capable of robust composition of transformations while preserving the correctness of the program at each step. Through its use of powerful abstractions and a scripting interface, CUDA-CHiLL allows for a developer to focus on optimization strategies and ignore the error prone details and low level constructs of GPGPU programming. The high level framework can be used inside an orthogonal auto-tuning system that can quickly evaluate the space of possible implementations. Although specicl to CUDA at the moment, many of the abstractions would hold for any GPGPU framework, particularly Open CL. The contributions of this thesis include a programming language approach to providing transformation abstraction and composition, a unifying framework for general and GPU specicl transformations, and demonstration of the framework on standard benchmarks that show it capable of matching or outperforming hand-tuned GPU kernels

    Functional design for operational earth resources ground data processing

    Get PDF
    The author has identified the following significant results. Study emphasis was on developing a unified concept for the required ground system, capable of handling data from all viable acquisition platforms and sensor groupings envisaged as supporting operational earth survey programs. The platforms considered include both manned and unmanned spacecraft in near earth orbit, and continued use of low and high altitude aircraft. The sensor systems include both imaging and nonimaging devices, operated both passively and actively, from the ultraviolet to the microwave regions of the electromagnetic spectrum
    corecore