1,268 research outputs found

    Safe Concurrency Introduction through Slicing

    Get PDF
    Traditional refactoring is about modifying the structure of existing code without changing its behaviour, but with the aim of making code easier to understand, modify, or reuse. In this paper, we introduce three novel refactorings for retrofitting concurrency to Erlang applications, and demonstrate how the use of program slicing makes the automation of these refactorings possible

    Hierarchical Parallelisation of Functional Renormalisation Group Calculations -- hp-fRG

    Get PDF
    The functional renormalisation group (fRG) has evolved into a versatile tool in condensed matter theory for studying important aspects of correlated electron systems. Practical applications of the method often involve a high numerical effort, motivating the question in how far High Performance Computing (HPC) can leverage the approach. In this work we report on a multi-level parallelisation of the underlying computational machinery and show that this can speed up the code by several orders of magnitude. This in turn can extend the applicability of the method to otherwise inaccessible cases. We exploit three levels of parallelisation: Distributed computing by means of Message Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means of SIMD units (single-instruction-multiple-data). Results are provided for two distinct High Performance Computing (HPC) platforms, namely the IBM-based BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster. We discuss how certain issues and obstacles were overcome in the course of adapting the code. Most importantly, we conclude that this vast improvement can actually be accomplished by introducing only moderate changes to the code, such that this strategy may serve as a guideline for other researcher to likewise improve the efficiency of their codes

    On Designing Multicore-aware Simulators for Biological Systems

    Full text link
    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

    Parallelisation of algorithms

    Get PDF
    Most numerical software involves performing an extremely large volume of algebraic computations. This is both costly and time consuming in respect of computer resources and, for large problems, often super-computer power is required in order for results to be obtained in a reasonable amount of time. One method whereby both the cost and time can be reduced is to use the principle "Many hands make light work", or rather, allow several computers to operate simultaneously on the code, working towards a common goal, and hopefully obtaining the required results in a fraction of the time and cost normally used. This can be achieved through the modification of the costly, time consuming code, breaking it up into separate individual code segments which may be executed concurrently on different processors. This is termed parallelisation of code. This document describes communication between sequential processes, protocols, message routing and parallelisation of algorithms. In particular, it deals with these aspects with reference to the Transputer as developed by INMOS and includes two parallelisation examples, namely parallelisation of code to study airflow and of code to determine far field patterns of antennas. This document also reports on the practical experiences with programming in parallel

    Viper : a visualisation tool for parallel program construction

    Get PDF
    +133hlm.;24c

    The MOLDY short-range molecular dynamics package

    Full text link
    We describe a parallelised version of the MOLDY molecular dynamics program. This Fortran code is aimed at systems which may be described by short-range potentials and specifically those which may be addressed with the embedded atom method. This includes a wide range of transition metals and alloys. MOLDY provides a range of options in terms of the molecular dynamics ensemble used and the boundary conditions which may be applied. A number of standard potentials are provided, and the modular structure of the code allows new potentials to be added easily. The code is parallelised using OpenMP and can therefore be run on shared memory systems, including modern multicore processors. Particular attention is paid to the updates required in the main force loop, where synchronisation is often required in OpenMP implementations of molecular dynamics. We examine the performance of the parallel code in detail and give some examples of applications to realistic problems, including the dynamic compression of copper and carbon migration in an iron-carbon alloy

    A new parallelisation technique for heterogeneous CPUs

    Get PDF
    Parallelization has moved in recent years into the mainstream compilers, and the demand for parallelizing tools that can do a better job of automatic parallelization is higher than ever. During the last decade considerable attention has been focused on developing programming tools that support both explicit and implicit parallelism to keep up with the power of the new multiple core technology. Yet the success to develop automatic parallelising compilers has been limited mainly due to the complexity of the analytic process required to exploit available parallelism and manage other parallelisation measures such as data partitioning, alignment and synchronization. This dissertation investigates developing a programming tool that automatically parallelises large data structures on a heterogeneous architecture and whether a high-level programming language compiler can use this tool to exploit implicit parallelism and make use of the performance potential of the modern multicore technology. The work involved the development of a fully automatic parallelisation tool, called VSM, that completely hides the underlying details of general purpose heterogeneous architectures. The VSM implementation provides direct and simple access for users to parallelise array operations on the Cell’s accelerators without the need for any annotations or process directives. This work also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM implementation as a one compiler system. The developed compiler system, which is called VP-Cell, takes a single source code and parallelises array expressions automatically. Several experiments were conducted using Vector Pascal benchmarks to show the validity of the VSM approach. The VP-Cell system achieved significant runtime performance on one accelerator as compared to the master processor’s performance and near-linear speedups over code runs on the Cell’s accelerators. Though VSM was mainly designed for developing parallelising compilers it also showed a considerable performance by running C code over the Cell’s accelerators
    corecore