10 research outputs found

    High-level Multicore Programming with C++11

    Get PDF
    Nowadays, one of the most important challenges in programming is the efficient usage of multicore processors. All modern programming languages support multicore programming at native or library level. C++11, the next standard of the C++ programming language, also supports multithreading at a low level. In this paper we argue for some extensions of the C++ Standard Template Library based on the features of C++11. These extensions enhance the standard library to be more powerful in the multicore realm. Our approach is based on functors and lambda expressions, which are major extensions in the language. We contribute three case studies: how to efficiently compose functors in pipelines, how to evaluate boolean operators in parallel, and how to efficiently accumulate over associative functors

    Adaptive heterogeneous parallelism for semi-empirical lattice dynamics in computational materials science.

    Get PDF
    With the variability in performance of the multitude of parallel environments available today, the conceptual overhead created by the need to anticipate runtime information to make design-time decisions has become overwhelming. Performance-critical applications and libraries carry implicit assumptions based on incidental metrics that are not portable to emerging computational platforms or even alternative contemporary architectures. Furthermore, the significance of runtime concerns such as makespan, energy efficiency and fault tolerance depends on the situational context. This thesis presents a case study in the application of both Mattsons prescriptive pattern-oriented approach and the more principled structured parallelism formalism to the computational simulation of inelastic neutron scattering spectra on hybrid CPU/GPU platforms. The original ad hoc implementation as well as new patternbased and structured implementations are evaluated for relative performance and scalability. Two new structural abstractions are introduced to facilitate adaptation by lazy optimisation and runtime feedback. A deferred-choice abstraction represents a unified space of alternative structural program variants, allowing static adaptation through model-specific exhaustive calibration with regards to the extrafunctional concerns of runtime, average instantaneous power and total energy usage. Instrumented queues serve as mechanism for structural composition and provide a representation of extrafunctional state that allows realisation of a market-based decentralised coordination heuristic for competitive resource allocation and the Lyapunov drift algorithm for cooperative scheduling

    Modular Programming of Synchronization and Communication among Tasks in Parallel Programs

    Get PDF
    Implementing synchronization and communication among tasks in parallel programs is a major challenge. We present a high-level DSL geared toward this challenge, by generalizing the existing protocol language Reo from supporting only a compile-time/statically set number of tasks (unsuitable for parallel programming), to supporting also a run-time/dynamically set number of tasks. Our contribution comprises new syntax, a new compilation/execution approach, and experimental results. Most surprisingly, the new approach can outperform the existing approach, even though the new approach requires more work to be done at run-time

    Squelettes algorithmiques asynchrones : application aux langages orientés domaine

    Get PDF
    In this thesis, we present developments to the approach used by the LRI Parsys team to automatically translate MATLAB-like scientific codes into high performance production codes. To reach a high level of performance, we have combined C++ template meta-programming and asynchronous parallel programming to analyze each expression and detect parallelism opportunities first, and then to ensure near-optimal use of the available resources of multi-core machines. To link these two stages of the code generation process, we have implemented a solution based on multi-level algorithmic skeletons. We have implemented our tools in the NT2 library and evaluated them with several significant scientific benchmarks.Dans cette thèse, nous présentons des développements de l'approche utilisée dans l'équipe « ParSys » du LRI pour traduire automatiquement des codes scientifiques écrits dans un langage dédié inspiré de Matlab en codes de production haute performance. Pour garantir cette performance, nous mettons à profit d'une part la méta-programmation par templates C++ afin d'analyser chaque expression pour détecter les opportunités de parallélisme, et d'autre part la programmation parallèle asynchrone pour utiliser au mieux les ressources disponibles des machines multi-cœurs. Pour faire le lien entre ces deux étapes du processus de génération de code, des squelettes algorithmiques multi-niveaux sont implémentés. Nos outils ont été implantés dans la bibliothèque NT2 et évalués sur des applications scientifiques du monde réel

    Easing parallel programming on heterogeneous systems

    Get PDF
    El modo más frecuente de resolver aplicaciones de HPC (High performance Computing) en tiempos de ejecución razonables y de una forma escalable es mediante el uso de sistemas de cómputo paralelo. La tendencia actual en los sistemas de HPC es la inclusión en la misma máquina de ejecución de varios dispositivos de cómputo, de diferente tipo y arquitectura. Sin embargo, su uso impone al programador retos específicos. Un programador debe ser experto en las herramientas y abstracciones existentes para memoria distribuida, los modelos de programación para sistemas de memoria compartida, y los modelos de programación específicos para para cada tipo de co-procesador, con el fin de crear programas híbridos que puedan explotar eficientemente todas las capacidades de la máquina. Actualmente, todos estos problemas deben ser resueltos por el programador, haciendo así la programación de una máquina heterogénea un auténtico reto. Esta Tesis trata varios de los problemas principales relacionados con la programación en paralelo de los sistemas altamente heterogéneos y distribuidos. En ella se realizan propuestas que resuelven problemas que van desde la creación de códigos portables entre diferentes tipos de dispositivos, aceleradores, y arquitecturas, consiguiendo a su vez máxima eficiencia, hasta los problemas que aparecen en los sistemas de memoria distribuida relacionados con las comunicaciones y la partición de estructuras de datosDepartamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    High-Performance Modelling and Simulation for Big Data Applications

    Get PDF
    This open access book was prepared as a Final Publication of the COST Action IC1406 “High-Performance Modelling and Simulation for Big Data Applications (cHiPSet)“ project. Long considered important pillars of the scientific method, Modelling and Simulation have evolved from traditional discrete numerical methods to complex data-intensive continuous analytical optimisations. Resolution, scale, and accuracy have become essential to predict and analyse natural and complex systems in science and engineering. When their level of abstraction raises to have a better discernment of the domain at hand, their representation gets increasingly demanding for computational and data resources. On the other hand, High Performance Computing typically entails the effective use of parallel and distributed processing units coupled with efficient storage, communication and visualisation systems to underpin complex data-intensive applications in distinct scientific and technical domains. It is then arguably required to have a seamless interaction of High Performance Computing with Modelling and Simulation in order to store, compute, analyse, and visualise large data sets in science and engineering. Funded by the European Commission, cHiPSet has provided a dynamic trans-European forum for their members and distinguished guests to openly discuss novel perspectives and topics of interests for these two communities. This cHiPSet compendium presents a set of selected case studies related to healthcare, biological data, computational advertising, multimedia, finance, bioinformatics, and telecommunications

    Automatic performance optimisation of parallel programs for GPUs via rewrite rules

    Get PDF
    Graphics Processing Units (GPUs) are now commonplace in computing systems and are the most successful parallel accelerators. Their performance is orders of magnitude higher than traditional Central Processing Units (CPUs) making them attractive for many application domains with high computational demands. However, achieving their full performance potential is extremely hard, even for experienced programmers, as it requires specialised software tailored for specific devices written in low-level languages such as OpenCL. Differences in device characteristics between manufacturers and even hardware generations often lead to large performance variations when different optimisations are applied. This inevitably leads to code that is not performance portable across different hardware. This thesis demonstrates that achieving performance portability is possible using LIFT, a functional data-parallel language which allows programs to be expressed at a high-level in a hardware-agnostic way. The LIFT compiler is empowered to automatically explore the optimisation space using a set of well-defined rewrite rules to transform programs seamlessly between different high-level algorithmic forms before translating them to a low-level OpenCL-specific form. The first contribution of this thesis is the development of techniques to compile functional LIFT programs that have optimisations explicitly encoded into efficient imperative OpenCL code. Producing efficient code is non-trivial as many performance sensitive details such as memory allocation, array accesses or synchronisation are not explicitly represented in the functional LIFT language. The thesis shows that the newly developed techniques are essential for achieving performance on par with manually optimised code for GPU programs with the exact same complex optimisations applied. The second contribution of this thesis is the presentation of techniques that enable the LIFT compiler to perform complex optimisations that usually require from tens to hundreds of individual rule applications by grouping them as macro-rules that cut through the optimisation space. Using matrix multiplication as an example, starting from a single high-level program the compiler automatically generates highly optimised and specialised implementations for desktop and mobile GPUs with very different architectures achieving performance portability. The final contribution of this thesis is the demonstration of how low-level and GPU-specific features are extracted directly from the high-level functional LIFT program, enabling building a statistical performance model that makes accurate predictions about the performance of differently optimised program variants. This performance model is then used to drastically speed up the time taken by the optimisation space exploration by ranking the different variants based on their predicted performance. Overall, this thesis demonstrates that performance portability is achievable using LIFT

    Datenparallele algorithmische Skelette:Erweiterungen und Anwendungen der Münster Skelettbibliothek Muesli

    Full text link
    Die Arbeit thematisiert den datenparallelen Bestandteil der Münster Skelettbibliothek Muesli und beschreibt neben einer Reihe implementierter Erweiterungen auch mit Muesli parallelisierte Anwendungen. Eine der wichtigsten Neuerungen ist die Unterstützung von Mehrkernprozessoren durch die Verwendung von OpenMP, infolgedessen mit Muesli entwickelte Programme auch auf Parallelrechnern mit hybrider Speicherarchitektur skalieren. Eine zusätzliche Erweiterung stellt die Neuentwicklung einer verteilten Datenstruktur für dünnbesetzte Matrizen dar. Letztere implementiert ein flexibles Designkonzept, was die Verwendung benutzerdefinierter Kompressions- sowie Lastverteilungsmechanismen ermöglicht. Darüber hinaus werden mit dem LM OSEM-Algorithmus und den ART 2-Netzen zwei Anwendungen vorgestellt, die mit Muesli parallelisiert wurden. Neben einer Beschreibung der Funktionsweise sowie der Eigenschaften und Konzepte von MPI und OpenMP wird darüber hinaus der aktuelle Forschungsstand skizziert. <br/

    Generative Version of the FastFlow Multicore Library

    Get PDF
    AbstractNowadays, one of the most important challenges in programming is the efficient usage of multicore processors. Many new programming languages and libraries support multicore programming.FastFlow is one of the most promising multicore C++ libraries. Unfortunately, a design problem occurs in the library. One of the most important methods is pure virtual function in a base class. This method supports the communication between different threads. Although, it cannot be template function because of the virtuality, hence, the threads pass and take argument as a void* pointer. The base class is not template neither. This is not typesafe approach. We make the library more efficient and safer with the help of generative technologies
    corecore