37 research outputs found

    The PEPPHER Approach to Programmability and Performance Portability for Heterogeneous many-core Architectures

    Get PDF
    International audienceThe European FP7 project PEPPHER is addressing programmability and performance portability for current and emerging heterogeneous many-core archi- tectures. As its main idea, the project proposes a multi-level parallel execution model comprised of potentially parallelized components existing in variants suitable for different types of cores, memory configurations, input characteristics, optimization criteria, and couples this with dynamic and static resource and architecture aware scheduling mechanisms. Crucial to PEPPHER is that components can be made performance aware, allowing for more efficient dynamic and static scheduling on the concrete, available resources. The flexibility provided in the software model, combined with a customizable, heterogeneous, memory and topology aware run-time system is key to efficiently exploiting the resources of each concrete hardware configuration. The project takes a holistic approach, relying on existing paradigms, interfaces, and languages for the parallelization of components, and develops a prototype framework, a methodology for extending the framework, and guidelines for constructing performance portable software and systems ­ including paths to migration of existing software ­ for heterogeneous many-core processors. This paper gives a high-level project overview, and presents a specific example showing how the PEPPHER component variant model and resource-aware run-time system enable performance portability of a numerical kernel

    Programmability and Performance Portability Aspects of Heterogeneous Multi-/Manycore Systems

    Get PDF
    International audienceWe discuss three complementary approaches that can provide both portability and an increased level of abstraction for the programming of heterogeneous multicore systems. Together, these approaches also support performance portability, as currently investigated in the EU FP7 project PEPPHER. In particular, we consider (1) a library-based approach, here represented by the integration of the SkePU C++ skeleton programming library with the StarPU runtime system for dynamic scheduling and dynamic selection of suitable execution units for parallel tasks; (2) a language-based approach, here represented by the Offload-C++ high-level language extensions and Offload compiler to generate platform-specific code; and (3) a component-based approach, specifically the PEPPHER component system for annotating user-level application components with performance metadata, thereby preparing them for performance-aware composition. We discuss the strengths and weaknesses of these approaches and show how they could complement each other in an integrational programming framework for heterogeneous multicore systems

    TANGO: Transparent heterogeneous hardware Architecture deployment for eNergy Gain in Operation

    Get PDF
    The paper is concerned with the issue of how software systems actually use Heterogeneous Parallel Architectures (HPAs), with the goal of optimizing power consumption on these resources. It argues the need for novel methods and tools to support software developers aiming to optimise power consumption resulting from designing, developing, deploying and running software on HPAs, while maintaining other quality aspects of software to adequate and agreed levels. To do so, a reference architecture to support energy efficiency at application construction, deployment, and operation is discussed, as well as its implementation and evaluation plans.Comment: Part of the Program Transformation for Programmability in Heterogeneous Architectures (PROHA) workshop, Barcelona, Spain, 12th March 2016, 7 pages, LaTeX, 3 PNG figure

    EXA2PRO programming environment:Architecture and applications

    Get PDF
    The EXA2PRO programming environment will integrate a set of tools and methodologies that will allow to systematically address many exascale computing challenges, including performance, performance portability, programmability, abstraction and reusability, fault tolerance and technical debt. The EXA2PRO tool-chain will enable the efficient deployment of applications in exascale computing systems, by integrating high-level software abstractions that offer performance portability and efficient exploitation of exascale systems' heterogeneity, tools for efficient memory management, optimizations based on trade-offs between various metrics and fault-tolerance support. Hence, by addressing various aspects of productivity challenges, EXA2PRO is expected to have significant impact in the transition to exascale computing, as well as impact from the perspective of applications. The evaluation will be based on 4 applications from 4 different domains that will be deployed in JUELICH supercomputing center. The EXA2PRO will generate exploitable results in the form of a tool-chain that support diverse exascale heterogeneous supercomputing centers and concrete improvements in various exascale computing challenges

    The Potential of the Intel Xeon Phi for Supervised Deep Learning

    Full text link
    Supervised learning of Convolutional Neural Networks (CNNs), also known as supervised Deep Learning, is a computationally demanding process. To find the most suitable parameters of a network for a given application, numerous training sessions are required. Therefore, reducing the training time per session is essential to fully utilize CNNs in practice. While numerous research groups have addressed the training of CNNs using GPUs, so far not much attention has been paid to the Intel Xeon Phi coprocessor. In this paper we investigate empirically and theoretically the potential of the Intel Xeon Phi for supervised learning of CNNs. We design and implement a parallelization scheme named CHAOS that exploits both the thread- and SIMD-parallelism of the coprocessor. Our approach is evaluated on the Intel Xeon Phi 7120P using the MNIST dataset of handwritten digits for various thread counts and CNN architectures. Results show a 103.5x speed up when training our large network for 15 epochs using 244 threads, compared to one thread on the coprocessor. Moreover, we develop a performance model and use it to assess our implementation and answer what-if questions.Comment: The 17th IEEE International Conference on High Performance Computing and Communications (HPCC 2015), Aug. 24 - 26, 2015, New York, US

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    Applications, tools and techniques on the road to exascale computing

    Get PDF
    This volume of the book series “Advances in Parallel Computing” contains the proceedings of ParCo2011, the 14th biennial ParCo Conference, held from 31 August to 3 September 2011, in Ghent, Belgium. In an era when physical limitations have slowed down advances in the performance of single processing units, and new scientific challenges require exascale speed, parallel processing has gained momentum as a key gateway to HPC (High Performance Computing). Historically, the ParCo conferences have focused on three main themes: Algorithms, Architectures (both hardware and software) and Applications. Nowadays, the scenery has changed from traditional multiprocessor topologies to heterogeneous manycores, incorporating standard CPUs, GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays). These platforms are, at a higher abstraction level, integrated in clusters, grids, and clouds. This is reflected in the papers presented at the conference and the contributions as included in these proceedings. An increasing number of new algorithms are optimized for heterogeneous platforms and performance tuning is targeting extreme scale computing. Heterogeneous platforms utilising the compute power and energy efficiency of GPGPUs (General Purpose GPUs) are clearly becoming mainstream HPC systems for a large number of applications in a wide spectrum of application areas. These systems excel in areas such as complex system simulation, real-time image processing and visualisation, etc. High performance computing accelerators may well become the cornerstone of exascale computing applications such as 3-D turbulent combustion flows, nuclear energy simulations, brain research, financial and geophysical modelling. The exploration of new architectures, programming tools and techniques was evidenced by the mini-symposia “Parallel Computing with FPGAs” and “Exascale Programming Models”. The need for exascale hardware and software was also stressed in the industrial session, with contributions from Cray and the European exascale software initiative. Our sincere appreciation goes to the keynote speakers who gave their perspectives on the impact of parallel computing today and the road to exascale computing tomorrow. Our heartfelt thanks go to the authors for their valuable scientific contributions and to the programme committee who reviewed the papers and provided constructive remarks. The international audience was inspired by the quality of the presentations. The attendance and interaction was high and the conference has been an agora where many fruitful ideas were exchanged and explored. We wish to express our sincere thanks to the organizers for the smooth operation of the conference. The University conference centre Het Pand offered an excellent environment for the conference as it allowed delegates to interact informally and easily. A special word of thanks is due to the management and support staff of Het Pand for their proficient and friendly support. The organizers managed to put together an extensive social programme. This included a reception at the medieval Town Hall of Ghent as well as a memorable conference dinner. These social events stimulated interaction amongst delegates and resulted in many new contacts being made. Finally we wish to thank all the many supporters who assisted in the organization and successful running of the event. Erik D'Hollander, Ghent University, Belgium Koen De Bosschere, Ghent University, Belgium Gerhard R. Joubert, TU Clausthal, Germany David Padua, University of Illinois, USA Frans Peters, Philips Research, Netherland

    Analyzing large-scale DNA Sequences on Multi-core Architectures

    Full text link
    Rapid analysis of DNA sequences is important in preventing the evolution of different viruses and bacteria during an early phase, early diagnosis of genetic predispositions to certain diseases (cancer, cardiovascular diseases), and in DNA forensics. However, real-world DNA sequences may comprise several Gigabytes and the process of DNA analysis demands adequate computational resources to be completed within a reasonable time. In this paper we present a scalable approach for parallel DNA analysis that is based on Finite Automata, and which is suitable for analyzing very large DNA segments. We evaluate our approach for real-world DNA segments of mouse (2.7GB), cat (2.4GB), dog (2.4GB), chicken (1GB), human (3.2GB) and turkey (0.2GB). Experimental results on a dual-socket shared-memory system with 24 physical cores show speed-ups of up to 17.6x. Our approach is up to 3x faster than a pattern-based parallel approach that uses the RE2 library.Comment: The 18th IEEE International Conference on Computational Science and Engineering (CSE 2015), Porto, Portugal, 20 - 23 October 201
    corecore