8 research outputs found

    Programming-Model Centric Debugging for OpenMP

    Get PDF
    International audienceIn this paper, we introduce a new approach for debugging parallel applications based on the OpenMP programming standard. This approach is centered on the programming model, by opposition with more traditionnal debugging methods focused on the programming language and machine code.We detail the main axes of programming-model centric debugging: providinga structural representation of the application, following the execution dynamicbehaviors and allowing interactions with the abstract and physical machines;and we explain how they apply to OpenMP fork-join and task-based programming

    Model-centric task debugging at scale

    Get PDF
    Chapter 1, Introduction, presents state of the art debugging techniques in high-performance computing. The lack of information out of the programming model, these traditional debugging tools suffer, motivated the model-centric debugging approach. Chapter 2, Technical Background: Parallel Programming Models & Tools, exemplifies the programming models used in the scope of my work. The differences between those models are illustrated, and for the most popular programming models in HPC, examples are attached in this chapter. The chapter also describes Temanejo, the toolchain's front-end, which supports the application developer during his actions. In the following chapter (Chapter 4), Design: Events & Requests in Ayudame, the theory of task" and dependency" representation is stated. The chapter includes the design of different information types, which are later on used for the communication between a programming model and the model-centric debugging approach. In chapter 5, Design: Communication Back-end Ayudame, the design of the back-end tool infrastructure is described in detail. This also includes the problems occurring during the design process and their specific solutions. The concept of a multi-process environment and the usage of different programming models at the same time is also part of this chapter. The following chapter (Chapter 6), Instrumentation of Runtime Systems, briefly describes the information exchange between a programming model and the model-centric debugging approach. The different ways of monitoring and controlling an application through its programming model are illustrated. In chapter 7, Case Study: Performance Debugging, the model-centric debugging approach is used for optimising an application. All necessary optimisation steps are described in detail, with the help of mock-ups. Additionally, a description of the different optimised versions is included in this chapter. The evaluation, done on different hardware architectures, is presented and discussed. This includes not only the behaviour of the versions on different platforms but also architecture specific issues

    Using OpenMP superscalar for parallelization of embedded and consumer applications

    Get PDF
    In the past years, research and industry have introduced several parallel programming models to simplify the development of parallel applications. A popular class among these models are task-based programming models which proclaim ease-of-use, portability, and high performance. A novel model in this class, OpenMP Superscalar, combines advanced features such as automated runtime dependency resolution, while maintaining simple pragma-based programming for C/C++. OpenMP Superscalar has proven to be effective in leveraging parallelism in HPC workloads. Embedded and consumer applications, however, are currently still mainly parallelized using traditional thread-based programming models. In this work, we investigate how effective OpenMP Superscalar is for embedded and consumer applications in terms of usability and performance. To determine the usability of OmpSs, we show in detail how to implement complex parallelization strategies such as ones used in parallel H.264 decoding. To evaluate the performance we created a collection of ten embedded and consumer benchmarks parallelized in both OmpSs and Pthreads.EC/FP7/248647/EU/ENabling technologies for a programmable many-CORE/ENCOR

    Applications, tools and techniques on the road to exascale computing

    Get PDF
    This volume of the book series “Advances in Parallel Computing” contains the proceedings of ParCo2011, the 14th biennial ParCo Conference, held from 31 August to 3 September 2011, in Ghent, Belgium. In an era when physical limitations have slowed down advances in the performance of single processing units, and new scientific challenges require exascale speed, parallel processing has gained momentum as a key gateway to HPC (High Performance Computing). Historically, the ParCo conferences have focused on three main themes: Algorithms, Architectures (both hardware and software) and Applications. Nowadays, the scenery has changed from traditional multiprocessor topologies to heterogeneous manycores, incorporating standard CPUs, GPUs (Graphics Processing Units) and FPGAs (Field Programmable Gate Arrays). These platforms are, at a higher abstraction level, integrated in clusters, grids, and clouds. This is reflected in the papers presented at the conference and the contributions as included in these proceedings. An increasing number of new algorithms are optimized for heterogeneous platforms and performance tuning is targeting extreme scale computing. Heterogeneous platforms utilising the compute power and energy efficiency of GPGPUs (General Purpose GPUs) are clearly becoming mainstream HPC systems for a large number of applications in a wide spectrum of application areas. These systems excel in areas such as complex system simulation, real-time image processing and visualisation, etc. High performance computing accelerators may well become the cornerstone of exascale computing applications such as 3-D turbulent combustion flows, nuclear energy simulations, brain research, financial and geophysical modelling. The exploration of new architectures, programming tools and techniques was evidenced by the mini-symposia “Parallel Computing with FPGAs” and “Exascale Programming Models”. The need for exascale hardware and software was also stressed in the industrial session, with contributions from Cray and the European exascale software initiative. Our sincere appreciation goes to the keynote speakers who gave their perspectives on the impact of parallel computing today and the road to exascale computing tomorrow. Our heartfelt thanks go to the authors for their valuable scientific contributions and to the programme committee who reviewed the papers and provided constructive remarks. The international audience was inspired by the quality of the presentations. The attendance and interaction was high and the conference has been an agora where many fruitful ideas were exchanged and explored. We wish to express our sincere thanks to the organizers for the smooth operation of the conference. The University conference centre Het Pand offered an excellent environment for the conference as it allowed delegates to interact informally and easily. A special word of thanks is due to the management and support staff of Het Pand for their proficient and friendly support. The organizers managed to put together an extensive social programme. This included a reception at the medieval Town Hall of Ghent as well as a memorable conference dinner. These social events stimulated interaction amongst delegates and resulted in many new contacts being made. Finally we wish to thank all the many supporters who assisted in the organization and successful running of the event. Erik D'Hollander, Ghent University, Belgium Koen De Bosschere, Ghent University, Belgium Gerhard R. Joubert, TU Clausthal, Germany David Padua, University of Illinois, USA Frans Peters, Philips Research, Netherland

    Parallel computing 2011, ParCo 2011: book of abstracts

    Get PDF
    This book contains the abstracts of the presentations at the conference Parallel Computing 2011, 30 August - 2 September 2011, Ghent, Belgiu

    Evaluation of Distributed Programming Models and Extensions to Task-based Runtime Systems

    Get PDF
    High Performance Computing (HPC) has always been a key foundation for scientific simulation and discovery. And more recently, deep learning models\u27 training have further accelerated the demand of computational power and lower precision arithmetic. In this era following the end of Dennard\u27s Scaling and when Moore\u27s Law seemingly still holds true to a lesser extent, it is not a coincidence that HPC systems are equipped with multi-cores CPUs and a variety of hardware accelerators that are all massively parallel. Coupling this with interconnect networks\u27 speed improvements lagging behind those of computational power increases, the current state of HPC systems is heterogeneous and extremely complex. This was heralded as a great challenge to the software stacks and their ability to extract performance from these systems, but also as a great opportunity to innovate at the programming model level to explore the different approaches and propose new solutions. With usability, portability, and performance as the main factors to consider, this dissertation first evaluates some of the widely used parallel programming models (MPI, MPI+OpenMP, and task-based runtime systems) ability to manage the load imbalance among the processes computing the LU factorization of a large dense matrix stored in the Block Low-Rank (BLR) format. Next I proposed a number of optimizations and implemented them in PaRSEC\u27s Dynamic Task Discovery (DTD) model, including user-level graph trimming and direct Application Programming Interface (API) calls to perform data broadcast operation to further extend the limit of STF model. On the other hand, the Parameterized Task Graph (PTG) approach in PaRSEC is the most scalable approach for many different applications, which I then explored the possibility of combining both the algorithmic approach of Communication-Avoiding (CA) and the communication-computation overlapping benefits provided by runtime systems using 2D five-point stencil as the test case. This broad programming models evaluation and extension work highlighted the abilities of task-based runtime system in achieving scalable performance and portability on contemporary heterogeneous HPC systems. Finally, I summarized the profiling capability of PaRSEC runtime system, and demonstrated with a use case its important role in the performance bottleneck identification leading to optimizations
    corecore