9 research outputs found

    A Generic Parallel Pattern Interface for Stream and Data Processing

    Get PDF
    Current parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and demands considerable efforts and specific knowledge. Apart from that, the lack of high-level parallel pattern abstractions, in those frameworks, further increases the complexity in developing parallel applications. To pave the way in this direction, this paper proposes GRPPI, a generic and reusable parallel pattern interface for both stream processing and data-intensive C++ applications. GRPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi-core processors, such as C++ threads, OpenMP and Intel TBB, and accelerators, as CUDA Thrust. Furthermore, thanks to its high-level C++ application programming interface and pattern composability features, GRPPI allows users to easily expose parallelism via standalone patterns or patterns compositions matching in sequential applications. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view. Furthermore, we analyze the impact of using stream and data pattern compositions on CPUs, GPUs and heterogeneous configurations.This work has been partially supported by the EU project ICT 644235 “REPHRASE: REfactoring Parallel Heterogeneous Resource-aware Applications” and the Spanish “Ministerio de Economía y Competitividad” under the grant TIN2016-79673-P “Towards Unification of HPC and Big Data Paradigms.

    Paving the way towards high-level parallel pattern interfaces for data stream processing

    Get PDF
    The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, encapsulating algorithmic aspects in pattern-based constructions, have considerably reduced the development and parallelization efforts of this type of applications. An example of parallel pattern interface is GrPPI, a C++ generic high-level library that acts as a layer between developers and existing parallel programming frameworks, such as C++ threads, OpenMP and Intel TBB. In this paper, we complement the basic patterns supported by GrPPI with the new stream operators Split-Join and Window, and the advanced parallel patterns Stream-Pool, Windowed-Farm and Stream-Iterator for the aforementioned back ends. Thanks to these new stream operators, complex compositions among streaming patterns can be expressed. On the other hand, the collection of advanced patterns allows users to tackle some domain-specific applications, ranging from the evolutionary to the real-time computing areas, where compositions of basic patterns are not capable of fully mimicking the algorithmic behavior of their original sequential codes. The experimental evaluation of the new advanced patterns and the stream operators on a set of domain-specific use-cases, using different back ends and pattern-specific parameters, reports considerable performance gains with respect to the sequential versions. Additionally, we demonstrate the benefits of the GrPPI pattern interface from the usability, flexibility and readability points of view.This work was partially supported by the EU project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the project TIN2013-41350-P “Scalable Data Management Techniques for High-End Computing Systems” from the Ministerio de Economía y Competitividad, Spai

    Towards automatic parallelization of stream processing applications

    Get PDF
    Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework.This work was supported in part by the Spanish Ministerio de Economía y Competitividad through the Project Toward Uni cation of HPC and Big Data Paradigms under Grant TIN2016-79637-P and in part by the EU Project RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications under Grant ICT 644235

    Detecting semantic violations of lock-free data structures through C++ contracts

    Get PDF
    The use of synchronization mechanisms in multithreaded applications is essential on shared-memory multi-core architectures. However, debugging parallel applications to avoid potential failures, such as data races or deadlocks, can be challenging. Race detectors are key to spot such concurrency bugs; nevertheless, if lock-free data structures are used, these may emit a significant number of false positives. In this paper, we present a framework for semantic violation detection of lock-free data structures which makes use of contracts, a novel feature of the upcoming C++20, and a customized version of the ThreadSanitizer race detector. We evaluate the detection accuracy of the framework in terms of false positives and false negatives leveraging some synthetic benchmarks which make use of the SPSC and MPMC lock-free queue structures from the Boost C++ library. Thanks to this framework, we are able to check the correct use of lock-free data structures, thus reducing the number of false positives.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness through Project Grant TIN2016-79637-P (BigHPC - Towards Unification of HPC and Big Data Paradigms) and the European Commission through Grant No. 801091 (ASPIDE - Exascale programmIng models for extreme data processing)

    Analyzing Power Consumption of I/O Operations in HPC Applications

    Get PDF
    Data movement is becoming a key issue in terms of performance and energy consumption in high performance computing (HPC) systems, in general, and Exascale systems, in particular. A preliminary step to perform I/O optimization and face the Exascale challenges is to deepen our understanding of energy consumption across the I/O stacks. In this paper, we analyze the power draw of different I/O operations using a new fine-grained internal wattmeter while simultaneously collecting system metrics. Based on correlations between the recorded metrics and the instantaneous internal power consumption, our methodology identifies the significant metrics with respect to power consumption and decides which ones should contribute directly or in a derivative manner. This approach has the advantage of building I/O power models based on a previous set of identified utilization metrics. This technique will be validated using write operations on an Intel Xeon Nehalem server system, as writes exhibit interesting patterns and distinct power regimes.The work presented in this paper has been partially supported by the EU Project FP7 318793 “EXA2GREEN” and partially supported by the EU under the COST Programme Action IC1305, “Network for Sustainable Ultrascale Computing (NESUS)” and by the grant TIN2013-41350-P, Scalable Data Management Techniques for High-End Computing Systems from the Spanish Ministry of Economy and Competitiveness.European Community's Seventh Framework Progra

    An adaptive offline implementation selector for heterogeneous parallel platforms

    Get PDF
    Heterogeneous Parallel Platforms, Comprising Multiple Processing Units And Architectures, Have Become A Cornerstone In Improving The Overall Performance And Energy Efficiency Of Scientific And Engineering Applications. Nevertheless, Taking Full Advantage Of Their Resources Comes Along With A Variety Of Difficulties: Developers Require Technical Expertise In Using Different Parallel Programming Frameworks And Previous Knowledge About The Algorithms Used Underneath By The Application. To Alleviate This Burden, We Present An Adaptive Offline Implementation Selector That Allows Users To Better Exploit Resources Provided By Heterogeneous Platforms. Specifically, This Framework Selects, At Compile Time, The Tuple Device-Implementation That Delivers The Best Performance On A Given Platform. The User Interface Of The Framework Leverages Two C&#43 &#43 Language Features: Attributes And Concepts. To Evaluate The Benefits Of This Framework, We Analyse The Global Performance And Convergence Of The Selector Using Two Different Use Cases. The Experimental Results Demonstrate That The Proposed Framework Allows Users Enhancing Performance While Minimizing Efforts To Tune Applications Targeted To Heterogeneous Platforms. Furthermore, We Also Demonstrate That Our Framework Delivers Comparable Performance Figures With Respect To Other Approaches.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been partially supported by the Spanish ‘Ministerio de Economía y Competitividad’ under the project grant TIN2016-79637-P ‘Towards Unification of High Performance Computing (HPC) and Big Data Paradigms’ and the EU Projects ICT 644235 ‘RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications’ and the FP7 609666 ‘Repara: Reengineering and Enabling Performance And poweR of Applications’

    Finding parallel patterns through static analysis in C++ applications

    Get PDF
    Since The 'Free Lunch' Of Processor Performance Is Over, Parallelism Has Become The New Trend In Hardware And Architecture Design. However, Parallel Resources Deployed In Data Centers Are Underused In Many Cases, Given That Sequential Programming Is Still Deeply Rooted In Current Software Development. To Address This Problem, New Methodologies And Techniques For Parallel Programming Have Been Progressively Developed. For Instance, Parallel Frameworks, Offering Programming Patterns, Allow Expressing Concurrency In Applications To Better Exploit Parallel Hardware. Nevertheless, A Large Portion Of Production Software, From A Broad Range Of Scientific And Industrial Areas, Is Still Developed Sequentially. Considering That These Software Modules Contain Thousands, Or Even Millions, Of Lines Of Code, An Extremely Large Amount Of Effort Is Needed To Identify Parallel Regions. To Pave The Way In This Area, This Paper Presents Parallel Pattern Analyzer Tool, A Software Component That Aids The Discovery And Annotation Of Parallel Patterns In Source Codes. This Tool Simplifies The Transformation Of Sequential Source Code To Parallel. Specifically, We Provide Support For Identifying Map, Farm, And Pipeline Parallel Patterns And Evaluate The Quality Of The Detection For A Set Of Different C++ Applications.This work was partially supported by the EU Projects ICT 644235 “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications” and the FP7 609666 “Repara: Reengineering and Enabling Performance and Power of Application

    Migración portable y de altas prestaciones de aplicaciones Matlab a C++: deconvolución esférica de datos de resonancia magnética por difusión

    Get PDF
    En muchos de los campos de la investi gación científica, se ba establecido Matlab como he rramienta de facto para el diseño de aplicaciones. Esta aproximación o&ece mucbaa ventajas como el rápido despliegue de prototipos, alto rendimiento en álge Lrta liu..ml, o:1uL1·o:1 uLrvb. Siu ,,u.uL~"~ hu. ~vli~1,;ivuo:1b desarrolladas son altamente dependientes del motor de ejecución de Matlab, limitando su despliegue en multitud de plataformas de altas prestaciones. En este trabajo presentamos un caso práctico de migración de una aplicación inicialmente basada en Matlab a una aplicación nativa en lenguaje e++. Pa ra ello se presentará la metodología empleada para la migración y las herramientas que facilitan esta tarea. La evaluación llevada a cabo demuestta que la solución implementada ofrece un buen rendimiento sobre dis tintas plataformas y sistemas altamente heterogéneoEste trabajo ha sido financiado por el Proyecto Europeo ICT 644235 RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applicationsy el Ministerio de Economia y Competitividad, bajo el proyecto TIN2013-41350-P Scalable Data Management Techniques for High-End Computing System

    Exploring stream parallel patterns in distributed MPI environments

    Get PDF
    In recent years, the large volumes of stream data and the near real-time requirements of data streaming applications have exacerbated the need for new scalable algorithms and programming interfaces for distributed and shared-memory platforms. To contribute in this direction, this paper presents a new distributed MPI back end for GrPPI, a C++ high-level generic interface of data-intensive and stream processing parallel patterns. This back end, as a new execution policy, supports distributed and hybrid (distributed+shared-memory) parallel executions of the Pipeline and Farm patterns, where the hybrid mode combines the MPI policy with a GrPPI shared-memory one. These patterns internally leverage distributed queues, which can be configured to use two-sided or one-sided MPI primitives to communicate items among nodes. A detailed analysis of the GrPPI MPI execution policy reports considerable benefits from the programmability, flexibility and readability points of view. The experimental evaluation of two different streaming applications with different distributed and shared-memory scenarios reports considerable performance gains with respect to the sequential versions at the expense of negligible GrPPI overheads.This work was partially supported by the EU project No. 801091 "ASPIDE: Exascale programming models for extreme data process ing"; and the project TIN2013-41350-P "Scalable Data Management Techniques for High-End Computing Systems" from the Ministerio de Economía y Competitividad , Spain