18 research outputs found

    Parallel source code transformation techniques using design patterns

    Get PDF
    Mención Internacional en el título de doctorIn recent years, the traditional approaches for improving performance, such as increasing the clock frequency, has come to a dead-end. To tackle this issue, parallel architectures, such as multi-/many-core processors, have been envisioned to increase the performance by providing greater processing capabilities. However, programming efficiently for this architectures demands big efforts in order to transform sequential applications into parallel and to optimize such applications. Compared to sequential programming, designing and implementing parallel applications for operating on modern hardware poses a number of new challenges to developers such as data races, deadlocks, load imbalance, etc. To pave the way, parallel design patterns provide a way to encapsulate algorithmic aspects, allowing users to implement robust, readable and portable solutions with such high-level abstractions. Basically, these patterns instantiate parallelism while hiding away the complexity of concurrency mechanisms, such as thread management, synchronizations or data sharing. Nonetheless, frameworks following this philosophy does not share the same interface and users require understanding different libraries, and their capabilities, not only to decide which fits best for their purposes but also to properly leverage them. Furthermore, in order to parallelize these applications, it is necessary to analyze the sequential code in order to detect the regions of code that can be parallelized that is a time consuming and complex task. Additionally, different libraries targeted to specific devices provide some algorithms implementations that are already parallel and highly-tuned. In these situations, it is also necessary to analyze and determine which routine implementation is the most suitable for a given problem. To tackle these issues, this thesis aims at simplifying and minimizing the necessary efforts to transform sequential applications into parallel. This way, resulting codes will improve their performance by fully exploiting the available resources while the development efforts will be considerably reduced. Basically, in this thesis, we contribute with the following. First, we propose a technique to detect potential parallel patterns in sequential code. Second, we provide a novel generic C++ interface for parallel patterns which acts as a switch among existing frameworks. Third, we implement a framework that is able to transform sequential code into parallel using the proposed pattern discovery technique and pattern interface. Finally, we propose mechanisms that are able to select the most suitable device and routine implementation to solve a given problem based on previous performance information. The evaluation demonstrates that using the proposed techniques can minimize the refactoring and optimization time while improving the performance of the resulting applications with respect to the original code.En los últimos años, las técnicas tradicionales para mejorar el rendimiento, como es el caso del incremento de la frecuencia de reloj, han llegado a sus límites. Con el fin de seguir mejorando el rendimiento, se han desarrollado las arquitecturas paralelas, las cuales proporcionan un incremento del rendimiento al estar provistas de mayores capacidades de procesamiento. Sin embargo, programar de forma eficiente para estas arquitecturas requieren de grandes esfuerzos por parte de los desarrolladores. Comparado con la programación secuencial, diseñar e implementar aplicaciones paralelas enfocadas a trabajar en estas arquitecturas presentan una gran cantidad de dificultades como son las condiciones de carrera, los deadlocks o el incorrecto balanceo de la carga. En este sentido, los patrones paralelos son una forma de encapsular aspectos algorítmicos de las aplicaciones permitiendo el desarrollo de soluciones robustas, portables y legibles gracias a las abstracciones de alto nivel. En general, estos patrones son capaces de proporcionar el paralelismo a la vez que ocultan las complejidades derivadas de los mecanismos de control de concurrencia necesarios como el manejo de los hilos, las sincronizaciones o la compartición de datos. No obstante, los diferentes frameworks que siguen esta filosofía no comparten una única interfaz lo que conlleva que los usuarios deban conocer múltiples bibliotecas y sus capacidades, con el fin de decidir cuál de ellos es mejor para una situación concreta y como usarlos de forma eficiente. Además, con el fin de paralelizar aplicaciones existentes, es necesario analizar e identificar las regiones del código que pueden ser paralelizadas, lo cual es una tarea ardua y compleja. Además, algunos algoritmos ya se encuentran implementados en paralelo y optimizados para arquitecturas concretas en diversas bibliotecas. Esto da lugar a que sea necesario analizar y determinar que implementación concreta es la más adecuada para solucionar un problema dado. Para paliar estas situaciones, está tesis busca simplificar y minimizar el esfuerzo necesario para transformar aplicaciones secuenciales en paralelas. De esta forma, los códigos resultantes serán capaces de explotar los recursos disponibles a la vez que se reduce considerablemente el esfuerzo de desarrollo necesario. En general, esta tesis contribuye con lo siguiente. En primer lugar, se propone una técnica de detección de patrones paralelos en códigos secuenciales. En segundo lugar, se presenta una interfaz genérica de patrones paralelos para C++ que permite seleccionar la implementación de dichos patrones proporcionada por frameworks ya existentes. En tercer lugar, se introduce un framework de transformación de código secuencial a paralelo que hace uso de las técnicas de detección de patrones y la interfaz presentadas. Finalmente, se proponen mecanismos capaces de seleccionar la implementación más adecuada para solucionar un problema concreto basándose en el rendimiento obtenido en ejecuciones previas. Gracias a la evaluación realizada se ha podido demostrar que uso de las técnicas presentadas pueden minimizar el tiempo necesario para transformar y optimizar el código a la vez que mejora el rendimiento de las aplicaciones transformadas.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: David Expósito Singh.- Secretario: Rafael Asenjo Plaza.- Vocal: Marco Aldinucc

    Simulación epidemiológica y visualización

    Get PDF
    La epidemiologia ha ganado gran importancia en la sociedad debido principalmente a la propagación del virus de la gripe A en el año 2009. Esta epidemia tuvo gran impacto social debido a su rápida expansión pasando a ser rápidamente una pandemia mundial y al temor resultante en la sociedad. Esto provoco una gran alarma social y una falta de medios para controlar la enfermedad por parte de las autoridades sanitarias. Por este motivo, se han desarrollado diversas herramientas que permitan realizar previsiones sobre la propagación de enfermedades con el fin de adoptar medidas capaces de reducir el impacto sobre la sociedad. Dentro de estas herramientas se encuentran una gran cantidad de simuladores del virus de la gripe. Estos simuladores mediante diversos modelos matemáticos y sistemas de representación de las poblaciones son capaces de generar una previsión sobre la propagación de la enfermedad. Los simuladores epidemiológicos desarrollados se dividen principalmente en dos grupos según el modelo de enfermedad que implementen. Según el modelo de enfermedad los simuladores se clasifican en estocásticos, si son modelos probabilísticos, o deterministas, si son modelos deterministas que generalmente se basan en ecuaciones diferenciales. En este trabajo se parte del simulador EpiGraph el cual implementa un modelo estocástico, capaz de simular el comportamiento de la enfermedad de la gripe , y un sistema de representación de la población mediante grafos basados en las redes sociales. Sin embargo, las previsiones resultantes del simulador son difíciles de analizar a simple vista y el sistema que genera modelos de la población en grafos mediante muestras de los grafos de las redes sociales puede mejorarse notablemente. Por este motivo, en este proyecto, se buscara, por un lado, diseñar e implementar un algoritmo de muestreo de grafos para generar los modelos de población utilizados en el simulador capaz de mantener las propiedades del grafo original y, por otro lado, se buscara diseñar una aplicación que sea capaz de mostrar los resultados de las previsiones permitiendo un análisis completo con la menor dificultar posible.Epidemiology has gained great importance in society mainly due to the spread of influenza virus in 2009. This epidemic had a great social impact because of its fast expansion becoming a global pandemic in a short time and the resulting fear in society. This provoked a great social alarm and uncertainty emerged in health authorities on how to control the disease. For this reason, has developed different tools to make predictions about the spread of disease in order to take measures that may reduce the impact on society. Among these tools are a large amount of simulators of influenza virus. These simulators are able to generate a forecast of the spread of the disease using mathematical models and populations representing models. Epidemiological simulators developed are mainly divided into two groups according to the implemented disease model. According to the disease model, simulators are classified as stochastic, if implements probabilistic models, or deterministic, if implements deterministic models that are usually based on differential equations. This project starts from the epigraph simulator which implements a stochastic model, able to simulate the behaviour of influenza disease, and a system of representation of the population by means of graphs based on social networks. However, the resulting forecasts are difficult to analyse with the naked eye and the system that generates the population models on graphs using sample graphs of social networks can be significantly improved. For this reason, this project will seek, on the one hand, to design and implement a graph-­‐sampling algorithm to generate population models used in the simulator to keep the properties of the original graph and on the other hand, design an application able to display the results of the forecasts allowing a complete analysis easily.Ingeniería Informátic

    A Generic Parallel Pattern Interface for Stream and Data Processing

    Get PDF
    Current parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and demands considerable efforts and specific knowledge. Apart from that, the lack of high-level parallel pattern abstractions, in those frameworks, further increases the complexity in developing parallel applications. To pave the way in this direction, this paper proposes GRPPI, a generic and reusable parallel pattern interface for both stream processing and data-intensive C++ applications. GRPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi-core processors, such as C++ threads, OpenMP and Intel TBB, and accelerators, as CUDA Thrust. Furthermore, thanks to its high-level C++ application programming interface and pattern composability features, GRPPI allows users to easily expose parallelism via standalone patterns or patterns compositions matching in sequential applications. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view. Furthermore, we analyze the impact of using stream and data pattern compositions on CPUs, GPUs and heterogeneous configurations.This work has been partially supported by the EU project ICT 644235 “REPHRASE: REfactoring Parallel Heterogeneous Resource-aware Applications” and the Spanish “Ministerio de Economía y Competitividad” under the grant TIN2016-79673-P “Towards Unification of HPC and Big Data Paradigms.

    Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends

    Get PDF
    Proceeding of: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, 14-17 May 2019In recent years, on-line processing of data streams has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data are generated in near real-time that need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work we present an comparative study of a stream-aware multi-staged application, which has been implemented using GrPPI, a generic and reusable parallel pattern interface for C++ applications. We demonstrate the benefits of using this interface in terms of programability, performance, and scalability.This work was supported by the EU project “ASPIDE: Exascale Programing Models for Extreme Data Processing” under grant 80109

    Towards Automatic Parallelization of Stream Processing Applications

    Get PDF
    Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework

    Paving the way towards high-level parallel pattern interfaces for data stream processing

    Get PDF
    The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, encapsulating algorithmic aspects in pattern-based constructions, have considerably reduced the development and parallelization efforts of this type of applications. An example of parallel pattern interface is GrPPI, a C++ generic high-level library that acts as a layer between developers and existing parallel programming frameworks, such as C++ threads, OpenMP and Intel TBB. In this paper, we complement the basic patterns supported by GrPPI with the new stream operators Split-Join and Window, and the advanced parallel patterns Stream-Pool, Windowed-Farm and Stream-Iterator for the aforementioned back ends. Thanks to these new stream operators, complex compositions among streaming patterns can be expressed. On the other hand, the collection of advanced patterns allows users to tackle some domain-specific applications, ranging from the evolutionary to the real-time computing areas, where compositions of basic patterns are not capable of fully mimicking the algorithmic behavior of their original sequential codes. The experimental evaluation of the new advanced patterns and the stream operators on a set of domain-specific use-cases, using different back ends and pattern-specific parameters, reports considerable performance gains with respect to the sequential versions. Additionally, we demonstrate the benefits of the GrPPI pattern interface from the usability, flexibility and readability points of view.This work was partially supported by the EU project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the project TIN2013-41350-P “Scalable Data Management Techniques for High-End Computing Systems” from the Ministerio de Economía y Competitividad, Spai

    Challenging the abstraction penalty in parallel patterns libraries

    Get PDF
    In the last years, pattern-based programming has been recognized as a good practice for efficiently exploiting parallel hardware resources. Following this approach, multiple libraries have been designed for providing such high-level abstractions to ease the parallel programming. However, those libraries do not share a common interface. To pave the way, GrPPI has been designed for providing an intermediate abstraction layer between application developers and existing parallel programming frameworks like OpenMP, Intel TBB or ISO C++ threads. On the other hand, FastFlow has been adopted as an efficient object-based programming framework that may benefit from being supported as an additional GrPPI backend. However, the object-based approach presents some major challenges to be incorporated under the GrPPI type safe functional programming style. In this paper, we present the integration of FastFlow as a new GrPPI backend to demonstrate that structured parallel programming frameworks perfectly fit the GrPPI design. Additionally, we also demonstrate that GrPPI does not incur in additional overheads for providing its abstraction layer, and we study the programmability in terms of lines of code and cyclomatic complexity. In general, the presented work acts as reciprocal validation of both FastFlow (as an efficient, native structured parallel programming framework) and GrPPI (as an efficient abstraction layer on top of existing parallel programming frameworks).This work has been partially supported by the European Commission EU H2020-ICT-2014-1 Project RePhrase (No. 644235) and by the Spanish Ministry of Economy and Competitiveness through TIN2016-79637-P “Towards Unification of HPC and Big Data Paradigms”

    Detecting semantic violations of lock-free data structures through C++ contracts

    Get PDF
    The use of synchronization mechanisms in multithreaded applications is essential on shared-memory multi-core architectures. However, debugging parallel applications to avoid potential failures, such as data races or deadlocks, can be challenging. Race detectors are key to spot such concurrency bugs; nevertheless, if lock-free data structures are used, these may emit a significant number of false positives. In this paper, we present a framework for semantic violation detection of lock-free data structures which makes use of contracts, a novel feature of the upcoming C++20, and a customized version of the ThreadSanitizer race detector. We evaluate the detection accuracy of the framework in terms of false positives and false negatives leveraging some synthetic benchmarks which make use of the SPSC and MPMC lock-free queue structures from the Boost C++ library. Thanks to this framework, we are able to check the correct use of lock-free data structures, thus reducing the number of false positives.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness through Project Grant TIN2016-79637-P (BigHPC - Towards Unification of HPC and Big Data Paradigms) and the European Commission through Grant No. 801091 (ASPIDE - Exascale programmIng models for extreme data processing)

    An adaptive offline implementation selector for heterogeneous parallel platforms

    Get PDF
    Heterogeneous Parallel Platforms, Comprising Multiple Processing Units And Architectures, Have Become A Cornerstone In Improving The Overall Performance And Energy Efficiency Of Scientific And Engineering Applications. Nevertheless, Taking Full Advantage Of Their Resources Comes Along With A Variety Of Difficulties: Developers Require Technical Expertise In Using Different Parallel Programming Frameworks And Previous Knowledge About The Algorithms Used Underneath By The Application. To Alleviate This Burden, We Present An Adaptive Offline Implementation Selector That Allows Users To Better Exploit Resources Provided By Heterogeneous Platforms. Specifically, This Framework Selects, At Compile Time, The Tuple Device-Implementation That Delivers The Best Performance On A Given Platform. The User Interface Of The Framework Leverages Two C&#43 &#43 Language Features: Attributes And Concepts. To Evaluate The Benefits Of This Framework, We Analyse The Global Performance And Convergence Of The Selector Using Two Different Use Cases. The Experimental Results Demonstrate That The Proposed Framework Allows Users Enhancing Performance While Minimizing Efforts To Tune Applications Targeted To Heterogeneous Platforms. Furthermore, We Also Demonstrate That Our Framework Delivers Comparable Performance Figures With Respect To Other Approaches.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work has been partially supported by the Spanish ‘Ministerio de Economía y Competitividad’ under the project grant TIN2016-79637-P ‘Towards Unification of High Performance Computing (HPC) and Big Data Paradigms’ and the EU Projects ICT 644235 ‘RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications’ and the FP7 609666 ‘Repara: Reengineering and Enabling Performance And poweR of Applications’

    Towards automatic parallelization of stream processing applications

    Get PDF
    Parallelizing and optimizing codes for recent multi-/many-core processors have been recognized to be a complex task. For this reason, strategies to automatically transform sequential codes into parallel and discover optimization opportunities are crucial to relieve the burden to developers. In this paper, we present a compile-time framework to (semi) automatically find parallel patterns (Pipeline and Farm) and transform sequential streaming applications into parallel using GrPPI, a generic parallel pattern interface. This framework uses a novel pipeline stage-balancing technique which provides the code generator module with the necessary information to produce balanced pipelines. The evaluation, using a synthetic video benchmark and a real-world computer vision application, demonstrates that the presented framework is capable of producing parallel and optimized versions of the application. A comparison study under several thread-core oversubscribed conditions reveals that the framework can bring comparable performance results with respect to the Intel TBB programming framework.This work was supported in part by the Spanish Ministerio de Economía y Competitividad through the Project Toward Uni cation of HPC and Big Data Paradigms under Grant TIN2016-79637-P and in part by the EU Project RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications under Grant ICT 644235
    corecore