14 research outputs found

    Enhancing the interoperability between distributed-memory and task-based programming models

    Get PDF
    Hybrid applications allow to exploit both inter- and intra-node parallelism, however the programming models currently used are not designed to be combined. For this reason, we propose a generic mechanism to enhance the interoperability between distributed-memory and task-based programming models

    A hierarchic task-based programming model for distributed heterogeneous computing

    Get PDF
    Distributed computing platforms are evolving to heterogeneous ecosystems with Clusters, Grids and Clouds introducing in its computing nodes, processors with different core architectures, accelerators (i.e. GPUs, FPGAs), as well as different memories and storage devices in order to achieve better performance with lower energy consumption. As a consequence of this heterogeneity, programming applications for these distributed heterogeneous platforms becomes a complex task. Additionally to the complexity of developing an application for distributed platforms, developers must also deal now with the complexity of the different computing devices inside the node. In this article, we present a programming model that aims to facilitate the development and execution of applications in current and future distributed heterogeneous parallel architectures. This programming model is based on the hierarchical composition of the COMP Superscalar and Omp Superscalar programming models that allow developers to implement infrastructure-agnostic applications. The underlying runtime enables applications to adapt to the infrastructure without the need of maintaining different versions of the code. Our programming model proposal has been evaluated on real platforms, in terms of heterogeneous resource usage, performance and adaptation.This work has been supported by the European Commission through the Horizon 2020 Research and Innovation program under contract 687584 (TANGO project) by the Spanish Government under contract TIN2015-65316 and grant SEV-2015-0493 (Severo Ochoa Program) and by Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272.Peer ReviewedPostprint (author's final draft

    Equipping Sparse Solvers for Exascale - A Survey of the DFG Project ESSEX

    Get PDF
    The ESSEX project investigates computational issues arising at exascale for large-scale sparse eigenvalue problems and develops programming concepts and numerical methods for their solution. The project pursues a coherent co-design of all software layers where a holistic performance engineering process guides code development across the classic boundaries of application, numerical method and basic kernel library. Within ESSEX the numerical methods cover both widely applicable solvers such as classic Krylov, Jacobi-Davidson or recent FEAST methods as well as domain specific iterative schemes relevant for the ESSEX quantum physics application. This presentation introduces the project structure and presents selected results which demonstrate the potential impact of ESSEX for efficient sparse solvers on highly scalable heterogeneous supercomputers. In the second project phase from 2016 to 2018, the ESSEX consortium will include partners from the Universities of Tokyo and of Tsukuba. Extensions of existing work will regard numerically reliable computing methods, scalability improvements by leveraging functional parallelism in asynchronous preconditioners, hiding and reducing communication cost, improving load balancing by advanced partitioning schemes, as well as the treatment of non-Hermitian matrix problems

    Programming models to support data science workflows

    Get PDF
    Data Science workflows have become a must to progress in many scientific areas such as life, health, and earth sciences. In contrast to traditional HPC workflows, they are more heterogeneous; combining binary executions, MPI simulations, multi-threaded applications, custom analysis (possibly written in Java, Python, C/C++ or R), and real-time processing. Furthermore, in the past, field experts were capable of programming and running small simulations. However, nowadays, simulations requiring hundreds or thousands of cores are widely used and, to this point, efficiently programming them becomes a challenge even for computer sciences. Thus, programming languages and models make a considerable effort to ease the programmability while maintaining acceptable performance. This thesis contributes to the adaptation of High-Performance frameworks to support the needs and challenges of Data Science workflows by extending COMPSs, a mature, general-purpose, task-based, distributed programming model. First, we enhance our prototype to orchestrate different frameworks inside a single programming model so that non-expert users can build complex workflows where some steps require highly optimised state of the art frameworks. This extension includes the @binary, @OmpSs, @MPI, @COMPSs, and @MultiNode annotations for both Java and Python workflows. Second, we integrate container technologies to enable developers to easily port, distribute, and scale their applications to distributed computing platforms. This combination provides a straightforward methodology to parallelise applications from sequential codes along with efficient image management and application deployment that ease the packaging and distribution of applications. We distinguish between static, HPC, and dynamic container management and provide representative use cases for each scenario using Docker, Singularity, and Mesos. Third, we design, implement and integrate AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and requires one single annotation (the @parallel Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores. Finally, we propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows) using one single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements without the effort of combining several frameworks at the same time. Also, to illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library that can be easily integrated with existing task-based frameworks to provide support for dataflows. The library provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.Els fluxos de treball de Data Science s鈥檋an convertit en una necessitat per progressar en moltes 脿rees cient铆fiques com les ci猫ncies de la vida, la salut i la terra. A difer猫ncia dels fluxos de treball tradicionals per a la CAP, els fluxos de Data Science s贸n m茅s heterogenis; combinant l鈥檈xecuci贸 de binaris, simulacions MPI, aplicacions multiproc茅s, an脿lisi personalitzats (possiblement escrits en Java, Python, C / C ++ o R) i computacions en temps real. Mentre que en el passat els experts de cada camp eren capa莽os de programar i executar petites simulacions, avui dia, aquestes simulacions representen un repte fins i tot per als experts ja que requereixen centenars o milers de nuclis. Per aquesta ra贸, els llenguatges i models de programaci贸 actuals s鈥檈sforcen considerablement en incrementar la programabilitat mantenint un rendiment acceptable. Aquesta tesi contribueix a l鈥檃daptaci贸 de models de programaci贸 per a la CAP per afrontar les necessitats i reptes dels fluxos de Data Science estenent COMPSs, un model de programaci贸 distribu茂da madur, de prop貌sit general, i basat en tasques. En primer lloc, millorem el nostre prototip per orquestrar diferent programari per a que els usuaris no experts puguin crear fluxos complexos usant un 煤nic model on alguns passos requereixin tecnologies altament optimitzades. Aquesta extensi贸 inclou les anotacions de @binary, @OmpSs, @MPI, @COMPSs, i @MultiNode per a fluxos en Java i Python. En segon lloc, integrem tecnologies de contenidors per permetre als desenvolupadors portar, distribuir i escalar f脿cilment les seves aplicacions en plataformes distribu茂des. A m茅s d鈥檜na metodologia senzilla per a paral路lelitzar aplicacions a partir de codis seq眉encials, aquesta combinaci贸 proporciona una gesti贸 d鈥檌matges i una implementaci贸 d鈥檃plicacions eficients que faciliten l鈥檈mpaquetat i la distribuci贸 d鈥檃plicacions. Distingim entre la gesti贸 de contenidors est脿tica, CAP i din脿mica i proporcionem casos d鈥櫭簊 representatius per a cada escenari amb Docker, Singularity i Mesos. En tercer lloc, dissenyem, implementem i integrem AutoParallel, un m貌dul de Python per determinar autom脿ticament la paral路lelitzaci贸 basada en tasques de nius de bucles afins i executar-los en paral路lel en una infraestructura distribu茂da. AutoParallel est脿 basat en programaci贸 seq眉encial, requereix una sola anotaci贸 (el decorador @parallel) i permet a un usuari intermig escalar una aplicaci贸 a centenars de nuclis. Finalment, proposem una forma d鈥檈stendre els sistemes basats en tasques per admetre dades d鈥檈ntrada i sortida continus; permetent aix铆 la combinaci贸 de fluxos de treball i dades (Fluxos H铆brids) en un 煤nic model. Conseq眉entment, els desenvolupadors poden crear fluxos complexos seguint diferents patrons sense l鈥檈sfor莽 de combinar diversos models al mateix temps. A m茅s, per a il路lustrar les capacitats dels Fluxos H铆brids, hem creat una biblioteca (DistroStreamLib) que s鈥檌ntegra f脿cilment amb els models basats en tasques per suportar fluxos de dades. La biblioteca proporciona una representaci贸 homog猫nia, gen猫rica i simple de seq眉猫ncies cont铆nues d鈥檕bjectes i arxius en Java i Python; permetent gestionar qualsevol tipus de dades sense tractar directament amb el back-end de streaming.Los flujos de trabajo de Data Science se han convertido en una necesidad para progresar en muchas 谩reas cient铆ficas como las ciencias de la vida, la salud y la tierra. A diferencia de los flujos de trabajo tradicionales para la CAP, los flujos de Data Science son m谩s heterog茅neos; combinando la ejecuci贸n de binarios, simulaciones MPI, aplicaciones multiproceso, an谩lisis personalizados (posiblemente escritos en Java, Python, C/C++ o R) y computaciones en tiempo real. Mientras que en el pasado los expertos de cada campo eran capaces de programar y ejecutar peque帽as simulaciones, hoy en d铆a, estas simulaciones representan un desaf铆o incluso para los expertos ya que requieren cientos o miles de n煤cleos. Por esta raz贸n, los lenguajes y modelos de programaci贸n actuales se esfuerzan considerablemente en incrementar la programabilidad manteniendo un rendimiento aceptable. Esta tesis contribuye a la adaptaci贸n de modelos de programaci贸n para la CAP para afrontar las necesidades y desaf铆os de los flujos de Data Science extendiendo COMPSs, un modelo de programaci贸n distribuida maduro, de prop贸sito general, y basado en tareas. En primer lugar, mejoramos nuestro prototipo para orquestar diferentes software para que los usuarios no expertos puedan crear flujos complejos usando un 煤nico modelo donde algunos pasos requieran tecnolog铆as altamente optimizadas. Esta extensi贸n incluye las anotaciones de @binary, @OmpSs, @MPI, @COMPSs, y @MultiNode para flujos en Java y Python. En segundo lugar, integramos tecnolog铆as de contenedores para permitir a los desarrolladores portar, distribuir y escalar f谩cilmente sus aplicaciones en plataformas distribuidas. Adem谩s de una metodolog铆a sencilla para paralelizar aplicaciones a partir de c贸digos secuenciales, esta combinaci贸n proporciona una gesti贸n de im谩genes y una implementaci贸n de aplicaciones eficientes que facilitan el empaquetado y la distribuci贸n de aplicaciones. Distinguimos entre gesti贸n de contenedores est谩tica, CAP y din谩mica y proporcionamos casos de uso representativos para cada escenario con Docker, Singularity y Mesos. En tercer lugar, dise帽amos, implementamos e integramos AutoParallel, un m贸dulo de Python para determinar autom谩ticamente la paralelizaci贸n basada en tareas de nidos de bucles afines y ejecutarlos en paralelo en una infraestructura distribuida. AutoParallel est谩 basado en programaci贸n secuencial, requiere una sola anotaci贸n (el decorador @parallel) y permite a un usuario intermedio escalar una aplicaci贸n a cientos de n煤cleos. Finalmente, proponemos una forma de extender los sistemas basados en tareas para admitir datos de entrada y salida continuos; permitiendo as铆 la combinaci贸n de flujos de trabajo y datos (Flujos H铆bridos) en un 煤nico modelo. Consecuentemente, los desarrolladores pueden crear flujos complejos siguiendo diferentes patrones sin el esfuerzo de combinar varios modelos al mismo tiempo. Adem谩s, para ilustrar las capacidades de los Flujos H铆bridos, hemos creado una biblioteca (DistroStreamLib) que se integra f谩cilmente a los modelos basados en tareas para soportar flujos de datos. La biblioteca proporciona una representaci贸n homog茅nea, gen茅rica y simple de secuencias continuas de objetos y archivos en Java y Python; permitiendo manejar cualquier tipo de datos sin tratar directamente con el back-end de streaming.Postprint (published version

    Fast and generic concurrent message-passing

    Get PDF
    Communication hardware and software have a significant impact on the performance of clusters and supercomputers. Message passing model and the Message-Passing Interface (MPI) is a widely used model of communications in the High-Performance Computing (HPC) community with great success. However, it has recently faced new challenges due to the emergence of many-core architecture and of programming models with dynamic task parallelism, assuming a large number of concurrent, light-weight threads. These applications come from important classes of applications such as graph and data analytics. Using MPI with these languages/runtimes is inefficient because MPI implementation is not able to perform well with threads. Using MPI as a communication middleware is also not efficient since MPI has to provide many abstractions that are not needed for many of the frameworks, thus having extra overheads. In this thesis, we studied MPI performance under the new assumptions. We identified several factors in the message-passing model which were inherently problematic for scalability and performance. Next, we analyzed the communication of a number of graph, threading and data-flow frameworks to identify generic patterns. We then proposed a low-level communication interface (LCI) to bridge the gap between communication architecture and runtime. The core of our idea is to attach to each message a few simple operations which fit better with the current hardware and can be implemented efficiently. We show that with only a few carefully chosen primitives and appropriate design, message-passing under this interface can easily outperform production MPI when running atop of multi-threaded environment. Further, using LCI is simple for various types of usage

    Runtime MPI Correctness Checking with a Scalable Tools Infrastructure

    Get PDF
    Increasing computational demand of simulations motivates the use of parallel computing systems. At the same time, this parallelism poses challenges to application developers. The Message Passing Interface (MPI) is a de-facto standard for distributed memory programming in high performance computing. However, its use also enables complex parallel programing errors such as races, communication errors, and deadlocks. Automatic tools can assist application developers in the detection and removal of such errors. This thesis considers tools that detect such errors during an application run and advances them towards a combination of both precise checks (neither false positives nor false negatives) and scalability. This includes novel hierarchical checks that provide scalability, as well as a formal basis for a distributed deadlock detection approach. At the same time, the development of parallel runtime tools is challenging and time consuming, especially if scalability and portability are key design goals. Current tool development projects often create similar tool components, while component reuse remains low. To provide a perspective towards more efficient tool development, which simplifies scalable implementations, component reuse, and tool integration, this thesis proposes an abstraction for a parallel tools infrastructure along with a prototype implementation. This abstraction overcomes the use of multiple interfaces for different types of tool functionality, which limit flexible component reuse. Thus, this thesis advances runtime error detection tools and uses their redesign and their increased scalability requirements to apply and evaluate a novel tool infrastructure abstraction. The new abstraction ultimately allows developers to focus on their tool functionality, rather than on developing or integrating common tool components. The use of such an abstraction in wide ranges of parallel runtime tool development projects could greatly increase component reuse. Thus, decreasing tool development time and cost. An application study with up to 16,384 application processes demonstrates the applicability of both the proposed runtime correctness concepts and of the proposed tools infrastructure
    corecore