3 research outputs found

    Parallel architectures and runtime systems co-design for task-based programming models

    Get PDF
    The increasing parallelism levels in modern computing systems has extolled the need for a holistic vision when designing multiprocessor architectures taking in account the needs of the programming models and applications. Nowadays, system design consists of several layers on top of each other from the architecture up to the application software. Although this design allows to do a separation of concerns where it is possible to independently change layers due to a well-known interface between them, it is hampering future systems design as the Law of Moore reaches to an end. Current performance improvements on computer architecture are driven by the shrinkage of the transistor channel width, allowing faster and more power efficient chips to be made. However, technology is reaching physical limitations were the transistor size will not be able to be reduced furthermore and requires a change of paradigm in systems design. This thesis proposes to break this layered design, and advocates for a system where the architecture and the programming model runtime system are able to exchange information towards a common goal, improve performance and reduce power consumption. By making the architecture aware of runtime information such as a Task Dependency Graph (TDG) in the case of dataflow task-based programming models, it is possible to improve power consumption by exploiting the critical path of the graph. Moreover, the architecture can provide hardware support to create such a graph in order to reduce the runtime overheads and making possible the execution of fine-grained tasks to increase the available parallelism. Finally, the current status of inter-node communication primitives can be exposed to the runtime system in order to perform a more efficient communication scheduling, and also creates new opportunities of computation and communication overlap that were not possible before. An evaluation of the proposals introduced in this thesis is provided and a methodology to simulate and characterize the application behavior is also presented.El aumento del paralelismo proporcionado por los sistemas de c贸mputo modernos ha provocado la necesidad de una visi贸n hol铆stica en el dise帽o de arquitecturas multiprocesador que tome en cuenta las necesidades de los modelos de programaci贸n y las aplicaciones. Hoy en d铆a el dise帽o de los computadores consiste en diferentes capas de abstracci贸n con una interfaz bien definida entre ellas. Las limitaciones de esta aproximaci贸n junto con el fin de la ley de Moore limitan el potencial de los futuros computadores. La mayor铆a de las mejoras actuales en el dise帽o de los computadores provienen fundamentalmente de la reducci贸n del tama帽o del canal del transistor, lo cual permite chips m谩s r谩pidos y con un consumo eficiente sin apenas cambios fundamentales en el dise帽o de la arquitectura. Sin embargo, la tecnolog铆a actual est谩 alcanzando limitaciones f铆sicas donde no ser谩 posible reducir el tama帽o de los transistores motivando as铆 un cambio de paradigma en la construcci贸n de los computadores. Esta tesis propone romper este dise帽o en capas y abogar por un sistema donde la arquitectura y el sistema de tiempo de ejecuci贸n del modelo de programaci贸n sean capaces de intercambiar informaci贸n para alcanzar una meta com煤n: La mejora del rendimiento y la reducci贸n del consumo energ茅tico. Haciendo que la arquitectura sea consciente de la informaci贸n disponible en el modelo de programaci贸n, como puede ser el grafo de dependencias entre tareas en los modelos de programaci贸n dataflow, es posible reducir el consumo energ茅tico explotando el camino critico del grafo. Adem谩s, la arquitectura puede proveer de soporte hardware para crear este grafo con el objetivo de reducir el overhead de construir este grado cuando la granularidad de las tareas es demasiado fina. Finalmente, el estado de las comunicaciones entre nodos puede ser expuesto al sistema de tiempo de ejecuci贸n para realizar una mejor planificaci贸n de las comunicaciones y creando nuevas oportunidades de solapamiento entre c贸mputo y comunicaci贸n que no eran posibles anteriormente. Esta tesis aporta una evaluaci贸n de todas estas propuestas, as铆 como una metodolog铆a para simular y caracterizar el comportamiento de las aplicacionesPostprint (published version

    Performance Observability and Monitoring of High Performance Computing with Microservices

    Get PDF
    Traditionally, High Performance Computing (HPC) softwarehas been built and deployed as bulk-synchronous, parallel executables based on the message-passing interface (MPI) programming model. The rise of data-oriented computing paradigms and an explosion in the variety of applications that need to be supported on HPC platforms have forced a re-think of the appropriate programming and execution models to integrate this new functionality. In situ workflows demarcate a paradigm shift in HPC software development methodologies enabling a range of new applications --- from user-level data services to machine learning (ML) workflows that run alongside traditional scientific simulations. By tracing the evolution of HPC software developmentover the past 30 years, this dissertation identifies the key elements and trends responsible for the emergence of coupled, distributed, in situ workflows. This dissertation's focus is on coupled in situ workflows involving composable, high-performance microservices. After outlining the motivation to enable performance observability of these services and why existing HPC performance tools and techniques can not be applied in this context, this dissertation proposes a solution wherein a set of techniques gathers, analyzes, and orients performance data from different sources to generate observability. By leveraging microservice components initially designed to build high performance data services, this dissertation demonstrates their broader applicability for building and deploying performance monitoring and visualization as services within an in situ workflow. The results from this dissertation suggest that: (1) integration of performance data from different sources is vital to understanding the performance of service components, (2) the in situ (online) analysis of this performance data is needed to enable the adaptivity of distributed components and manage monitoring data volume, (3) statistical modeling combined with performance observations can help generate better service configurations, and (4) services are a promising architecture choice for deploying in situ performance monitoring and visualization functionality. This dissertation includes previously published and co-authored material and unpublished co-authored material
    corecore