17 research outputs found

    Improving utilization of heterogeneous clusters

    Get PDF
    Datacenters often agglutinate sets of nodes with different capabilities, leading to a sub-optimal resource utilization. One of the best ways of improving utilization is to balance the load by taking into account the heterogeneity of these clusters. This article presents a novel way of expressing computational capacity, more adequate for heterogeneous clusters, and also advocates for task migration in order to further improve the utilization. The experimental evaluation shows that both proposals are advantageous and allow improving the utilization of heterogeneous clusters and reducing the makespan to 16.7% and 17.1%, respectively.This work has been supported by the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and TIN2016-81840-REDT (CAPAP-H6 network) and the European HiPEAC Network of Excellenc

    Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems

    Get PDF
    A challenge that heterogeneous system programmers face is leveraging the performance of all the devices that integrate the system. This paper presents Sigmoid, a new load balancing algorithm that efficiently co-executes a single OpenCL data-parallel kernel on all the devices of heterogeneous systems. Sigmoid splits the workload proportionally to the capabilities of the devices, drastically reducing response time and energy consumption. It is designed around several features; it is dynamic, adaptive, guided and effortless, as it does not require the user to give any parameter, adapting to the behaviourof each kernel at runtime. To evaluate Sigmoid's performance, it has been implemented in Maat, a system abstraction library. Experimental results with different kernel types show that Sigmoid exhibits excellent performance, reaching a utilization of 90%, together with energy savings up to 20%, always reducing programming effort compared to OpenCL, and facilitating the portability to other heterogeneous machines.This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence

    Un sistema para la docencia a distancia en asignaturas con hardware real

    Get PDF
    La docencia práctica en laboratorio de asignaturas centradas en el hardware como las del área de Estructura y Organización de Computadores se ha visto severamente afectada por el COVID-19. En este artículo se introduce un nuevo sistema de laboratorio remoto para la realización de sesiones prácticas basadas en Raspberry Pi ejecutando el sistema operativo RISC OS. El sistema gestiona tanto la alimentación de los equipos como la entrada/salida realizada a través de dispositivos periféricos, y permite al alumno visualizar e interaccionar con el escritorio del equipo remoto y con los dispositivos hardware conectados al mismo. Asimismo, el sistema facilita que un alumno y un profesor puedan visualizar el equipo remoto de forma simultánea en tiempo real, lo que facilita la resolución de dudas y la realización de pruebas de evaluación. El sistema combina una lógica de control basada en módulos Arduino y conexiones Ethernet con una interfaz web programada en lenguaje PHP. Con estas especiaciones, se ha desarrollado con éxito una prueba de concepto dotada de dos equipos remotos y dos interfaces de entrada.Los autores agradecen la colaboración de Fernando Vallejo, Carmen Martínez y Cristóbal Camarero. Este trabajo ha sido parcialmente financiado por la V Convocatoria de Proyectos de Innovación Docente, del Vicerrectorado de Ordenación Académica y Profesorado de la Universidad de Cantabria

    Assessing the Suitability of King Topologies for Interconnection Networks

    Get PDF
    In the late years many different interconnection networks have been used with two main tendencies. One is characterized by the use of high-degree routers with long wires while the other uses routers of much smaller degree. The latter rely on two-dimensional mesh and torus topologies with shorter local links. This paper focuses on doubling the degree of common 2D meshes and tori while still preserving an attractive layout for VLSI design. By adding a set of diagonal links in one direction, diagonal networks are obtained. By adding a second set of links, networks of degree eight are built, named king networks. This research presents a comprehensive study of these networks which includes a topological analysis, the proposal of appropriate routing procedures and an empirical evaluation. King networks exhibit a number of attractive characteristics which translate to reduced execution times of parallel applications. For example, the execution times NPB suite are reduced up to a 30 percent. In addition, this work reveals other properties of king networks such as perfect partitioning that deserves further attention for its convenient exploitation in forthcoming high-performance parallel systems

    Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

    Get PDF
    The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493

    Outcomes from elective colorectal cancer surgery during the SARS-CoV-2 pandemic

    Get PDF
    This study aimed to describe the change in surgical practice and the impact of SARS-CoV-2 on mortality after surgical resection of colorectal cancer during the initial phases of the SARS-CoV-2 pandemic

    King topologies as interconnection networks : cross my mesh and hope to rule

    No full text
    RESUMEN: Las topologías King son una evolución de redes de interconexión de computadores de alto rendimiento, concretamente las mallas y toros. Con el propósito de incrementar el grado de éstas, las redes king añaden enlaces diagonales en dos direcciones. Esto tiene el efecto de mejorar el rendimiento, aumentando el throughput y disminuyendo la latencia. Esta tesis propone varios algoritmos de enrutamiento para satisfacer diversos requerimientos. Comienza con el estudio del enrutamiento de mínima distancia, para aplicaciones que requieran latencias bajas. Por otro lado propone un algoritmo de enrutamiento no mínimo, que relaja la restricción de distancia mínima para mejorar el equilibrio de carga en situaciones de tráfico adverso. Además hace un análisis de algoritmos de enrutamiento tolerantes a fallos, proponiendo un algoritmo original específico para redes King. Finalmente, la tesis muestra un estudio de coste energético y de área para establecer que estas redes son una alternativa viable a las tradicionales.ABSTRACT: King topologies are an evolution of the meshes and tori commonly used as interconnection networks for high-performance computing. In order to increase the degree of the latter, king networks add diagonal links in both orientations. This has the effect of improving the performance, increasing the throughput and reducing the latency. This thesis proposes several routing algorithms that satisfy different needs. First, it studies minimum-distance routing for applications requiring short latencies. Next it proposes a misrouting algorithm, that relaxes the minimum distance restriction to improve the load balancing capability in the presence of adverse traffic patterns. In addition, it studies fault-tolerant routing algorithms, and proposes an original algorithm specific for king networks. Lastly, the thesis shows an area and energy cost evaluation to establish that these networks are a viable alternative to traditional network

    Performance and energy task migration model for heterogeneous clusters

    No full text
    This article presents a set of linear regression models to predict the impact of task migration on different objectives, like performance and energy consumption. It allows to establish whether at a given moment the migration of a task is profitable in terms of performance or energy consumption. Also, it can be used to determine the best node to migrate a task depending on the objective. The model uses a small set of parameters that are easily measurable. It has been validated against a small heterogeneous cluster using the Slurm resource manager. The model captures the tendencies observed in the results of the experiments, with average relative errors below 3.5% in execution time and 2.5% in energy consumption.Acknowledgements This work has been supported by the Spanish Science and Tech[1]nology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence

    A simulator for intelligent workload managers in heterogeneous clusters

    No full text
    Modern High Performance Computing (HPC) clusters often comprise a huge amount of computing resources of different capabilities, making them heterogeneous and difficult to manage. In addition, they must deal with a wide range of applications with different requirements. All this poses a great challenge to the workload managers that assign applications to resources. There are many new proposals to overcome this challenge, including some that employ Deep Reinforcement Learning (DRL) techniques. This paper proposes a novel simulation framework for the study of workload managers, that has been conceived to foster the study of workload managers based on DRL techniques. Its main features include the simulation of heterogeneous clusters based on multicore architectures, taking into account the contention in shared memory access and the energy consumption. A validation of the accuracy and performance of the simulator was made, compared with a real environment based on Slurm. This shows good accuracy of the results, with a relative error below 5% in makespan and 10% in energy consumption, and speedups up to 200.This work has been supported by the Spanish Science andTechnology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence
    corecore