9 research outputs found

    Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems

    Get PDF
    A challenge that heterogeneous system programmers face is leveraging the performance of all the devices that integrate the system. This paper presents Sigmoid, a new load balancing algorithm that efficiently co-executes a single OpenCL data-parallel kernel on all the devices of heterogeneous systems. Sigmoid splits the workload proportionally to the capabilities of the devices, drastically reducing response time and energy consumption. It is designed around several features; it is dynamic, adaptive, guided and effortless, as it does not require the user to give any parameter, adapting to the behaviourof each kernel at runtime. To evaluate Sigmoid's performance, it has been implemented in Maat, a system abstraction library. Experimental results with different kernel types show that Sigmoid exhibits excellent performance, reaching a utilization of 90%, together with energy savings up to 20%, always reducing programming effort compared to OpenCL, and facilitating the portability to other heterogeneous machines.This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence

    Infrared Thermography for Estimating Supraclavicular Skin Temperature and BAT Activity in Humans: A Systematic Review

    Get PDF
    Objective: Brown adipose tissue (BAT) is a thermogenic tissue with potential as a therapeutic target in the treatment of obesity and related metabolic disorders. The most used technique for quantifying human BAT activity is the measurement of 18F-fluorodeoxyglucose uptake via a positron emission tomography/computed tomography scan following exposure to cold. However, several studies have indicated the measurement of the supraclavicular skin temperature (SST) by infrared thermography (IRT) to be a less invasive alternative. This work reviews the state of the art of this latter method as a means of determining BAT activity in humans. Methods: The data sources for this review were PubMed, Web of Science, and EBSCOhost (SPORTdiscus), and eligible studies were those conducted in humans. Results: In most studies in which participants were first cooled, an increase in IRT-measured SST was noted. However, only 5 of 24 such studies also involved a nuclear technique that confirmed increased activity in BAT, and only 2 took into account the thickness of the fat layer when measuring SST by IRT. Conclusions: More work is needed to understand the involvement of tissues other than BAT in determining IRTmeasured SST; at present, IRT cannot determine whether any increase in SST is due to increased BAT activity.This study was supported by the Spanish Ministry of Economy and Competitiveness (MINECO) via the Fondo de Investigación Sanitaria del Instituto de Salud Carlos III (PI13/01393), Retos de la Sociedad (DEP2016-79512-R) and European Regional Development Funds (ERDF), the Fundación Iberoamericana de Nutrición (FINUT), the Redes Temáticas de Investigación Cooperativa RETIC (Red SAMID RD16/0022), the AstraZeneca HealthCare Foundation, the University of Granada Plan Propio de Investigación 2016 Excellence actions: Unit of Excellence on Exercise and Health (UCEES), and Plan Propio de Investigación 2018 and the Junta de Andalucía, Consejería de Conocimiento, Investigación y Universidades (ERDF: SOMM17/6107/UGR). DSI is an Investigator of the Miguel Servet Fund from Carlos III National Institute of Health, Spain (CP15/00106). DJP is supported by grants from the Spanish Ministry of Science and Innovation-MINECO (RYC-2014-16938), MINECO/European Fund for Regional Development (FEDER) (DEP2016-76123-R), the Government of Andalusia, the Integrated Territorial Initiative 2014-2020 for the Province of Cádiz (PI-0002-2017), the European Union's ERASMUS+SPORT program (Grant Agreement 603121-EPP-1-2018-1-ES-SPO-SCP), and the EXERNET Research Network on Exercise and Health in Special Populations (DEP2005-00046/ACTI)

    Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems

    Get PDF
    The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493

    Impact of an intermittent and localized cooling intervention on skin temperature, sleep quality and energy expenditure in free-living, young, healthy adults

    Get PDF
    Where people live and work together it is not always possible to modify the ambient temperature; ways must therefore be found that allow individuals to feel thermally comfortable in such settings. The Embr Wave (R) is a wrist-worn device marketed as a 'personal thermostat' that can apply a local cooling stimulus to the skin. The aim of the present study was to determine the effect of an intermittent mild cold stimulus of 25 degrees C for 15-20 s every 5 min over 3.5 days under free-living conditions on 1) skin temperature, 2) perception of skin temperature, 3) sleep quality and 4) resting energy expenditure (REE) in young, healthy adults. Ten subjects wore the device for 3.5 consecutive days. This intervention reduced distal skin temperature after correcting for personal ambient temperature (P = 0.051). Thus, this intermittent mild cold regime can reduce distal skin temperature, and wearing it under free-living conditions for 3.5 days does not seem to impair the perception of skin temperature and sleep quality or modify REE.The study was funded by the Spanish Ministry of Economy and Competitiveness via the Fondo de Investigacion Sanitaria del Instituto de Salud Carlos III (PI13/01393 and CB16/10/00239) and PTA 12264-I, Retos de la Sociedad (DEP2016-79512-R), and European Regional Development Funds (ERDF). Other funders included the Spanish Ministry of Education (FPU 16/05159, 15/04059 and 19/02326), the Fundacion Iberoamericana de Nutricion (FINUT), the Redes Tematicas De Investigacion Cooperativa RETIC (Red SAMID RD16/0022), the AstraZeneca Health Care Foundation, the University of Granada Plan Propio de Investigacion 2016 (Excellence actions: Unit of Excellence on Exercise, Nutrition and Health [UCEENS]), and by the Junta de Andalucia, Consejeria de Conocimiento, Investigacion y Universidades (ERDF, SOMM17/6107/UGR). AMT was supported by Seneca Foundation through grant 19899/GERM/15 and the Ministry of Science Innovation and Universities RTI2018-093528-B-I0, as well as DJP (MINECO; RYC-2014-16938). BMT was supported by an individual postdoctoral grant from the Fundacion Alfonso Martin Escudero. We thank Dr. Matt Smith of Embr Labs Inc. for configuring the Embr Wave (R) devices used in this experiment

    Vitruvius+: An area-efficient RISC-V decoupled vector coprocessor for high performance computing applications

    Get PDF
    The maturity level of RISC-V and the availability of domain-specific instruction set extensions, like vector processing, make RISC-V a good candidate for supporting the integration of specialized hardware in processor cores for the High Performance Computing (HPC) application domain. In this article,1 we present Vitruvius+, the vector processing acceleration engine that represents the core of vector instruction execution in the HPC challenge that comes within the EuroHPC initiative. It implements the RISC-V vector extension (RVV) 0.7.1 and can be easily connected to a scalar core using the Open Vector Interface standard. Vitruvius+ natively supports long vectors: 256 double precision floating-point elements in a single vector register. It is composed of a set of identical vector pipelines (lanes), each containing a slice of the Vector Register File and functional units (one integer, one floating point). The vector instruction execution scheme is hybrid in-order/out-of-order and is supported by register renaming and arithmetic/memory instruction decoupling. On a stand-alone synthesis, Vitruvius+ reaches a maximum frequency of 1.4 GHz in typical conditions (TT/0.80V/25°C) using GlobalFoundries 22FDX FD-SOI. The silicon implementation has a total area of 1.3 mm2 and maximum estimated power of ~920 mW for one instance of Vitruvius+ equipped with eight vector lanes.This research has received funding from the European High Performance Computing Joint Undertaking (JU) under Framework Partnership Agreement No 800928 (European Processor Initiative) and Specific Grant Agreement No 101036168 (EPI SGA2). The JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland. The EPI-SGA2 project, PCI2022-132935 is also co-funded by MCIN/AEI/10.13039/501100011033 and by the UE NextGen- erationEU/PRTR. This work has also been partially supported by the Spanish Ministry of Science and Innovation (PID2019-107255GB-C21/AEI/10.13039/501100011033).Peer ReviewedPostprint (author's final draft

    DVINO: A RISC-V vector processor implemented in 65nm technology

    Get PDF
    This paper describes the design, verification, implementation and fabrication of the Drac Vector IN-Order (DVINO) processor, a RISC-V vector processor capable of booting Linux jointly developed by BSC, CIC-IPN, IMB-CNM (CSIC), and UPC. The DVINO processor includes an internally developed two-lane vector processor unit as well as a Phase Locked Loop (PLL) and an Analog-to-Digital Converter (ADC). The paper summarizes the design from architectural as well as logic synthesis and physical design in CMOS 65nm technology.The DRAC project is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. The authors are part of RedRISCV which promotes activities around open hardware. The Lagarto Project is supported by the Research and Graduate Secretary (SIP) of the Instituto Politecnico Nacional (IPN) from Mexico, and by the CONACyT scholarship for Center for Research in Computing (CIC-IPN).Peer ReviewedArticle signat per 43 autors/es: Guillem Cabo∗, Gerard Candón∗, Xavier Carril∗, Max Doblas∗, Marc Domínguez∗, Alberto González∗, Cesar Hernández†, Víctor Jiménez∗, Vatistas Kostalampros∗, Rubén Langarita∗, Neiel Leyva†, Guillem López-Paradís∗, Jonnatan Mendoza∗, Francesco Minervini∗, Julian Pavón∗, Cristobal Ramírez∗, Narcís Rodas∗, Enrico Reggiani∗, Mario Rodríguez∗, Carlos Rojas∗, Abraham Ruiz∗, Víctor Soria∗, Alejandro Suanes‡, Iván Vargas∗, Roger Figueras∗, Pau Fontova∗, Joan Marimon∗, Víctor Montabes∗, Adrián Cristal∗, Carles Hernández∗, Ricardo Martínez‡, Miquel Moretó∗§, Francesc Moll∗§, Oscar Palomar∗§, Marco A. Ramírez†, Antonio Rubio§, Jordi Sacristán‡, Francesc Serra-Graells‡, Nehir Sonmez∗, Lluís Terés‡, Osman Unsal∗, Mateo Valero∗§, Luís Villa† // ∗Barcelona Supercomputing Center (BSC), Barcelona, Spain. Email: [email protected]; †Centro de Investigación en Computación, Instituto Politécnico Nacional (CIC-IPN), Mexico City, Mexico; ‡ Institut de Microelectronica de Barcelona, IMB-CNM (CSIC), Spain. Email: [email protected]; §Universitat Politecnica de Catalunya (UPC), Barcelona, Spain. Email: [email protected] (author's final draft

    RICORS2040 : The need for collaborative research in chronic kidney disease

    Get PDF
    Chronic kidney disease (CKD) is a silent and poorly known killer. The current concept of CKD is relatively young and uptake by the public, physicians and health authorities is not widespread. Physicians still confuse CKD with chronic kidney insufficiency or failure. For the wider public and health authorities, CKD evokes kidney replacement therapy (KRT). In Spain, the prevalence of KRT is 0.13%. Thus health authorities may consider CKD a non-issue: very few persons eventually need KRT and, for those in whom kidneys fail, the problem is 'solved' by dialysis or kidney transplantation. However, KRT is the tip of the iceberg in the burden of CKD. The main burden of CKD is accelerated ageing and premature death. The cut-off points for kidney function and kidney damage indexes that define CKD also mark an increased risk for all-cause premature death. CKD is the most prevalent risk factor for lethal coronavirus disease 2019 (COVID-19) and the factor that most increases the risk of death in COVID-19, after old age. Men and women undergoing KRT still have an annual mortality that is 10- to 100-fold higher than similar-age peers, and life expectancy is shortened by ~40 years for young persons on dialysis and by 15 years for young persons with a functioning kidney graft. CKD is expected to become the fifth greatest global cause of death by 2040 and the second greatest cause of death in Spain before the end of the century, a time when one in four Spaniards will have CKD. However, by 2022, CKD will become the only top-15 global predicted cause of death that is not supported by a dedicated well-funded Centres for Biomedical Research (CIBER) network structure in Spain. Realizing the underestimation of the CKD burden of disease by health authorities, the Decade of the Kidney initiative for 2020-2030 was launched by the American Association of Kidney Patients and the European Kidney Health Alliance. Leading Spanish kidney researchers grouped in the kidney collaborative research network Red de Investigación Renal have now applied for the Redes de Investigación Cooperativa Orientadas a Resultados en Salud (RICORS) call for collaborative research in Spain with the support of the Spanish Society of Nephrology, Federación Nacional de Asociaciones para la Lucha Contra las Enfermedades del Riñón and ONT: RICORS2040 aims to prevent the dire predictions for the global 2040 burden of CKD from becoming true

    Efficient co-execution support in heterogeneous systems.

    No full text
    RESUMEN: Las arquitecturas heterogéneas ofrecen capacidades excelentes en términos tanto de rendimiento como de eficiencia energética. Sin embargo, la mayoría de sistemas y modelos de programación actuales consideran que los recursos heterogéneos son entidades totalmente independientes, dejando su gestión en manos del programador. Esto favorece el paralelismo de tareas y paradigmas de programación host-device. Como consecuencia, el programador no dispone de ninguna ayuda que facilite la colaboración de todos los dispositivos disponibles, realizando el cómputo asociado a una sola tarea y aprovechando el paralelismo de datos. Esta forma de operar recibe el nombre de co-ejecución y actualmente requiere que el programador divida manualmente la carga de trabajo, teniendo que encargarse de decisiones complejas como el equilibrio de carga o la distribución de datos. Sin embargo, para que la co-ejecución sea verdaderamente útil, es necesario que no requiera mayor esfuerzo que la utilización de un solo dispositivo. Esta tesis propone técnicas tanto software como hardware que posibilitan una co-ejecución eficiente sin esfuerzo. Estas técnicas incluyen dos nuevos algoritmos de balanceo de carga, una librería de abstracción, una implementación de co-ejecución en un modelo de programación basado en tareas y el diseño de un nuevo dispatcher dar soporte hardware a la co-ejecución.ABSTRACT: Heterogeneous architectures offer outstanding capabilities in terms of both performance and energy efficiency. However, most current systems and programming models regard heterogeneous resources as independent entities, leaving their management at the hands of the programmer. This favours task parallelism and a host-device approach to programming. A consequence of this is that the programmer is effectively left alone regarding co-execution, which is the computation of all the devices, collaborating on the work associated to a single workload in a data-parallel manner. Achieving it requires a careful and manual division of the workload, which requires the programmer to make complex decisions such as those related to data distribution and load balancing. Nevertheless, for heterogeneous co-execution to be useful, it has to be effortless, requiring an equivalent amount of work to using a single device. This dissertation proposes software and hardware techniques to enable effortless, performant co-execution, including two novel load balancing algorithms, a new abstraction library, an implementation of co-execution in a task-based programming model and a new dispatcher design for hardware supported co-execution

    Dynamic load balancing on Multi-CPU and Multi-GPU systems

    No full text
    RESUMEN: A causa del surgimiento de las GPUs como dispositivos de propósito general con gran capacidad de cómputo paralelo, los sistemas heterogéneos, que utilizan GPUs y CPUs, han cobrado especial protagonismo en el ámbito de la computación de alto rendimiento. A raí z de esto, se han desarrollado modelos de programación que permiten trabajar con dispositivos heterogéneos dentro de un sistema. Sin embargo, el soporte a multitud de dispositivos muy dispares, ofrecido por estos modelos, tiene un inconveniente fundamental: es necesario que la gestión del hardware se realice de forma individual para cada dispositivo, con las dificultades que esto conlleva. Asimismo, la obtención de un buen rendimiento, aprovechando todos los recursos del sistema, no resulta una tarea trivial, pues implica distribuir la carga de trabajo en función de la potencia de cómputo de cada dispositivo y gestionar la memoria adecuadamente, pues, en general, la memoria de cada dispositivo se encuentra separada. El propósito de este proyecto es desarrollar un modelo de programación que permita distribuir el trabajo entre todos los recursos de los que disponga un sistema, de manera transparente al programador y aprovechando toda la potencia de cómputo disponible. Para ello, el modelo implementa 4 técnicas de balanceo de carga que permiten adaptar la forma de distribución de trabajo a las necesidades de cada aplicación, de modo que se aprovechen los recursos adecuadamente. Por otro lado, se introduce la noción de "sistema único", que permite que el usuario se comunique con el sistema completo, en lugar de con una miríada de dispositivos aislados. Esta idea permite que el código sea portable a otros sistemas con hardware diferente y, además, resulta en una disminución de la cantidad de las l neas de código necesarias para ejecutar una aplicación, facilitando la labor de programación.ABSTRACT: Due to the emergence of GPUs as general purpose devices with great parallel computing capabilities, heterogeneous systems, which use GPUs and CPUs, have gained special prominence in the field of High Performance Computing. As a result of this, some programming models that make working with heterogeneous devices possible have been developed. However, the support of several different devices offered by these models has an important drawback: device management has to be done independently for each device, with all the difficulties that this carries. Obtaining of a good performance, making the most of all the available resources, is not a trivial task either, because it implies distributing the workload taking the computational power of each device into account and managing memory correctly, because, usually, the memory of the devices is separated. The purpose of this project is developing a programming model that makes distributing the workload among all the resources available in a system possible, transparently to the programmer and making the most of the available computational power. To accomplish this, the model implements 4 load balancing techniques, so workload distribution can be adapted to the needs of each application and resources adequately used. The notion of using an only system, which enables the programmer to communicate with the whole system, instead of communicating with a myriad of isolated devices, is presented too. This idea makes source code portable to systems with different hardware and also reduces the source code length needed to run an application, making programming easier.Ingeniería en Informátic
    corecore