76 research outputs found
Mashing load balancing algorithm to boost hybrid kernels in molecular dynamics simulations
The path to the efficient exploitation of molecular dynamics simulators is strongly driven by the increasingly intensive use of accelerators. However, they suffer performance portability issues, making it necessary both to achieve technological combinations that allow taking advantage of each programming model and device, and to define more effective load distribution strategies that consider the simulation conditions. In this work, a new load balancing algorithm is presented, together with a set of optimizations to support hybrid co-execution in a runtime system for heterogeneous computing. The new extended design enables the exploitation of custom kernels and acceleration technologies altogether, being encapsulated for the rest of the runtime and its scheduling system. With this support, Mash algorithm allows to simultaneously leverage different workload distribution strategies, benefiting from the most advantageous one per device and technology. Experiments show that these proposals achieve an efficiency close to 0.90 and an energy efficiency improvement around 1.80 over the original optimized version.This work has been supported by the Spanish Ministry of Education (FPU16/03299 grant), the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and performed under the Project HPC-EUROPA3 (INFRAIA-2016-1-730897), with the support of the EC Research Innovation Action (H2020). The author gratefully acknowledges the support of the SPMT group, part of HLRS
Improving utilization of heterogeneous clusters
Datacenters often agglutinate sets of nodes with different capabilities, leading to a sub-optimal resource utilization. One of the best ways of improving utilization is to balance the load by taking into account the heterogeneity of these clusters. This article presents a novel way of expressing computational capacity, more adequate for heterogeneous clusters, and also advocates for task migration in order to further improve the utilization. The experimental evaluation shows that both proposals are advantageous and allow improving the utilization of heterogeneous clusters and reducing the makespan to 16.7% and 17.1%, respectively.This work has been supported by the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and TIN2016-81840-REDT (CAPAP-H6 network) and the
European HiPEAC Network of Excellenc
Simulation with skeletons of applications using dimemas
Large computer systems, like those in the TOP 500 ranking, comprise about hundreds of thousands cores. Simulating application execution in these systems is very complex and costly. This article explores the option of using application skeletons, together with an analytic simulator, to study the performance of these large systems. With this aim, the Dimemas simulator has been enhanced with the capability of simulating application skeletons. This enhancement allows simulating the skeleton of Lulesh, an application with 90k processes in a single day. In addition, it also generates traces, which is of great value to validate skeletons and simulations.This work has been partially supported under grants the Spanish Ministry of Science, Innovation and Universities under contract TIN2016-76635-C2-2-R (AEI/FEDER, UE), the Mont-Blanc project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 671697 and Juan de la Cierva{Formaci on contract (FJCI-2017-31643) by the Ministerio de Ciencia, Innovaci on y Universidades of Spain
Sigmoid: An auto-tuned load balancing algorithm for heterogeneous systems
A challenge that heterogeneous system programmers face is leveraging the performance of all the devices that integrate the system. This paper presents Sigmoid, a new load balancing algorithm that efficiently co-executes a single OpenCL data-parallel kernel on all the devices of heterogeneous systems. Sigmoid splits the workload proportionally to the capabilities of the devices, drastically reducing response time and energy consumption. It is designed around several features; it is dynamic, adaptive, guided and effortless, as it does not require the user to give any parameter, adapting to the behaviourof each kernel at runtime. To evaluate Sigmoid's performance, it has been implemented in Maat, a system abstraction library. Experimental results with different kernel types show that Sigmoid exhibits excellent performance, reaching a utilization of 90%, together with energy savings up to 20%, always reducing programming effort compared to OpenCL, and facilitating the portability to other heterogeneous machines.This work has been supported by the Spanish Science and Technology Commission under contract PID2019-105660RB-C22 and the European HiPEAC Network of Excellence
Parallelisation of decision-making techniques in aquaculture enterprises
Nowadays, the Artificial Intelligent (AI) techniques are applied in enterprise software to solve Big Data and Business Intelligence (BI) problems. But most AI techniques are computationally excessive, and they become unfeasible for common business use. Therefore, specific high performance computing is needed to reduce the
response time and make these software applications viable on an industrial environment. The main objective of this paper is to demonstrate the improvement of an aquaculture BI tool based in AI techniques, using parallel programming. This tool, called AquiAID, was created by the research group of Economic Management for
the Sustainable Development of Primary Sector of the Universidad de Cantabria. The parallelisation reduces the computation time up to 60 times, and the energy efficiency by 600 times with respect to the sequential program. With these improvements, the software will improve the fish farming management in aquaculture
industry.This work has been supported by the Spanish Science and Technology Commission under contracts
PID2019-105660RB-C22 and TED2021-131176B-I00 and FPU21/03110
Characterizing the Communication Demands of the Graph500 Benchmark on a Commodity Cluster
Big Data applications have gained importance over the last few years. Such applications focus on the analysis of huge amounts of unstructured information and present a series of differences with traditional High Performance Computing (HPC) applications. For illustrating such dissimilarities, this paper analyzes the behavior of the most scalable version of the Graph500 benchmark when run on a state-of-the-art commodity cluster facility. Our work shows that this new computation paradigm stresses the interconnection subsystem.
In this work, we provide both analytical and empirical characterizations of the Graph500 benchmark, showing that its communication needs bound the achieved performance on
a cluster facility. Up to our knowledge, our evaluation is the first to consider the impact of message aggregation on the communication overhead and explore a tradeoff that diminishes benchmark execution time, increasing system performance
The Role of Aesthetics in the Teaching and Learning of Mathematics
Presentamos una exploración del papel de la estética en la enseñanza y aprendizaje de las matemáticas. Primeramente, revisamos el estado de la cuestión. Posteriormente, obtenemos mediante un cuestionario, que de entre los criterios estéticos que aparecen en la literatura, el referido a la simplicidad es el que parece ser compartido por alumnos de educación secundaria con calificaciones académicas medias y bajas. Finalmente, observamos mediante un estudio de casos, que a través de contenidos matemáticos, en este caso las teselaciones del plano, se vislumbra en alumnos de educación secundaria la aparición de consideraciones y experiencias estéticas.We present an exploration of the role of aesthetics in the teaching and learning of Mathematics. Firstly, we review the state of the issue. Subsequently, through a questionnaire, we obtain that among the aesthetic criteria that appear in the literature is the one related to simplicity that seems to be shared by high school students of middle and low academic qualifications. Finally, through a case study, we observe that through mathematical contents, in this case the tessellations of the plane, we see in students of high school the appearance of aesthetic considerations and experiences
A clustering-based knowledge discovery process for data centre infrastructure management
Data centre infrastructure management (DCIM) is the integration of information technology and facility management disciplines to centralise monitoring and management in data centres. One of the most important problems of DCIM tools is the analysis of the huge amount of data obtained from the real-time monitoring of thousands of resources. In this paper, an adaptation of the knowledge discovery process for dealing with the data analysis in DCIM tools is proposed. A case of study based on monitoring and labelling of nodes of a high performance computing data centre in real time is presented. This shows that characterising the state of the nodes according to a reduced and relevant set of metrics is feasible and its outcome directly usable, simplifying consequently the decision-making process in these complex infrastructures. © 2016, Springer Science+Business Media New York.This work has been supported by the Spanish Science and Technol-
ogy Commission (CICYT) under contract TIN2013-46957-C2-2-P and CAPAP-H5 network
TIN2014-53522, the European HiPEAC Network of Excellence
Auto-tuned OpenCL kernel co-execution in OmpSs for heterogeneous systems
The emergence of heterogeneous systems has been very notable recently. The nodes of the most powerful computers integrate several compute accelerators, like GPUs. Profiting from such node configurations is not a trivial endeavour. OmpSs is a framework for task based parallel applications, that allows the execution of OpenCl kernels on different compute devices. However, it does not support the co-execution of a single kernel on several devices. This paper presents an extension of OmpSs that rises to this challenge, and presents Auto-Tune, a load balancing algorithm that automatically adjusts its internal parameters to suit the hardware capabilities and application behavior. The extension allows programmers to take full advantage of the computing devices with negligible impact on the code. It takes care of two main issues. First, the automatic distribution of datasets and the management of device memory address spaces. Second, the implementation of a set of load balancing algorithms to adapt to the particularities of applications and systems. Experimental results reveal that the co-execution of single kernels on all the devices in the node is beneficial in terms of performance and energy consumption, and that Auto-Tune gives the best overall results.This work has been supported by the University of Cantabria with grant CVE-2014-18166, the Generalitat de Catalunya under grant 2014-SGR-1051, the Spanish Ministry of Economy, Industry and Competitiveness under contracts TIN2016-76635-C2-2-R (AEI/FEDER, UE) and TIN2015-65316-P. The Spanish Government through the Programa Severo Ochoa (SEV-2015-0493
- …