11 research outputs found
An study of the effect of process malleability in the energy efficiency on GPU‑based clusters
The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution
Modeling power consumption of 3D MPDATA and the CG method on ARM and Intel multicore architectures
We propose an approach to estimate the power consumption of algorithms, as a function of the frequency and number of cores, using only a very reduced set of real power measures. In addition, we also provide the formulation of a method to select the voltage–frequency scaling–concurrency throttling configurations that should be tested in order to obtain accurate estimations of the power dissipation. The power models and selection methodology are verified using two real scientific application: the stencil-based 3D MPDATA algorithm and the conjugate gradient (CG) method for sparse linear systems. MPDATA is a crucial component of the EULAG model, which is widely used in weather forecast simulations. The CG algorithm is the keystone for iterative solution of sparse symmetric positive definite linear systems via Krylov subspace methods. The reliability of the method is confirmed for a variety of ARM and Intel architectures, where the estimated results correspond to the real measured values with the average error being slightly below 5% in all cases
Stencil codes on a vector length agnostic architecture
Data-level parallelism is frequently ignored or underutilized. Achieved through vector/SIMD capabilities, it can provide substantial performance improvements on top of widely used techniques such as thread-level parallelism. However, manual vectorization is a tedious and costly process that needs to be repeated for each specific instruction set or register size. In addition, automatic compiler vectorization is susceptible to code complexity, and usually limited due to data and control dependencies. To address some these issues, Arm recently released a new vector ISA, the Scalable Vector Extension (SVE), which is Vector-Length Agnostic (VLA). VLA enables the generation of binary files that run regardless of the physical vector register length.
In this paper we leverage the main characteristics of SVE to implement and optimize stencil computations, ubiquitous in scientific
computing. We show that SVE enables easy deployment of textbook optimizations like loop unrolling, loop fusion, load trading or data reuse. Our detailed simulations using vector lengths ranging from 128 to 2,048 bits show that these optimizations can lead to performance improvements over straight-forward vectorized code of up to 56.6% for 2,048 bit vectors. In addition, we show that certain optimizations can hurt performance due to a reduction in arithmetic intensity, and provide insight useful for compiler optimizers.This work has been partially supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Economy and
Competitiveness (contract TIN2015-65316-P), and by the Generalitat de Catalunya (contracts 2017-SGR-1328 and 2017-SGR-1414).
The Mont-Blanc project receives funding from the EUs H2020 Framework Programme (H2020/2014-2020) under grant agreements
no. 671697 and no. 779877. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship number RYC-2016-21104. Finally,
A. Armejach has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Juan de la Cierva
postdoctoral fellowship number FJCI-2015-24753.Peer ReviewedPostprint (author's final draft
IMPROVING THE PERFORMANCE AND TIME-PREDICTABILITY OF GPUs
Graphic Processing Units (GPUs) are originally mainly designed to accelerate graphic applications. Now the capability of GPUs to accelerate applications that can be parallelized into a massive number of threads makes GPUs the ideal accelerator for boosting the performance of such kind of general-purpose applications. Meanwhile it is also very promising to apply GPUs to embedded and real-time applications as well, where high throughput and intensive computation are also needed.
However, due to the different architecture and programming model of GPUs, how to fully utilize the advanced architectural features of GPUs to boost the performance and how to analyze the worst-case execution time (WCET) of GPU applications are the problems that need to be addressed before exploiting GPUs further in embedded and real-time applications. We propose to apply both architectural modification and static analysis methods to address these problems. First, we propose to study the GPU cache behavior and use bypassing to reduce unnecessary memory traffic and to improve the performance. The results show that the proposed bypassing method can reduce the global memory traffic by about 22% and improve the performance by about 13% on average. Second, we propose a cache access reordering framework based on both architectural extension and static analysis to improve the predictability of GPU L1 data caches. The evaluation results show that the proposed method can provide good predictability in GPU L1 data caches, while allowing the dynamic warp scheduling for good performance. Third, based on the analysis of the architecture and dynamic behavior of GPUs, we propose a WCET timing model based on a predictable warp scheduling policy to enable the WCET estimation on GPUs. The experimental results show that the proposed WCET analyzer can effectively provide WCET estimations for both soft and hard real-time application purposes. Last, we propose to analyze the shared Last Level Cache (LLC) in integrated CPU-GPU architectures and to integrate the analysis of the shared LLC into the WCET analysis of the GPU kernels in such systems. The results show that the proposed shared data LLC analysis method can improve the accuracy of the shared LLC miss rate estimations, which can further improve the WCET estimations of the GPU kernels
Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016) Sofia, Bulgaria
Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016
Performance and Energy Optimization of the Iterative Solution of Sparse Linear Systems on Multicore Processors
En esta tesis doctoral se aborda la solución de sistemas dispersos de ecuaciones lineales utilizando métodos iterativos precondicionados basados en subespacios de Krylov. En concreto, se centra en ILUPACK, una biblioteca que implementa precondicionadores de tipo ILU multinivel para la solución eficiente de sistemas lineales dispersos. El incremento en el número de ecuaciones, y la aparición
de nuevas arquitecturas, motiva el desarrollo de una versión paralela de ILUPACK que optimice tanto el tiempo de ejecución como el consumo energético en arquitecturas multinúcleo actuales y en clusters de nodos construidos con esta tecnologÃa. El objetivo principal de la tesis es el diseño, implementación y valuación de resolutores paralelos energéticamente eficientes para sistemas lineales dispersos orientados a procesadores multinúcleo asà como aceleradores hardware como el Intel Xeon Phi. Para
lograr este objetivo, se aprovecha el paralelismo de tareas mediante OmpSs y MPI, y se desarrolla un entorno automático para detectar ineficiencias energéticas.In this dissertation we target the solution of large sparse systems of linear equations using preconditioned iterative methods based on Krylov subspaces. Specifically, we focus on ILUPACK, a library that offers multi-level ILU preconditioners for the effective solution of sparse linear systems.
The increase of the number of equations and the introduction of new HPC architectures motivates us to develop a parallel version of ILUPACK which optimizes both execution time and energy consumption on current multicore architectures and clusters of nodes built from this type of technology. Thus, the main goal of this thesis is the design, implementation and evaluation of parallel and energy-efficient iterative sparse linear system solvers for multicore processors as well as recent manycore accelerators such as the Intel Xeon Phi. To fulfill the general objective, we optimize ILUPACK exploiting task parallelism via OmpSs and MPI, and also develope an automatic framework to detect energy inefficiencies
Recommended from our members
Accelerating Materials Discovery with Machine Learning
As we enter the data age, ever-increasing amounts of human knowledge are being recorded in machine-readable formats.
This has opened up new opportunities to leverage data to accelerate scientific discovery.
This thesis focuses on how we can use historical and computational data to aid the discovery and development of new materials.
We begin by looking at a traditional materials informatics task -- elucidating the structure-function relationships of high-temperature cuprate superconductors.
One of the most significant challenges for materials informatics is the limited availability of relevant data.
We propose a simple calibration-based approach to estimate the apical and in-plane copper-oxygen distances from more readily available lattice parameter data to address this challenge for cuprate superconductors.
Our investigation uncovers a large, unexplored region of materials space that may yield cuprates with higher critical temperatures.
We propose two experimental avenues that may enable this region to be accessed.
Computational materials exploration is bottle-necked by our ability to provide input structures to feed our workflows.
Whilst \textit{ab-intio} structure identification is possible, it is computationally burdensome and we lack design rules for deciding where to target searches in high-throughput setups.
To address this, there is a need to develop tools that suggest promising candidates, enabling automated deployment and increased efficiency.
Machine learning models are well suited to this task, however, current approaches typically use hand-engineered inputs.
This means that their performance is circumscribed by the intuitions reflected in the chosen inputs.
We propose a novel way to formulate the machine learning task as a set regression problem over the elements in a material.
We show that our approach leads to higher sample efficiency than other well-established composition-based approaches.
Having demonstrated the ability of machine learning to aid in the selection of promising compound compositions, we next explore how useful machine learning might be for identifying fabrication routes.
Using a recently released data-mined data set of solid-state synthesis reactions, we design a two-stage model to predict the products of inorganic reactions.
We critically explore the performance of this model, showing that whilst the predictions fall short of the accuracy required to be chemically discriminative, the model provides valuable insights into understanding inorganic reactions.
Through careful investigation of the model's failure modes, we explore the challenges that remain in the construction of forward inorganic reaction prediction models and suggest some pathways to tackle the identified issues.
One of the principal ways that material scientists understand and categorise materials is in terms of their symmetries.
Crystal structure prototypes are assigned based on the presence of symmetrically equivalent sites known as Wyckoff positions.
We show that a powerful coarse-grained representation of materials structures can be constructed from the Wyckoff positions by discarding information about their coordinates within crystal structures.
One of the strengths of this representation is that it maintains the ability of structure-based methods to distinguish polymorphs whilst also allowing combinatorial enumeration akin to composition-based approaches.
We construct an end-to-end differentiable model that takes our proposed Wyckoff representation as input.
The performance of this approach is examined on a suite of materials discovery experiments showing that it leads to strong levels of enrichment in materials discovery tasks.
The research presented in this thesis highlights the promise of applying data-driven workflows and machine learning in materials discovery and development.
This thesis concludes by speculating about promising research directions for applying machine learning within materials discovery
Ocean Modelling in Support of Operational Ocean and Coastal Services
Operational oceanography is maturing rapidly. Its capabilities are being noticeably enhanced in response to a growing demand for regularly updated ocean information. Today, several core forecasting and monitoring services, such as the Copernicus Marine ones focused on global and regional scales, are well-stablished. The sustained availability of oceanography products has favored the proliferation of specific downstream services devoted to coastal monitoring and forecasting. Ocean models are a key component of these operational oceanographic systems (especially in a context marked by the extensive application of dynamical downscaling approaches), and progress in ocean modeling is certainly a driver for the evolution of these services. The goal of this Special Issue is to publish research papers on ocean modeling that benefit model applications that support existing operational oceanographic services. This Special Issue is addressed to an audience with interests in physical oceanography and especially on its operational applications. There is a focus on the numerical modeling needed for a better forecasts in marine environments and using seamless modeling approaches to simulate global to coastal processes
101 geodynamic modelling: how to design, interpret, and communicate numerical studies of the solid Earth
Geodynamic modelling provides a powerful tool to investigate processes in the Earth's crust, mantle, and core that are not directly observable. However, numerical models are inherently subject to the assumptions and simplifications on which they are based. In order to use and review numerical modelling studies appropriately, one needs to be aware of the limitations of geodynamic modelling as well as its advantages. Here, we present a comprehensive yet concise overview of the geodynamic modelling process applied to the solid Earth from the choice of governing equations to numerical methods, model setup, model interpretation, and the eventual communication of the model results. We highlight best practices and discuss their implementations including code verification, model validation, internal consistency checks, and software and data management. Thus, with this perspective, we encourage high-quality modelling studies, fair external interpretation, and sensible use of published work. We provide ample examples, from lithosphere and mantle dynamics specifically, and point out synergies with related fields such as seismology, tectonophysics, geology, mineral physics, planetary science, and geodesy. We clarify and consolidate terminology across geodynamics and numerical modelling to set a standard for clear communication of modelling studies. All in all, this paper presents the basics of geodynamic modelling for first-time and experienced modellers, collaborators, and reviewers from diverse backgrounds to (re)gain a solid understanding of geodynamic modelling as a whole