54 research outputs found
Development of an oceanographic application in HPC
High Performance Computing (HPC) is used for running advanced application programs
efficiently, reliably, and quickly.
In earlier decades, performance analysis of HPC applications was evaluated based on
speed, scalability of threads, memory hierarchy. Now, it is essential to consider the
energy or the power consumed by the system while executing an application.
In fact, the High Power Consumption (HPC) is one of biggest problems for the High
Performance Computing (HPC) community and one of the major obstacles for exascale
systems design.
The new generations of HPC systems intend to achieve exaflop performances and will
demand even more energy to processing and cooling. Nowadays, the growth of HPC
systems is limited by energy issues
Recently, many research centers have focused the attention on doing an automatic tuning
of HPC applications which require a wide study of HPC applications in terms of power
efficiency.
In this context, this paper aims to propose the study of an oceanographic application,
named OceanVar, that implements Domain Decomposition based 4D Variational model
(DD-4DVar), one of the most commonly used HPC applications, going to evaluate not
only the classic aspects of performance but also aspects related to power efficiency in
different case of studies.
These work were realized at Bsc (Barcelona Supercomputing Center), Spain within the
Mont-Blanc project, performing the test first on HCA server with Intel technology and then on a mini-cluster Thunder with ARM technology.
In this work of thesis it was initially explained the concept of assimilation date, the
context in which it is developed, and a brief description of the mathematical model
4DVAR.
After this problem’s close examination, it was performed a porting from Matlab
description of the problem of data-assimilation to its sequential version in C language.
Secondly, after identifying the most onerous computational kernels in order of time, it
has been developed a parallel version of the application with a parallel multiprocessor
programming style, using the MPI (Message Passing Interface) protocol.
The experiments results, in terms of performance, have shown that, in the case of
running on HCA server, an Intel architecture, values of efficiency of the two most
onerous functions obtained, growing the number of process, are approximately equal to
80%.
In the case of running on ARM architecture, specifically on Thunder mini-cluster,
instead, the trend obtained is labeled as "SuperLinear Speedup" and, in our case, it can
be explained by a more efficient use of resources (cache memory access) compared with
the sequential case.
In the second part of this paper was presented an analysis of the some issues of this
application that has impact in the energy efficiency.
After a brief discussion about the energy consumption characteristics of the Thunder
chip in technological landscape, through the use of a power consumption detector, the
Yokogawa Power Meter, values of energy consumption of mini-cluster Thunder were
evaluated in order to determine an overview on the power-to-solution of this application
to use as the basic standard for successive analysis with other parallel styles.
Finally, a comprehensive performance evaluation, targeted to estimate the goodness of
MPI parallelization, is conducted using a suitable performance tool named Paraver,
developed by BSC.
Paraver is such a performance analysis and visualisation tool which can be used to
analyse MPI, threaded or mixed mode programmes and represents the key to perform a parallel profiling and to optimise the code for High Performance Computing.
A set of graphical representation of these statistics make it easy for a developer to
identify performance problems. Some of the problems that can be easily identified are
load imbalanced decompositions, excessive communication overheads and poor average
floating operations per second achieved.
Paraver can also report statistics based on hardware counters, which are provided by the
underlying hardware.
This project aimed to use Paraver configuration files to allow certain metrics to be
analysed for this application.
To explain in some way the performance trend obtained in the case of analysis on the
mini-cluster Thunder, the tracks were extracted from various case of studies and the
results achieved is what expected, that is a drastic drop of cache misses by the case ppn
(process per node) = 1 to case ppn = 16.
This in some way explains a more efficient use of cluster resources with an increase of
the number of processes
Resilience for large ensemble computations
With the increasing power of supercomputers, ever more detailed models of physical systems can be simulated, and ever larger problem sizes can be considered for any kind of numerical system. During the last twenty years the performance of the fastest clusters went from the teraFLOPS domain (ASCI RED: 2.3 teraFLOPS) to the pre-exaFLOPS domain (Fugaku: 442 petaFLOPS), and we will soon have the first supercomputer with a peak performance cracking the exaFLOPS (El Capitan: 1.5 exaFLOPS). Ensemble techniques experience a renaissance with the availability of those extreme scales. Especially recent techniques, such as particle filters, will benefit from it. Current ensemble methods in climate science, such as ensemble Kalman filters, exhibit a linear dependency between the problem size and the ensemble size, while particle filters show an exponential dependency. Nevertheless, with the prospect of massive computing power come challenges such as power consumption and fault-tolerance. The mean-time-between-failures shrinks with the number of components in the system, and it is expected to have failures every few hours at exascale. In this thesis, we explore and develop techniques to protect large ensemble computations from failures. We present novel approaches in differential checkpointing, elastic recovery, fully asynchronous checkpointing, and checkpoint compression. Furthermore, we design and implement a fault-tolerant particle filter with pre-emptive particle prefetching and caching. And finally, we design and implement a framework for the automatic validation and application of lossy compression in ensemble data assimilation. Altogether, we present five contributions in this thesis, where the first two improve state-of-the-art checkpointing techniques, and the last three address the resilience of ensemble computations. The contributions represent stand-alone fault-tolerance techniques, however, they can also be used to improve the properties of each other. For instance, we utilize elastic recovery (2nd contribution) for mitigating resiliency in an online ensemble data assimilation framework (3rd contribution), and we built our validation framework (5th contribution) on top of our particle filter implementation (4th contribution). We further demonstrate that our contributions improve resilience and performance with experiments on various architectures such as Intel, IBM, and ARM processors.Amb l’increment de les capacitats de còmput dels supercomputadors, es poden simular models de sistemes físics encara més detallats, i es poden resoldre problemes de més grandària en qualsevol tipus de sistema numèric. Durant els últims vint anys, el rendiment dels clústers més ràpids ha passat del domini dels teraFLOPS (ASCI RED: 2.3 teraFLOPS) al domini dels pre-exaFLOPS (Fugaku: 442 petaFLOPS), i aviat tindrem el primer supercomputador amb un rendiment màxim que sobrepassa els exaFLOPS (El Capitan: 1.5 exaFLOPS). Les tècniques d’ensemble experimenten un renaixement amb la disponibilitat d’aquestes escales tan extremes. Especialment les tècniques més noves, com els filtres de partícules, se¿n beneficiaran. Els mètodes d’ensemble actuals en climatologia, com els filtres d’ensemble de Kalman, exhibeixen una dependència lineal entre la mida del problema i la mida de l’ensemble, mentre que els filtres de partícules mostren una dependència exponencial. No obstant, juntament amb les oportunitats de poder computar massivament, apareixen desafiaments com l’alt consum energètic i la necessitat de tolerància a errors. El temps de mitjana entre errors es redueix amb el nombre de components del sistema, i s’espera que els errors s’esdevinguin cada poques hores a exaescala. En aquesta tesis, explorem i desenvolupem tècniques per protegir grans càlculs d’ensemble d’errors. Presentem noves tècniques en punts de control diferencials, recuperació elàstica, punts de control totalment asincrònics i compressió de punts de control. A més, dissenyem i implementem un filtre de partícules tolerant a errors amb captació i emmagatzematge en caché de partícules de manera preventiva. I finalment, dissenyem i implementem un marc per la validació automàtica i l’aplicació de compressió amb pèrdua en l’assimilació de dades d’ensemble. En total, en aquesta tesis presentem cinc contribucions, les dues primeres de les quals milloren les tècniques de punts de control més avançades, mentre que les tres restants aborden la resiliència dels càlculs d’ensemble. Les contribucions representen tècniques independents de tolerància a errors; no obstant, també es poden utilitzar per a millorar les propietats de cadascuna. Per exemple, utilitzem la recuperació elàstica (segona contribució) per a mitigar la resiliència en un marc d’assimilació de dades d’ensemble en línia (tercera contribució), i construïm el nostre marc de validació (cinquena contribució) sobre la nostra implementació del filtre de partícules (quarta contribució). A més, demostrem que les nostres contribucions milloren la resiliència i el rendiment amb experiments en diverses arquitectures, com processadors Intel, IBM i ARM.Postprint (published version
Multilevel Algebraic Approach for Performance Analysis of Parallel Algorithms
In order to solve a problem in parallel we need to undertake the fundamental step of splitting the computational tasks into parts, i.e. decomposing the problem solving. A whatever decomposition does not necessarily lead to a parallel algorithm with the highest performance. This topic is even more important when complex parallel algorithms must be developed for hybrid or heterogeneous architectures. We present an innovative approach which starts from a problem decomposition into parts (sub-problems). These parts will be regarded as elements of an algebraic structure and will be related to each other according to a suitably defined dependency relationship. The main outcome of such framework is to define a set of block matrices (dependency, decomposition, memory accesses and execution) which simply highlight fundamental characteristics of the corresponding algorithm, such as inherent parallelism and sources of overheads. We provide a mathematical formulation of this approach, and we perform a feasibility analysis for the performance of a parallel algorithm in terms of its time complexity and scalability. We compare our results with standard expressions of speed up, efficiency, overhead, and so on. Finally, we show how the multilevel structure of this framework eases the choice of the abstraction level (both for the problem decomposition and for the algorithm description) in order to determine the granularity of the tasks within the performance analysis. This feature is helpful to better understand the mapping of parallel algorithms on novel hybrid and heterogeneous architectures
CIRA annual report FY 2016/2017
Reporting period April 1, 2016-March 31, 2017
Recommended from our members
The WWRP Polar Prediction Project (PPP)
Mission statement: “Promote cooperative international research enabling development of improved weather and environmental prediction services for the polar regions, on time scales from hours to seasonal”. Increased economic, transportation and research activities in polar regions are leading to more demands for sustained and improved availability of predictive weather and climate information to support decision-making. However, partly as a result of a strong emphasis of previous international efforts on lower and middle latitudes, many gaps in weather, sub-seasonal and seasonal forecasting in polar regions hamper reliable decision making in the Arctic, Antarctic and possibly the middle latitudes as well.
In order to advance polar prediction capabilities, the WWRP Polar Prediction Project (PPP) has been established as one of three THORPEX (THe Observing System Research and Predictability EXperiment) legacy activities. The aim of PPP, a ten year endeavour (2013-2022), is to promote cooperative international research enabling development of improved weather and environmental prediction services for the polar regions, on hourly to seasonal time scales. In order to achieve its goals, PPP will enhance international and interdisciplinary collaboration through the development of strong linkages with related initiatives; strengthen linkages between academia, research institutions and operational forecasting centres; promote interactions and communication between research and stakeholders; and foster education and outreach.
Flagship research activities of PPP include sea ice prediction, polar-lower latitude linkages and the Year of Polar Prediction (YOPP) - an intensive observational, coupled modelling, service-oriented research and educational effort in the period mid-2017 to mid-2019
Teaching and Learning of Fluid Mechanics, Volume II
This book is devoted to the teaching and learning of fluid mechanics. Fluid mechanics occupies a privileged position in the sciences; it is taught in various science departments including physics, mathematics, mechanical, chemical and civil engineering and environmental sciences, each highlighting a different aspect or interpretation of the foundation and applications of fluids. While scholarship in fluid mechanics is vast, expanding into the areas of experimental, theoretical and computational fluid mechanics, there is little discussion among scientists about the different possible ways of teaching this subject. We think there is much to be learned, for teachers and students alike, from an interdisciplinary dialogue about fluids. This volume therefore highlights articles which have bearing on the pedagogical aspects of fluid mechanics at the undergraduate and graduate level
- …