Search CORE

11,399 research outputs found

The ESCAPE project : Energy-efficient Scalable Algorithms for Weather Prediction at Exascale

Author: Baldauf Michael
Bauer Peter
Berg Per
Bosak Bartosz
Bénard Pierre
Błażewicz Marek
Ciesielski Sebastian
Ciznicki Milosz
Clement Valentin
Colavolpe Charles
Deconinck Willem
Degrauwe Daan
Diamantakis Michail
Douriez Louis
Fuhrer Oliver
Gillard Mike
Glinton Michael
Gray Alan
Guibert David
Hamrud Mats
Kulczewski Michał
Kurowski Krzysztof
Kühnlein Christian
Lange Michael
Lock Sarah-Jane
Lysaght Michael
Macfaden Alexander J
Marguinaud Philippe
Mazauric Cyril
McKinstry Alastair
Mengaldo Gianmarco
Messmer Peter
Mozdzynski George
Müller Andreas
New Nick
Nielsen Kristian P
O'Brien Enda
Osuna Carlos
Piotrowski Zbigniew P
Piątek Wojciech
Poulsen Jacob W
Procyk Marcin
Raffin Erwan
Robinson Oisín
Saarinen Sami
Sass Bent H
Shukla Parijat
Smet Geert
Smolarkiewicz Piotr K
Spychala Pawel
Szmelter Joanna
Termonia Piet
Thiemert Daniel
Van Bever Joris
Vigouroux Xavier
Voitus Fabrice
Wedi Nils
Wyszogrodzki Andrzej
Zheng Yongjun
Publication venue: 'Copernicus GmbH'
Publication date: 01/01/2019
Field of study

In the simulation of complex multi-scale flows arising in weather and climate modelling, one of the biggest challenges is to satisfy strict service requirements in terms of time to solution and to satisfy budgetary constraints in terms of energy to solution, without compromising the accuracy and stability of the application. These simulations require algorithms that minimise the energy footprint along with the time required to produce a solution, maintain the physically required level of accuracy, are numerically stable, and are resilient in case of hardware failure. The European Centre for Medium-Range Weather Forecasts (ECMWF) led the ESCAPE (Energy-efficient Scalable Algorithms for Weather Prediction at Exascale) project, funded by Horizon 2020 (H2020) under the FET-HPC (Future and Emerging Technologies in High Performance Computing) initiative. The goal of ESCAPE was to develop a sustainable strategy to evolve weather and climate prediction models to next-generation computing technologies. The project partners incorporate the expertise of leading European regional forecasting consortia, university research, experienced high-performance computing centres, and hardware vendors. This paper presents an overview of the ESCAPE strategy: (i) identify domain-specific key algorithmic motifs in weather prediction and climate models (which we term Weather & Climate Dwarfs), (ii) categorise them in terms of computational and communication patterns while (iii) adapting them to different hardware architectures with alternative programming models, (iv) analyse the challenges in optimising, and (v) find alternative algorithms for the same scheme. The participating weather prediction models are the following: IFS (Integrated Forecasting System); ALARO, a combination of AROME (Application de la Recherche a l'Operationnel a Meso-Echelle) and ALADIN (Aire Limitee Adaptation Dynamique Developpement International); and COSMO-EULAG, a combination of COSMO (Consortium for Small-scale Modeling) and EULAG (Eulerian and semi-Lagrangian fluid solver). For many of the weather and climate dwarfs ESCAPE provides prototype implementations on different hardware architectures (mainly Intel Skylake CPUs, NVIDIA GPUs, Intel Xeon Phi, Optalysys optical processor) with different programming models. The spectral transform dwarf represents a detailed example of the co-design cycle of an ESCAPE dwarf. The dwarf concept has proven to be extremely useful for the rapid prototyping of alternative algorithms and their interaction with hardware; e.g. the use of a domain-specific language (DSL). Manual adaptations have led to substantial accelerations of key algorithms in numerical weather prediction (NWP) but are not a general recipe for the performance portability of complex NWP models. Existing DSLs are found to require further evolution but are promising tools for achieving the latter. Measurements of energy and time to solution suggest that a future focus needs to be on exploiting the simultaneous use of all available resources in hybrid CPU-GPU arrangements

Loughborough University Institutional Repository

Ghent University Academic Bibliography

From FPGA to ASIC: A RISC-V processor experience

Author: Rojas Morales Carlos
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

X-SRAM: Enabling In-Memory Boolean Computations in CMOS Static Random Access Memories

Author: Alessio Giacomini (5601440)
Armin Tarrah (5601434)
Laura Treu (1506499)
Sabrina Giaretta (5601425)
Stefano Campanaro (31123)
Veronica Vendramin (5601428)
Vinícius da Silva Duarte (5601431)
Viviana Corich (5601437)
Publication venue
Publication date: 01/01/2018
Field of study

Silicon-based Static Random Access Memories (SRAM) and digital Boolean logic have been the workhorse of the state-of-art computing platforms. Despite tremendous strides in scaling the ubiquitous metal-oxide-semiconductor transistor, the underlying \textit{von-Neumann} computing architecture has remained unchanged. The limited throughput and energy-efficiency of the state-of-art computing systems, to a large extent, results from the well-known \textit{von-Neumann bottleneck}. The energy and throughput inefficiency of the von-Neumann machines have been accentuated in recent times due to the present emphasis on data-intensive applications like artificial intelligence, machine learning \textit{etc}. A possible approach towards mitigating the overhead associated with the von-Neumann bottleneck is to enable \textit{in-memory} Boolean computations. In this manuscript, we present an augmented version of the conventional SRAM bit-cells, called \textit{the X-SRAM}, with the ability to perform in-memory, vector Boolean computations, in addition to the usual memory storage operations. We propose at least six different schemes for enabling in-memory vector computations including NAND, NOR, IMP (implication), XOR logic gates with respect to different bit-cell topologies

-

the 8T cell and the 8

^+

T Differential cell. In addition, we also present a novel \textit{`read-compute-store'} scheme, wherein the computed Boolean function can be directly stored in the memory without the need of latching the data and carrying out a subsequent write operation. The feasibility of the proposed schemes has been verified using predictive transistor models and Monte-Carlo variation analysis.Comment: This article has been accepted in a future issue of IEEE Transactions on Circuits and Systems-I: Regular Paper

arXiv.org e-Print Archive

Locus Repositório Institucional da UFV

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Frontiers - Publisher Connector

Online Research Database In Technology

Archivio istituzionale della ricerca - Università di Padova

FigShare

Dynamic Energy Management for Chip Multi-processors under Performance Constraints

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/10/2017
Field of study

We introduce a novel algorithm for dynamic energy management (DEM) under performance constraints in chip multi-processors (CMPs). Using the novel concept of delayed instructions count, performance loss estimations are calculated at the end of each control period for each core. In addition, a Kalman filtering based approach is employed to predict workload in the next control period for which voltage-frequency pairs must be selected. This selection is done with a novel dynamic voltage and frequency scaling (DVFS) algorithm whose objective is to reduce energy consumption but without degrading performance beyond the user set threshold. Using our customized Sniper based CMP system simulation framework, we demonstrate the effectiveness of the proposed algorithm for a variety of benchmarks for 16 core and 64 core network-on-chip based CMP architectures. Simulation results show consistent energy savings across the board. We present our work as an investigation of the tradeoff between the achievable energy reduction via DVFS when predictions are done using the effective Kalman filter for different performance penalty thresholds

epublications@Marquette