77 research outputs found
LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI Applications
A new pre-exascale computer cluster has been designed to foster scientific
progress and competitive innovation across European research systems, it is
called LEONARDO. This paper describes the general architecture of the system
and focuses on the technologies adopted for its GPU-accelerated partition. High
density processing elements, fast data movement capabilities and mature
software stack collections allow the machine to run intensive workloads in a
flexible and scalable way. Scientific applications from traditional High
Performance Computing (HPC) as well as emerging Artificial Intelligence (AI)
domains can benefit from this large apparatus in terms of time and energy to
solution.Comment: 16 pages, 5 figures, 7 tables, to be published in Journal of Large
Scale Research Facilitie
LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI applications
A new pre-exascale computer cluster has been designed to foster scientific progress and competitive innovation across European research systems, it is called LEONARDO. This paper describes thegeneral architecture of the system and focuses on the technologies adopted for its GPU-accelerated partition. High density processing elements, fast data movement capabilities and mature software stack collections allow the machine to run intensive workloads in a flexible and scalable way. Scientific applications from traditional High Performance Computing (HPC) as well as emerging Artificial Intelligence (AI) domains can benefit from this large apparatus in terms of time and energy to solution
Three-Dimensional Lattice Boltzmann Model Results for Complex Fluids Ordering
The kinetics of domain growth of fluid mixtures quenched from a disordered to a lamellar phase has been studied in three dimensions. We use a numerical approach based on the lattice Boltzmann method (LBM). A novel implementation for LBM which "fuses" the collision and streaming steps is used in order to reduce memory and bandwidth requirements. We find that extended defects between stacks of lamellae with different orientation dominate the late time dynamics
Improving Computational Efficiency for Energy Management Systems in Plug-in Hybrid Electric Vehicles Using Dynamic Programming Based Controllers
Reducing computational time has become a critical issue in recent years, particularly in the transportation field, where the complexity of scenarios demands lightweight controllers to run large simulations and gather results to study different behaviors. This study proposes two novel formulations of the Optimal Control Problem (OCP) for the Energy Management System of a Plug-in Hybrid Electric Vehicle (PHEV) and compares their performance
with a benchmark found in the literature. Dynamic Programming was chosen as the optimization algorithm to solve the OCP in a Matlab environment, using the DynaProg toolbox. The objective is to address the optimality of the fuel economy solution and computational time. In order to improve the computational efficiency of the algorithm, an existing formulation from the literature was modified, which originally utilized three control inputs. The approach involves leveraging the unique equations that describe the Input-Split Hybrid powertrain, resulting in a reduction of control inputs firstly to two and finally to one in the proposed solutions. The aforementioned formulations are referred to as 2-Controls and a 1-Control. Virtual tests were conducted to evaluate the performance of the two formulations. The simulations were carried out in various scenarios, including urban and highway
driving, to ensure the versatility of the controllers. The results demonstrate that both proposed formulations achieve a reduction in computational time compared to the benchmark. The 2-Controls formulation achieved a reduction in computational time of approximately 40 times, while the 1-Control formulation achieved a remarkable reduction of approximately 850 times. These reductions in computational time were achieved while obtaining a maximum difference in fuel economy of approximately 1.5% for the 1-Control formulation with respect to the benchmark solution. Overall, this study provides valuable insights into the development of efficient and optimal controllers for PHEVs, which can be applied to various transportation scenarios. The proposed formulations reduce computational time without sacrificing the optimality of the fuel economy solution, making them a promising approach for future research in this area
Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats
Fluid dynamics simulations with the lattice Boltzmann method (LBM) are very
memory-intensive. Alongside reduction in memory footprint, significant
performance benefits can be achieved by using FP32 (single) precision compared
to FP64 (double) precision, especially on GPUs. Here, we evaluate the
possibility to use even FP16 and Posit16 (half) precision for storing fluid
populations, while still carrying arithmetic operations in FP32. For this, we
first show that the commonly occurring number range in the LBM is a lot smaller
than the FP16 number range. Based on this observation, we develop novel 16-bit
formats - based on a modified IEEE-754 and on a modified Posit standard - that
are specifically tailored to the needs of the LBM. We then carry out an
in-depth characterization of LBM accuracy for six different test systems with
increasing complexity: Poiseuille flow, Taylor-Green vortices, Karman vortex
streets, lid-driven cavity, a microcapsule in shear flow (utilizing the
immersed-boundary method) and finally the impact of a raindrop (based on a
Volume-of-Fluid approach). We find that the difference in accuracy between FP64
and FP32 is negligible in almost all cases, and that for a large number of
cases even 16-bit is sufficient. Finally, we provide a detailed performance
analysis of all precision levels on a large number of hardware
microarchitectures and show that significant speedup is achieved with mixed
FP32/16-bit.Comment: 30 pages, 20 figures, 4 tables, 2 code listing
INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments.
Abstract
Motivation: Cellular mRNA levels originate from the combined action of multiple regulatory processes, which can be recapitulated by the rates of pre-mRNA synthesis, pre-mRNA processing and mRNA degradation. Recent experimental and computational advances set the basis to study these intertwined levels of regulation. Nevertheless, software for the comprehensive quantification of RNA dynamics is still lacking.
Results: INSPEcT is an R package for the integrative analysis of RNA- and 4sU-seq data to study the dynamics of transcriptional regulation. INSPEcT provides gene-level quantification of these rates, and a modeling framework to identify which of these regulatory processes are most likely to explain the observed mRNA and pre-mRNA concentrations. Software performance is tested on a synthetic dataset, instrumental to guide the choice of the modeling parameters and the experimental design.
Availability and implementation: INSPEcT is submitted to Bioconductor and is currently available as Supplementary Additional File S1.
Contact: [email protected]
Supplementary Information: Supplementary data are available at Bioinformatics online
Thread-safe lattice Boltzmann for high-performance computing on GPUs
We present thread-safe, highly-optimized lattice Boltzmann implementations,
specifically aimed at exploiting the high memory bandwidth of GPU-based
architectures. At variance with standard approaches to LB coding, the proposed
strategy, based on the reconstruction of the post-collision distribution via
Hermite projection, enforces data locality and avoids the onset of memory
dependencies, which may arise during the propagation step, with no need to
resort to more complex streaming strategies. The thread-safe lattice Boltzmann
achieves peak performances, both in two and three dimensions and it allows to
sensibly reduce the allocated memory ( tens of GigaBytes for order billions
lattice nodes simulations) by retaining the algorithmic simplicity of standard
LB computing. Our findings open attractive prospects for high-performance
simulations of complex flows on GPU-based architectures
- …