77 research outputs found

    LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI Applications

    Full text link
    A new pre-exascale computer cluster has been designed to foster scientific progress and competitive innovation across European research systems, it is called LEONARDO. This paper describes the general architecture of the system and focuses on the technologies adopted for its GPU-accelerated partition. High density processing elements, fast data movement capabilities and mature software stack collections allow the machine to run intensive workloads in a flexible and scalable way. Scientific applications from traditional High Performance Computing (HPC) as well as emerging Artificial Intelligence (AI) domains can benefit from this large apparatus in terms of time and energy to solution.Comment: 16 pages, 5 figures, 7 tables, to be published in Journal of Large Scale Research Facilitie

    LEONARDO: A Pan-European Pre-Exascale Supercomputer for HPC and AI applications

    Get PDF
    A new pre-exascale computer cluster has been designed to foster scientific progress and competitive innovation across European research systems, it is called LEONARDO. This paper describes thegeneral architecture of the system and focuses on the technologies adopted for its GPU-accelerated partition. High density processing elements, fast data movement capabilities and mature software stack collections allow the machine to run intensive workloads in a flexible and scalable way. Scientific applications from traditional High Performance Computing (HPC) as well as emerging Artificial Intelligence (AI) domains can benefit from this large apparatus in terms of time and energy to solution

    Three-Dimensional Lattice Boltzmann Model Results for Complex Fluids Ordering

    Get PDF
    The kinetics of domain growth of fluid mixtures quenched from a disordered to a lamellar phase has been studied in three dimensions. We use a numerical approach based on the lattice Boltzmann method (LBM). A novel implementation for LBM which "fuses" the collision and streaming steps is used in order to reduce memory and bandwidth requirements. We find that extended defects between stacks of lamellae with different orientation dominate the late time dynamics

    Improving Computational Efficiency for Energy Management Systems in Plug-in Hybrid Electric Vehicles Using Dynamic Programming Based Controllers

    Get PDF
    Reducing computational time has become a critical issue in recent years, particularly in the transportation field, where the complexity of scenarios demands lightweight controllers to run large simulations and gather results to study different behaviors. This study proposes two novel formulations of the Optimal Control Problem (OCP) for the Energy Management System of a Plug-in Hybrid Electric Vehicle (PHEV) and compares their performance with a benchmark found in the literature. Dynamic Programming was chosen as the optimization algorithm to solve the OCP in a Matlab environment, using the DynaProg toolbox. The objective is to address the optimality of the fuel economy solution and computational time. In order to improve the computational efficiency of the algorithm, an existing formulation from the literature was modified, which originally utilized three control inputs. The approach involves leveraging the unique equations that describe the Input-Split Hybrid powertrain, resulting in a reduction of control inputs firstly to two and finally to one in the proposed solutions. The aforementioned formulations are referred to as 2-Controls and a 1-Control. Virtual tests were conducted to evaluate the performance of the two formulations. The simulations were carried out in various scenarios, including urban and highway driving, to ensure the versatility of the controllers. The results demonstrate that both proposed formulations achieve a reduction in computational time compared to the benchmark. The 2-Controls formulation achieved a reduction in computational time of approximately 40 times, while the 1-Control formulation achieved a remarkable reduction of approximately 850 times. These reductions in computational time were achieved while obtaining a maximum difference in fuel economy of approximately 1.5% for the 1-Control formulation with respect to the benchmark solution. Overall, this study provides valuable insights into the development of efficient and optimal controllers for PHEVs, which can be applied to various transportation scenarios. The proposed formulations reduce computational time without sacrificing the optimality of the fuel economy solution, making them a promising approach for future research in this area

    Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats

    Get PDF
    Fluid dynamics simulations with the lattice Boltzmann method (LBM) are very memory-intensive. Alongside reduction in memory footprint, significant performance benefits can be achieved by using FP32 (single) precision compared to FP64 (double) precision, especially on GPUs. Here, we evaluate the possibility to use even FP16 and Posit16 (half) precision for storing fluid populations, while still carrying arithmetic operations in FP32. For this, we first show that the commonly occurring number range in the LBM is a lot smaller than the FP16 number range. Based on this observation, we develop novel 16-bit formats - based on a modified IEEE-754 and on a modified Posit standard - that are specifically tailored to the needs of the LBM. We then carry out an in-depth characterization of LBM accuracy for six different test systems with increasing complexity: Poiseuille flow, Taylor-Green vortices, Karman vortex streets, lid-driven cavity, a microcapsule in shear flow (utilizing the immersed-boundary method) and finally the impact of a raindrop (based on a Volume-of-Fluid approach). We find that the difference in accuracy between FP64 and FP32 is negligible in almost all cases, and that for a large number of cases even 16-bit is sufficient. Finally, we provide a detailed performance analysis of all precision levels on a large number of hardware microarchitectures and show that significant speedup is achieved with mixed FP32/16-bit.Comment: 30 pages, 20 figures, 4 tables, 2 code listing

    INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments.

    Get PDF
    Abstract Motivation: Cellular mRNA levels originate from the combined action of multiple regulatory processes, which can be recapitulated by the rates of pre-mRNA synthesis, pre-mRNA processing and mRNA degradation. Recent experimental and computational advances set the basis to study these intertwined levels of regulation. Nevertheless, software for the comprehensive quantification of RNA dynamics is still lacking. Results: INSPEcT is an R package for the integrative analysis of RNA- and 4sU-seq data to study the dynamics of transcriptional regulation. INSPEcT provides gene-level quantification of these rates, and a modeling framework to identify which of these regulatory processes are most likely to explain the observed mRNA and pre-mRNA concentrations. Software performance is tested on a synthetic dataset, instrumental to guide the choice of the modeling parameters and the experimental design. Availability and implementation: INSPEcT is submitted to Bioconductor and is currently available as Supplementary Additional File S1. Contact: [email protected] Supplementary Information: Supplementary data are available at Bioinformatics online

    Thread-safe lattice Boltzmann for high-performance computing on GPUs

    Full text link
    We present thread-safe, highly-optimized lattice Boltzmann implementations, specifically aimed at exploiting the high memory bandwidth of GPU-based architectures. At variance with standard approaches to LB coding, the proposed strategy, based on the reconstruction of the post-collision distribution via Hermite projection, enforces data locality and avoids the onset of memory dependencies, which may arise during the propagation step, with no need to resort to more complex streaming strategies. The thread-safe lattice Boltzmann achieves peak performances, both in two and three dimensions and it allows to sensibly reduce the allocated memory ( tens of GigaBytes for order billions lattice nodes simulations) by retaining the algorithmic simplicity of standard LB computing. Our findings open attractive prospects for high-performance simulations of complex flows on GPU-based architectures
    corecore