1,423 research outputs found
The HPCG benchmark: analysis, shared memory preliminary improvements and evaluation on an Arm-based platform
The High-Performance Conjugate Gradient (HPCG) benchmark complements the LINPACK benchmark in the performance evaluation coverage of large High-Performance Computing (HPC) systems. Due to its lower arithmetic intensity and higher memory pressure, HPCG is recognized as a more representative benchmark for data-center and irregular memory access pattern workloads, therefore its popularity and acceptance is raising within the HPC community. As only a small fraction of the reference version of the HPCG benchmark is parallelized with shared memory techniques (OpenMP), we introduce in this report two OpenMP parallelization methods. Due to the increasing importance of Arm architecture in the HPC scenario, we evaluate our HPCG code at scale on a state-of-the-art HPC system based on Cavium ThunderX2 SoC. We consider our work as a contribution to the Arm ecosystem: along with this technical report, we plan in fact to release our code for boosting the tuning of the HPCG benchmark within the Arm community.Postprint (author's final draft
Strong scaling of general-purpose molecular dynamics simulations on GPUs
We describe a highly optimized implementation of MPI domain decomposition in
a GPU-enabled, general-purpose molecular dynamics code, HOOMD-blue (Anderson
and Glotzer, arXiv:1308.5587). Our approach is inspired by a traditional
CPU-based code, LAMMPS (Plimpton, J. Comp. Phys. 117, 1995), but is implemented
within a code that was designed for execution on GPUs from the start (Anderson
et al., J. Comp. Phys. 227, 2008). The software supports short-ranged pair
force and bond force fields and achieves optimal GPU performance using an
autotuning algorithm. We are able to demonstrate equivalent or superior scaling
on up to 3,375 GPUs in Lennard-Jones and dissipative particle dynamics (DPD)
simulations of up to 108 million particles. GPUDirect RDMA capabilities in
recent GPU generations provide better performance in full double precision
calculations. For a representative polymer physics application, HOOMD-blue 1.0
provides an effective GPU vs. CPU node speed-up of 12.5x.Comment: 30 pages, 14 figure
FluTAS: A GPU-accelerated finite difference code for multiphase flows
We present the Fluid Transport Accelerated Solver, FluTAS, a scalable GPU
code for multiphase flows with thermal effects. The code solves the
incompressible Navier-Stokes equation for two-fluid systems, with a direct
FFT-based Poisson solver for the pressure equation. The interface between the
two fluids is represented with the Volume of Fluid (VoF) method, which is mass
conserving and well suited for complex flows thanks to its capacity of handling
topological changes. The energy equation is explicitly solved and coupled with
the momentum equation through the Boussinesq approximation. The code is
conceived in a modular fashion so that different numerical methods can be used
independently, the existing routines can be modified, and new ones can be
included in a straightforward and sustainable manner. FluTAS is written in
modern Fortran and parallelized using hybrid MPI/OpenMP in the CPU-only version
and accelerated with OpenACC directives in the GPU implementation. We present
different benchmarks to validate the code, and two large-scale simulations of
fundamental interest in turbulent multiphase flows: isothermal emulsions in HIT
and two-layer Rayleigh-B\'enard convection. FluTAS is distributed through a MIT
license and arises from a collaborative effort of several scientists, aiming to
become a flexible tool to study complex multiphase flows
Recommended from our members
Neural network fusion: a novel CT-MR Aortic Aneurysm image segmentation method.
Medical imaging examination on patients usually involves more than one imaging modalities, such as Computed Tomography (CT), Magnetic Resonance (MR) and Positron Emission Tomography(PET) imaging. Multimodal imaging allows examiners to benefit from the advantage of each modalities. For example, for Abdominal Aortic Aneurysm, CT imaging shows calcium deposits in the aorta clearly while MR imaging distinguishes thrombus and soft tissues better.1 Analysing and segmenting both CT and MR images to combine the results will greatly help radiologists and doctors to treat the disease. In this work, we present methods on using deep neural network models to perform such multi-modal medical image segmentation. As CT image and MR image of the abdominal area cannot be well registered due to non-affine deformations, a naive approach is to train CT and MR segmentation network separately. However, such approach is time-consuming and resource-inefficient. We propose a new approach to fuse the high-level part of the CT and MR network together, hypothesizing that neurons recognizing the high level concepts of Aortic Aneurysm can be shared across multiple modalities. Such network is able to be trained end-to-end with non-registered CT and MR image using shorter training time. Moreover network fusion allows a shared representation of Aorta in both CT and MR images to be learnt. Through experiments we discovered that for parts of Aorta showing similar aneurysm conditions, their neural presentations in neural network has shorter distances. Such distances on the feature level is helpful for registering CT and MR image
Machine learning application for development of a data-driven predictive model able to investigate quality of life scores in a rare disease.
BACKGROUND:Alkaptonuria (AKU) is an ultra-rare autosomal recessive disease caused by a mutation in the homogentisate 1,2-dioxygenase (HGD) gene. One of the main obstacles in studying AKU, and other ultra-rare diseases, is the lack of a standardized methodology to assess disease severity or response to treatment. Quality of Life scores (QoL) are a reliable way to monitor patients' clinical condition and health status. QoL scores allow to monitor the evolution of diseases and assess the suitability of treatments by taking into account patients' symptoms, general health status and care satisfaction. However, more comprehensive tools to study a complex and multi-systemic disease like AKU are needed. In this study, a Machine Learning (ML) approach was implemented with the aim to perform a prediction of QoL scores based on clinical data deposited in the ApreciseKUre, an AKU- dedicated database. METHOD:Data derived from 129 AKU patients have been firstly examined through a preliminary statistical analysis (Pearson correlation coefficient) to measure the linear correlation between 11 QoL scores. The variable importance in QoL scores prediction of 110 ApreciseKUre biomarkers has been then calculated using XGBoost, with K-nearest neighbours algorithm (k-NN) approach. Due to the limited number of data available, this model has been validated using surrogate data analysis. RESULTS:We identified a direct correlation of 6 (age, Serum Amyloid A, Chitotriosidase, Advanced Oxidation Protein Products, S-thiolated proteins and Body Mass Index) out of 110 biomarkers with the QoL health status, in particular with the KOOS (Knee injury and Osteoarthritis Outcome Score) symptoms (Relative Absolute Error (RAE) 0.25). The error distribution of surrogate-model (RAE 0.38) was unequivocally higher than the true-model one (RAE of 0.25), confirming the consistency of our dataset. Our data showed that inflammation, oxidative stress, amyloidosis and lifestyle of patients correlates with the QoL scores for physical status, while no correlation between the biomarkers and patients' mental health was present (RAE 1.1). CONCLUSIONS:This proof of principle study for rare diseases confirms the importance of database, allowing data management and analysis, which can be used to predict more effective treatments
Quantum ESPRESSO: One Further Step toward the Exascale
We review the statusof the Quantum ESPRESSO softwaresuite for electronic-structure calculations based on plane waves,pseudopotentials, and density-functional theory. We highlight therecent developments in the porting to GPUs of the main codes, usingan approach based on OpenACC and CUDA Fortran offloading.We describe, in particular, the results achieved on linear-responsecodes, which are one of the distinctive features of the QuantumESPRESSO suite. We also present extensive performance benchmarkson different GPU-accelerated architectures for the main codes of thesuite
Hardware calibrated learning to compensate heterogeneity in analog RRAM-based Spiking Neural Networks
Spiking Neural Networks (SNNs) can unleash the full power of analog Resistive Random Access Memories (RRAMs) based circuits for low power signal processing. Their inherent computational sparsity naturally results in energy efficiency benefits. The main challenge implementing robust SNNs is the intrinsic variability (heterogeneity) of both analog CMOS circuits and RRAM technology. In this work, we assessed the performance and variability of RRAM-based neuromorphic circuits that were designed and fabricated using a 130 nm technology node. Based on these results, we propose a Neuromorphic Hardware Calibrated (NHC) SNN, where the learning circuits are calibrated on the measured data. We show that by taking into account the measured heterogeneity characteristics in the off-chip learning phase, the NHC SNN self-corrects its hardware non-idealities and learns to solve benchmark tasks with high accuracy. This work demonstrates how to cope with the heterogeneity of neurons and synapses for increasing classification accuracy in temporal tasks
The polymorphism L412F in TLR3 inhibits autophagy and is a marker of severe COVID-19 in males
The polymorphism L412F in TLR3 has been associated with several infectious diseases. However, the mechanism underlying this association is still unexplored. Here, we show that the L412F polymorphism in TLR3 is a marker of severity in COVID-19. This association increases in the sub-cohort of males. Impaired macroautophagy/autophagy and reduced TNF/TNFα production was demonstrated in HEK293 cells transfected with TLR3L412F-encoding plasmid and stimulated with specific agonist poly(I:C). A statistically significant reduced survival at 28 days was shown in L412F COVID-19 patients treated with the autophagy-inhibitor hydroxychloroquine (p = 0.038). An increased frequency of autoimmune disorders such as co-morbidity was found in L412F COVID-19 males with specific class II HLA haplotypes prone to autoantigen presentation. Our analyses indicate that L412F polymorphism makes males at risk of severe COVID-19 and provides a rationale for reinterpreting clinical trials considering autophagy pathways. Abbreviations: AP: autophagosome; AUC: area under the curve; BafA1: bafilomycin A1; COVID-19: coronavirus disease-2019; HCQ: hydroxychloroquine; RAP: rapamycin; ROC: receiver operating characteristic; SARS-CoV-2: severe acute respiratory syndrome coronavirus 2; TLR: toll like receptor; TNF/TNF-α: tumor necrosis factor
- …