Search CORE

356 research outputs found

CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code

Author: Celorrio de Pablo Ricardo
Osés Villanueva Javier
Pardo Zubiaur David
Publication venue: 'Universidad de Zaragoza'
Publication date: 01/01/2013
Field of study

El método de elementos finitos (FEM) ha probado ser uno de los métodos mas eficientes para resolver ecuaciones diferenciales. Diseñado para aprovechar las capacidades de calculo de los ordenadores, las mejores realizadas a lo largo de los años han permitido solucionar problemas cada vez mas grandes. Una de las ultimas mejoras ha sido el desarrollo de las tarjetas gráficas (GPU). La programación científica con GPUs era extremadamente compleja hasta que en 2006 la compañía NVIDIA desarrolló CUDA. Es un lenguaje genérico de programación que no requiere de conocimientos de la tradicional programación con GPUs. Estos dispositivos son capaces de realizar grandes cantidades de operaciones simultáneamente. Esta capacidad los hace muy atractivos para el calculo en FEM. Una de las partes del FEM que mas recursos computacionales requiere, es la solución de sistemas de ecuaciones lineales. En este trabajo de fin de máster, se implementará un algoritmo para la solución de sistemas de ecuaciones lineales en CUDA. Dicho sistema provendrá de la aplicación de un método hp-FEM a la ecuación de Laplace. El objetivo es comparar la ejecución del solucionador implementado CUDA frente a una implementación en C y comprobar si CUDA presenta ventajas sobre la programación tradicional

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Universidad de Zaragoza

Towards enhancing coding productivity for GPU programming using static graphs

Author: Peña Antonio
Toledo Leonel
Valero Lara Pedro
Vetter Jeffrey S.
Publication venue: 'MDPI AG'
Publication date: 01/01/2022
Field of study

The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.his research was funded by EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, accessed on 13 April 2022).Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

Accelerated spatially resolved electrical simulation of photovoltaic devices using photovoltaic-oriented nodal analysis

Author: Archana Sinha (7207964)
Martin Bliss (1250019)
Rajesh Gupta (65584)
Ralph Gottschalg (1247661)
Tom Betts (1258395)
Xiaofeng Wu (1254888)
Publication venue
Publication date: 01/01/2015
Field of study

This paper presents photovoltaic-oriented nodal analysis (PVONA), a general and flexible tool for efficient spatially resolved simulations for photovoltaic (PV) cells and modules. This approach overcomes the major problem of the conventional Simulation Program with Integrated Circuit Emphasis-based approaches for solving circuit network models, which is the limited number of nodes that can be simulated due to memory and computing time requirements. PVONA integrates a specifically designed sparse data structure and a graphics processing unit-based parallel conjugate gradient algorithm into a PV-oriented iterative Newton--Raphson solver. This first avoids the complicated and time-consuming netlist parsing, second saves memory space, and third accelerates the simulation procedure. In the tests, PVONA generated the local current and voltage maps of a model with 316 x 316 nodes with a thin-film PV cell in 15 s, i.e., using only 4.6% of the time required by the latest LTSpice package. The 2-D characterization is used as a case study and the potential application of PVONA toward quantitative analysis of electroluminescence are discussed

Loughborough University Institutional Repository

Modeling and simulation of the electric activity of the heart using graphic processing units

Author: Mena Tobar Andrés
Rodríguez Matas José Félix
Publication venue: Universidad de Zaragoza, Prensas de la Universidad
Publication date: 01/01/2017
Field of study

Mathematical modelling and simulation of the electric activity of the heart (cardiac electrophysiology) offers and ideal framework to combine clinical and experimental data in order to help understanding the underlying mechanisms behind the observed respond under physiological and pathological conditions. In this regard, solving the electric activity of the heart possess a big challenge, not only because of the structural complexities inherent to the heart tissue, but also because of the complex electric behaviour of the cardiac cells. The multi- scale nature of the electrophysiology problem makes difficult its numerical solution, requiring temporal and spatial resolutions of 0.1 ms and 0.2 mm respectively for accurate simulations, leading to models with millions degrees of freedom that need to be solved for thousand time steps. Solution of this problem requires the use of algorithms with higher level of parallelism in multi-core platforms. In this regard the newer programmable graphic processing units (GPU) has become a valid alternative due to their tremendous computational horsepower. This thesis develops around the implementation of an electrophysiology simulation software entirely developed in Compute Unified Device Architecture (CUDA) for GPU computing. The software implements fully explicit and semi-implicit solvers for the monodomain model, using operator splitting and the finite element method for space discretization. Performance is compared with classical multi-core MPI based solvers operating on dedicated high-performance computer clusters. Results obtained with the GPU based solver show enormous potential for this technology with accelerations over 50× for three-dimensional problems when using an implicit scheme for the parabolic equation, whereas accelerations reach values up to 100× for the explicit implementation. The implemented solver has been applied to study pro-arrhythmic mechanisms during acute ischemia. In particular, we investigate on how hyperkalemia affects the vulnerability window to reentry and the reentry patterns in the heterogeneous substrate caused by acute regional ischemia using an anatomically and biophysically detailed human biventricular model. A three dimensional geometrically and anatomically accurate regionally ischemic human heart model was created. The ischemic region was located in the inferolateral and posterior side of the left ventricle mimicking the occlusion of the circumflex artery, and the presence of a washed-out zone not affected by ischemia at the endocardium has been incorporated. Realistic heterogeneity and fi er anisotropy has also been considered in the model. A highly electrophysiological detailed action potential model for human has been adapted to make it suitable for modeling ischemic conditions (hyperkalemia, hipoxia, and acidic conditions) by introducing a formulation of the ATP-sensitive K+ current. The model predicts the generation of sustained re-entrant activity in the form single and double circus around a blocked area within the ischemic zone for K+ concentrations bellow 9mM, with the reentrant activity associated with ventricular tachycardia in all cases. Results suggest the washed-out zone as a potential pro-arrhythmic substrate factor helping on establishing sustained ventricular tachycardia.Colli-Franzone P, Pavarino L. A parallel solver for reaction-diffusion systems in computational electrocardiology, Math. Models Methods Appl. Sci. 14 (06):883-911, 2004.Colli-Franzone P, Deu hard P, Erdmann B, Lang J, Pavarino L F. Adaptivity in space and time for reaction-diffusion systems in electrocardiology, SIAM J. Sci. Comput. 28 (3):942-962, 2006.Ferrero J M(Jr), Saiz J, Ferrero J M, Thakor N V. Simulation of action potentials from metabolically impaired cardiac myocytes: Role of atp-sensitive K+ current. Circ Res, 79(2):208-221, 1996.Ferrero J M (Jr), Trenor B. Rodriguez B, Saiz J. Electrical acticvity and reentry during acute regional myocardial ischemia: Insights from simulations.Int J Bif Chaos, 13:3703-3715, 2003.Heidenreich E, Ferrero J M, Doblare M, Rodriguez J F. Adaptive macro finite elements for the numerical solution of monodomain equations in cardiac electrophysiology, Ann. Biomed. Eng. 38 (7):2331-2345, 2010.Janse M J, Kleber A G. Electrophysiological changes and ventricular arrhythmias in the early phase of regional myocardial ischemia. Circ. Res. 49:1069-1081, 1981.ten Tusscher K HWJ, Panlov A V. Alternans and spiral breakup in a human ventricular tissue model. Am. J.Physiol. Heart Circ. Physiol. 291(3):1088-1100, 2006.<br /

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Universidad de Zaragoza

Development and optimization of a combinational multigrid algorithm for large scale circuit simulation on massively parallel architectures

Author: Γαρυφάλλου Δημήτριος Κ.
Publication venue
Publication date: 01/01/2015
Field of study

University of Thessaly Institutional Repository

Simulation of large-scale circuits with Steiner node preconditioners on parallel architectures

Author: Γαρυφάλλου Δημήτριος Κ.
Publication venue
Publication date: 01/01/2015
Field of study

University of Thessaly Institutional Repository

Volume 19 (2) 2013

Author: Andrei Mihail-Iulian
Publication venue: 'Wydawnictwo SIGMA-NOT, sp. z.o.o.'
Publication date
Field of study

New generation of General Purpose Graphic Processing Unit (GPGPU) cards with their large computation power allow to approach difficult tasks from Radio Frequency Integrated Circuits (RFICs) modeling area. Using different electromagnetic modeling methods, the Finite Element Method (FEM) and the Finite Integration Technique (FIT), to model Radio Frequency Integrated Circuit (RFIC) devices, large linear equations systems have to be solved. This paper presents the benefits of using Graphic Processing Unit (GPU) computations for solving such systems which are characterized by sparse complex matrices. CUSP is a GPU generic parallel algorithms library for sparse linear algebra and graph computations based on Compute Unified Device Architecture (CUDA). The code is calling iterative methods available in CUSP in order to solve those complex linear equation systems. The tests were performed on various Central Processing Units (CPU) and GPU hardware configurations. The results of these tests show that using GPU computations for solving the linear equations systems, the electromagnetic modeling process of RFIC devices can be accelerated and at the same time a high level of computation accuracy is maintained. Tests were carried out on matrices obtained for an integrated inductor designed for RFICs, and for Micro Stripe (MS) designed for Photonics Integrated Circuit (PIC).Pozna

PSNC Institutional Repository

Research on Acceleration Technology for FDTD Based on Vivado HLS

Author: 陈瑞
Publication venue
Publication date: 08/12/2017
Field of study

时域有限差分法(Finitedifferencetimedomainmethod，FDTD)是一种电磁学计算的基本方法，通过空间内电场和磁场的交替计算，得到整个研究空间的电磁分布情况。对于很多电磁学问题，不论从概念上还是可实现性上来讲，时域有限差分方法都是最简单的计算方法。时域有限差分法可以解决复杂的电磁计算问题，但同时要消耗大量的计算机资源，并且花费较长的计算时间。为了更快速高效地得到计算结果，可以利用硬件技术进行加速，这也是近年来FDTD方法研究领域比较受关注的部分。Xilinx公司新推出的高级综合工具VivadoHLS(HighLevelSynthesis)，直接通过C/C++语言开发硬...In the field of computational electromagnetics, finite difference time domain method (FDTD) has been widely used. Using FDTD, the electromagnetic distribution of the whole field is obtained by alternating calculation of the electric and magnetic field. For many electromagnetical computational problems, FDTD is the simplest method, in consideration of conception and achievability. Although FDTD can...学位：工程硕士院系专业：物理科学与技术学院_工程硕士(电子与通信工程)学号：3432014115280

Xiamen University Institutional Repository

Heterogeneous multicore systems for signal processing

Author: Ries Florian <1980>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 18/04/2011
Field of study

This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included

AMS Tesi di Dottorato