356 research outputs found

    CUDA implementation of the solution of a system of linear equations arising in an hp-Finite Element code

    Get PDF
    El método de elementos finitos (FEM) ha probado ser uno de los métodos mas eficientes para resolver ecuaciones diferenciales. Diseñado para aprovechar las capacidades de calculo de los ordenadores, las mejores realizadas a lo largo de los años han permitido solucionar problemas cada vez mas grandes. Una de las ultimas mejoras ha sido el desarrollo de las tarjetas gráficas (GPU). La programación científica con GPUs era extremadamente compleja hasta que en 2006 la compañía NVIDIA desarrolló CUDA. Es un lenguaje genérico de programación que no requiere de conocimientos de la tradicional programación con GPUs. Estos dispositivos son capaces de realizar grandes cantidades de operaciones simultáneamente. Esta capacidad los hace muy atractivos para el calculo en FEM. Una de las partes del FEM que mas recursos computacionales requiere, es la solución de sistemas de ecuaciones lineales. En este trabajo de fin de máster, se implementará un algoritmo para la solución de sistemas de ecuaciones lineales en CUDA. Dicho sistema provendrá de la aplicación de un método hp-FEM a la ecuación de Laplace. El objetivo es comparar la ejecución del solucionador implementado CUDA frente a una implementación en C y comprobar si CUDA presenta ventajas sobre la programación tradicional

    Towards enhancing coding productivity for GPU programming using static graphs

    Get PDF
    The main contribution of this work is to increase the coding productivity of GPU programming by using the concept of Static Graphs. GPU capabilities have been increasing significantly in terms of performance and memory capacity. However, there are still some problems in terms of scalability and limitations to the amount of work that a GPU can perform at a time. To minimize the overhead associated with the launch of GPU kernels, as well as to maximize the use of GPU capacity, we have combined the new CUDA Graph API with the CUDA programming model (including CUDA math libraries) and the OpenACC programming model. We use as test cases two different, well-known and widely used problems in HPC and AI: the Conjugate Gradient method and the Particle Swarm Optimization. In the first test case (Conjugate Gradient) we focus on the integration of Static Graphs with CUDA. In this case, we are able to significantly outperform the NVIDIA reference code, reaching an acceleration of up to 11× thanks to a better implementation, which can benefit from the new CUDA Graph capabilities. In the second test case (Particle Swarm Optimization), we complement the OpenACC functionality with the use of CUDA Graph, achieving again accelerations of up to one order of magnitude, with average speedups ranging from 2× to 4×, and performance very close to a reference and optimized CUDA code. Our main target is to achieve a higher coding productivity model for GPU programming by using Static Graphs, which provides, in a very transparent way, a better exploitation of the GPU capacity. The combination of using Static Graphs with two of the current most important GPU programming models (CUDA and OpenACC) is able to reduce considerably the execution time w.r.t. the use of CUDA and OpenACC only, achieving accelerations of up to more than one order of magnitude. Finally, we propose an interface to incorporate the concept of Static Graphs into the OpenACC Specifications.his research was funded by EPEEC project from the European Union’s Horizon 2020 Research and Innovation program under grant agreement No. 801051. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the US Department of Energy (DOE). The US government retains and the publisher, by accepting the article for publication, acknowledges that the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript, or allow others to do so, for US government purposes. DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan, accessed on 13 April 2022).Peer ReviewedPostprint (published version

    Accelerated spatially resolved electrical simulation of photovoltaic devices using photovoltaic-oriented nodal analysis

    Get PDF
    This paper presents photovoltaic-oriented nodal analysis (PVONA), a general and flexible tool for efficient spatially resolved simulations for photovoltaic (PV) cells and modules. This approach overcomes the major problem of the conventional Simulation Program with Integrated Circuit Emphasis-based approaches for solving circuit network models, which is the limited number of nodes that can be simulated due to memory and computing time requirements. PVONA integrates a specifically designed sparse data structure and a graphics processing unit-based parallel conjugate gradient algorithm into a PV-oriented iterative Newton--Raphson solver. This first avoids the complicated and time-consuming netlist parsing, second saves memory space, and third accelerates the simulation procedure. In the tests, PVONA generated the local current and voltage maps of a model with 316 x 316 nodes with a thin-film PV cell in 15 s, i.e., using only 4.6% of the time required by the latest LTSpice package. The 2-D characterization is used as a case study and the potential application of PVONA toward quantitative analysis of electroluminescence are discussed

    Modeling and simulation of the electric activity of the heart using graphic processing units

    Get PDF
    Mathematical modelling and simulation of the electric activity of the heart (cardiac electrophysiology) offers and ideal framework to combine clinical and experimental data in order to help understanding the underlying mechanisms behind the observed respond under physiological and pathological conditions. In this regard, solving the electric activity of the heart possess a big challenge, not only because of the structural complexities inherent to the heart tissue, but also because of the complex electric behaviour of the cardiac cells. The multi- scale nature of the electrophysiology problem makes difficult its numerical solution, requiring temporal and spatial resolutions of 0.1 ms and 0.2 mm respectively for accurate simulations, leading to models with millions degrees of freedom that need to be solved for thousand time steps. Solution of this problem requires the use of algorithms with higher level of parallelism in multi-core platforms. In this regard the newer programmable graphic processing units (GPU) has become a valid alternative due to their tremendous computational horsepower. This thesis develops around the implementation of an electrophysiology simulation software entirely developed in Compute Unified Device Architecture (CUDA) for GPU computing. The software implements fully explicit and semi-implicit solvers for the monodomain model, using operator splitting and the finite element method for space discretization. Performance is compared with classical multi-core MPI based solvers operating on dedicated high-performance computer clusters. Results obtained with the GPU based solver show enormous potential for this technology with accelerations over 50× for three-dimensional problems when using an implicit scheme for the parabolic equation, whereas accelerations reach values up to 100× for the explicit implementation. The implemented solver has been applied to study pro-arrhythmic mechanisms during acute ischemia. In particular, we investigate on how hyperkalemia affects the vulnerability window to reentry and the reentry patterns in the heterogeneous substrate caused by acute regional ischemia using an anatomically and biophysically detailed human biventricular model. A three dimensional geometrically and anatomically accurate regionally ischemic human heart model was created. The ischemic region was located in the inferolateral and posterior side of the left ventricle mimicking the occlusion of the circumflex artery, and the presence of a washed-out zone not affected by ischemia at the endocardium has been incorporated. Realistic heterogeneity and fi er anisotropy has also been considered in the model. A highly electrophysiological detailed action potential model for human has been adapted to make it suitable for modeling ischemic conditions (hyperkalemia, hipoxia, and acidic conditions) by introducing a formulation of the ATP-sensitive K+ current. The model predicts the generation of sustained re-entrant activity in the form single and double circus around a blocked area within the ischemic zone for K+ concentrations bellow 9mM, with the reentrant activity associated with ventricular tachycardia in all cases. Results suggest the washed-out zone as a potential pro-arrhythmic substrate factor helping on establishing sustained ventricular tachycardia.Colli-Franzone P, Pavarino L. A parallel solver for reaction-diffusion systems in computational electrocardiology, Math. Models Methods Appl. Sci. 14 (06):883-911, 2004.Colli-Franzone P, Deu hard P, Erdmann B, Lang J, Pavarino L F. Adaptivity in space and time for reaction-diffusion systems in electrocardiology, SIAM J. Sci. Comput. 28 (3):942-962, 2006.Ferrero J M(Jr), Saiz J, Ferrero J M, Thakor N V. Simulation of action potentials from metabolically impaired cardiac myocytes: Role of atp-sensitive K+ current. Circ Res, 79(2):208-221, 1996.Ferrero J M (Jr), Trenor B. Rodriguez B, Saiz J. Electrical acticvity and reentry during acute regional myocardial ischemia: Insights from simulations.Int J Bif Chaos, 13:3703-3715, 2003.Heidenreich E, Ferrero J M, Doblare M, Rodriguez J F. Adaptive macro finite elements for the numerical solution of monodomain equations in cardiac electrophysiology, Ann. Biomed. Eng. 38 (7):2331-2345, 2010.Janse M J, Kleber A G. Electrophysiological changes and ventricular arrhythmias in the early phase of regional myocardial ischemia. Circ. Res. 49:1069-1081, 1981.ten Tusscher K HWJ, Panlov A V. Alternans and spiral breakup in a human ventricular tissue model. Am. J.Physiol. Heart Circ. Physiol. 291(3):1088-1100, 2006.<br /

    Volume 19 (2) 2013

    Get PDF
    New generation of General Purpose Graphic Processing Unit (GPGPU) cards with their large computation power allow to approach difficult tasks from Radio Frequency Integrated Circuits (RFICs) modeling area. Using different electromagnetic modeling methods, the Finite Element Method (FEM) and the Finite Integration Technique (FIT), to model Radio Frequency Integrated Circuit (RFIC) devices, large linear equations systems have to be solved. This paper presents the benefits of using Graphic Processing Unit (GPU) computations for solving such systems which are characterized by sparse complex matrices. CUSP is a GPU generic parallel algorithms library for sparse linear algebra and graph computations based on Compute Unified Device Architecture (CUDA). The code is calling iterative methods available in CUSP in order to solve those complex linear equation systems. The tests were performed on various Central Processing Units (CPU) and GPU hardware configurations. The results of these tests show that using GPU computations for solving the linear equations systems, the electromagnetic modeling process of RFIC devices can be accelerated and at the same time a high level of computation accuracy is maintained. Tests were carried out on matrices obtained for an integrated inductor designed for RFICs, and for Micro Stripe (MS) designed for Photonics Integrated Circuit (PIC).Pozna

    Research on Acceleration Technology for FDTD Based on Vivado HLS

    Get PDF
    时域有限差分法(Finitedifferencetimedomainmethod,FDTD)是一种电磁学计算的基本方法,通过空间内电场和磁场的交替计算,得到整个研究空间的电磁分布情况。对于很多电磁学问题,不论从概念上还是可实现性上来讲,时域有限差分方法都是最简单的计算方法。时域有限差分法可以解决复杂的电磁计算问题,但同时要消耗大量的计算机资源,并且花费较长的计算时间。为了更快速高效地得到计算结果,可以利用硬件技术进行加速,这也是近年来FDTD方法研究领域比较受关注的部分。Xilinx公司新推出的高级综合工具VivadoHLS(HighLevelSynthesis),直接通过C/C++语言开发硬...In the field of computational electromagnetics, finite difference time domain method (FDTD) has been widely used. Using FDTD, the electromagnetic distribution of the whole field is obtained by alternating calculation of the electric and magnetic field. For many electromagnetical computational problems, FDTD is the simplest method, in consideration of conception and achievability. Although FDTD can...学位:工程硕士院系专业:物理科学与技术学院_工程硕士(电子与通信工程)学号:3432014115280

    Heterogeneous multicore systems for signal processing

    Get PDF
    This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included
    corecore