11 research outputs found
The art of solving a large number of non-stiff, low-dimensional ordinary differential equation systems on GPUs and CPUs
This paper discusses the main performance barriers for solving a large number of independent ordinary differential equation systems on processors (CPU) and graphics cards (GPU). With a naïve approach, for instance, the utilisation of a CPU can be as low as 4% of its theoretical peak processing power. The main barriers identified by the detailed analysing of the hardware architectures and profiling using hardware performance monitoring units are as follows. First, exploitation of the SIMD capabilities of the CPU via vector registers. The solution is to implement/enforce explicit vectorisation. Second, hiding instruction latencies on both CPUs and GPUs that can be achieved with increasing (instruction-level) parallelism. Third, the efficient handling of large timescale differences or event handling using the massively parallel architecture of GPUs. A viable option to overcome this difficulty is asynchronous time stepping. The above optimisation techniques and their implementation possibilities are discussed and tested on three program packages: MPGOS written in C++ and specialised only for GPUs; ODEINT implemented in C++, which supports execution on both CPUs and GPUs; finally, DifferentialEquations.jl written in Julia that also supports execution on both CPUs and GPUs. The tested systems (Lorenz equation, Keller–Miksis equation and a pressure relief valve model) are non-stiff and have low dimension. Thus, the performance of the codes are not limited by memory bandwidth, and Runge–Kutta type solvers are efficient and suitable choices. The employed hardware are an Intel Core i7-4820K CPU with 30.4 GFLOPS peak double-precision performance per cores and an Nvidia GeForce Titan Black GPU that has a total of 1707 GFLOPS peak double-precision performance
emgr - The Empirical Gramian Framework
System Gramian matrices are a well-known encoding for properties of
input-output systems such as controllability, observability or minimality.
These so-called system Gramians were developed in linear system theory for
applications such as model order reduction of control systems. Empirical
Gramian are an extension to the system Gramians for parametric and nonlinear
systems as well as a data-driven method of computation. The empirical Gramian
framework - emgr - implements the empirical Gramians in a uniform and
configurable manner, with applications such as Gramian-based (nonlinear) model
reduction, decentralized control, sensitivity analysis, parameter
identification and combined state and parameter reduction
Memory-friendly fixed-point iteration method for nonlinear surface mode oscillations of acoustically driven bubbles: from the perspective of high-performance GPU programming
A fixed-point iteration technique is presented to handle the implicit nature of the governing equations of nonlinear surface mode oscillations of acoustically excited microbubbles. The model is adopted from the theoretical work of Shaw [1], where the dynamics of the mean bubble radius and the surface modes are bi-directionally coupled via nonlinear terms. The model comprises a set of second-order ordinary differential equations. It extends the classic Keller–Miksis equation and the linearized dynamical equations for each surface mode. Only the implicit parts (containing the second derivatives) are reevaluated during the iteration process. The performance of the technique is tested at various parameter combinations. The majority of the test cases needs only a single reevaluation to achieve 10^-9 error. Although the arithmetic operation count is higher than the Gauss elimination, due to its memory-friendly matrix-free nature, it is a viable alternative for high-performance GPU computations of massive parameter studies
Recommended from our members
Hybrid Analog-Digital Co-Processing for Scientific Computation
In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms.
This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking.
I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer.
The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around.
The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations,
and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads.
The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms.
The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation.
As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads
Software for Exascale Computing - SPPEXA 2016-2019
This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
High-performance computing for impact-induced fracture analysis exploiting octree mesh patterns
The impact-induced fracture analysis has a wide range of engineering and defence applications, including aerospace, manufacturing and construction. An accurate simulation of impact events often requires modelling large-scale complex geometries along with dynamic stress waves and damage propagation. To perform such simulations in a timely manner, a highly efficient and scalable computational framework is necessary.
This thesis aims to develop a high-performance computational framework for analysing large-scale structural problems pertaining to impact-induced fracture events. A hierarchical grid-based mesh containing octree cells is utilised for discretising the problem domain. The scaled boundary finite element method (SBFEM) is employed, which can efficiently handle the octree cells by eliminating the hanging node issues.
The octree-mesh is used in balanced form with a limited number of octree cell patterns. The master element matrices of each pattern are pre-computed while the storage of the individual element matrices is avoided leading to a significant reduction in memory requirements, especially for large-scale models. Further, the advantages of octree cells are leveraged by automatic mesh generation and local refinement process, which enables efficient pre-processing of models with complex geometries.
To handle the matrix operations associated with large-scale simulation, a pattern-by-pattern (PBP) approach is proposed. In this technique, the octree-patterns are exploited to recast a majority of the computational work into pattern-level dense matrix operations. This avoids global matrix assembly, allows better cache utilisation, and aids the associated memory-bandwidth limited computations, resulting in significant performance gains in matrix operations.
The PBP approach also supports large-scale parallelism. In this work, the parallel computation is carried out using the mesh-partitioning strategy and implemented using the message passing technique. It is shown that the developed solvers can simulate large-scale and complex structural problems, e.g. delamination/fracture in sandwich panels with approximately a billion unknowns (or DOFs). A massive scaling can be achieved with more than ten thousand cores in a distributed computing environment, which reduces the computation time from months (on a single core) to a few minutes
Towards Cognition-Guided Patient-Specific Numerical Simulation for Cardiac Surgery Assistance
Motivation.
Patient-specific, knowledge-based, holistic surgical treatment planning is of utmost importance when dealing with complex surgery. Surgeons need to account for all available medical patient data, keep track of technical developments, and stay on top of current surgical expert knowledge to define a suitable surgical treatment strategy.
There is a large potential for computer assistance, also, and in particular, regarding surgery simulation which gives surgeons the opportunity not only to plan but to simulate,
too, some steps of an intervention and to forecast relevant surgical situations.
Purpose.
In this work, we particularly look at mitral valve reconstruction (MVR) surgery, which is to re-establish the functionality of an incompetent mitral valve (MV) through implantation of an artificial ring that reshapes the valvular morphology. We aim at supporting MVR by providing surgeons with biomechanical FEM-based MVR surgery simulations that enable them to assess the simulated behavior of the MV after an MVR. However, according to the above requirements, such surgery simulation is really beneficial to surgeons only if it is patient-specific, surgical expert knowledge-based, comprehensive in terms of the underlying model and the patient’s data, and if its setup and execution is fully automated and integrated into the surgical treatment workflow.
Methods.
This PhD work conducts research on simulation-enhanced, cognition-guided, patient-specific cardiac surgery assistance. First, we derive a biomechanical MV/MVR model and develop an FEM-based MVR surgery simulation using the FEM software toolkit HiFlow3. Following, we outline the functionality and features of the Medical Simulation Markup Language (MSML) and how it simplifies the biomechanical modeling workflow. It is then detailed, how, by means of the MSML and a set of dedicated MVR simulation reprocessing operators, patient-individual medical data can comprehensively be analyzed and processed in order for the fully automated setup of MVR simulation scenarios. Finally, the presented work is integrated into the cognitive system architecture of the joint research project Cognition-Guided Surgery. We particularly look at its semantic knowledge and data infrastructure as well as at the setup of its cognitive software components, which eventually facilitate cognition-guidance and patient-specifity for the overall simulation-enhanced MVR assistance pipeline.
Results and Discussion.
We have proposed and implemented, for the first time, a prototypic system for simulation-enhanced, cognition-guided, patient-specific cardiac surgery assistance. The overall system was evaluated in terms of functionality and performance. Through its cognitive, data-driven pipeline setup, medical patient data and surgical information is analyzed and processed comprehensively, efficiently and fully automatically, and the hence set-up simulation scenarios yield reliable, patient-specific MVR surgery simulation results. This indicates the system’s usability and applicability. The proposed work thus presents an important step towards a simulation-enhanced, cognition-guided, patient-specific cardiac surgery assistance, and can – once operative – be expected to significantly enhance MVR surgery. Concluding, we discuss possible further research contents and promising applications to build upon the presented work