Search CORE

191 research outputs found

DualSPHysics: from fluid dynamics to multiphysics problems

Author: Altomare C.
Canelas R. B.
Crespo A. J. C.
Dominguez J. M.
Fourtakas G.
Garcia-Feal O.
Gomez-Gesteira M.
Martinez-Estevez I.
Mokos A.
Rogers B. D.
Stansby P. K.
Tafuni A.
Vacondio R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

DualSPHysics is a weakly compressible smoothed particle hydrodynamics (SPH) Navier–Stokes solver initially conceived to deal with coastal engineering problems, especially those related to wave impact with coastal structures. Since the first release back in 2011, DualSPHysics has shown to be robust and accurate for simulating extreme wave events along with a continuous improvement in efficiency thanks to the exploitation of hardware such as graphics processing units for scientific computing or the coupling with wave propagating models such as SWASH and OceanWave3D. Numerous additional functionalities have also been included in the DualSPHysics package over the last few years which allow the simulation of fluid-driven objects. The use of the discrete element method has allowed the solver to simulate the interaction among different bodies (sliding rocks, for example), which provides a unique tool to analyse debris flows. In addition, the recent coupling with other solvers like Project Chrono or MoorDyn has been a milestone in the development of the solver. Project Chrono allows the simulation of articulated structures with joints, hinges, sliders and springs and MoorDyn allows simulating moored structures. Both functionalities make DualSPHysics especially suited for the simulation of offshore energy harvesting devices. Lately, the present state of maturity of the solver goes beyond single-phase simulations, allowing multi-phase simulations with gas–liquid and a combination of Newtonian and non-Newtonian models expanding further the capabilities and range of applications for the DualSPHysics solver. These advances and functionalities make DualSPHysics an advanced meshless solver with emphasis on free-surface flow modelling

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Radial Basis Functions: Biomedical Applications and Parallelization

Author: Liu Ke
Publication venue: UWM Digital Commons
Publication date: 01/12/2016
Field of study

Radial basis function (RBF) is a real-valued function whose values depend only on the distances between an interpolation point and a set of user-specified points called centers. RBF interpolation is one of the primary methods to reconstruct functions from multi-dimensional scattered data. Its abilities to generalize arbitrary space dimensions and to provide spectral accuracy have made it particularly popular in different application areas, including but not limited to: finding numerical solutions of partial differential equations (PDEs), image processing, computer vision and graphics, deep learning and neural networks, etc. The present thesis discusses three applications of RBF interpolation in biomedical engineering areas: (1) Calcium dynamics modeling, in which we numerically solve a set of PDEs by using meshless numerical methods and RBF-based interpolation techniques; (2) Image restoration and transformation, where an image is restored from its triangular mesh representation or transformed under translation, rotation, and scaling, etc. from its original form; (3) Porous structure design, in which the RBF interpolation used to reconstruct a 3D volume containing porous structures from a set of regularly or randomly placed points inside a user-provided surface shape. All these three applications have been investigated and their effectiveness has been supported with numerous experimental results. In particular, we innovatively utilize anisotropic distance metrics to define the distance in RBF interpolation and apply them to the aforementioned second and third applications, which show significant improvement in preserving image features or capturing connected porous structures over the isotropic distance-based RBF method. Beside the algorithm designs and their applications in biomedical areas, we also explore several common parallelization techniques (including OpenMP and CUDA-based GPU programming) to accelerate the performance of the present algorithms. In particular, we analyze how parallel programming can help RBF interpolation to speed up the meshless PDE solver as well as image processing. While RBF has been widely used in various science and engineering fields, the current thesis is expected to trigger some more interest from computational scientists or students into this fast-growing area and specifically apply these techniques to biomedical problems such as the ones investigated in the present work

University of Wisconsin-Milwaukee

Schnelle Löser für partielle Differentialgleichungen

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2008
Field of study

[no abstract available

Repositorium für Naturwissenschaften und Technik

High-performance tsunami modelling with modern GPU technology

Author: Amouzgar Reza
Publication venue: Newcastle University
Publication date: 01/01/2017
Field of study

PhD ThesisEarthquake-induced tsunamis commonly propagate in the deep ocean as long waves and develop into sharp-fronted surges moving rapidly coastward, which may be effectively simulated by hydrodynamic models solving the nonlinear shallow water equations (SWEs). Tsunamis can cause substantial economic and human losses, which could be mitigated through early warning systems given efficient and accurate modelling. Most existing tsunami models require long simulation times for real-world applications. This thesis presents a graphics processing unit (GPU) accelerated finite volume hydrodynamic model using the compute unified device architecture (CUDA) for computationally efficient tsunami simulations. Compared with a standard PC, the model is able to reduce run-time by a factor of > 40. The validated model is used to reproduce the 2011 Japan tsunami. Two source models were tested, one based on tsunami waveform inversion and another using deep-ocean tsunameters. Vertical sea surface displacement is computed by the Okada model, assuming instantaneous sea-floor deformation. Both source models can reproduce the wave propagation at offshore and nearshore gauges, but the tsunameter-based model better simulates the first wave amplitude. Effects of grid resolutions between 450-3600 m, slope limiters, and numerical accuracy are also investigated for the simulation of the 2011 Japan tsunami. Grid resolutions of 1-2 km perform well with a proper limiter; the Sweby limiter is optimal for coarser resolutions, recovers wave peaks better than minmod, and is more numerically stable than Superbee. One hour of tsunami propagation can be predicted in 50 times on a regular low-cost PC-hosted GPU, compared to a single CPU. For 450 m resolution on a larger-memory server-hosted GPU, performance increased by ~70 times. Finally, two adaptive mesh refinement (AMR) techniques including simplified dynamic adaptive grids on CPU and a static adaptive grid on GPU are introduced to provide multi-scale simulations. Both can reduce run-time by ~3 times while maintaining acceptable accuracy. The proposed computationally-efficient tsunami model is expected to provide a new practical tool for tsunami modelling for different purposes, including real-time warning, evacuation planning, risk management and city planning

Newcastle University eTheses

Recommended from our members

Fast algorithms for biophysically-constrained inverse problems in medical imaging

Author: Gholaminejad Amir
Publication venue
Publication date: 05/02/2018
Field of study

We present algorithms and software for parameter estimation for forward and inverse tumor growth problems and diffeomorphic image registration. Our methods target the following scenarios: automatic image registration of healthy images to tumor bearing medical images and parameter estimation/calibration of tumor models. This thesis focuses on robust and scalable algorithms for these problems. Although the proposed framework applies to many problems in oncology, we focus on primary brain tumors and in particular low and high-grade gliomas. For the tumor model, the main quantity of interest is the extent of tumor infiltration into the brain, beyond what is visible in imaging. The inverse tumor problem assumes that we have patient images at two (or more) well-separated times so that we can observe the tumor growth. Also, the inverse problem requires that the two images are segmented. But in a clinical setting such information is usually not available. In a typical case, we just have multimodal magnetic resonance images with no segmentation. We address this lack of information by solving a coupled inverse registration and tumor problem. The role of image registration is to find a plausible mapping between the patient's tumor-bearing image and a normal brain (atlas), with known segmentation. Solving this coupled inverse problem has a prohibitive computational cost, especially in 3D. To address this challenge we have developed novel schemes, scaled up to 200K cores. Our main contributions is the design and implementation of fast solvers for these problems. We also study the performance for the tumor parameter estimation and registration solvers and their algorithmic scalability. In particular, we introduce the following novel algorithms: An adjoint formulation for tumor-growth problems with/without mass-effect; The first parallel 3D Newton-Krylov method for large diffeomorphic image registration; A novel parallel semi-Lagrangian algorithm for solving advection equations in image registration and its parallel implementation on shared and distributed memory architectures; and Accelerated FFT (AccFFT), an open-source parallel FFT library for CPU and GPUs scaled up to 131,000 cores with optimized kernels for computing spectral operators. The scientific outcomes of this thesis, has appeared in the proceedings of three ACM/IEEE SCxy conferences (two best student paper finalist, and one ACM SRC gold medal), two journal papers, two papers in review, four papers in preparation (coupling, mass effect, segmentation, and multi-species tumor model), and seven conference presentations.Computational Science, Engineering, and Mathematic

Texas ScholarWorks

Using GPUs to accelerate computational diffusion MRI: from microstructure estimation to tractography and connectomes

Author: Giles Mike
Hernandez-Fernandez Moises
Jbabdi Saad
Reguly Istvan
Smith Stephen
Sotiropoulos Stamatios N.
Publication venue: 'Elsevier BV'
Publication date: 08/12/2018
Field of study

The great potential of computational diffusion MRI (dMRI) relies on indirect inference of tissue microstructure and brain connections, since modelling and tractography frameworks map diffusion measurements to neuroanatomical features. This mapping however can be computationally highly expensive, particularly given the trend of increasing dataset sizes and the complexity in biophysical modelling. Limitations on computing resources can restrict data exploration and methodology development. A step forward is to take advantage of the computational power offered by recent parallel computing architectures, especially Graphics Processing Units (GPUs). GPUs are massive parallel processors that offer trillions of floating point operations per second, and have made possible the solution of computationally-intensive scientific problems that were intractable before. However, they are not inherently suited for all problems. Here, we present two different frameworks for accelerating dMRI computations using GPUs that cover the most typical dMRI applications: a framework for performing biophysical modelling and microstructure estimation, and a second framework for performing tractography and long-range connectivity estimation. The former provides a front-end and automatically generates a GPU executable file from a user-specified biophysical model, allowing accelerated non-linear model fitting in both deterministic and stochastic ways (Bayesian inference). The latter performs probabilistic tractography, it can generate whole-brain connectomes and supports new functionality for imposing anatomical constraints, such as inherent consideration of surface meshes (GIFTI files) along with volumetric images. We validate the frameworks against well-established CPU-based implementations and we show that despite the very different challenges for parallelising these problems, a single GPU achieves better performances than 200 CPU cores thanks to our parallel designs

Repository@Nottingham

Oxford University Research Archive

Repository of the Academy's Library

IDEFIX: a versatile performance-portable Godunov code for astrophysical flows

Author: Baghdadi S.
Bossche M. Van den
Lesur G. R. J.
Mauxion J.
Robert C. M. T.
Wafflard-Fernandez G.
Publication venue
Publication date: 26/04/2023
Field of study

Exascale super-computers now becoming available rely on hybrid energy-efficient architectures that involve an accelerator such as Graphics Processing Units (GPU). Leveraging the computational power of these machines often means a significant rewrite of the numerical tools each time a new architecture becomes available. To address these issues, we present Idefix, a new code for astrophysical flows that relies on the Kokkos meta-programming library to guarantee performance portability on a wide variety of architectures while keeping the code as simple as possible for the user. Idefix is based on a Godunov finite-volume method that solves the non-relativistic HD and MHD equations on various grid geometries. Idefix includes a wide choice of solvers and several additional modules (constrained transport, orbital advection, non-ideal MHD) allowing users to address complex astrophysical problems. Idefix has been successfully tested on Intel and AMD CPUs (up to 131 072 CPU cores on Irene-Rome at TGCC) as well as NVidia and AMD GPUs (up to 1024 GPUs on Adastra at CINES). Idefix achieves more than 1e8 cell/s in MHD on a single NVidia V100 GPU and 3e11 cell/s on 256 Adastra nodes (1024 GPUs) with 95% parallelization efficiency (compared to a single node). For the same problem, Idefix is up to 6 times more energy efficient on GPUs compared to Intel Cascade Lake CPUs. Idefix is now a mature exascale-ready open-source code that can be used on a large variety of astrophysical and fluid dynamics applications.Comment: 18 pages, 18 figures, 3 tables, accepted for publication in Astronomy & Astrophysic

arXiv.org e-Print Archive

Dense and sparse parallel linear algebra algorithms on graphics processing units

Author: Lamas Daviña Alejandro
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 13/11/2018
Field of study

Una línea de desarrollo seguida en el campo de la supercomputación es el uso de procesadores de propósito específico para acelerar determinados tipos de cálculo. En esta tesis estudiamos el uso de tarjetas gráficas como aceleradores de la computación y lo aplicamos al ámbito del álgebra lineal. En particular trabajamos con la biblioteca SLEPc para resolver problemas de cálculo de autovalores en matrices de gran dimensión, y para aplicar funciones de matrices en los cálculos de aplicaciones científicas. SLEPc es una biblioteca paralela que se basa en el estándar MPI y está desarrollada con la premisa de ser escalable, esto es, de permitir resolver problemas más grandes al aumentar las unidades de procesado. El problema lineal de autovalores, Ax = lambda x en su forma estándar, lo abordamos con el uso de técnicas iterativas, en concreto con métodos de Krylov, con los que calculamos una pequeña porción del espectro de autovalores. Este tipo de algoritmos se basa en generar un subespacio de tamaño reducido (m) en el que proyectar el problema de gran dimensión (n), siendo m << n. Una vez se ha proyectado el problema, se resuelve este mediante métodos directos, que nos proporcionan aproximaciones a los autovalores del problema inicial que queríamos resolver. Las operaciones que se utilizan en la expansión del subespacio varían en función de si los autovalores deseados están en el exterior o en el interior del espectro. En caso de buscar autovalores en el exterior del espectro, la expansión se hace mediante multiplicaciones matriz-vector. Esta operación la realizamos en la GPU, bien mediante el uso de bibliotecas o mediante la creación de funciones que aprovechan la estructura de la matriz. En caso de autovalores en el interior del espectro, la expansión requiere resolver sistemas de ecuaciones lineales. En esta tesis implementamos varios algoritmos para la resolución de sistemas de ecuaciones lineales para el caso específico de matrices con estructura tridiagonal a bloques, que se ejecutan en GPU. En el cálculo de las funciones de matrices hemos de diferenciar entre la aplicación directa de una función sobre una matriz, f(A), y la aplicación de la acción de una función de matriz sobre un vector, f(A)b. El primer caso implica un cálculo denso que limita el tamaño del problema. El segundo permite trabajar con matrices dispersas grandes, y para resolverlo también hacemos uso de métodos de Krylov. La expansión del subespacio se hace mediante multiplicaciones matriz-vector, y hacemos uso de GPUs de la misma forma que al resolver autovalores. En este caso el problema proyectado comienza siendo de tamaño m, pero se incrementa en m en cada reinicio del método. La resolución del problema proyectado se hace aplicando una función de matriz de forma directa. Nosotros hemos implementado varios algoritmos para calcular las funciones de matrices raíz cuadrada y exponencial, en las que el uso de GPUs permite acelerar el cálculo.One line of development followed in the field of supercomputing is the use of specific purpose processors to speed up certain types of computations. In this thesis we study the use of graphics processing units as computer accelerators and apply it to the field of linear algebra. In particular, we work with the SLEPc library to solve large scale eigenvalue problems, and to apply matrix functions in scientific applications. SLEPc is a parallel library based on the MPI standard and is developed with the premise of being scalable, i.e. to allow solving larger problems by increasing the processing units. We address the linear eigenvalue problem, Ax = lambda x in its standard form, using iterative techniques, in particular with Krylov's methods, with which we calculate a small portion of the eigenvalue spectrum. This type of algorithms is based on generating a subspace of reduced size (m) in which to project the large dimension problem (n), being m << n. Once the problem has been projected, it is solved by direct methods, which provide us with approximations of the eigenvalues of the initial problem we wanted to solve. The operations used in the expansion of the subspace vary depending on whether the desired eigenvalues are from the exterior or from the interior of the spectrum. In the case of searching for exterior eigenvalues, the expansion is done by matrix-vector multiplications. We do this on the GPU, either by using libraries or by creating functions that take advantage of the structure of the matrix. In the case of eigenvalues from the interior of the spectrum, the expansion requires solving linear systems of equations. In this thesis we implemented several algorithms to solve linear systems of equations for the specific case of matrices with a block-tridiagonal structure, that are run on GPU. In the computation of matrix functions we have to distinguish between the direct application of a matrix function, f(A), and the action of a matrix function on a vector, f(A)b. The first case involves a dense computation that limits the size of the problem. The second allows us to work with large sparse matrices, and to solve it we also make use of Krylov's methods. The expansion of subspace is done by matrix-vector multiplication, and we use GPUs in the same way as when solving eigenvalues. In this case the projected problem starts being of size m, but it is increased by m on each restart of the method. The solution of the projected problem is done by directly applying a matrix function. We have implemented several algorithms to compute the square root and the exponential matrix functions, in which the use of GPUs allows us to speed up the computation.Una línia de desenvolupament seguida en el camp de la supercomputació és l'ús de processadors de propòsit específic per a accelerar determinats tipus de càlcul. En aquesta tesi estudiem l'ús de targetes gràfiques com a acceleradors de la computació i ho apliquem a l'àmbit de l'àlgebra lineal. En particular treballem amb la biblioteca SLEPc per a resoldre problemes de càlcul d'autovalors en matrius de gran dimensió, i per a aplicar funcions de matrius en els càlculs d'aplicacions científiques. SLEPc és una biblioteca paral·lela que es basa en l'estàndard MPI i està desenvolupada amb la premissa de ser escalable, açò és, de permetre resoldre problemes més grans en augmentar les unitats de processament. El problema lineal d'autovalors, Ax = lambda x en la seua forma estàndard, ho abordem amb l'ús de tècniques iteratives, en concret amb mètodes de Krylov, amb els quals calculem una xicoteta porció de l'espectre d'autovalors. Aquest tipus d'algorismes es basa a generar un subespai de grandària reduïda (m) en el qual projectar el problema de gran dimensió (n), sent m << n. Una vegada s'ha projectat el problema, es resol aquest mitjançant mètodes directes, que ens proporcionen aproximacions als autovalors del problema inicial que volíem resoldre. Les operacions que s'utilitzen en l'expansió del subespai varien en funció de si els autovalors desitjats estan en l'exterior o a l'interior de l'espectre. En cas de cercar autovalors en l'exterior de l'espectre, l'expansió es fa mitjançant multiplicacions matriu-vector. Aquesta operació la realitzem en la GPU, bé mitjançant l'ús de biblioteques o mitjançant la creació de funcions que aprofiten l'estructura de la matriu. En cas d'autovalors a l'interior de l'espectre, l'expansió requereix resoldre sistemes d'equacions lineals. En aquesta tesi implementem diversos algorismes per a la resolució de sistemes d'equacions lineals per al cas específic de matrius amb estructura tridiagonal a blocs, que s'executen en GPU. En el càlcul de les funcions de matrius hem de diferenciar entre l'aplicació directa d'una funció sobre una matriu, f(A), i l'aplicació de l'acció d'una funció de matriu sobre un vector, f(A)b. El primer cas implica un càlcul dens que limita la grandària del problema. El segon permet treballar amb matrius disperses grans, i per a resoldre-ho també fem ús de mètodes de Krylov. L'expansió del subespai es fa mitjançant multiplicacions matriu-vector, i fem ús de GPUs de la mateixa forma que en resoldre autovalors. En aquest cas el problema projectat comença sent de grandària m, però s'incrementa en m en cada reinici del mètode. La resolució del problema projectat es fa aplicant una funció de matriu de forma directa. Nosaltres hem implementat diversos algorismes per a calcular les funcions de matrius arrel quadrada i exponencial, en les quals l'ús de GPUs permet accelerar el càlcul.Lamas Daviña, A. (2018). Dense and sparse parallel linear algebra algorithms on graphics processing units [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/112425TESI

RiuNet