1,195 research outputs found

    PyFR: An Open Source Framework for Solving Advection-Diffusion Type Problems on Streaming Architectures using the Flux Reconstruction Approach

    Get PDF
    High-order numerical methods for unstructured grids combine the superior accuracy of high-order spectral or finite difference methods with the geometric flexibility of low-order finite volume or finite element schemes. The Flux Reconstruction (FR) approach unifies various high-order schemes for unstructured grids within a single framework. Additionally, the FR approach exhibits a significant degree of element locality, and is thus able to run efficiently on modern streaming architectures, such as Graphical Processing Units (GPUs). The aforementioned properties of FR mean it offers a promising route to performing affordable, and hence industrially relevant, scale-resolving simulations of hitherto intractable unsteady flows within the vicinity of real-world engineering geometries. In this paper we present PyFR, an open-source Python based framework for solving advection-diffusion type problems on streaming architectures using the FR approach. The framework is designed to solve a range of governing systems on mixed unstructured grids containing various element types. It is also designed to target a range of hardware platforms via use of an in-built domain specific language based on the Mako templating engine. The current release of PyFR is able to solve the compressible Euler and Navier-Stokes equations on grids of quadrilateral and triangular elements in two dimensions, and hexahedral elements in three dimensions, targeting clusters of CPUs, and NVIDIA GPUs. Results are presented for various benchmark flow problems, single-node performance is discussed, and scalability of the code is demonstrated on up to 104 NVIDIA M2090 GPUs. The software is freely available under a 3-Clause New Style BSD license (see www.pyfr.org)

    QuickCSG: Fast Arbitrary Boolean Combinations of N Solids

    Get PDF
    QuickCSG computes the result for general N-polyhedron boolean expressions without an intermediate tree of solids. We propose a vertex-centric view of the problem, which simplifies the identification of final geometric contributions, and facilitates its spatial decomposition. The problem is then cast in a single KD-tree exploration, geared toward the result by early pruning of any region of space not contributing to the final surface. We assume strong regularity properties on the input meshes and that they are in general position. This simplifying assumption, in combination with our vertex-centric approach, improves the speed of the approach. Complemented with a task-stealing parallelization, the algorithm achieves breakthrough performance, one to two orders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over two to dozens of polyhedra. The algorithm also outperforms GPU implementations with approximate discretizations, while producing an output without redundant facets. Despite the restrictive assumptions on the input, we show the usefulness of QuickCSG for applications with large CSG problems and strong temporal constraints, e.g. modeling for 3D printers, reconstruction from visual hulls and collision detection

    QuickCSG: Fast Arbitrary Boolean Combinations of N Solids

    Full text link
    QuickCSG computes the result for general N-polyhedron boolean expressions without an intermediate tree of solids. We propose a vertex-centric view of the problem, which simplifies the identification of final geometric contributions, and facilitates its spatial decomposition. The problem is then cast in a single KD-tree exploration, geared toward the result by early pruning of any region of space not contributing to the final surface. We assume strong regularity properties on the input meshes and that they are in general position. This simplifying assumption, in combination with our vertex-centric approach, improves the speed of the approach. Complemented with a task-stealing parallelization, the algorithm achieves breakthrough performance, one to two orders of magnitude speedups with respect to state-of-the-art CPU algorithms, on boolean operations over two to dozens of polyhedra. The algorithm also outperforms GPU implementations with approximate discretizations, while producing an output without redundant facets. Despite the restrictive assumptions on the input, we show the usefulness of QuickCSG for applications with large CSG problems and strong temporal constraints, e.g. modeling for 3D printers, reconstruction from visual hulls and collision detection

    Doctor of Philosophy

    Get PDF
    dissertationMemory access irregularities are a major bottleneck for bandwidth limited problems on Graphics Processing Unit (GPU) architectures. GPU memory systems are designed to allow consecutive memory accesses to be coalesced into a single memory access. Noncontiguous accesses within a parallel group of threads working in lock step may cause serialized memory transfers. Irregular algorithms may have data-dependent control flow and memory access, which requires runtime information to be evaluated. Compile time methods for evaluating parallelism, such as static dependence graphs, are not capable of evaluating irregular algorithms. The goals of this dissertation are to study irregularities within the context of unstructured mesh and sparse matrix problems, analyze the impact of vectorization widths on irregularities, and present data-centric methods that improve control flow and memory access irregularity within those contexts. Reordering associative operations has often been exploited for performance gains in parallel algorithms. This dissertation presents a method for associative reordering of stencil computations over unstructured meshes that increases data reuse through caching. This novel parallelization scheme offers considerable speedups over standard methods. Vectorization widths can have significant impact on performance in vectorized computations. Although the hardware vector width is generally fixed, the logical vector width used within a computation can range from one up to the width of the computation. Significant performance differences can occur due to thread scheduling and resource limitations. This dissertation analyzes the impact of vectorization widths on dense numerical computations such as 3D dG postprocessing. It is difficult to efficiently perform dynamic updates on traditional sparse matrix formats. Explicitly controlling memory segmentation allows for in-place dynamic updates in sparse matrices. Dynamically updating the matrix without rebuilding or sorting greatly improves processing time and overall throughput. This dissertation presents a new sparse matrix format, dynamic compressed sparse row (DCSR), which allows for dynamic streaming updates to a sparse matrix. A new method for parallel sparse matrix-matrix multiplication (SpMM) that uses dynamic updates is also presented

    Efficient computation of two-dimensional steady free-surface flows

    Full text link
    We consider a family of steady free-surface flow problems in two dimensions, concentrating on the effect of nonlinearity on the train of gravity waves that appear downstream of a disturbance. By exploiting standard complex variable techniques, these problems are formulated in terms of a coupled system of Bernoulli's equation and an integral equation. When applying a numerical collocation scheme, the Jacobian for the system is dense, as the integral equation forces each of the algebraic equations to depend on each of the unknowns. We present here a strategy for overcoming this challenge, which leads to a numerical scheme that is much more efficient than what is normally employed for these types of problems, allowing for many more grid points over the free surface. In particular, we provide a simple recipe for constructing a sparse approximation to the Jacobian that is used as a preconditioner in a Jacobian-free Newton-Krylov method for solving the nonlinear system. We use this approach to compute numerical results for a variety of prototype problems including flows past pressure distributions, a surface-piercing object and bottom topographies.Comment: 20 pages, 13 figures, under revie

    Radial Basis Functions: Biomedical Applications and Parallelization

    Get PDF
    Radial basis function (RBF) is a real-valued function whose values depend only on the distances between an interpolation point and a set of user-specified points called centers. RBF interpolation is one of the primary methods to reconstruct functions from multi-dimensional scattered data. Its abilities to generalize arbitrary space dimensions and to provide spectral accuracy have made it particularly popular in different application areas, including but not limited to: finding numerical solutions of partial differential equations (PDEs), image processing, computer vision and graphics, deep learning and neural networks, etc. The present thesis discusses three applications of RBF interpolation in biomedical engineering areas: (1) Calcium dynamics modeling, in which we numerically solve a set of PDEs by using meshless numerical methods and RBF-based interpolation techniques; (2) Image restoration and transformation, where an image is restored from its triangular mesh representation or transformed under translation, rotation, and scaling, etc. from its original form; (3) Porous structure design, in which the RBF interpolation used to reconstruct a 3D volume containing porous structures from a set of regularly or randomly placed points inside a user-provided surface shape. All these three applications have been investigated and their effectiveness has been supported with numerous experimental results. In particular, we innovatively utilize anisotropic distance metrics to define the distance in RBF interpolation and apply them to the aforementioned second and third applications, which show significant improvement in preserving image features or capturing connected porous structures over the isotropic distance-based RBF method. Beside the algorithm designs and their applications in biomedical areas, we also explore several common parallelization techniques (including OpenMP and CUDA-based GPU programming) to accelerate the performance of the present algorithms. In particular, we analyze how parallel programming can help RBF interpolation to speed up the meshless PDE solver as well as image processing. While RBF has been widely used in various science and engineering fields, the current thesis is expected to trigger some more interest from computational scientists or students into this fast-growing area and specifically apply these techniques to biomedical problems such as the ones investigated in the present work
    corecore