125 research outputs found

    Comparative Study of Uniform and Graded Meshes for Solving Convection-Diffusion Equation with Quadratic Source

    Get PDF
    Due to its fundamental nature, the problems of convection-diffusion are discussed in various aviation, science and engineering applications. Among major applications are in the study of the dynamics of aircraft wake vortex and its interaction with turbulent jet which is a very serious hazard in aviation. Other applications include those in the investigation of intrusive sampling of jet engine exhaust gases, and the effectiveness of hot fluid injection in the removal of ice on aircraft wings. The numerical solutions of convection-diffusion require proper meshing schemes. Among major meshes in computational fluid dynamics are those of uniform, piecewise-uniform, graded, and hybrid over which the solutions of discretized governing equations are found. Bad solutions as spurious fluctuations, over- or under-predictions, and excessive computation time might be the results of unwitting application of the meshes. Accentuating comparative effectiveness of two meshes, namely uniform mesh and graded mesh with mesh expansion factor, this paper takes the solution of a convection-diffusion equation with quadratic source term into account. The problem is solved by assigning several values of mesh expansion factor to graded mesh, while mesh number is kept constant. The factors are calculated based on the generalization of their logarithmically linear relationship with low Peclet numbers derived in previous work. Based on the values of Peclet number, five test cases are considered. Graded mesh is proven relatively more robust, particularly due the solution on the mesh being free from spurious fluctuation. Furthermore, the accuracy level of the solution of up to two order of magnitude higher is obtained. The mesh expansion factor therefore contributes to a stable and highly accurate solution corresponding to all interested Peclet numbers

    A GPU-based Laplacian Solver for Magnetostatic Boundary Value Problems

    Get PDF
    Modern graphics processing units (GPUs) have more computing power than CPUs, and thus, GPUs are proposed as more efficient compute units in solving scientific problems with large parallelizable computational loads. In our study, we present a GPU algorithm to solve a magnetostatic boundary value problem, which exhibits parallel properties. In particular, we solve the Laplace equation to find the magnetic scalar potential in the region between two coaxial cylinders. This requires discretizing the problem domain into small cells and finding the solution at each node of the generated mesh. The smaller the cell size is the more accurate the solution will be. More accurate solution leads to a better estimation of the surface current needed to generate a uniform magnetic field inside the inner cylinder, which is the final goal. Although solving a mesh with a large number of smaller cells is computationally intensive, GPU computing provides techniques to accelerate performance. The problem domain is discretized using the finite difference method (FDM) and the linear system of equations obtained from the FDM is solved by the successive over relaxation (SOR) method. The parallel program is implemented using CUDA framework. The performance of the parallel algorithm is optimized using several CUDA optimization strategies and the speedup of the parallel GPU implementation over the sequential CPU implementation is provided.Master of Science in Applied Computer Scienc

    Tensor B-spline numerical method for PDEs : a high performance approach

    Get PDF
    Solutions of Partial Differential Equations (PDEs) form the basis of many mathematical models in physics and medicine. In this work, a novel Tensor B-spline methodology for numerical solutions of linear second-order PDEs is proposed. The methodology applies the B-spline signal processing framework and computational tensor algebra in order to construct high-performance numerical solvers for PDEs. The method allows high-order approximations, is mesh-free, matrix-free and computationally and memory efficient. The first chapter introduces the main ideas of the Tensor B-spline method, depicts the main contributions of the thesis and outlines the thesis structure. The second chapter provides an introduction to PDEs, reviews the numerical methods for solving PDEs, introduces splines and signal processing techniques with B-splines, and describes tensors and the computational tensor algebra. The third chapter describes the principles of the Tensor B-spline methodology. The main aspects are 1) discretization of the PDE variational formulation via B-spline representation of the solution, the coefficients, and the source term, 2) introduction to the tensor B-spline kernels, 3) application of tensors and computational tensor algebra to the discretized variational formulation of the PDE, 4) tensor-based analysis of the problem structure, 5) derivation of the efficient computational techniques, and 6) efficient boundary processing and numerical integration procedures. The fourth chapter describes 1) different computational strategies of the Tensor B-spline solver and an evaluation of their performance, 2) the application of the method to the forward problem of the Optical Diffusion Tomography and an extensive comparison with the state-of-the-art Finite Element Method on synthetic and real medical data, 3) high-performance multicore CPU- and GPU-based implementations, and 4) the solution of large-scale problems on hardware with limited memory resources

    Methods for Multilevel Parallelism on GPU Clusters: Application to a Multigrid Accelerated Navier-Stokes Solver

    Get PDF
    Computational Fluid Dynamics (CFD) is an important field in high performance computing with numerous applications. Solving problems in thermal and fluid sciences demands enormous computing resources and has been one of the primary applications used on supercomputers and large clusters. Modern graphics processing units (GPUs) with many-core architectures have emerged as general-purpose parallel computing platforms that can accelerate simulation science applications substantially. While significant speedups have been obtained with single and multiple GPUs on a single workstation, large problems require more resources. Conventional clusters of central processing units (CPUs) are now being augmented with GPUs in each compute-node to tackle large problems. The present research investigates methods of taking advantage of the multilevel parallelism in multi-node, multi-GPU systems to develop scalable simulation science software. The primary application the research develops is a cluster-ready GPU-accelerated Navier-Stokes incompressible flow solver that includes advanced numerical methods, including a geometric multigrid pressure Poisson solver. The research investigates multiple implementations to explore computation / communication overlapping methods. The research explores methods for coarse-grain parallelism, including POSIX threads, MPI, and a hybrid OpenMP-MPI model. The application includes a number of usability features, including periodic VTK (Visualization Toolkit) output, a run-time configuration file, and flexible setup of obstacles to represent urban areas and complex terrain. Numerical features include a variety of time-stepping methods, buoyancy-drivenflow, adaptive time-stepping, various iterative pressure solvers, and a new parallel 3D geometric multigrid solver. At each step, the project examines performance and scalability measures using the Lincoln Tesla cluster at the National Center for Supercomputing Applications (NCSA) and the Longhorn cluster at the Texas Advanced Computing Center (TACC). The results demonstrate that multi-GPU clusters can substantially accelerate computational fluid dynamics simulations

    Numerical modeling of extrusion forming tools: improving its efficiency on heterogeneous parallel computers

    Get PDF
    Dissertação de mestrado em Engenharia InformáticaPolymer processing usually requires several experimentation and calibration attempts to lead to a final result with the desired quality. As this results in large costs, software applications have been developed aiming to replace laboratory experimentation by computer based simulations and hence lower these costs. The focus of this dissertation was on one of these applications, the FlowCode, an application which helps the design of extrusion forming tools, applied to plastics processing or in the processing of other fluids. The original application had two versions of the code, one to run in a single-core CPU and the other for NVIDIA GPU devices. With the increasing use of heterogeneous platforms, many applications can now benefit and leverage the computational power of these platforms. As this requires some expertise, mostly to schedule tasks/functions and transfer the necessary data to the devices, several frameworks were developed to aid the development - with StarPU being the one with more international relevance, although other ones are emerging such as Dynamic Irregular Computing Environment (DICE). The main objectives of this dissertation were to improve the FlowCode, and to assess the use of one framework to develop an efficient heterogeneous version. Only the CPU version of the code was improved, by first applying techniques to the sequential version and parallelizing it afterwards using OpenMP on both multi-core CPU devices (Intel Xeon 12-core) and on many-core devices (Intel Xeon Phi 61-core). For the heterogeneous version, StarPU was chosen after studying both StarPU and DICE frameworks. Results show the parallel CPU version to be faster than the GPU one, for all input datasets. The GPU code is far from being efficient, requiring several improvements, so comparing the devices with each other would not be fair. The Xeon Phi version proves to be the faster one when no framework is used. For the StarPU version, several schedulers were tested to evaluate the faster one, leading to the most efficient to solve our problem. Executing the code on two GPU devices is 1.7 times faster than when executing the GPU version without the framework. Adding the CPU to the GPUs of the testing environment do not improve execution time with most schedulers due to the lack of available parallelism in the application. Globally, the StarPU version is the faster one followed by the Xeon Phi, CPU and GPU versions.O processamento de polímeros requer normalmente várias tentativas de experimentação e calibração de modo a que o resultado final tenha a qualidade pretendida. Como isto resulta em custos elevados, diversas aplicações foram desenvolvidas para substituir a parte de experimentação laboratorial por simulações por computador e consequentemente, reduzir esses custos. Este dissertação foca-se numa dessas aplicações, o FlowCode, uma aplicação de ajuda à conceção de ferramentas de extrusão aplicada no processamento de plásticos ou no processamento de outros tipos de fluidos. Esta aplicação inicial era composta por duas versões, uma executada sequencialmente num processador e outra executada em aceleradores computacionais NVIDIA GPU. Com o aumento da utilização de plataformas heterogéneas, muitas aplicações podem beneficiar do poder computacional destas plataformas. Como isto requer alguma experiência, principalmente para escalonar tarefas/funções e transferir os dados necessários para os aceleradores, várias frameworks foram desenvolvidas para ajudar ao desenvolvimento - sendo StarPU a framework com mais relevância internacional, embora outras estejam a surgir como a framework DICE. Os principais objetivos desta dissertação eram melhorar o FlowCode assim como avaliar a utilização de uma framework para desenvolver uma versão heterogénea eficiente. Apenas a versão CPU foi melhorada, primeiro aplicando técnicas na versão sequencial, e depois procedendo à paralelização usando OpenMP em CPUs multi-core (Intel Xeon 12-core) e aceleradores many-core (Intel Xeon Phi 61-core). Para a versão heterogénea, foi escolhido a framework StarPU depois de se ter feito um estudo das frameworks StarPU e DICE. Os resultados mostram que a versão CPU paralela é mais rápida que a GPU em todos os casos testados. O código GPU está longe de ser eficiente, necessitando diversas melhorias. Portanto, uma comparação entre CPUs, GPUs e Xeon Phi’s não seria justa. A versão Xeon Phi revela-se ser a mais rápida quando não é usada nenhuma framework. Para a versão StarPU, vários escalonadores foram testados para avaliar o mais rápido, levando ao mais eficiente para resolver o nosso problema. Executar o código em dois GPUs é 1.7 vezes mais rápido do que executar para um GPU sem framework em um dos casos testados. Adicionar o CPU aos GPUs do ambiente de teste não melhora o tempo de execução para a maioria dos escalonadores devido à falta de paralelismo disponível. Globalmente, a versão StarPU é a mais rápida seguida das versões Xeon Phi, CPU, e GPU
    corecore