284 research outputs found

    MATLAB Implementation of a Multigrid Solver for Diffusion Problems: : Graphics Processing Unit vs. Central Processing Unit

    Get PDF
    Graphics Processing Units are immensely powerful processors and for variety applications they outperform the Central Processing Unit, CPU. The recent generations of GPU’s have a flexible architecture than older generations and programming interface more user friendly, which makes them better suited for general purpose programming. A high end GPU can give a desktop computer the same computational power as a small cluster of CPU’s. Speedup of applications by using the GPU has been shown in a variety of research fields, including medicine, finance and earth science. 3D seismic imaging is extensively used in oil exploration, and imaging complex geological areas is heavy computational task involving terabytes of data. Seismic imaging software that utilizes GPU’s is being developed by companies such as SeismicCity. They found that 20x performance increase can be achieved by utilizing GPU’s in their computer setup. In this thesis it is shown that the GPU architecture is well suited for solving partial differential equations on structured grids. A parallel multigrid method algorithm is implemented using Jacket that can harness the computational power of the GPU. Jacket uses MATLAB syntax, which allow for more rapid development of algorithms. This does, however, come at a price, implementations that are developed in high level languages is not as efficient as implementations developed in low level languages such as C. The ideas used in multigrid have been adapted to solve a broad spectrum of problems that involves structures that do not necessarily resemble any form of physical grid. They can for example be used to solve problems characterized by matrix structures, particle structures and lattice structures. The collection of methods that build on the same ideas as the multigrid method is often called multilevel methods, but there is no official unified term for these methods. The multigrid algorithm implemented in this thesis efficiently solves Poisson problems for homogenous systems in 2 and 3 dimensions. The GPU implementation is 60 to 70 times faster than the equivalent CPU implementation, and can solve systems of size 2573 in less than a second. Poisson solvers can be used to solve a variety of physical problems either as a stand alone solver or as a part of another solver. In this thesis it is shown that it can be used in an application where porous convection is simulated. Porous convection can describe migration of ground water and hydrocarbons in the earth’s crust

    Hybrid Analog-Digital Co-Processing for Scientific Computation

    Get PDF
    In the past 10 years computer architecture research has moved to more heterogeneity and less adherence to conventional abstractions. Scientists and engineers hold an unshakable belief that computing holds keys to unlocking humanity's Grand Challenges. Acting on that belief they have looked deeper into computer architecture to find specialized support for their applications. Likewise, computer architects have looked deeper into circuits and devices in search of untapped performance and efficiency. The lines between computer architecture layers---applications, algorithms, architectures, microarchitectures, circuits and devices---have blurred. Against this backdrop, a menagerie of computer architectures are on the horizon, ones that forgo basic assumptions about computer hardware, and require new thinking of how such hardware supports problems and algorithms. This thesis is about revisiting hybrid analog-digital computing in support of diverse modern workloads. Hybrid computing had extensive applications in early computing history, and has been revisited for small-scale applications in embedded systems. But architectural support for using hybrid computing in modern workloads, at scale and with high accuracy solutions, has been lacking. I demonstrate solving a variety of scientific computing problems, including stochastic ODEs, partial differential equations, linear algebra, and nonlinear systems of equations, as case studies in hybrid computing. I solve these problems on a system of multiple prototype analog accelerator chips built by a team at Columbia University. On that team I made contributions toward programming the chips, building the digital interface, and validating the chips' functionality. The analog accelerator chip is intended for use in conjunction with a conventional digital host computer. The appeal and motivation for using an analog accelerator is efficiency and performance, but it comes with limitations in accuracy and problem sizes that we have to work around. The first problem is how to do problems in this unconventional computation model. Scientific computing phrases problems as differential equations and algebraic equations. Differential equations are a continuous view of the world, while algebraic equations are a discrete one. Prior work in analog computing mostly focused on differential equations; algebraic equations played a minor role in prior work in analog computing. The secret to using the analog accelerator to support modern workloads on conventional computers is that these two viewpoints are interchangeable. The algebraic equations that underlie most workloads can be solved as differential equations, and differential equations are naturally solvable in the analog accelerator chip. A hybrid analog-digital computer architecture can focus on solving linear and nonlinear algebra problems to support many workloads. The second problem is how to get accurate solutions using hybrid analog-digital computing. The reason that the analog computation model gives less accurate solutions is it gives up representing numbers as digital binary numbers, and instead uses the full range of analog voltage and current to represent real numbers. Prior work has established that encoding data in analog signals gives an energy efficiency advantage as long as the analog data precision is limited. While the analog accelerator alone may be useful for energy-constrained applications where inputs and outputs are imprecise, we are more interested in using analog in conjunction with digital for precise solutions. This thesis gives novel insight that the trick to do so is to solve nonlinear problems where low-precision guesses are useful for conventional digital algorithms. The third problem is how to solve large problems using hybrid analog-digital computing. The reason the analog computation model can't handle large problems is it gives up step-by-step discrete-time operation, instead allowing variables to evolve smoothly in continuous time. To make that happen the analog accelerator works by chaining hardware for mathematical operations end-to-end. During computation analog data flows through the hardware with no overheads in control logic and memory accesses. The downside is then the needed hardware size grows alongside problem sizes. While scientific computing researchers have for a long time split large problems into smaller subproblems to fit in digital computer constraints, this thesis is a first attempt to consider these divide-and-conquer algorithms as an essential tool in using the analog model of computation. As we enter the post-Moore’s law era of computing, unconventional architectures will offer specialized models of computation that uniquely support specific problem types. Two prominent examples are deep neural networks and quantum computers. Recent trends in computer science research show these unconventional architectures will soon have broad adoption. In this thesis I show another specialized, unconventional architecture is to use analog accelerators to solve problems in scientific computing. Computer architecture researchers will discover other important models of computation in the future. This thesis is an example of the discovery process, implementation, and evaluation of how an unconventional architecture supports specialized workloads

    Real-time 3D rendering of water using CUDA

    Get PDF
    This thesis addresses the real-time simulation of 3D water, both on the CPU and on the GPU. The stable fluids method is extended to 3D, and implemented both on the CPU and on the GPU. The GPU-based implementation is done using the NVIDIA Compute Unified Device Architecture API (Application Programming Interface), shortly CUDA. The stable fluids method requires the use of an iterative sparse linear system solver. Therefore, three solvers were implemented on both CPU and GPU, namely Jacobi, Gauss-Seidel, and Conjugate Gradient solvers. Rendering of water or its velocities, of the moving obstacles, of the static obstacles, and of the world are done using Vertex Buffer Objects (VBOs). In the CPU-based version standard OpenGL VBOs are used, while on the GPU-based version OpenGL-CUDA interoperability VBOs and standard OpenGL VBOs are used

    High-performance computing and communication models for solving the complex interdisciplinary problems on DPCS

    Get PDF
    The paper presents some advanced high performance (HPC) and parallel computing (PC) methodologies for solving a large space complex problem involving the integrated difference research areas. About eight interdisciplinary problems will be accurately solved on multiple computers communicating over the local area network. The mathematical modeling and a large sparse simulation of the interdisciplinary effort involve the area of science, engineering, biomedical, nanotechnology, software engineering, agriculture, image processing and urban planning. The specific methodologies of PC software under consideration include PVM, MPI, LUNA, MDC, OpenMP, CUDA and LINDA integrated with COMSOL and C++/C. There are different communication models of parallel programming, thus some definitions of parallel processing, distributed processing and memory types are explained for understanding the main contribution of this paper. The matching between the methodology of PC and the large sparse application depends on the domain of solution, the dimension of the targeted area, computational and communication pattern, the architecture of distributed parallel computing systems (DPCS), the structure of computational complexity and communication cost. The originality of this paper lies in obtaining the complex numerical model dealing with a large scale partial differential equation (PDE), discretization of finite difference (FDM) or finite element (FEM) methods, numerical simulation, high-performance simulation and performance measurement. The simulation of PDE will perform by sequential and parallel algorithms to visualize the complex model in high-resolution quality. In the context of a mathematical model, various independent and dependent parameters present the complex and real phenomena of the interdisciplinary application. As a model executes, these parameters can be manipulated and changed. As an impact, some chemical or mechanical properties can be predicted based on the observation of parameter changes. The methodologies of parallel programs build on the client-server model, slave-master model and fragmented model. HPC of the communication model for solving the interdisciplinary problems above will be analyzed using a flow of the algorithm, numerical analysis and the comparison of parallel performance evaluations. In conclusion, the integration of HPC, communication model, PC software, performance and numerical analysis happens to be an important approach to fulfill the matching requirement and optimize the solution of complex interdisciplinary problems

    A GPU-based Laplacian Solver for Magnetostatic Boundary Value Problems

    Get PDF
    Modern graphics processing units (GPUs) have more computing power than CPUs, and thus, GPUs are proposed as more efficient compute units in solving scientific problems with large parallelizable computational loads. In our study, we present a GPU algorithm to solve a magnetostatic boundary value problem, which exhibits parallel properties. In particular, we solve the Laplace equation to find the magnetic scalar potential in the region between two coaxial cylinders. This requires discretizing the problem domain into small cells and finding the solution at each node of the generated mesh. The smaller the cell size is the more accurate the solution will be. More accurate solution leads to a better estimation of the surface current needed to generate a uniform magnetic field inside the inner cylinder, which is the final goal. Although solving a mesh with a large number of smaller cells is computationally intensive, GPU computing provides techniques to accelerate performance. The problem domain is discretized using the finite difference method (FDM) and the linear system of equations obtained from the FDM is solved by the successive over relaxation (SOR) method. The parallel program is implemented using CUDA framework. The performance of the parallel algorithm is optimized using several CUDA optimization strategies and the speedup of the parallel GPU implementation over the sequential CPU implementation is provided.Master of Science in Applied Computer Scienc

    GPU Accelerated Three Dimensional Unstructured Geometric Multi-Grid Solver

    Get PDF
    Consider a set of points P in three dimensional euclidean space. For each point p, the neighbourhood N(p), is defined as the set of points in P, which are voronoi neighbours. Each point in P represents a variable and its value is dependent on the value of its neighbourhood. Its value is given by the sum of the values of points in its neighbourhood scaled by predefined constants. The constants depend on the spacing between the points. The problem is to solve all the variables. Such representations arise naturally in solving flow equations in Computational Fluid Dynamics with domains represented using unstructured meshes. The problem reduces to solve a system of linear equations. In this work geometric multigrid method is implemented for solving the problem faster. Solving this problem on very large input is a time consuming process. The inputs considered here are having size of the order of millions. Graphics Processing Units(GPU) are dedicated parallel processors which serves both as a programmable graphics processor and a scalable parallel computing platform. The parallelization of this problem for GPUs is not straight forward because of the irregularity. The CFD problem used for experiment is the steady and unsteady heat transfer problem in 3D unstructured meshes.The combination of multigrid algorithm and GPU implementation for the steady problem on a 1.6 million mesh gives 1630 times speed up compared to non-multigrid CPU implementation

    Integration of a big data emerging on large sparse simulation and its application on green computing platform

    Get PDF
    The process of analyzing large data and verifying a big data set are a challenge for understanding the fundamental concept behind it. Many big data analysis techniques suffer from the poor scalability, variation inequality, instability, lower convergence, and weak accuracy of the large-scale numerical algorithms. Due to these limitations, a wider opportunity for numerical analysts to develop the efficiency and novel parallel algorithms has emerged. Big data analytics plays an important role in the field of sciences and engineering for extracting patterns, trends, actionable information from large sets of data and improving strategies for making a decision. A large data set consists of a large-scale data collection via sensor network, transformation from signal to digital images, high resolution of a sensing system, industry forecasts, existing customer records to predict trends and prepare for new demand. This paper proposes three types of big data analytics in accordance to the analytics requirement involving a large-scale numerical simulation and mathematical modeling for solving a complex problem. First is a big data analytics for theory and fundamental of nanotechnology numerical simulation. Second, big data analytics for enhancing the digital images in 3D visualization, performance analysis of embedded system based on the large sparse data sets generated by the device. Lastly, extraction of patterns from the electroencephalogram (EEG) data set for detecting the horizontal-vertical eye movements. Thus, the process of examining a big data analytics is to investigate the behavior of hidden patterns, unknown correlations, identify anomalies, and discover structure inside unstructured data and extracting the essence, trend prediction, multi-dimensional visualization and real-time observation using the mathematical model. Parallel algorithms, mesh generation, domain-function decomposition approaches, inter-node communication design, mapping the subdomain, numerical analysis and parallel performance evaluations (PPE) are the processes of the big data analytics implementation. The superior of parallel numerical methods such as AGE, Brian and IADE were proven for solving a large sparse model on green computing by utilizing the obsolete computers, the old generation servers and outdated hardware, a distributed virtual memory and multi-processors. The integration of low-cost communication of message passing software and green computing platform is capable of increasing the PPE up to 60% when compared to the limited memory of a single processor. As a conclusion, large-scale numerical algorithms with great performance in scalability, equality, stability, convergence, and accuracy are important features in analyzing big data simulation

    Integration of a big data emerging on large sparse simulation and its application on green computing platform

    Get PDF
    The process of analyzing large data and verifying a big data set are a challenge for understanding the fundamental concept behind it. Many big data analysis techniques suffer from the poor scalability, variation inequality, instability, lower convergence, and weak accuracy of the large-scale numerical algorithms. Due to these limitations, a wider opportunity for numerical analysts to develop the efficiency and novel parallel algorithms has emerged. Big data analytics plays an important role in the field of sciences and engineering for extracting patterns, trends, actionable information from large sets of data and improving strategies for making a decision. A large data set consists of a large-scale data collection via sensor network, transformation from signal to digital images, high resolution of a sensing system, industry forecasts, existing customer records to predict trends and prepare for new demand. This paper proposes three types of big data analytics in accordance to the analytics requirement involving a large-scale numerical simulation and mathematical modeling for solving a complex problem. First is a big data analytics for theory and fundamental of nanotechnology numerical simulation. Second, big data analytics for enhancing the digital images in 3D visualization, performance analysis of embedded system based on the large sparse data sets generated by the device. Lastly, extraction of patterns from the electroencephalogram (EEG) data set for detecting the horizontal-vertical eye movements. Thus, the process of examining a big data analytics is to investigate the behavior of hidden patterns, unknown correlations, identify anomalies, and discover structure inside unstructured data and extracting the essence, trend prediction, multi-dimensional visualization and real-time observation using the mathematical model. Parallel algorithms, mesh generation, domain-function decomposition approaches, inter-node communication design, mapping the subdomain, numerical analysis and parallel performance evaluations (PPE) are the processes of the big data analytics implementation. The superior of parallel numerical methods such as AGE, Brian and IADE were proven for solving a large sparse model on green computing by utilizing the obsolete computers, the old generation servers and outdated hardware, a distributed virtual memory and multi-processors. The integration of low-cost communication of message passing software and green computing platform is capable of increasing the PPE up to 60% when compared to the limited memory of a single processor. As a conclusion, large-scale numerical algorithms with great performance in scalability, equality, stability, convergence, and accuracy are important features in analyzing big data simulation
    • …
    corecore