680 research outputs found

    On Moving Least Squares Based Flow Visualization

    Get PDF
    Modern simulation and measurement methods tend to produce meshfree data sets if modeling of processes or objects with free surfaces or boundaries is desired. In Computational Fluid Dynamics (CFD), such data sets are described by particle-based vector fields. This paper presents a summary of a selection of methods for the extraction of geometric features of such point-based vector fields while pointing out its challenges, limitations, and applications

    Parallel computation of 3-D electromagnetic scattering using finite elements

    Full text link
    The finite element method (FEM) with local absorbing boundary conditions has been recently applied to compute electromagnetic scattering from large 3-D geometries. In this paper, we present details pertaining to code implementation and optimization. Various types of sparse matrix storage schemes are discussed and their performance is examined in terms of vectorization and net storage requirements. The system of linear equations is solved using a preconditioned biconjugate gradient (BCG) algorithm and a fairly detailed study of existing point and block preconditioners (diagonal and incomplete LU) is carried out. A modified ILU preconditioning scheme is also introducted which works better than the traditional version for our matrix systems. The parallelization of the iterative sparse solver and the matrix generation/assembly as implemented on the KSR1 multiprocessor is described and the interprocessor communication patterns are analysed in detail. Near-linear speed-up is obtained for both the iterative solver and the matrix generation/assembly phases. Results are presented for a problem having 224,476 unknowns and validated by comparison with measured data.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/50413/1/1660070504_ftp.pd

    Hypercube matrix computation task

    Get PDF
    A major objective of the Hypercube Matrix Computation effort at the Jet Propulsion Laboratory (JPL) is to investigate the applicability of a parallel computing architecture to the solution of large-scale electromagnetic scattering problems. Three scattering analysis codes are being implemented and assessed on a JPL/California Institute of Technology (Caltech) Mark 3 Hypercube. The codes, which utilize different underlying algorithms, give a means of evaluating the general applicability of this parallel architecture. The three analysis codes being implemented are a frequency domain method of moments code, a time domain finite difference code, and a frequency domain finite elements code. These analysis capabilities are being integrated into an electromagnetics interactive analysis workstation which can serve as a design tool for the construction of antennas and other radiating or scattering structures. The first two years of work on the Hypercube Matrix Computation effort is summarized. It includes both new developments and results as well as work previously reported in the Hypercube Matrix Computation Task: Final Report for 1986 to 1987 (JPL Publication 87-18)

    Nodal Discontinuous Galerkin Methods on Graphics Processors

    Full text link
    Discontinuous Galerkin (DG) methods for the numerical solution of partial differential equations have enjoyed considerable success because they are both flexible and robust: They allow arbitrary unstructured geometries and easy control of accuracy without compromising simulation stability. Lately, another property of DG has been growing in importance: The majority of a DG operator is applied in an element-local way, with weak penalty-based element-to-element coupling. The resulting locality in memory access is one of the factors that enables DG to run on off-the-shelf, massively parallel graphics processors (GPUs). In addition, DG's high-order nature lets it require fewer data points per represented wavelength and hence fewer memory accesses, in exchange for higher arithmetic intensity. Both of these factors work significantly in favor of a GPU implementation of DG. Using a single US$400 Nvidia GTX 280 GPU, we accelerate a solver for Maxwell's equations on a general 3D unstructured grid by a factor of 40 to 60 relative to a serial computation on a current-generation CPU. In many cases, our algorithms exhibit full use of the device's available memory bandwidth. Example computations achieve and surpass 200 gigaflops/s of net application-level floating point work. In this article, we describe and derive the techniques used to reach this level of performance. In addition, we present comprehensive data on the accuracy and runtime behavior of the method.Comment: 33 pages, 12 figures, 4 table

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Graph Contraction for Mapping Data on Parallel Computers: A Quality–Cost Tradeoff

    Get PDF

    A Parallel Monte Carlo Code for Simulating Collisional N-body Systems

    Full text link
    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N~10^7 particles. Our code is based on the the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures, and the introduction of a parallel random number generation scheme, as well as a parallel sorting algorithm, required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. The implementation uses the Message Passing Interface (MPI) library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude, from 10^5 to 10^7. We find that our results are in good agreement with self-similar core-collapse solutions, and the core collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within less than 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N=10^5, 128 for N=10^6 and 256 for N=10^7. The runtime reaches a saturation with the addition of more processors beyond these limits which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60x, 100x, and 220x, respectively.Comment: 53 pages, 13 figures, accepted for publication in ApJ Supplement

    A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters

    Get PDF
    In this work, we consider the solution of boundary integral equations by means of a scalable hierarchical matrix approach on clusters equipped with graphics hardware, i.e. graphics processing units (GPUs). To this end, we extend our existing single-GPU hierarchical matrix library hmglib such that it is able to scale on many GPUs and such that it can be coupled to arbitrary application codes. Using a model GPU implementation of a boundary element method (BEM) solver, we are able to achieve more than 67 percent relative parallel speed-up going from 128 to 1024 GPUs for a model geometry test case with 1.5 million unknowns and a real-world geometry test case with almost 1.2 million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6 minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the setup phase and 20 seconds for the iterative solver. To the best of the authors' knowledge, we here discuss the first fully GPU-based distributed-memory parallel hierarchical matrix Open Source library using the traditional H-matrix format and adaptive cross approximation with an application to BEM problems

    Computation of 3D Frequency-Domain Waveform Kernals for c(x,y,z) Media

    Get PDF
    Seismic tomography, as typically practiced on both the exploration, crustal, and global scales, considers only the arrival times of selected sets of phases and relies primarily on WKBJ theory during inversion. Since the mid 1980’s, researchers have explored, largely on a theoretical level, the possibility of inverting the entire seismic record. Due to the ongoing advances in CPU performance, full waveform inversion is finally becoming feasible on select problems with promising results emerging from frequency-domain methods. However, frequency-domain techniques using sparse direct solvers are currently constrained by memory limitations in 3D where they exhibit a O(n4) worst-case bound on memory usage. We sidestep this limitation by using a hybrid approach, calculating frequency domain Green’s functions for the scalar wave equation by driving a high-order, time-domain, finite-difference (FDTD) code to steady state using a periodic source. The frequency-domain response is extracted using the phase sensitive detection (PSD) method recently developed by Nihei and Li (2006). The resulting algorithm has an O(n3) memory footprint and is amenable to parallelization in the space, shot, or frequency domains. We demonstrate this approach by generating waveform inversion kernels for fully c(x,y,z) models. Our test examples include a realistic VSP experiment using the geometry and velocity models obtained from a site in Western Wyoming, and a deep crustal reflection/refraction profile based on the LARSE II geometry and the SCEC community velocity model. We believe that our 3D solutions to the scalar Helmholtz equation, for models with upwards of 100 million degrees of freedom, are the largest examples documented in the open geophysical literature. Such results suggest that iterative 3D waveform inversion is an achievable goal in the near future.Shell GameChangerMassachusetts Institute of Technology. Earth Resources Laborator

    Computer algebra and transputers applied to the finite element method

    Get PDF
    Recent developments in computing technology have opened new prospects for computationally intensive numerical methods such as the finite element method. More complex and refined problems can be solved, for example increased number and order of the elements improving accuracy. The power of Computer Algebra systems and parallel processing techniques is expected to bring significant improvement in such methods. The main objective of this work has been to assess the use of these techniques in the finite element method. The generation of interpolation functions and element matrices has been investigated using Computer Algebra. Symbolic expressions were obtained automatically and efficiently converted into FORTRAN routines. Shape functions based on Lagrange polynomials and mapping functions for infinite elements were considered. One and two dimensional element matrices for bending problems based on Hermite polynomials were also derived. Parallel solvers for systems of linear equations have been developed since such systems often arise in numerical methods. Both symmetric and asymmetric solvers have been considered. The implementation was on Transputer-based machines. The speed-ups obtained are good. An analysis by finite element method of a free surface flow over a spillway has been carried out. Computer Algebra was used to derive the integrand of the element matrices and their numerical evaluation was done in parallel on a Transputer-based machine. A graphical interface was developed to enable the visualisation of the free surface and the influence of the parameters. The speed- ups obtained were good. Convergence of the iterative solution method used was good for gated spillways. Some problems experienced with the non-gated spillways have lead to a discussion and tests of the potential factors of instability
    corecore