29 research outputs found

    A fast algorithm for solving diagonally dominant symmetric quasi-pentadiagonal Toeplitz linear systems

    Get PDF
    In this paper, we develop a new algorithm for solving diagonally dominant symmetric quasi-pentadiagonal Toeplitz linear systems. Numerical experiments are given in order to illustrate the validity and efficiency of our algorithm.The authors would like to thank the supports of the Portuguese Funds through FCT–Fundação para a Ciência e a Tecnologia, within the Project UID/MAT/00013/2013

    A low-cost parallel implementation of direct numerical simulation of wall turbulence

    Full text link
    A numerical method for the direct numerical simulation of incompressible wall turbulence in rectangular and cylindrical geometries is presented. The distinctive feature resides in its design being targeted towards an efficient distributed-memory parallel computing on commodity hardware. The adopted discretization is spectral in the two homogeneous directions; fourth-order accurate, compact finite-difference schemes over a variable-spacing mesh in the wall-normal direction are key to our parallel implementation. The parallel algorithm is designed in such a way as to minimize data exchange among the computing machines, and in particular to avoid taking a global transpose of the data during the pseudo-spectral evaluation of the non-linear terms. The computing machines can then be connected to each other through low-cost network devices. The code is optimized for memory requirements, which can moreover be subdivided among the computing nodes. The layout of a simple, dedicated and optimized computing system based on commodity hardware is described. The performance of the numerical method on this computing system is evaluated and compared with that of other codes described in the literature, as well as with that of the same code implementing a commonly employed strategy for the pseudo-spectral calculation.Comment: To be published in J. Comp. Physic

    High-order implicit residual smoothing time scheme for direct and large eddy simulations of compressible flows

    Get PDF
    Restrictions on the maximum allowable time step of explicit time integration methods for direct and large eddy simulations of compressible turbulent flows at high Reynolds numbers can be very severe, because of the extremely small space steps used close to solid walls to capture tiny and elongated boundary layer structures. A way of increasing stability limits is to use implicit time integration schemes. However, the price to pay is a higher computational cost per time step, higher discretization errors and lower parallel scalability. In quest for an implicit time scheme for scale-resolving simulations providing the best possible compromise between these opposite requirements, we develop a Runge–Kutta implicit residual smoothing (IRS) scheme of fourth-order accuracy, based on a bilaplacian operator. The implicit operator involves the inversion of scalar pentadiagonal systems, for which efficient parallel algorithms are available. The proposed method is assessed against two explicit and two implicit time integration techniques in terms of computational cost required to achieve a threshold level of accuracy. Precisely, the proposed time scheme is compared to four-stages and six-stages low-storage Runge–Kutta method, to the second-order IRS and to a second-order backward scheme solved by means of matrix-free quasi-exact Newton subiterations. Numerical results show that the proposed IRS scheme leads to reductions in computational time by a factor 3 to 5 for an accuracy comparable to that of the corresponding explicit Runge–Kutta scheme

    Practical solutions for seismic free-surface and internal multiple attenuation based on inversion

    No full text
    Multiple prediction through inversion (MPI) is an effective method for seismic multiple attenuation. The research in this thesis aims to make the MPI method more practical for both free-surface and internal multiple attenuation. For free-surface multiple attenuation, the MPI scheme requires the input data to be dense and regularly sampled, and with one shot at each receiver position. In order to meet these requirements, I use a multilevel B-spline method for seismic data reconstruction. This method can perform regularisation and interpolation on seismic data without any prior-knowledge of models. For free-surface multiple attenuation on marine data, MPI can generate superior results compared to SRME (surface-related multiple attenuation). However, MPI is more computationally expensive due to the large amount of matrix operations involved. The conventional implementation addresses this by approximating the multiple model prediction operator as a pentadiagonal or a tridiagonal matrix. Tackle this problem by solving the full prediction operator using a Graphic Processing Unit (GPU), this accelerates the processing and improve the multiple attenuation results, especially for far-offset traces. As extensions of SRME for internal multiple attenuation, both the CFP (common-focus-point) technique and correlation method have problems. The results can be improved using the MPI method with GPU acceleration. The correlation method is preferred as the initial step for MPI because it can be implemented as a fully data-driven pre-stack domain approach in either forward data space or inverse data space. In all cases, the MPI scheme generates internal multiple models with improved kinematic and dynamic accuracy

    Development of a high-order parallel solver for direct and large eddy simulations of turbulent flows

    Get PDF
    Turbulence is inherent in fluid dynamics, in that laminar flows are rather the exception than the rule, hence the longstanding interest in the subject, both within the academic community and the industrial R&D laboratories. Since 1883, much progress has been made, and statistics applied to turbulence have provided understanding of the scaling laws which are peculiar to several model flows, whereas experiments have given insight on the structure of real-world flows, but, soon enough, numerical approaches to the matter have become the most promising ones, since they lay the ground for the solution of high Reynolds number unsteady Navier-Stokes equations by means of computer systems. Nevertheless, despite the exponential rise in computational capability over the last few decades, the more computer technology advances, the higher the Reynolds number sought for test-cases of industrial interest: there is a natural tendency to perform simulations as large as possible, a habit that leaves no room for wasting resources. Indeed, as the scale separation grows with Re, the reduction of wall clock times for a high-fidelity solution of desired accuracy becomes increasingly important. To achieve this task, a CFD solver should rely on the use of appropriate physical models, consistent numerical methods to discretize the equations, accurate non-dissipative numerical schemes, efficient algorithms to solve the numerics, and fast routines implementing those algorithms. Two archetypal approaches to CFD are direct and large-eddy simulation (DNS and LES respectively), which profoundly differ in several aspects but are both “eddy-resolving” methods, meant to resolve the structures of the flow-field with the highest possible accuracy and putting in as little spurious dissipation as possible. These two requirements of accurate resolution of scales, and energy conservation, should be addressed by any numerical method, since they are essential to many real-world fluid flows of industrial interest. As a consequence, high order numerical schemes, and compact schemes among them, have received much consideration, since they address both goals, at the cost of a lower ease of application of the boundary condition, and a higher computational cost. The latter problem is tackled with parallel computing, which also allows to take advantage of the currently available computer power at the best possible extent. The research activity conducted by the present author has concerned the development, from scratch, of a three-dimensional, unsteady, incompressible Navier-Stokes parallel solver, which uses an advanced algorithm for the process-wise solution of the linear systems arising from the application of high order compact finite difference schemes, and hinges upon a three-dimensional decomposition of the cartesian computational space. The code is written in modern Fortran 2003 — plus a few features which are unique to the 2008 standard — and is parallelized through the use of MPI 3.1 standard’s advanced routines, as implemented by the OpenMPI library project. The coding was carried out with the objective of creating an original CFD high-order parallel solver which is maintainable and extendable, of course within a well-defined range of possibilities. With this main priority being outlined, particular attention was paid to several key concepts: modularity and readability of the source code and, in turn, its reusability; ease of implementation of virtually any new explicit or implicit finite difference scheme; modern programming style and avoidance of deprecated old legacy Fortran constructs and features, so that the world wide web is a reliable and active means to the quick solution of coding problems arising from the implementation of new modules in the code; last but not least, thorough comments, especially in critical sections of the code, explaining motives and possible expected weak links. Design, production, and documentation of a program from scratch is almost never complete. This is certainly true for the present effort. The method and the code are verified against the full three-dimensional Lid-Driven Cavity and Taylor-Green Vortex flows. The latter test is used also for the assessment of scalability and parallel efficiency

    Poloidal-toroidal decomposition in a finite cylinder. II. Discretization, regularization and validation

    Full text link
    The Navier-Stokes equations in a finite cylinder are written in terms of poloidal and toroidal potentials in order to impose incompressibility. Regularity of the solutions is ensured in several ways: First, the potentials are represented using a spectral basis which is analytic at the cylindrical axis. Second, the non-physical discontinuous boundary conditions at the cylindrical corners are smoothed using a polynomial approximation to a steep exponential profile. Third, the nonlinear term is evaluated in such a way as to eliminate singularities. The resulting pseudo-spectral code is tested using exact polynomial solutions and the spectral convergence of the coefficients is demonstrated. Our solutions are shown to agree with exact polynomial solutions and with previous axisymmetric calculations of vortex breakdown and of nonaxisymmetric calculations of onset of helical spirals. Parallelization by azimuthal wavenumber is shown to be highly effective

    Efficient Parallel Resolution of The Simplified Transport Equations in Mixed-Dual Formulation

    Get PDF
    International audienceA reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization

    Exploiting Locality and Parallelism with Hierarchically Tiled Arrays

    Get PDF
    The importance of tiles or blocks in mathematics and thus computer science cannot be overstated. From a high level point of view, they are the natural way to express many algorithms, both in iterative and recursive forms. Tiles or sub-tiles are used as basic units in the algorithm description. From a low level point of view, tiling, either as the unit maintained by the algorithm, or as a class of data layouts, is one of the most effective ways to exploit locality, which is a must to achieve good performance in current computers given the growing gap between memory and processor speed. Finally, tiles and operations on them are also basic to express data distribution and parallelism. Despite the importance of this concept, which makes inevitable its widespread usage, most languages do not support it directly. Programmers have to understand and manage the low-level details along with the introduction of tiling. This gives place to bloated potentially error-prone programs in which opportunities for performance are lost. On the other hand, the disparity between the algorithm and the actual implementation enlarges. This thesis illustrates the power of Hierarchically Tiled Arrays (HTAs), a data type which enables the easy manipulation of tiles in object-oriented languages. The objective is to evolve this data type in order to make the representation of all classes for algorithms with a high degree of parallelism and/or locality as natural as possible. We show in the thesis a set of tile operations which leads to a natural and easy implementation of different algorithms in parallel and in sequential with higher clarity and smaller size. In particular, two new language constructs dynamic partitioning and overlapped tiling are discussed in detail. They are extensions of the HTA data type to improve its capabilities to express algorithms with a high abstraction and free programmers from programming tedious low-level tasks. To prove the claims, two popular languages, C++ and MATLAB are extended with our HTA data type. In addition, several important dense linear algebra kernels, stencil computation kernels, as well as some benchmarks in NAS benchmark suite were implemented. We show that the HTA codes needs less programming effort with a negligible effect on performance
    corecore