10 research outputs found

    Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

    Full text link
    This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on gpu hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 4096^3 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the fmm-based vortex method achieving 74% parallel efficiency on 4096 processes (one gpu per mpi process, 3 gpus per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of mpi processes (using only cpu cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 seconds for the vortex method and 154 seconds for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex method calculations to date

    Vorticity structure and evolution in a transverse jet with new algorithms for scalable particle simulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2004.Includes bibliographical references (p. 188-200).Transverse jets arise in many applications, including propulsion, effluent dispersion, oil field flows, V/STOL aerodynamics, and drug delivery. Furthermore, they exemplify flows dominated by coherent structures that cascade into smaller scales, a source of many current challenges in fluid dynamics. This study seeks a fundamental, mechanistic understanding of the relationship between the dispersion of jet fluid and the underlying vortical structures of the transverse jet-and of how to develop actuation that optimally manipulates their dynamics to affect mixing. We develop a massively parallel 3-D vortex simulation of a high-momentum transverse jet at large Reynolds number, featuring a discrete filament representation of the vorticity field with local mesh refinement to capture stretching and folding and hair-pin removal to regularize the formation of small scales. A novel formulation of the vorticity flux boundary conditions rigorously accounts for the interaction of channel vorticity with the jet boundary layer. This formulation yields analytical expressions for vortex lines in near field of the jet and suggests effective modes of unsteady actuation at the nozzle. The present computational approach requires hierarchical N-body methods for velocity evaluation at each timestep, as direct summation is prohibitively expensive. We introduce new clustering algorithms for parallel domain decomposition of N-body interactions and demonstrate the optimality of the resulting cluster geometries. We also develop compatible techniques for dynamic load balancing, including adaptive scaling of cluster metrics and adaptive redistribution of their centroids. These tools extend to parallel hierarchical simulation of N-body problems in gravitational astrophysics,(cont.) molecular dynamics, and other fields. Simulations reveal the mechanisms by which vortical structures evolve; previous computational and experimental investigations of these processes have been incomplete at best, limited to low Reynolds numbers, transient early-stage dynamics, or Eulerian diagnostics of essentially Lagrangian phenomena. Transformation of the cylindrical shear layer emanating from the nozzle, initially dominated by azimuthal vorticity, begins with axial elongation of its lee side to form sections of counter-rotating vorticity aligned with the jet trajectory. Periodic rollup of the shear layer accompanies this deformation, creating arcs carrying azimuthal vorticity of alternating signs, curved toward the windward side of the jet. Following the pronounced bending of the trajectory into the crossflow, we observe a catastrophic breakdown of these sparse periodic structures into a dense distribution of smaller scales, with an attendant complexity of tangled vortex filaments. Nonetheless, spatial filtering of this region reveals the persistence of counter-rotating streamwise vorticity. We further characterize the flow by calculating maximum direct Lyapunov exponents of particle trajectories, identifying repelling material surfaces that organize finite-time mixing.by Youssef Mohamed Marzouk.Ph.D

    Irregular Computations in Fortran – Expression and Implementation Strategies

    Get PDF

    Directive-based Approach to Heterogeneous Computing

    Get PDF
    El mundo de la computación de altas prestaciones está sufriendo grandes cambios que incrementan notablemente su complejidad. La incapacidad de los sistemas monoprocesador o incluso multiprocesador de mantener el incremento de la potencia de cómputo para suplir las necesidades de la comunidad científica ha forzado la irrupción de arquitecturas hardware masivamente paralelas y de unidades específicas para realizar operaciones concretas. Un buen ejemplo de este tipo de dispositivos son las GPU (Unidades de procesamiento gráfico). Estos dispositivos, tradicionalmente dedicados a la programación gráfica, se han convertido recientemente en una plataforma ideal para implementar cómputos masivamente paralelos. La combinación de GPUs para realizar tareas intensivas en cómputo con multi-procesadores para llevar tareas menos intensas pero con lógica de control más compleja, se ha convertido en los últimos años en una de las plataformas más comunes para la realización de cálculos científicos a bajo coste, dado que la potencia desplegada en muchos casos puede alcanzar la de clústers de pequeño o mediano tamaño, con un coste inicial y de mantenimiento notablemente inferior. La incorporación de GPUs en clústers ha permitido también aumentar la capacidad de éstos. Sin embargo, la complejidad de la programación de GPUs, y su integración con códigos existentes, dificultan enormemente la introducción de estas tecnologías entre usuarios menos expertos. En esta tésis exploramos la utilización de modelos de programación basados en directivas para este tipo de entornos, multi-core, many-core, GPUs y clústers, donde el usuario medio ve disminuida notablemente su productividad debido a la dificultad de programación en estos entornos. Para explorar la mejor forma de aplicar directivas en estos entornos, hemos desarrollado un conjunto de herramientas software altamente flexibles (un compilador y un runtime), que permiten explorar diversas técnicas con relativamente poco esfuerzo. La irrupción del estándar de programación de directivas de OpenACC nos permitió demostrar la capacidad de estas herramientas, realizando una implementación experimental del estándar (accULL) en muy poco tiempo y con un rendimiento nada desdeñable. Los resultados computacionales aportados nos permiten demostrar: (a) La disminución en el esfuerzo de programación que permiten las aproximaciones basadas en directivas, (b) La capacidad y flexibilidad de las herramientas diseñadas durante esta tésis para explorar estas aproximaciones y finalmente (c) El potencial de desarrollo futuro de accULL como herramienta experimental en OpenACC en base al rendimiento obtenido actualmente frente al rendimiento de otras aproximaciones comerciales

    Validating DOE's Office of Science "capability" computing needs.

    Full text link

    Aeromechanics of Coaxial Rotor Helicopters using the Viscous Vortex Particle Method

    Full text link
    Coaxial rotor helicopters are a candidate for the next generation of rotorcraft due to their ability to achieve high speeds without compromising hover performance. Coaxial rotors are designed to offload the retreating side of the rotor in high speed flight to delay the effects of reverse flow and blade stall which limit the speed of conventional single main rotor helicopters. The proximity of the two rotors induces periodic blade passage effect loads and unsteady rotor wake interactions absent in single rotor configurations. Coaxial rotors employ stiff composite hingeless blades to prevent the possibility of blade strike. At high speeds, the coaxial rotor operates at reduced RPM to avoid the drag penalty on the advancing blade tip. This combination of rotor lift distribution, periodic blade passage effect, unsteady rotor wake interaction, combined with stiff hingeless blades and reduced rotor RPM implies that a coaxial rotor system requires a specialized aeromechanical analysis. The goal of this dissertation is to develop a comprehensive aeromechanical analysis capable of modeling the aeroelasticity of stiff hingeless counter-rotating blades and the complex rotor-wake interactions present in a coaxial rotor system. The rotor wake is modeled with the Viscous Vortex Particle method, a grid free approach for calculating vortex interactions over long distances. The spanwise blade loading in attached flow is obtained from a computational fluid dynamics based rational function approximation unsteady aerodynamic model. The ONERA dynamic stall model is extended to capture three dimensional effects due to flow separation. The combination of the viscous vortex particle method with reduced order models for spanwise loading captures the unsteady coaxial rotor loads with computational efficiency. Trim procedures are developed to determine control inputs for a coaxial rotor to maintain equilibrium in hover and forward flight. In forward flight, two different trim conditions are considered: trim with propulsor off, and trim at level attitude. The two trim conditions have a significant impact on the vibratory hub loads, rotor inflow distribution and the aeroelastic stability. A unique aspect of the coaxial rotor is that its stability in both hover and forward flight are governed by equations with periodic coefficients. Therefore, a periodic aeroelastic stability analysis based on Floquet theory is applied. A new graphical method is developed to identify coupling between the blade modes of the two rotors. The aeromechanical formulation is applied to a rotor resembling the Sikorsky X2TD coaxial helicopter. In hover, the rotor experiences 8/rev blade passage loads due to oscillations in the blade bound circulation induced inflow. Increasing the collective pitch increases the coupling between the flap and lag modes of the blade. The aerodynamic interactions lead to an inter-rotor coupling of the first flap modes. In forward flight, the effects of trim condition, advance ratio, lift offset, and separated wake on the hub loads, inflow distribution and aeroelastic stability are examined. The results indicate that the aeroelastic stability of the lag mode is reduced in forward flight at a level attitude compared to hover. This study provides an improved physical understanding of the aeroelastic interactions in coaxial rotors. The work presented in this dissertation has the potential to facilitate design and development of future high-speed coaxial rotorcraft.PHDAerospace EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163227/1/punsingh_1.pd

    FieldPlacer - A flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures

    Get PDF
    The field of placement methods for components of integrated circuits, especially in the domain of reconfigurable chip architectures, is mainly dominated by a handful of concepts. While some of these are easy to apply but difficult to adapt to new situations, others are more flexible but rather complex to realize. This work presents the FieldPlacer framework, a flexible, fast and unconstrained force-directed placement method for heterogeneous reconfigurable logic architectures, in particular for the ever important heterogeneous FPGAs. In contrast to many other force-directed placers, this approach is called ‘unconstrained’ as it does not require a priori fixed logic elements in order to calculate a force equilibrium as the solution to a system of equations. Instead, it is based on a free spring embedder simulation of a graph representation which includes all logic block types of a design simultaneously. The FieldPlacer framework offers a huge amount of flexibility in applying different distance norms (e. g., the Manhattan distance) for the force-directed layout and aims at creating adapted layouts for various objective functions, e. g., highest performance or improved routability. Depending on the individual situation, a runtime-quality trade-off can be considered to either produce a decent placement in a very short time or to generate an exceptionally good placement, which takes longer. An extensive comparison with the latest simulated annealing placement method from the well-known Versatile Place and Route (VPR) framework shows that the FieldPlacer approach can create placements of comparable quality much faster than VPR or, alternatively, generate better placements in the same time. The flexibility in defining arbitrary objective functions and the intuitive adaptability of the method, which, among others, includes different concepts from the field of graph drawing, should facilitate further developments with this framework, e. g., for new upcoming optimization targets like the energy consumption of an implemented design

    Generalized averaged Gaussian quadrature and applications

    Get PDF
    A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal

    MS FT-2-2 7 Orthogonal polynomials and quadrature: Theory, computation, and applications

    Get PDF
    Quadrature rules find many applications in science and engineering. Their analysis is a classical area of applied mathematics and continues to attract considerable attention. This seminar brings together speakers with expertise in a large variety of quadrature rules. It is the aim of the seminar to provide an overview of recent developments in the analysis of quadrature rules. The computation of error estimates and novel applications also are described
    corecore