5 research outputs found

    Scaling Hierarchical N-body Simulations on GPU Clusters

    Full text link
    Abstract — This paper focuses on the use of GPGPU-based clus-ters for hierarchical N-body simulations. Whereas the behavior of these hierarchical methods has been studied in the past on CPU-based architectures, we investigate key performance issues in the context of clusters of GPUs. These include kernel orga-nization and efficiency, the balance between tree traversal and force computation work, grain size selection through the tuning of offloaded work request sizes, and the reduction of sequential bottlenecks. The effects of various application parameters are studied and experiments done to quantify gains in performance. Our studies are carried out in the context of a production-quality parallel cosmological simulator called ChaNGa. We highlight the re-engineering of the application to make it more suitable for GPU-based environments. Finally, we present performance results from experiments on the NCSA Lincoln GPU cluster, including a note on GPU use in multistepped simulations

    Implementação paralela do algoritmo Barnes-Hut para simulação do problema N-Corpos usando um número arbitrário de GPU’s

    Get PDF
    O problema N-corpos é o problema referente à previsão do comportamento de corpos individuais dentro de um sistema dinâmico onde todos os corpos interagem entre si. Ele aparece em diferentes áreas de estudo, desde a simulação das interações entre corpos de gigantesca escala, como corpos celestes (planetas, estrelas, galáxias), até escalas microscópicas como pequenas partículas. Trata-se de um problema que normalmente requer grande esforço computacional para ser resolvido e, devido a isso, diversos algoritmos foram desenvolvidos ao longo dos anos para reduzir o tempo de processamento necessário. Entre eles está o algoritmo de Barnes-Hut, que realiza aproximações dos corpos que estão sendo calculados de modo a agrupá-los e minimizar os cálculos realizados. Nos últimos anos, com a disponibilização das unidades de processamento gráfico (GPUs) para execução paralela de algoritmos de propósito geral, várias soluções de paralelização do problema de N-Corpos foram propostas para essa nova plataforma. Neste trabalho, estendemos uma implementação paralela do algoritmo de Barnes-Hut para GPU, permitindo o uso de um número arbitrário de GPU’s. Avaliamos o comportamento do programa a cada GPU que adicionamos, testando com até 4 GPU’s simultaneamente. Observamos que os resultados positivos obtidos estão de acordo com nossas expectativas, onde identificamos que a aceleração inicialmente se aproxima do máximo teórico, porém, à medida que se aumenta o número de GPU’s, o custo de gerenciamento cresce, reduzindo o ganho de aceleração

    Scalable Parallel Formulations of the Barnes-Hut Method for n-Body Simulations

    No full text
    In this paper, we present two new parallel formulations of the Barnes-Hut method. These parallel formulations are especially suited for simulations with irregular particle densities. We first present a parallel formulation that uses a static partitioning of the domain and assignment of subdomains to processors. We demonstrate that this scheme delivers acceptable load balance, and coupled with two collective communication operations, it yields good performance. We present a second parallel formulation which combines static decomposition of the domain with an assignment of subdomains to processors based on Morton ordering. This alleviates the load imbalance inherent in the first scheme. The second parallel formulation is inspired by two currently best known parallel algorithms for the Barnes-Hut method. We present an experimental evaluation of these schemes on a 256 processor nCUBE2 parallel computer for an astrophysical simulation

    Programming SMP clusters: node-level object groups and their use in a framework for Nbody applications

    Get PDF
    Ankara : The Department of Computer Engineering and Information Science and the Institute of Engineering and Sciences of Bilkent Univ., 1999.Thesis (Master's) -- Bilkent University, 1999.Includes bibliographical references leaves 64-66Cengiz, İlkerM.S

    Vorticity structure and evolution in a transverse jet with new algorithms for scalable particle simulation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2004.Includes bibliographical references (p. 188-200).Transverse jets arise in many applications, including propulsion, effluent dispersion, oil field flows, V/STOL aerodynamics, and drug delivery. Furthermore, they exemplify flows dominated by coherent structures that cascade into smaller scales, a source of many current challenges in fluid dynamics. This study seeks a fundamental, mechanistic understanding of the relationship between the dispersion of jet fluid and the underlying vortical structures of the transverse jet-and of how to develop actuation that optimally manipulates their dynamics to affect mixing. We develop a massively parallel 3-D vortex simulation of a high-momentum transverse jet at large Reynolds number, featuring a discrete filament representation of the vorticity field with local mesh refinement to capture stretching and folding and hair-pin removal to regularize the formation of small scales. A novel formulation of the vorticity flux boundary conditions rigorously accounts for the interaction of channel vorticity with the jet boundary layer. This formulation yields analytical expressions for vortex lines in near field of the jet and suggests effective modes of unsteady actuation at the nozzle. The present computational approach requires hierarchical N-body methods for velocity evaluation at each timestep, as direct summation is prohibitively expensive. We introduce new clustering algorithms for parallel domain decomposition of N-body interactions and demonstrate the optimality of the resulting cluster geometries. We also develop compatible techniques for dynamic load balancing, including adaptive scaling of cluster metrics and adaptive redistribution of their centroids. These tools extend to parallel hierarchical simulation of N-body problems in gravitational astrophysics,(cont.) molecular dynamics, and other fields. Simulations reveal the mechanisms by which vortical structures evolve; previous computational and experimental investigations of these processes have been incomplete at best, limited to low Reynolds numbers, transient early-stage dynamics, or Eulerian diagnostics of essentially Lagrangian phenomena. Transformation of the cylindrical shear layer emanating from the nozzle, initially dominated by azimuthal vorticity, begins with axial elongation of its lee side to form sections of counter-rotating vorticity aligned with the jet trajectory. Periodic rollup of the shear layer accompanies this deformation, creating arcs carrying azimuthal vorticity of alternating signs, curved toward the windward side of the jet. Following the pronounced bending of the trajectory into the crossflow, we observe a catastrophic breakdown of these sparse periodic structures into a dense distribution of smaller scales, with an attendant complexity of tangled vortex filaments. Nonetheless, spatial filtering of this region reveals the persistence of counter-rotating streamwise vorticity. We further characterize the flow by calculating maximum direct Lyapunov exponents of particle trajectories, identifying repelling material surfaces that organize finite-time mixing.by Youssef Mohamed Marzouk.Ph.D
    corecore