12 research outputs found

    Comparing velocity and passive scalar statistics in fluid turbulence at high Schmidt numbers and Reynolds numbers

    Full text link
    Recently, Shete et al. [Phys. Rev. Fluids 7, 024601 (2022)] explored the characteristics of passive scalars in the presence of a uniform mean gradient, mixed by stationary isotropic turbulence. They concluded that at high Reynolds and Schmidt numbers, the presence of both inertial-convective and viscous-convective ranges, renders the statistics of the scalar and velocity fluctuations to behave similarly. However, their data included Schmidt numbers of 0.1, 0.7, 1.0 and 7.0, only the last of which can (at best) be regarded as moderately high. Additionally, they do not consider already available data in the literature at substantially higher Schmidt number of up to 512. By including these data, we demonstrate here that the differences between velocity and scalar statistics show no vanishing trends with increasing Reynolds and Schmidt numbers, and essential differences remain in tact at all Reynolds and Schmidt numbers.Comment: accepted and to be published in Physical Review Fluids as a Commen

    Intermittency of turbulent velocity and scalar fields using 3D local averaging

    Full text link
    An efficient approach for extracting 3D local averages in spherical subdomains is proposed and applied to study the intermittency of small-scale velocity and scalar fields in direct numerical simulations of isotropic turbulence. We focus on the inertial-range scaling exponents of locally averaged energy dissipation rate, enstrophy and scalar dissipation rate corresponding to the mixing of a passive scalar Ξ\theta in the presence of a uniform mean gradient. The Taylor-scale Reynolds number RλR_\lambda goes up to 13001300, and the Schmidt number ScSc up to 512512 (albeit at smaller RλR_\lambda). The intermittency exponent of the energy dissipation rate is Ό≈0.23\mu \approx 0.23, whereas that of enstrophy is slightly larger; trends with RλR_\lambda suggest that this will be the case even at extremely large RλR_\lambda. The intermittency exponent of the scalar dissipation rate is ΌΞ≈0.35\mu_\theta \approx 0.35 for Sc=1Sc=1. These findings are in essential agreement with previously reported results in the literature. We further show that ΌΞ\mu_\theta decreases monotonically with increasing ScSc, either as 1/log⁥Sc1/\log Sc or a weak power law, suggesting that ΌΞ→0\mu_\theta \to 0 as Sc→∞Sc \to \infty, reaffirming recent results on the breakdown of scalar dissipation anomaly in this limit.Comment: 7 pages, 5 figure

    Extreme-scale computing and studies of intermittency, mixing of passive scalars and stratified flows in turbulence

    Get PDF
    Turbulent flows are known for the intermittent occurrence of intense strain rates and local rotation, and for its ability to provide efficient mixing. This thesis focuses on pursuing fundamental advances in physical understanding, using high-resolution Direct Numerical Simulations based on a Fourier pseudo-spectral approach. The computations are very demanding, while ever-larger simulations are required for studies of intermittency, where high Reynolds number and good small-scale resolution are important. A new batched asynchronous algorithm capable of extremely large problem sizes has been developed for dense node heterogeneous architecture machines like Summit. Optimizing data copies between CPU and GPU and communication over the network while overlapping data copies and computations are key to achieving good performance. Processing data residing on the larger CPU memory in batches on the GPU helps avoids limitations on problem size. Favorable performance is obtained up to a world-leading problem size of 18432^3 (over 6 trillion grid points) on 3072 Summit nodes. A more portable implementation using OpenMP is pursued to target 32768^3 problem size on the exascale machine Frontier expected in early 2022. Hero-sized simulations are often relatively short in time, which raises concerns regarding sampling and statistical independence. A Multiple Resolution Independent Simulations approach (MRIS) is developed to address this issue, via multiple short simulation segments evolving from lower-resolution datasets distributed over a longer physical time span. Using this approach, the effects of small-scale intermittency are studied through statistics of local averages of dissipation rate and enstrophy. The dissipation rate is further studied from a multifractal viewpoint. The MRIS approach is also used to study passive scalar intermittency and test for refined similarity hypothesis, through statistics of scalar dissipation rate at high Reynolds number. Lastly, density stratified flows are studied under both stable and unstable stratification, with anisotropy development studied through the Reynolds-stress budget.Ph.D

    An Efficient Particle Tracking Algorithm for Large-Scale Parallel Pseudo-Spectral Simulations of Turbulence

    Get PDF
    Particle tracking in large-scale numerical simulations of turbulent flows presents one of the major bottlenecks in parallel performance and scaling efficiency. Here, we describe a particle tracking algorithm for large-scale parallel pseudo-spectral simulations of turbulence which scales well up to billions of tracer particles on modern high-performance computing architectures. We summarize the standard parallel methods used to solve the fluid equations in our hybrid MPI/OpenMP implementation. As the main focus, we describe the implementation of the particle tracking algorithm and document its computational performance. To address the extensive inter-process communication required by particle tracking, we introduce a task-based approach to overlap point-to-point communications with computations, thereby enabling improved resource utilization. We characterize the computational cost as a function of the number of particles tracked and compare it with the flow field computation, showing that the cost of particle tracking is very small for typical applications

    Universality and Scaling in Compressible Turbulence and Mixing

    Get PDF
    Compressible turbulence and turbulent mixing play a critical role in diverse systems ranging from engineering devices to astrophysics. Examples include high-speed scram jets, hypersonic flows, combustion and star formation. The phenomenon is poorly understood due to complicated interactions between the compressible (dilatational) and vortical (solenoidal) modes in addition to the coupling of the flow field with thermodynamic variables. Attempts to make progress using traditional governing parameters, namely the Taylor Reynolds number, (Rλ) and the turbulent Mach number, (Mt) have been marred with inconsistencies and conflicting results in the literature. Resolving these discrepancies, further our understanding of this phenomena, develop new turbulence models for actual applications and affect flow control in practical situations are the ultimate objectives of this project. For this, we perform direct numerical simulations for a wide range of forcing conditions using state-of-the-art massively parallel codes that we show to be scalable up to 431200 cores at world-record resolutions. The aggregate database comprises an unprecedented wide range of values of the governing parameters. Through a novel asymptotic theoretical approach and systematic data analysis, we identify a new non-dimensional scaling parameter, ÎŽ, the ratio of compressible to vortical strength along with traditional parameters to unravel universal behaviour and scaling laws resolving several major issues currently plaguing the field. This could prove a paradigm shift in how compressible turbulence is studied. We predict the energy distribution across scales of the dilatational part of turbulent kinetic energy by dividing the ÎŽ − Mt plane into different physical regimes. These insights are also applied to passive scalar mixing. Although the large-scale of motion of passive scalars is oblivious to the effects of compressibility, it has a strong effect on the smallest scales. With these insights, we successfully parametrize the mixing efficiency in terms of the governing parameters. Our results have major implications in turbulence modeling paving the road towards more accurate, robust and generic models. In order to generate the current unique database, several computational issues had to be addressed, such as IO at scales, the use of accelerators, and the overhead associated with high levels of parallelism. Thus we also contribute towards extending the capabilities of the grand computational challenge of simulating turbulence at realistic conditions seen in nature and engineering applications

    XSEDE: eXtreme Science and Engineering Discovery Environment Third Quarter 2012 Report

    Get PDF
    The Extreme Science and Engineering Discovery Environment (XSEDE) is the most advanced, powerful, and robust collection of integrated digital resources and services in the world. It is an integrated cyberinfrastructure ecosystem with singular interfaces for allocations, support, and other key services that researchers can use to interactively share computing resources, data, and expertise.This a report of project activities and highlights from the third quarter of 2012.National Science Foundation, OCI-105357

    Visualization challenges in distributed heterogeneous computing environments

    Get PDF
    Large-scale computing environments are important for many aspects of modern life. They drive scientific research in biology and physics, facilitate industrial rapid prototyping, and provide information relevant to everyday life such as weather forecasts. Their computational power grows steadily to provide faster response times and to satisfy the demand for higher complexity in simulation models as well as more details and higher resolutions in visualizations. For some years now, the prevailing trend for these large systems is the utilization of additional processors, like graphics processing units. These heterogeneous systems, that employ more than one kind of processor, are becoming increasingly widespread since they provide many benefits, like higher performance or increased energy efficiency. At the same time, they are more challenging and complex to use because the various processing units differ in their architecture and programming model. This heterogeneity is often addressed by abstraction but existing approaches often entail restrictions or are not universally applicable. As these systems also grow in size and complexity, they become more prone to errors and failures. Therefore, developers and users become more interested in resilience besides traditional aspects, like performance and usability. While fault tolerance is well researched in general, it is mostly dismissed in distributed visualization or not adapted to its special requirements. Finally, analysis and tuning of these systems and their software is required to assess their status and to improve their performance. The available tools and methods to capture and evaluate the necessary information are often isolated from the context or not designed for interactive use cases. These problems are amplified in heterogeneous computing environments, since more data is available and required for the analysis. Additionally, real-time feedback is required in distributed visualization to correlate user interactions to performance characteristics and to decide on the validity and correctness of the data and its visualization. This thesis presents contributions to all of these aspects. Two approaches to abstraction are explored for general purpose computing on graphics processing units and visualization in heterogeneous computing environments. The first approach hides details of different processing units and allows using them in a unified manner. The second approach employs per-pixel linked lists as a generic framework for compositing and simplifying order-independent transparency for distributed visualization. Traditional methods for fault tolerance in high performance computing systems are discussed in the context of distributed visualization. On this basis, strategies for fault-tolerant distributed visualization are derived and organized in a taxonomy. Example implementations of these strategies, their trade-offs, and resulting implications are discussed. For analysis, local graph exploration and tuning of volume visualization are evaluated. Challenges in dense graphs like visual clutter, ambiguity, and inclusion of additional attributes are tackled in node-link diagrams using a lens metaphor as well as supplementary views. An exploratory approach for performance analysis and tuning of parallel volume visualization on a large, high-resolution display is evaluated. This thesis takes a broader look at the issues of distributed visualization on large displays and heterogeneous computing environments for the first time. While the presented approaches all solve individual challenges and are successfully employed in this context, their joint utility form a solid basis for future research in this young field. In its entirety, this thesis presents building blocks for robust distributed visualization on current and future heterogeneous visualization environments.Große Rechenumgebungen sind fĂŒr viele Aspekte des modernen Lebens wichtig. Sie treiben wissenschaftliche Forschung in Biologie und Physik, ermöglichen die rasche Entwicklung von Prototypen in der Industrie und stellen wichtige Informationen fĂŒr das tĂ€gliche Leben, beispielsweise Wettervorhersagen, bereit. Ihre Rechenleistung steigt stetig, um Resultate schneller zu berechnen und dem Wunsch nach komplexeren Simulationsmodellen sowie höheren Auflösungen in der Visualisierung nachzukommen. Seit einigen Jahren ist die Nutzung von zusĂ€tzlichen Prozessoren, z.B. Grafikprozessoren, der vorherrschende Trend fĂŒr diese Systeme. Diese heterogenen Systeme, welche mehr als eine Art von Prozessor verwenden, finden zunehmend mehr Verbreitung, da sie viele VorzĂŒge, wie höhere Leistung oder erhöhte Energieeffizienz, bieten. Gleichzeitig sind diese jedoch aufwendiger und komplexer in der Nutzung, da die verschiedenen Prozessoren sich in Architektur und Programmiermodel unterscheiden. Diese HeterogenitĂ€t wird oft durch Abstraktion angegangen, aber bisherige AnsĂ€tze sind hĂ€ufig nicht universal anwendbar oder bringen EinschrĂ€nkungen mit sich. Diese Systeme werden zusĂ€tzlich anfĂ€lliger fĂŒr Fehler und AusfĂ€lle, da ihre GrĂ¶ĂŸe und KomplexitĂ€t zunimmt. Entwickler sind daher neben traditionellen Aspekten, wie Leistung und Bedienbarkeit, zunehmend an WiderstandfĂ€higkeit gegenĂŒber Fehlern und AusfĂ€llen interessiert. Obwohl Fehlertoleranz im Allgemeinen gut untersucht ist, wird diese in der verteilten Visualisierung oft ignoriert oder nicht auf die speziellen UmstĂ€nde dieses Feldes angepasst. Analyse und Optimierung dieser Systeme und ihrer Software ist notwendig, um deren Zustand einzuschĂ€tzen und ihre Leistung zu verbessern. Die verfĂŒgbaren Werkzeuge und Methoden, um die erforderlichen Informationen zu sammeln und auszuwerten, sind oft vom Kontext entkoppelt oder nicht fĂŒr interaktive Szenarien ausgelegt. Diese Probleme sind in heterogenen Rechenumgebungen verstĂ€rkt, da dort mehr Daten fĂŒr die Analyse verfĂŒgbar und notwendig sind. FĂŒr verteilte Visualisierung ist zusĂ€tzlich RĂŒckmeldung in Echtzeit notwendig, um Interaktionen der Benutzer mit Leistungscharakteristika zu korrelieren und um die GĂŒltigkeit und Korrektheit der Daten und ihrer Visualisierung zu entscheiden. Diese Dissertation prĂ€sentiert BeitrĂ€ge fĂŒr all diese Aspekte. ZunĂ€chst werden zwei AnsĂ€tze zur Abstraktion im Kontext von generischen Berechnungen auf Grafikprozessoren und Visualisierung in heterogenen Umgebungen untersucht. Der erste Ansatz verbirgt Details verschiedener Prozessoren und ermöglicht deren Nutzung ĂŒber einheitliche Schnittstellen. Der zweite Ansatz verwendet pro-Pixel verkettete Listen (per-pixel linked lists) zur Kombination von Pixelfarben und zur Vereinfachung von ordnungsunabhĂ€ngiger Transparenz in verteilter Visualisierung. Übliche Fehlertoleranz-Methoden im Hochleistungsrechnen werden im Kontext der verteilten Visualisierung diskutiert. Auf dieser Grundlage werden Strategien fĂŒr fehlertolerante verteilte Visualisierung abgeleitet und in einer Taxonomie organisiert. Beispielhafte Umsetzungen dieser Strategien, ihre Kompromisse und ZugestĂ€ndnisse, und die daraus resultierenden Implikationen werden diskutiert. Zur Analyse werden lokale Exploration von Graphen und die Optimierung von Volumenvisualisierung untersucht. Herausforderungen in dichten Graphen wie visuelle Überladung, AmbiguitĂ€t und Einbindung zusĂ€tzlicher Attribute werden in Knoten-Kanten Diagrammen mit einer Linsenmetapher sowie ergĂ€nzenden Ansichten der Daten angegangen. Ein explorativer Ansatz zur Leistungsanalyse und Optimierung paralleler Volumenvisualisierung auf einer großen, hochaufgelösten Anzeige wird untersucht. Diese Dissertation betrachtet zum ersten Mal Fragen der verteilten Visualisierung auf großen Anzeigen und heterogenen Rechenumgebungen in einem grĂ¶ĂŸeren Kontext. WĂ€hrend jeder vorgestellte Ansatz individuelle Herausforderungen löst und erfolgreich in diesem Zusammenhang eingesetzt wurde, bilden alle gemeinsam eine solide Basis fĂŒr kĂŒnftige Forschung in diesem jungen Feld. In ihrer Gesamtheit prĂ€sentiert diese Dissertation Bausteine fĂŒr robuste verteilte Visualisierung auf aktuellen und kĂŒnftigen heterogenen Visualisierungsumgebungen

    GPU acceleration of a petascale application for turbulent mixing at high Schmidt number using OpenMP 4.5

    No full text
    This paper reports on the successful implementation of a massively parallel GPU-accelerated algorithm for the direct numerical simulation of turbulent mixing at high Schmidt number. The work stems from a recent development (Comput. Phys. Commun., vol. 219, 2017, 313-328), in which a low-communication algorithm was shown to attain high degrees of scalability on the Cray XE6 architecture when overlapping communication and computation via dedicated communication threads. An even higher level of performance has now been achieved using OpenMP 4.5 on the Cray XK7 architecture, where on each node the 16 integer cores of an AMD Interlagos processor share a single Nvidia K20X GPU accelerator. In the new algorithm, data movements are minimized by performing virtually all of the intensive scalar field computations in the form of combined compact finite difference (CCD) operations on the GPUs. A memory layout in departure from usual practices is found to provide much better performance for a specific kernel required to apply the CCD scheme. Asynchronous execution enabled by adding the OpenMP 4.5 NOWAIT clause to TARGET constructs improves scalability when used to overlap computation on the GPUs with computation and communication on the CPUs. On the 27-petaflops supercomputer Titan at Oak Ridge National Laboratory, USA, a GPU-to-CPU speedup factor of approximately 5 is consistently observed at the largest problem size of 81923 grid points for the scalar field computed with 8192 XK7 nodes

    Towards Asynchronous Simulations of Turbulent Flows: Accuracy, Performance, and Optimization

    Get PDF
    Our understanding of turbulence has heavily relied on high-fidelity Direct Numerical Simulations (DNS) that resolve all dynamically relevant scales. But because of the inherent complexities of turbulent flows, these simulations are computationally very expensive and practically impossible at realistic conditions. Advancements in high performance computing provided much needed boost to the computational resources through increasing levels of parallelism and made DNS realizable, even though only in a limited parameter range. As the number of processing elements (PEs) in parallel machines increases, the penalties incurred in current algorithms due to necessary communications and synchronizations between PEs to update data become significant. These overheads are expected to pose a serious challenge to scalability on the next-generation exascale machines. An effective way to mitigate this bottleneck is through relaxation of strict communication and synchronization constraints and proceed with computations asynchronously i.e. without waiting for updated information from the other PEs. In this work, we investigate the viability of such asynchronous computing using high-order Asynchrony-Tolerant (AT) schemes for accurate and scalable simulations of reacting and non-reacting turbulence at extreme scales. For this, we first assess the important numerical properties of AT schemes, including conservation, stability, and spectral accuracy. Through rigorous mathematical analysis, we expose the breakdown of the standard von Neumann analysis for stability of multi-level schemes, even for widely used synchronous schemes. We overcome these limitations through what we call the generalized von Neumann analysis that is then used to assess stability of the AT schemes. Following which, we propose and implement two computational algorithms to introduce asynchrony in a three-dimensional compressible flow solver. We use these to perform first of a kind asynchronous simulation of compressible turbulence and analyze the effect of asynchrony on important physical characteristics of turbulence. Specifically we show that both large-scale and scale-scale features including highly intermittent instantaneous events, are accurately resolved by these algorithms. We also show excellent strong and weak scaling of asynchronous algorithms up to a processor count of P = 262144 because of significant reduction in communication overheads. As a precursor to the development of asynchronous combustion codes for simulations of more challenging problems with additional physical and numerical complexities, we investigate the effect of asynchrony on several canonical reacting flows. Furthermore, for problems with shocks and discontinuities, such as detonations, we derive and verify AT-WENO (weighted essentially non-oscillatory) schemes. With the ultimate goal to derive new optimal AT schemes we also develop a unified framework for the derivation of finite difference schemes. We show explicit trade-offs between order of accuracy, spectral accuracy and stability under this unifying framework, which can be exploited to devise very accurate numerical schemes for asynchronous computations on extreme scales with minimal overheads
    corecore