6 research outputs found
Mach number and wall thermal boundary condition effects on near-wall compressible turbulence
We investigate the effects of thermal boundary conditions and Mach number on
turbulence close to walls. In particular, we study the near-wall asymptotic
behavior for adiabatic and pseudo-adiabatic walls, and compare to the
asymptotic behavior recently found near isothermal cold walls (Baranwal et al.
(2022)). This is done by analyzing a new large database of highly-resolved
direct numerical simulations of turbulent channels with different wall thermal
conditions and centerline Mach numbers. We observe that the asymptotic
power-law behavior of Reynolds stresses as well as heat fluxes does change with
both centerline Mach number and thermal-condition at the wall. Power-law
exponents transition from their analytical expansion for solenoidal fields to
those for non-solenoidal field as the Mach number is increased, though this
transition is found to be dependent on the thermal boundary conditions. The
correlation coefficients between velocity and temperature are also found to be
affected by these factors. Consistent with recent proposals on universal
behavior of compressible turbulence, we find that dilatation at the wall is the
key scaling parameter for this power-law exponents providing a universal
functional law which can provide a basis for general models of near-wall
behavior.Comment: 24 pages, 15 figures, Under consideration for publication in Journal
of Fluid Mechanic
GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping
Nanopore sequencing is a widely-used high-throughput genome sequencing
technology that can sequence long fragments of a genome into raw electrical
signals at low cost. Nanopore sequencing requires two computationally-costly
processing steps for accurate downstream genome analysis. The first step,
basecalling, translates the raw electrical signals into nucleotide bases (i.e.,
A, C, G, T). The second step, read mapping, finds the correct location of a
read in a reference genome. In existing genome analysis pipelines, basecalling
and read mapping are executed separately. We observe in this work that such
separate execution of the two most time-consuming steps inherently leads to (1)
significant data movement and (2) redundant computations on the data, slowing
down the genome analysis pipeline. This paper proposes GenPIP, an in-memory
genome analysis accelerator that tightly integrates basecalling and read
mapping. GenPIP improves the performance of the genome analysis pipeline with
two key mechanisms: (1) in-memory fine-grained collaborative execution of the
major genome analysis steps in parallel; (2) a new technique for
early-rejection of low-quality and unmapped reads to timely stop the execution
of genome analysis for such reads, reducing inefficient computation. Our
experiments show that, for the execution of the genome analysis pipeline,
GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with
negligible accuracy loss compared to the state-of-the-art software genome
analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design
that combines state-of-the-art in-memory basecalling and read mapping
accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure
Optimizing GPU Convnets
Convolution layers are useful for improving the accuracy of neural networks. In the case of networks like CosmoFlow with multiple consecutive convolution layers, the runtime for convolution layers dominates the end-to-end runtime. Several convolution algorithms, such as implicit GEMM, Fast Fourier transform, and Winograd have been optimized for different platforms. To achieve performance close to theoretical bounds, oftentimes manual fine-tuning is required which is specific to the target architecture. We use the DaCe framework to develop portable optimizations for 3D convolutions for implicit GEMM and direct convolution algorithms for the GPUs. We benchmark the optimized code against the available manually tuned library implementations