2,463 research outputs found
NVIDIA Tensor Core Programmability, Performance & Precision
The NVIDIA Volta GPU microarchitecture introduces a specialized unit, called
"Tensor Core" that performs one matrix-multiply-and-accumulate on 4x4 matrices
per clock cycle. The NVIDIA Tesla V100 accelerator, featuring the Volta
microarchitecture, provides 640 Tensor Cores with a theoretical peak
performance of 125 Tflops/s in mixed precision. In this paper, we investigate
current approaches to program NVIDIA Tensor Cores, their performances and the
precision loss due to computation in mixed precision.
Currently, NVIDIA provides three different ways of programming
matrix-multiply-and-accumulate on Tensor Cores: the CUDA Warp Matrix Multiply
Accumulate (WMMA) API, CUTLASS, a templated library based on WMMA, and cuBLAS
GEMM. After experimenting with different approaches, we found that NVIDIA
Tensor Cores can deliver up to 83 Tflops/s in mixed precision on a Tesla V100
GPU, seven and three times the performance in single and half precision
respectively. A WMMA implementation of batched GEMM reaches a performance of 4
Tflops/s. While precision loss due to matrix multiplication with half precision
input might be critical in many HPC applications, it can be considerably
reduced at the cost of increased computation. Our results indicate that HPC
applications using matrix multiplications can strongly benefit from using of
NVIDIA Tensor Cores.Comment: This paper has been accepted by the Eighth International Workshop on
Accelerators and Hybrid Exascale Systems (AsHES) 201
Soil Moisture Active/Passive (SMAP) Forward Brightness Temperature Simulator
The SMAP is one of four first-tier missions recommended by the US National Research Council's Committee on Earth Science and Applications from Space (Earth Science and Applications from Space: National Imperatives for the Next Decade and Beyond, Space Studies Board, National Academies Press, 2007) [1]. It is to measure the global soil moisture and freeze/thaw from space. One of the spaceborne instruments is an L-band radiometer with a shared single feedhorn and parabolic mesh reflector. While the radiometer measures the emission over a footprint of interest, unwanted emissions are also received by the antenna through the antenna sidelobes from the cosmic background and other error sources such as the Sun, the Moon and the galaxy. Their effects need to be considered accurately, and the analysis of the overall performance of the radiometer requires end-to-end performance simulation from Earth emission to antenna brightness temperature, such as the global simulation of L-band brightness temperature simulation over land and sea [2]. To assist with the SMAP radiometer level 1B algorithm development, the SMAP forward brightness temperature simulator is developed by adapting the Aquarius simulator [2] with necessary modifications. This poster presents the current status of the SMAP forward brightness simulator s development including incorporating the land microwave emission model and its input datasets, and a simplified atmospheric radiative transfer model. The latest simulation results are also presented to demonstrate the ability of supporting the SMAP L1B algorithm development
Elemental tellurium as a chiral p-type thermoelectric material
The thermoelectric transport properties of elemental tellurium are investigated by density functional theory combined with the Boltzmann transport equation in the rigid band approximation. We find that the thermoelectric transport properties parallel and perpendicular to the helical chains are highly asymmetric (almost symmetric) for p- (n-) type doped tellurium due to the anisotropic (isotropic) hole (electron) pockets of the Fermi surface. The electronic band structure shows that the lone-pair derived uppermost heavy-hole and extremely light-hole lower valence bands offer the opportunity to obtain both a high Seebeck coefficient and electrical conductivity along the chains through Sb or Bi doping. Furthermore, the stairlike density of states yields a large asymmetry for the transport distribution function relative to the Fermi energy which leads to large thermopower. The calculations reveal that tellurium has the potential to be a good p-type thermoelectric material with an optimum figure of merit zT of 0.31 (0.56) at room temperature (500 K) at a hole concentration around 1×10^19 cm^−3. Exploiting the rich chemistry of lone pairs in chiral solids may have important implications for the discovery of high-zT polychalcogenide-based thermoelectric materials
Recommended from our members
Dynamic Patterns of Transcript Abundance of Transposable Element Families in Maize.
Transposable Elements (TEs) are mobile elements that contribute the majority of DNA sequences in the maize genome. Due to their repetitive nature, genomic studies of TEs are complicated by the difficulty of properly attributing multi-mapped short reads to specific genomic loci. Here, we utilize a method to attribute RNA-seq reads to TE families rather than particular loci in order to characterize transcript abundance for TE families in the maize genome. We applied this method to assess per-family expression of transposable elements in >800 published RNA-seq libraries representing a range of maize development, genotypes, and hybrids. While a relatively small proportion of TE families are transcribed, expression is highly dynamic with most families exhibiting tissue-specific expression. A large number of TE families were specifically detected in pollen and endosperm, consistent with reproductive dynamics that maintain silencing of TEs in the germ line. We find that B73 transcript abundance is a poor predictor of TE expression in other genotypes and that transcript levels can differ even for shared TEs. Finally, by assessing recombinant inbred line and hybrid transcriptomes, complex patterns of TE transcript abundance across genotypes emerged. Taken together, this study reveals a dynamic contribution of TEs to maize transcriptomes
Freeze-drying Silica Based Aerogels Using Cryoprotectants and Eutectic Solvent Mixtures
Silica based aerogels have unique properties, including good thermal insulation and convective inhibition. A sol-gel process can be used to produce semi-opaque, monolithic gels, which can then be dried to produce aerogels. Multiple drying methods are available industrially, however, these methods require high temperatures and pressures, specialized equipment, and are time consuming. This project aims to experimentally study the possibility of a new method for drying wet gels through a freeze-drying process, with the use of cryoprotectants, eutectics, and polymers to inhibit and control ice formation and growth during drying. Silica wet gels were produced using tetraethylorthosilicate (TEOS), ethanol, water, and hydrochloric acid/ammonia hydroxide. After gelation the gels were subjected to solvent exchanges with varying concentrations of cryoprotectants, eutectics, polymers and combinations of the three. A customized freeze-dryer was used to obtain silica aerogels from wet gels, with monolithicity and porosity of the resulting aerogel measured by SEM and BET. The results indicated that the addition of cryoprotectants, eutectics, and polymers yielded monolithic foams which were structurally stable and had measurable porosity and surface area. Using the processes developed in this work would allow for simpler, more cost effective methods for drying wet gels to be developed; these methods could be used to produce freeze-dried aerogels with better properties and have potential for industrial implementation
Optimization principles and the figure of merit for triboelectric generators
Energy harvesting with triboelectric nanogenerators is a burgeoning field, with a growing portfolio of creative application schemes attracting much interest. Although power generation capabilities and its optimization are one of the most important subjects, a satisfactory elemental model that illustrates the basic principles and sets the optimization guideline remains elusive. We use a simple model to clarify how the energy generation mechanism is electrostatic induction but with a time-varying character that makes the optimal matching for power generation more restrictive. By combining multiple parameters into dimensionless variables, we pinpoint the optimum condition with only two independent parameters, leading to predictions of the maximum limit of power density, which allows us to derive the triboelectric material and device figure of merit. We reveal the importance of optimizing device capacitance, not only load resistance, and minimizing the impact of parasitic capacitance. Optimized capacitances can lead to an overall increase in power density of more than 10 times
- …