32 research outputs found
Temporal Dynamic Quantization for Diffusion Models
The diffusion model has gained popularity in vision applications due to its
remarkable generative performance and versatility. However, high storage and
computation demands, resulting from the model size and iterative generation,
hinder its use on mobile devices. Existing quantization techniques struggle to
maintain performance even in 8-bit precision due to the diffusion model's
unique property of temporal variation in activation. We introduce a novel
quantization method that dynamically adjusts the quantization interval based on
time step information, significantly improving output quality. Unlike
conventional dynamic quantization techniques, our approach has no computational
overhead during inference and is compatible with both post-training
quantization (PTQ) and quantization-aware training (QAT). Our extensive
experiments demonstrate substantial improvements in output quality with the
quantized diffusion model across various datasets
Recommended from our members
Clouds and convective self-aggregation in a multi-model ensemble of radiative-convective equilibrium simulations
The Radiative-Convective Equilibrium Model Intercomparison Project (RCEMIP) is an intercomparison of multiple types of numerical models configured in radiative-convective equilibrium (RCE). RCE is an idealization of the tropical atmosphere that has long been used to study basic questions in climate science. Here, we employ RCE to investigate the role that clouds and convective activity play in determining cloud feedbacks, climate sensitivity, the state of convective aggregation, and the equilibrium climate. RCEMIP is unique amongst intercomparisons in its inclusion of a wide range of model types, including atmospheric general circulation models (GCMs), single column models (SCMs), cloud-resolving models (CRMs), large eddy simulations (LES), and global cloud-resolving models (GCRMs).
The first results are presented from the RCEMIP ensemble of more than 30 models. While there are large differences across the RCEMIP ensemble in the representation of mean profiles of temperature, humidity, and cloudiness, in a majority of models anvil clouds rise, warm, and decrease in area coverage in response to an increase in sea surface temperature (SST). Nearly all models exhibit self-aggregation in large domains and agree that self-aggregation acts to dry and warm the troposphere, reduce high cloudiness, and increase cooling to space. The degree of self-aggregation exhibits no clear tendency with warming. There is a wide range of climate sensitivities, but models with parameterized convection tend to have lower climate sensitivities than models with explicit convection. In models with parameterized convection, aggregated simulations have lower climate sensitivities than un-aggregated simulations
Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array
Resistive Crossbar memory Arrays (RCA) have been gaining interest as a promising platform to implement Convolutional Neural
Networks (CNN). One of the major challenges in RCA-based design is that the number of rows in an RCA is often smaller than
the number of input neurons in a layer. Previous works used highresolution Analog-to-Digital Converters (ADCs) to compute the
partial weighted sum in each array and merged partial sums from
multiple arrays outside the RCAs. However, such approach suffers from significant power consumption due to the need for highresolution ADCs. In this paper, we propose a methodology to more
efficiently construct a large CNN with multiple RCAs. By splitting
the input feature map and retraining the CNN with proper initialization, we demonstrate that any CNN model can be represented
with multiple arrays without using intermediate partial sums. The
experimental results show that the ADC power of the proposed
design is 32x smaller and the total chip power of the proposed
design is 3x smaller than those of the baseline design1
Input-splitting of large neural networks for power-efficient accelerator with resistive crossbar memory array
Resistive Crossbar memory Arrays (RCA) have been gaining interest as a promising platform to implement Convolutional Neural Networks (CNN). One of the major challenges in RCA-based design is that the number of rows in an RCA is often smaller than the number of input neurons in a layer. Previous works used highresolution Analog-to-Digital Converters (ADCs) to compute the partial weighted sum in each array and merged partial sums from multiple arrays outside the RCAs. However, such approach suffers from significant power consumption due to the need for highresolution ADCs. In this paper, we propose a methodology to more efficiently construct a large CNN with multiple RCAs. By splitting the input feature map and retraining the CNN with proper initialization, we demonstrate that any CNN model can be represented with multiple arrays without using intermediate partial sums. The experimental results show that the ADC power of the proposed design is 32x smaller and the total chip power of the proposed design is 3x smaller than those of the baseline design.1
Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network
1
OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator
2