Search CORE

32 research outputs found

Temporal Dynamic Quantization for Diffusion Models

Author: Ahn Daehyun
Kim Hyungjun
Lee Jungwon
Park Eunhyeok
So Junhyuk
Publication venue
Publication date: 04/06/2023
Field of study

The diffusion model has gained popularity in vision applications due to its remarkable generative performance and versatility. However, high storage and computation demands, resulting from the model size and iterative generation, hinder its use on mobile devices. Existing quantization techniques struggle to maintain performance even in 8-bit precision due to the diffusion model's unique property of temporal variation in activation. We introduce a novel quantization method that dynamically adjusts the quantization interval based on time step information, significantly improving output quality. Unlike conventional dynamic quantization techniques, our approach has no computational overhead during inference and is compatible with both post-training quantization (PTQ) and quantization-aware training (QAT). Our extensive experiments demonstrate substantial improvements in output quality with the quantized diffusion model across various datasets

arXiv.org e-Print Archive

Recommended from our members

Clouds and convective self-aggregation in a multi-model ensemble of radiative-convective equilibrium simulations

The Radiative-Convective Equilibrium Model Intercomparison Project (RCEMIP) is an intercomparison of multiple types of numerical models configured in radiative-convective equilibrium (RCE). RCE is an idealization of the tropical atmosphere that has long been used to study basic questions in climate science. Here, we employ RCE to investigate the role that clouds and convective activity play in determining cloud feedbacks, climate sensitivity, the state of convective aggregation, and the equilibrium climate. RCEMIP is unique amongst intercomparisons in its inclusion of a wide range of model types, including atmospheric general circulation models (GCMs), single column models (SCMs), cloud-resolving models (CRMs), large eddy simulations (LES), and global cloud-resolving models (GCRMs). The first results are presented from the RCEMIP ensemble of more than 30 models. While there are large differences across the RCEMIP ensemble in the representation of mean profiles of temperature, humidity, and cloudiness, in a majority of models anvil clouds rise, warm, and decrease in area coverage in response to an increase in sea surface temperature (SST). Nearly all models exhibit self-aggregation in large domains and agree that self-aggregation acts to dry and warm the troposphere, reduce high cloudiness, and increase cooling to space. The degree of self-aggregation exhibits no clear tendency with warming. There is a wide range of climate sensitivities, but models with parameterized convection tend to have lower climate sensitivities than models with explicit convection. In models with parameterized convection, aggregated simulations have lower climate sensitivities than un-aggregated simulations

Central Archive at the University of Reading

Crossref

TU Delft Repository

CWI's Institutional Repository

Directory of Open Access Journals

HAL-INSU

eScholarship - University of California

HAL-IRD

MPG.PuRe

HAL-Polytechnique

HAL-Ecole des Ponts ParisTech

Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array

Author: Ahn Daehyun
KIM Hyungjun
KIM JAE JOON
KIM Yulhwa
Publication venue: ACM/IEEE
Publication date: 23/07/2018
Field of study

Resistive Crossbar memory Arrays (RCA) have been gaining interest as a promising platform to implement Convolutional Neural Networks (CNN). One of the major challenges in RCA-based design is that the number of rows in an RCA is often smaller than the number of input neurons in a layer. Previous works used highresolution Analog-to-Digital Converters (ADCs) to compute the partial weighted sum in each array and merged partial sums from multiple arrays outside the RCAs. However, such approach suffers from significant power consumption due to the need for highresolution ADCs. In this paper, we propose a methodology to more efficiently construct a large CNN with multiple RCAs. By splitting the input feature map and retraining the CNN with proper initialization, we demonstrate that any CNN model can be represented with multiple arrays without using intermediate partial sums. The experimental results show that the ADC power of the proposed design is 32x smaller and the total chip power of the proposed design is 3x smaller than those of the baseline design1

포항공과대학교

Input-splitting of large neural networks for power-efficient accelerator with resistive crossbar memory array

Author: Ahn Daehyun
Kim Hyungjun
Kim Jae-Joon
Kim Yulhwa
Publication venue: Institute of Electrical and Electronics Engineers Inc.
Publication date: 24/07/2018
Field of study

포항공과대학교

Double Viterbi: Weight Encoding for High Compression Ratio and Fast On-Chip Reconstruction for Deep Neural Network

Author: AHN DAEHYUN
KIM JAE JOON
Lee Dong-Kyu
TAESU KIM
Publication venue: International Conference on Learning Representations
Publication date: 08/05/2019
Field of study

포항공과대학교

OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator

Author: AHN DAEHYUN
Choi Jungwook
KIM JAE JOON
YOON HYUNSUNG
박준기
Publication venue: Conference on Machine Learning and Systems (MLSys)
Publication date: 03/03/2020
Field of study

포항공과대학교