Search CORE

31 research outputs found

Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

Author: Chang Du-Seong
Choi Jungwook
Hong Sukjin
Kim Minsoo
Lee Sihwa
Publication venue
Publication date: 20/11/2022
Field of study

Knowledge distillation (KD) has been a ubiquitous method for model compression to strengthen the capability of a lightweight model with the transferred knowledge from the teacher. In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters. However, little is understood about which of the various KD approaches best fits the QAT of Transformers. In this work, we provide an in-depth analysis of the mechanism of KD on attention recovery of quantized large Transformers. In particular, we reveal that the previously adopted MSE loss on the attention score is insufficient for recovering the self-attention information. Therefore, we propose two KD methods; attention-map and attention-output losses. Furthermore, we explore the unification of both losses to address task-dependent preference between attention-map and output losses. The experimental results on various Transformer encoder models demonstrate that the proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit weight quantization.Comment: EMNLP 2022 Main Track Long Pape

arXiv.org e-Print Archive

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Author: Chang Du-Seong
Choi Jungwook
Hong Sukjin
Kim Minsoo
Lee Janghwan
Lee Sihwa
Sung Wonyong
Publication venue
Publication date: 13/08/2023
Field of study

Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task

arXiv.org e-Print Archive

Computer Organization

Author: Angelopoulou Anastasia
Ge Linqiang
Hodhod Rania
Koech Japheth
Lee Sukjin
Publication venue: GALILEO Open Learning Materials
Publication date: 01/04/2022
Field of study

This open textbook for Computer Organization was developed as a result of a Round 18 Transformation Grant at Columbus State University.https://oer.galileo.usg.edu/compsci-textbooks/1012/thumbnail.jp

GALILEO, University System of Georgia

Computer Science Resources and Student Projects at Columbus State University

Author: Angelopoulou Anastasia
Ge Linqiang
Hodhod Rania
Lee Sukjin
Zhou Yi
Publication venue: GALILEO Open Learning Materials
Publication date: 01/04/2023
Field of study

This set of student-created and curated educational resources for Wireless, IoT and Mobile Security, Theory of Computation, Software Engineering, Computer Architecture, and Digital Media was created under an Affordable Materials Grant

GALILEO, University System of Georgia

Toughness behavior and deformation mechanisms in FCC-based Fe45Co30Cr10V10Ni5-xMnx high-entropy alloys: Insights from instrumented Charpy impact tests

Author: Dong Geun Kim
Jeongho Han
Sukjin Lee
Yeon Taek Choi
Yong Hee Jo
Publication venue: Elsevier
Publication date: 01/05/2024
Field of study

This study investigated toughness properties of FCC-based Fe45Co30Cr10V10Ni5-xMnx high entropy alloys (HEAs). Ductile-dimpled rupture mode in all alloys and test temperatures complicated precise explanations on Charpy impact energy, but the instrumented Charpy test provided a breakdown of total energy (ET) into initiation energy (EI) and propagation energy (EP), offering critical insights into Mn-content- and temperature-related energy variations. Pmax and TP, reflecting maximum flow stress and time, respectively, until initiation of crack propagation from the notch tip, involved a transition of deformation mechanisms from slip to TWIP, then to BCC-TRIP, thereby inducing increased strain hardening and delaying plastic instability at the notch tip. At 25 °C, an increase in Mn content prompted a transition in deformation behavior from slip to TWIP and subsequently to BCC-TRIP, correlating well with increased Pmax and TP and consequently ET due to enhanced strain hardenability. At −196 °C, predominant BCC-TRIP activity in all alloys led to increased formation of BCC martensite, intensifying strain hardening and requiring higher Pmax for crack initiation than at 25 °C. Due to minimal slip line field formation associated with reduced plastic deformation, however, cracks initiated directly from the notch-tip center, representing fast initiation of crack propagation and consequently reduction in EI and ET. Thus, utilizing parameters from instrumented Charpy tests, including Pmax, TP, EI, EP, and ET, provided insights into fracture phenomena and their interrelations in the present HEAs

Directory of Open Access Journals

Changes in Sunlight and Outdoor Thermal Environment Conditions Based on the Layout Plan of Flat Type Apartment Houses

Author: Athreya
Hwang
Kim
Kim
Lee
Park
Seonghwan Yoon
Sukjin Jung
Taguchi
Publication venue: 'MDPI AG'
Publication date
Field of study

Crossref

EGR1 Regulation of Vasculogenic Mimicry in the MDA-MB-231 Triple-Negative Breast Cancer Cell Line through the Upregulation of KLF4 Expression

Author: Euitaek Jung
Soon Young Shin
Sukjin Ou
Tae Yoon Kim
Young Han Lee
Publication venue: MDPI AG
Publication date: 01/09/2023
Field of study

Vasculogenic mimicry (VM) is an intriguing phenomenon observed in tumor masses, in which cancer cells organize themselves into capillary-like channels that closely resemble the structure and function of blood vessels. Although VM is believed to contribute to alternative tumor vascularization, the detailed regulatory mechanisms controlling these cellular processes remain poorly understood. Our study aimed to investigate the role of Early Growth Response 1 (EGR1) in regulating VM in aggressive cancer cells, specifically MDA-MB-231 triple-negative breast cancer cells. Our study revealed that EGR1 promotes the formation of capillary-like tubes by MDA-MB-231 cells in a 3-dimensional Matrigel matrix. EGR1 was observed to upregulate Kruppel-like factor 4 (KLF4) expression, which regulates the formation of the capillary-like tube structure. Additionally, our findings highlight the involvement of the ERK1/2 and p38 mitogen-activated protein kinase pathways in mediating the expression of EGR1 and KLF4, underscoring their crucial role in VM in MDA-MB-231 cells. Understanding these regulatory mechanisms will provide valuable insights into potential therapeutic targets for preventing VM during the treatment of triple-negative breast cancer

Directory of Open Access Journals

Programming Languages Adoptions (CSU)

Author: Angelopoulou Anastasia
Gao Yujing
Ge Linqiang
Hodhod Rania
Koech Japheth
Lee Sukjin
Publication venue: GALILEO Open Learning Materials
Publication date: 01/04/2022
Field of study

This adoption for the Programming Languages, Formal Syntax and Semantics of Programming Languages, and Basics of Compiler Design courses at Columbus State University is as a result of a Round 18 Transformation Grant.https://oer.galileo.usg.edu/compsci-collections/1046/thumbnail.jp

GALILEO, University System of Georgia