31 research outputs found
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Knowledge distillation (KD) has been a ubiquitous method for model
compression to strengthen the capability of a lightweight model with the
transferred knowledge from the teacher. In particular, KD has been employed in
quantization-aware training (QAT) of Transformer encoders like BERT to improve
the accuracy of the student model with the reduced-precision weight parameters.
However, little is understood about which of the various KD approaches best
fits the QAT of Transformers. In this work, we provide an in-depth analysis of
the mechanism of KD on attention recovery of quantized large Transformers. In
particular, we reveal that the previously adopted MSE loss on the attention
score is insufficient for recovering the self-attention information. Therefore,
we propose two KD methods; attention-map and attention-output losses.
Furthermore, we explore the unification of both losses to address
task-dependent preference between attention-map and output losses. The
experimental results on various Transformer encoder models demonstrate that the
proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit
weight quantization.Comment: EMNLP 2022 Main Track Long Pape
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Generative Language Models (GLMs) have shown impressive performance in tasks
such as text generation, understanding, and reasoning. However, the large model
size poses challenges for practical deployment. To solve this problem,
Quantization-Aware Training (QAT) has become increasingly popular. However,
current QAT methods for generative models have resulted in a noticeable loss of
accuracy. To counteract this issue, we propose a novel knowledge distillation
method specifically designed for GLMs. Our method, called token-scaled logit
distillation, prevents overfitting and provides superior learning from the
teacher model and ground truth. This research marks the first evaluation of
ternary weight quantization-aware training of large-scale GLMs with less than
1.0 degradation in perplexity and no loss of accuracy in a reasoning task
Computer Organization
This open textbook for Computer Organization was developed as a result of a Round 18 Transformation Grant at Columbus State University.https://oer.galileo.usg.edu/compsci-textbooks/1012/thumbnail.jp
Computer Science Resources and Student Projects at Columbus State University
This set of student-created and curated educational resources for Wireless, IoT and Mobile Security, Theory of Computation, Software Engineering, Computer Architecture, and Digital Media was created under an Affordable Materials Grant
Toughness behavior and deformation mechanisms in FCC-based Fe45Co30Cr10V10Ni5-xMnx high-entropy alloys: Insights from instrumented Charpy impact tests
This study investigated toughness properties of FCC-based Fe45Co30Cr10V10Ni5-xMnx high entropy alloys (HEAs). Ductile-dimpled rupture mode in all alloys and test temperatures complicated precise explanations on Charpy impact energy, but the instrumented Charpy test provided a breakdown of total energy (ET) into initiation energy (EI) and propagation energy (EP), offering critical insights into Mn-content- and temperature-related energy variations. Pmax and TP, reflecting maximum flow stress and time, respectively, until initiation of crack propagation from the notch tip, involved a transition of deformation mechanisms from slip to TWIP, then to BCC-TRIP, thereby inducing increased strain hardening and delaying plastic instability at the notch tip. At 25 °C, an increase in Mn content prompted a transition in deformation behavior from slip to TWIP and subsequently to BCC-TRIP, correlating well with increased Pmax and TP and consequently ET due to enhanced strain hardenability. At −196 °C, predominant BCC-TRIP activity in all alloys led to increased formation of BCC martensite, intensifying strain hardening and requiring higher Pmax for crack initiation than at 25 °C. Due to minimal slip line field formation associated with reduced plastic deformation, however, cracks initiated directly from the notch-tip center, representing fast initiation of crack propagation and consequently reduction in EI and ET. Thus, utilizing parameters from instrumented Charpy tests, including Pmax, TP, EI, EP, and ET, provided insights into fracture phenomena and their interrelations in the present HEAs
EGR1 Regulation of Vasculogenic Mimicry in the MDA-MB-231 Triple-Negative Breast Cancer Cell Line through the Upregulation of KLF4 Expression
Vasculogenic mimicry (VM) is an intriguing phenomenon observed in tumor masses, in which cancer cells organize themselves into capillary-like channels that closely resemble the structure and function of blood vessels. Although VM is believed to contribute to alternative tumor vascularization, the detailed regulatory mechanisms controlling these cellular processes remain poorly understood. Our study aimed to investigate the role of Early Growth Response 1 (EGR1) in regulating VM in aggressive cancer cells, specifically MDA-MB-231 triple-negative breast cancer cells. Our study revealed that EGR1 promotes the formation of capillary-like tubes by MDA-MB-231 cells in a 3-dimensional Matrigel matrix. EGR1 was observed to upregulate Kruppel-like factor 4 (KLF4) expression, which regulates the formation of the capillary-like tube structure. Additionally, our findings highlight the involvement of the ERK1/2 and p38 mitogen-activated protein kinase pathways in mediating the expression of EGR1 and KLF4, underscoring their crucial role in VM in MDA-MB-231 cells. Understanding these regulatory mechanisms will provide valuable insights into potential therapeutic targets for preventing VM during the treatment of triple-negative breast cancer
Programming Languages Adoptions (CSU)
This adoption for the Programming Languages, Formal Syntax and Semantics of Programming Languages, and Basics of Compiler Design courses at Columbus State University is as a result of a Round 18 Transformation Grant.https://oer.galileo.usg.edu/compsci-collections/1046/thumbnail.jp