31 research outputs found

    Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders

    Full text link
    Knowledge distillation (KD) has been a ubiquitous method for model compression to strengthen the capability of a lightweight model with the transferred knowledge from the teacher. In particular, KD has been employed in quantization-aware training (QAT) of Transformer encoders like BERT to improve the accuracy of the student model with the reduced-precision weight parameters. However, little is understood about which of the various KD approaches best fits the QAT of Transformers. In this work, we provide an in-depth analysis of the mechanism of KD on attention recovery of quantized large Transformers. In particular, we reveal that the previously adopted MSE loss on the attention score is insufficient for recovering the self-attention information. Therefore, we propose two KD methods; attention-map and attention-output losses. Furthermore, we explore the unification of both losses to address task-dependent preference between attention-map and output losses. The experimental results on various Transformer encoder models demonstrate that the proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit weight quantization.Comment: EMNLP 2022 Main Track Long Pape

    Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

    Full text link
    Generative Language Models (GLMs) have shown impressive performance in tasks such as text generation, understanding, and reasoning. However, the large model size poses challenges for practical deployment. To solve this problem, Quantization-Aware Training (QAT) has become increasingly popular. However, current QAT methods for generative models have resulted in a noticeable loss of accuracy. To counteract this issue, we propose a novel knowledge distillation method specifically designed for GLMs. Our method, called token-scaled logit distillation, prevents overfitting and provides superior learning from the teacher model and ground truth. This research marks the first evaluation of ternary weight quantization-aware training of large-scale GLMs with less than 1.0 degradation in perplexity and no loss of accuracy in a reasoning task

    Computer Organization

    No full text
    This open textbook for Computer Organization was developed as a result of a Round 18 Transformation Grant at Columbus State University.https://oer.galileo.usg.edu/compsci-textbooks/1012/thumbnail.jp

    Computer Science Resources and Student Projects at Columbus State University

    No full text
    This set of student-created and curated educational resources for Wireless, IoT and Mobile Security, Theory of Computation, Software Engineering, Computer Architecture, and Digital Media was created under an Affordable Materials Grant

    Toughness behavior and deformation mechanisms in FCC-based Fe45Co30Cr10V10Ni5-xMnx high-entropy alloys: Insights from instrumented Charpy impact tests

    No full text
    This study investigated toughness properties of FCC-based Fe45Co30Cr10V10Ni5-xMnx high entropy alloys (HEAs). Ductile-dimpled rupture mode in all alloys and test temperatures complicated precise explanations on Charpy impact energy, but the instrumented Charpy test provided a breakdown of total energy (ET) into initiation energy (EI) and propagation energy (EP), offering critical insights into Mn-content- and temperature-related energy variations. Pmax and TP, reflecting maximum flow stress and time, respectively, until initiation of crack propagation from the notch tip, involved a transition of deformation mechanisms from slip to TWIP, then to BCC-TRIP, thereby inducing increased strain hardening and delaying plastic instability at the notch tip. At 25 °C, an increase in Mn content prompted a transition in deformation behavior from slip to TWIP and subsequently to BCC-TRIP, correlating well with increased Pmax and TP and consequently ET due to enhanced strain hardenability. At −196 °C, predominant BCC-TRIP activity in all alloys led to increased formation of BCC martensite, intensifying strain hardening and requiring higher Pmax for crack initiation than at 25 °C. Due to minimal slip line field formation associated with reduced plastic deformation, however, cracks initiated directly from the notch-tip center, representing fast initiation of crack propagation and consequently reduction in EI and ET. Thus, utilizing parameters from instrumented Charpy tests, including Pmax, TP, EI, EP, and ET, provided insights into fracture phenomena and their interrelations in the present HEAs

    EGR1 Regulation of Vasculogenic Mimicry in the MDA-MB-231 Triple-Negative Breast Cancer Cell Line through the Upregulation of KLF4 Expression

    No full text
    Vasculogenic mimicry (VM) is an intriguing phenomenon observed in tumor masses, in which cancer cells organize themselves into capillary-like channels that closely resemble the structure and function of blood vessels. Although VM is believed to contribute to alternative tumor vascularization, the detailed regulatory mechanisms controlling these cellular processes remain poorly understood. Our study aimed to investigate the role of Early Growth Response 1 (EGR1) in regulating VM in aggressive cancer cells, specifically MDA-MB-231 triple-negative breast cancer cells. Our study revealed that EGR1 promotes the formation of capillary-like tubes by MDA-MB-231 cells in a 3-dimensional Matrigel matrix. EGR1 was observed to upregulate Kruppel-like factor 4 (KLF4) expression, which regulates the formation of the capillary-like tube structure. Additionally, our findings highlight the involvement of the ERK1/2 and p38 mitogen-activated protein kinase pathways in mediating the expression of EGR1 and KLF4, underscoring their crucial role in VM in MDA-MB-231 cells. Understanding these regulatory mechanisms will provide valuable insights into potential therapeutic targets for preventing VM during the treatment of triple-negative breast cancer

    Programming Languages Adoptions (CSU)

    No full text
    This adoption for the Programming Languages, Formal Syntax and Semantics of Programming Languages, and Basics of Compiler Design courses at Columbus State University is as a result of a Round 18 Transformation Grant.https://oer.galileo.usg.edu/compsci-collections/1046/thumbnail.jp
    corecore