913 research outputs found
Understanding and Improving Knowledge Distillation for Quantization-Aware Training of Large Transformer Encoders
Knowledge distillation (KD) has been a ubiquitous method for model
compression to strengthen the capability of a lightweight model with the
transferred knowledge from the teacher. In particular, KD has been employed in
quantization-aware training (QAT) of Transformer encoders like BERT to improve
the accuracy of the student model with the reduced-precision weight parameters.
However, little is understood about which of the various KD approaches best
fits the QAT of Transformers. In this work, we provide an in-depth analysis of
the mechanism of KD on attention recovery of quantized large Transformers. In
particular, we reveal that the previously adopted MSE loss on the attention
score is insufficient for recovering the self-attention information. Therefore,
we propose two KD methods; attention-map and attention-output losses.
Furthermore, we explore the unification of both losses to address
task-dependent preference between attention-map and output losses. The
experimental results on various Transformer encoder models demonstrate that the
proposed KD methods achieve state-of-the-art accuracy for QAT with sub-2-bit
weight quantization.Comment: EMNLP 2022 Main Track Long Pape
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Generative Language Models (GLMs) have shown impressive performance in tasks
such as text generation, understanding, and reasoning. However, the large model
size poses challenges for practical deployment. To solve this problem,
Quantization-Aware Training (QAT) has become increasingly popular. However,
current QAT methods for generative models have resulted in a noticeable loss of
accuracy. To counteract this issue, we propose a novel knowledge distillation
method specifically designed for GLMs. Our method, called token-scaled logit
distillation, prevents overfitting and provides superior learning from the
teacher model and ground truth. This research marks the first evaluation of
ternary weight quantization-aware training of large-scale GLMs with less than
1.0 degradation in perplexity and no loss of accuracy in a reasoning task
PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models
We present a novel end-to-end personality-based synthetic dialogue data
generation pipeline, specifically designed to elicit responses from large
language models via prompting. We design the prompts to generate more
human-like dialogues considering real-world scenarios when users engage with
chatbots. We introduce PSYDIAL, the first Korean dialogue dataset focused on
personality-based dialogues, curated using our proposed pipeline. Notably, we
focus on the Extraversion dimension of the Big Five personality model in our
research. Experimental results indicate that while pre-trained models and those
fine-tuned with a chit-chat dataset struggle to generate responses reflecting
personality, models trained with PSYDIAL show significant improvements. The
versatility of our pipeline extends beyond dialogue tasks, offering potential
for other non-dialogue related applications. This research opens doors for more
nuanced, personality-driven conversational AI in Korean and potentially other
languages. Our code is publicly available at
https://github.com/jiSilverH/psydial.Comment: LREC-COLING 2024 Mai
Lipase-catalyzed dimethyl adipate synthesis: response surface modeling and kinetics
Dimethyl adipate (DMA) was synthesized by immobilized Candida antarctica lipase B-catalyzed esterification of adipic acid and methanol. To optimize the reaction conditions of ester production, response surface methodology was applied, and the effects of four factors namely, time, temperature, enzyme concentration, and molar ratio of substrates on product synthesis were determined. A statistical model predicted that the maximum conversion yield would be 97.6%, at the optimal conditions of 58.5°C, 54.0 mg enzyme, 358.0 min, and 12:1 molar ratio of methanol to adipic acid. The R2 (0.9769) shows a high correlation between predicted and experimental values. The kinetics of the reaction was also investigated in this study. The reaction was found to obey the ping-pong bi-bi mechanism with methanol inhibition. The kinetic parameters were determined and used to simulate the experimental results. A good quality of fit was observed between the simulated and experimental initial rates
Bloodstream Infections and Clinical Significance of Healthcare-associated Bacteremia: A Multicenter Surveillance Study in Korean Hospitals
Recent changes in healthcare systems have changed the epidemiologic paradigms in many infectious fields including bloodstream infection (BSI). We compared clinical characteristics of community-acquired (CA), hospital-acquired (HA), and healthcare-associated (HCA) BSI. We performed a prospective nationwide multicenter surveillance study from 9 university hospitals in Korea. Total 1,605 blood isolates were collected from 2006 to 2007, and 1,144 isolates were considered true pathogens. HA-BSI accounted for 48.8%, CA-BSI for 33.2%, and HCA-BSI for 18.0%. HA-BSI and HCA-BSI were more likely to have severe comorbidities. Escherichia coli was the most common isolate in CA-BSI (47.1%) and HCA-BSI (27.2%). In contrast, Staphylococcus aureus (15.2%), coagulase-negative Staphylococcus (15.1%) were the common isolates in HA-BSI. The rate of appropriate empiric antimicrobial therapy was the highest in CA-BSI (89.0%) followed by HCA-BSI (76.4%), and HA-BSI (75.0%). The 30-day mortality rate was the highest in HA-BSI (23.0%) followed by HCA-BSI (18.4%), and CA-BSI (10.2%). High Pitt score and inappropriate empirical antibiotic therapy were the independent risk factors for mortality by multivariate analysis. In conclusion, the present data suggest that clinical features, outcome, and microbiologic features of causative pathogens vary by origin of BSI. Especially, HCA-BSI shows unique clinical characteristics, which should be considered a distinct category for more appropriate antibiotic treatment
The Fourteenth Data Release of the Sloan Digital Sky Survey: First Spectroscopic Data from the extended Baryon Oscillation Spectroscopic Survey and from the second phase of the Apache Point Observatory Galactic Evolution Experiment
The fourth generation of the Sloan Digital Sky Survey (SDSS-IV) has been in
operation since July 2014. This paper describes the second data release from
this phase, and the fourteenth from SDSS overall (making this, Data Release
Fourteen or DR14). This release makes public data taken by SDSS-IV in its first
two years of operation (July 2014-2016). Like all previous SDSS releases, DR14
is cumulative, including the most recent reductions and calibrations of all
data taken by SDSS since the first phase began operations in 2000. New in DR14
is the first public release of data from the extended Baryon Oscillation
Spectroscopic Survey (eBOSS); the first data from the second phase of the
Apache Point Observatory (APO) Galactic Evolution Experiment (APOGEE-2),
including stellar parameter estimates from an innovative data driven machine
learning algorithm known as "The Cannon"; and almost twice as many data cubes
from the Mapping Nearby Galaxies at APO (MaNGA) survey as were in the previous
release (N = 2812 in total). This paper describes the location and format of
the publicly available data from SDSS-IV surveys. We provide references to the
important technical papers describing how these data have been taken (both
targeting and observation details) and processed for scientific use. The SDSS
website (www.sdss.org) has been updated for this release, and provides links to
data downloads, as well as tutorials and examples of data use. SDSS-IV is
planning to continue to collect astronomical data until 2020, and will be
followed by SDSS-V.Comment: SDSS-IV collaboration alphabetical author data release paper. DR14
happened on 31st July 2017. 19 pages, 5 figures. Accepted by ApJS on 28th Nov
2017 (this is the "post-print" and "post-proofs" version; minor corrections
only from v1, and most of errors found in proofs corrected
World-Wide FINGERS Network: A global approach to risk reduction and prevention of dementia
© 2020 The Authors. Alzheimer\u27s & Dementia published by Wiley Periodicals, Inc. on behalf of Alzheimer\u27s Association Reducing the risk of dementia can halt the worldwide increase of affected people. The multifactorial and heterogeneous nature of late-onset dementia, including Alzheimer\u27s disease (AD), indicates a potential impact of multidomain lifestyle interventions on risk reduction. The positive results of the landmark multidomain Finnish Geriatric Intervention Study to Prevent Cognitive Impairment and Disability (FINGER) support such an approach. The World-Wide FINGERS (WW-FINGERS), launched in 2017 and including over 25 countries, is the first global network of multidomain lifestyle intervention trials for dementia risk reduction and prevention. WW-FINGERS aims to adapt, test, and optimize the FINGER model to reduce risk across the spectrum of cognitive decline—from at-risk asymptomatic states to early symptomatic stages—in different geographical, cultural, and economic settings. WW-FINGERS aims to harmonize and adapt multidomain interventions across various countries and settings, to facilitate data sharing and analysis across studies, and to promote international joint initiatives to identify globally implementable and effective preventive strategies
Robust estimation of bacterial cell count from optical density
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data
- …