62 research outputs found
How to Fine-Tune BERT for Text Classification?
Language model pre-training has proven to be useful in learning universal
language representations. As a state-of-the-art language model pre-training
model, BERT (Bidirectional Encoder Representations from Transformers) has
achieved amazing results in many language understanding tasks. In this paper,
we conduct exhaustive experiments to investigate different fine-tuning methods
of BERT on text classification task and provide a general solution for BERT
fine-tuning. Finally, the proposed solution obtains new state-of-the-art
results on eight widely-studied text classification datasets
SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking
Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has
widespread applications. Existing systems typically utilize BERT for text
encoding. However, CSC requires the model to account for both phonetic and
graphemic information. To adapt BERT to the CSC task, we propose a token-level
self-distillation contrastive learning method. We employ BERT to encode both
the corrupted and corresponding correct sentence. Then, we use contrastive
learning loss to regularize corrupted tokens' hidden states to be closer to
counterparts in the correct sentence. On three CSC datasets, we confirmed our
method provides a significant improvement above baselines
PVC-LOT-008-J-038
Laser-based powder-bed fusion additive manufacturing or three-dimensional printing technology has gained tremendous attention due to its controllable, digital, and automated manufacturing process, which can afford a refined microstructure and superior strength. However, it is a major challenge to additively manufacture metal parts with satisfactory ductility and toughness. Here we report a novel selective laser melting process to simultaneously enhance the strength and ductility of stainless steel 316L by in-process engineering its microstructure into a crystallographic texture. We find that the tensile strength and ductility of SLM-built stainless steel 316L samples could be enhanced by ~16% and ~40% respectively, with the engineered textured microstructure compared to the common textured microstructure. This is because the favorable nano-twinning mechanism was significantly more activated in the textured stainless steel 316L samples during plastic deformation. In addition, kinetic simulations were performed to unveil the relationship between the melt pool geometry and crystallographic texture. The new additive manufacturing strategy of engineering the crystallographic texture can be applied to other metals and alloys with twinning-induced plasticity. This work paves the way to additively manufacture metal parts with high strength and high ductility.NRF (Natl Research Foundation, S’pore)Published versio
Do Large Language Models Know What They Don't Know?
Large language models (LLMs) have a wealth of knowledge that allows them to
excel in various Natural Language Processing (NLP) tasks. Current research
focuses on enhancing their performance within their existing knowledge. Despite
their vast knowledge, LLMs are still limited by the amount of information they
can accommodate and comprehend. Therefore, the ability to understand their own
limitations on the unknows, referred to as self-knowledge, is of paramount
importance. This study aims to evaluate LLMs' self-knowledge by assessing their
ability to identify unanswerable or unknowable questions. We introduce an
automated methodology to detect uncertainty in the responses of these models,
providing a novel measure of their self-knowledge. We further introduce a
unique dataset, SelfAware, consisting of unanswerable questions from five
diverse categories and their answerable counterparts. Our extensive analysis,
involving 20 LLMs including GPT-3, InstructGPT, and LLaMA, discovering an
intrinsic capacity for self-knowledge within these models. Moreover, we
demonstrate that in-context learning and instruction tuning can further enhance
this self-knowledge. Despite this promising insight, our findings also
highlight a considerable gap between the capabilities of these models and human
proficiency in recognizing the limits of their knowledge.Comment: 10 pages, 9 figures, accepted by Findings of ACL202
- …