12 research outputs found
ELECTRA is a Zero-Shot Learner, Too
Recently, for few-shot or even zero-shot learning, the new paradigm
"pre-train, prompt, and predict" has achieved remarkable achievements compared
with the "pre-train, fine-tune" paradigm. After the success of prompt-based
GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa)
prompt learning methods became popular and widely used. However, another
efficient pre-trained discriminative model, ELECTRA, has probably been
neglected. In this paper, we attempt to accomplish several NLP tasks in the
zero-shot scenario using a novel our proposed replaced token detection
(RTD)-based prompt learning method. Experimental results show that ELECTRA
model based on RTD-prompt learning achieves surprisingly state-of-the-art
zero-shot performance. Numerically, compared to MLM-RoBERTa-large and
MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7%
improvement on all 15 tasks. Especially on the SST-2 task, our
RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training
data. Overall, compared to the pre-trained masked language models, the
pre-trained replaced token detection model performs better in zero-shot
learning. The source code is available at:
https://github.com/nishiwen1214/RTD-ELECTRA.Comment: The source code is available at:
https://github.com/nishiwen1214/RTD-ELECTR
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
Recently Large Language Models (LLMs) have demonstrated their amazing text
understanding and generation capabilities. However, even stronger LLMs may
still learn incorrect knowledge from the training corpus, as well as some
knowledge that is outdated over time. Direct secondary fine-tuning with data
containing new knowledge may be ineffective in updating knowledge due to the
conflict between old and new knowledge. In this paper, we propose a new
paradigm for fine-tuning called F-Learning (Forgetting before Learning), which
is based on parametric arithmetic to achieve forgetting of old knowledge and
learning of new knowledge. Experimental results on two publicly available
datasets demonstrate that our proposed F-Learning can obviously improve the
knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
Moreover, we have also discovered that forgetting old knowledge by subtracting
the parameters of LoRA can achieve a similar effect to subtracting the
parameters of full fine-tuning, and sometimes even surpass it significantly.Comment: 8 pages, 2 figures, 2 table
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
Recently, there have been significant advancements in large language models
(LLMs), particularly focused on the English language. These advancements have
enabled these LLMs to understand and execute complex instructions with
unprecedented accuracy and fluency. However, despite these advancements, there
remains a noticeable gap in the development of Chinese instruction tuning. The
unique linguistic features and cultural depth of the Chinese language pose
challenges for instruction tuning tasks. Existing datasets are either derived
from English-centric LLMs or are ill-suited for aligning with the interaction
patterns of real-world Chinese users. To bridge this gap, we introduce
COIG-CQIA, a high-quality Chinese instruction tuning dataset. Our aim is to
build a diverse, wide-ranging instruction-tuning dataset to better align model
behavior with human interactions. To this end, we collect a high-quality
human-written corpus from various sources on the Chinese Internet, including
Q&A communities, Wikis, examinations, and existing NLP datasets. This corpus
was rigorously filtered and carefully processed to form the COIG-CQIA dataset.
Furthermore, we train models of various scales on different subsets of CQIA,
following in-depth evaluation and analyses. The findings from our experiments
offer valuable insights for selecting and developing Chinese instruction-tuning
datasets. We also find that models trained on CQIA-Subset achieve competitive
results in human assessment as well as knowledge and security benchmarks. Data
are available at https://huggingface.co/datasets/m-a-p/COIG-CQI
HAT4RD: Hierarchical Adversarial Training for Rumor Detection in Social Media
With the development of social media, social communication has changed. While this facilitates people’s communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors can affect people’s judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expressed in hundreds of ways on social media. As such, the robustness and generalization of the current rumor detection model are in question. We proposed a novel hierarchical adversarial training method for rumor detection (HAT4RD) on social media. Specifically, HAT4RD is based on gradient ascent by adding adversarial perturbations to the embedding layers of post-level and event-level modules to deceive the detector. At the same time, the detector uses stochastic gradient descent to minimize the adversarial risk to learn a more robust model. In this way, the post-level and event-level sample spaces are enhanced, and we verified the robustness of our model under a variety of adversarial attacks. Moreover, visual experiments indicate that the proposed model drifts into an area with a flat loss landscape, thereby, leading to better generalization. We evaluate our proposed method on three public rumor datasets from two commonly used social platforms (Twitter and Weibo). Our experimental results demonstrate that our model achieved better results compared with the state-of-the-art methods
An Underutilized Food “Miwu”: Diet History, Nutritional Evaluations, and Countermeasures for Industrial Development
About 10 major crops basically feed the world. In fact, there are still a large number of plants that have not been fully explored and utilized because they have been ignored by the market and research. The expansion of food sources in various countries plays an important role in maintaining food security and nutrition security in the world. Miwu is the aerial part of the medicinal plant Rhizoma Chuanxiong belonging to a traditional local characteristic food raw material. Its edible value is still little known. Through textual research, component determination, literature survey, field research, and SWOT analysis, this paper has a comprehensive understanding of Miwu’s diet history, chemical components, safety risks, and industrial development status. It is found that Miwu has been eaten for 800 years, is rich in nutrients and active ingredients, and has no acute toxicity. In addition, the current industrial development of Miwu has significant advantages and many challenges. To sum up, Miwu is a potentially underutilized food raw material. This paper also provides countermeasures for the industrialized development of Miwu, which will provide a milestone reference for the future utilization and development of Miwu