11 research outputs found
ELECTRA is a Zero-Shot Learner, Too
Recently, for few-shot or even zero-shot learning, the new paradigm
"pre-train, prompt, and predict" has achieved remarkable achievements compared
with the "pre-train, fine-tune" paradigm. After the success of prompt-based
GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa)
prompt learning methods became popular and widely used. However, another
efficient pre-trained discriminative model, ELECTRA, has probably been
neglected. In this paper, we attempt to accomplish several NLP tasks in the
zero-shot scenario using a novel our proposed replaced token detection
(RTD)-based prompt learning method. Experimental results show that ELECTRA
model based on RTD-prompt learning achieves surprisingly state-of-the-art
zero-shot performance. Numerically, compared to MLM-RoBERTa-large and
MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7%
improvement on all 15 tasks. Especially on the SST-2 task, our
RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training
data. Overall, compared to the pre-trained masked language models, the
pre-trained replaced token detection model performs better in zero-shot
learning. The source code is available at:
https://github.com/nishiwen1214/RTD-ELECTRA.Comment: The source code is available at:
https://github.com/nishiwen1214/RTD-ELECTR
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
Recently Large Language Models (LLMs) have demonstrated their amazing text
understanding and generation capabilities. However, even stronger LLMs may
still learn incorrect knowledge from the training corpus, as well as some
knowledge that is outdated over time. Direct secondary fine-tuning with data
containing new knowledge may be ineffective in updating knowledge due to the
conflict between old and new knowledge. In this paper, we propose a new
paradigm for fine-tuning called F-Learning (Forgetting before Learning), which
is based on parametric arithmetic to achieve forgetting of old knowledge and
learning of new knowledge. Experimental results on two publicly available
datasets demonstrate that our proposed F-Learning can obviously improve the
knowledge updating performance of both full fine-tuning and LoRA fine-tuning.
Moreover, we have also discovered that forgetting old knowledge by subtracting
the parameters of LoRA can achieve a similar effect to subtracting the
parameters of full fine-tuning, and sometimes even surpass it significantly.Comment: 8 pages, 2 figures, 2 table
HAT4RD: Hierarchical Adversarial Training for Rumor Detection in Social Media
With the development of social media, social communication has changed. While this facilitates people’s communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors can affect people’s judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expressed in hundreds of ways on social media. As such, the robustness and generalization of the current rumor detection model are in question. We proposed a novel hierarchical adversarial training method for rumor detection (HAT4RD) on social media. Specifically, HAT4RD is based on gradient ascent by adding adversarial perturbations to the embedding layers of post-level and event-level modules to deceive the detector. At the same time, the detector uses stochastic gradient descent to minimize the adversarial risk to learn a more robust model. In this way, the post-level and event-level sample spaces are enhanced, and we verified the robustness of our model under a variety of adversarial attacks. Moreover, visual experiments indicate that the proposed model drifts into an area with a flat loss landscape, thereby, leading to better generalization. We evaluate our proposed method on three public rumor datasets from two commonly used social platforms (Twitter and Weibo). Our experimental results demonstrate that our model achieved better results compared with the state-of-the-art methods
An Underutilized Food “Miwu”: Diet History, Nutritional Evaluations, and Countermeasures for Industrial Development
About 10 major crops basically feed the world. In fact, there are still a large number of plants that have not been fully explored and utilized because they have been ignored by the market and research. The expansion of food sources in various countries plays an important role in maintaining food security and nutrition security in the world. Miwu is the aerial part of the medicinal plant Rhizoma Chuanxiong belonging to a traditional local characteristic food raw material. Its edible value is still little known. Through textual research, component determination, literature survey, field research, and SWOT analysis, this paper has a comprehensive understanding of Miwu’s diet history, chemical components, safety risks, and industrial development status. It is found that Miwu has been eaten for 800 years, is rich in nutrients and active ingredients, and has no acute toxicity. In addition, the current industrial development of Miwu has significant advantages and many challenges. To sum up, Miwu is a potentially underutilized food raw material. This paper also provides countermeasures for the industrialized development of Miwu, which will provide a milestone reference for the future utilization and development of Miwu