Search CORE

12 research outputs found

ELECTRA is a Zero-Shot Learner, Too

Author: Kao Hung-Yu
Ni Shiwen
Publication venue
Publication date: 20/07/2022
Field of study

Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discriminative model, ELECTRA, has probably been neglected. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a novel our proposed replaced token detection (RTD)-based prompt learning method. Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance. Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7% improvement on all 15 tasks. Especially on the SST-2 task, our RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training data. Overall, compared to the pre-trained masked language models, the pre-trained replaced token detection model performs better in zero-shot learning. The source code is available at: https://github.com/nishiwen1214/RTD-ELECTRA.Comment: The source code is available at: https://github.com/nishiwen1214/RTD-ELECTR

arXiv.org e-Print Archive

Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

Author: Chen Dingwei
Hu Xiping
Li Chengming
Ni Shiwen
Xu Ruifeng
Yang Min
Publication venue
Publication date: 14/11/2023
Field of study

Recently Large Language Models (LLMs) have demonstrated their amazing text understanding and generation capabilities. However, even stronger LLMs may still learn incorrect knowledge from the training corpus, as well as some knowledge that is outdated over time. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which is based on parametric arithmetic to achieve forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can achieve a similar effect to subtracting the parameters of full fine-tuning, and sometimes even surpass it significantly.Comment: 8 pages, 2 figures, 2 table

arXiv.org e-Print Archive

COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

Author: Bai Yuelin
Chen Wenhu
Du Xinrun
Fu Jie
Huang Wenhao
Jin Yonggang
Liang Yiming
Lin Chenghua
Lin Hongquan
Liu Ziqiang
Ma Nuo
Ni Shiwen
Wang Zekun
Wu Haihong
Yang Min
Yuan Ruibin
Zhang Ge
Zhang Jiajun
Zhang Xincheng
Zheng Tianyu
Zhou Junting
Publication venue
Publication date: 26/03/2024
Field of study

Recently, there have been significant advancements in large language models (LLMs), particularly focused on the English language. These advancements have enabled these LLMs to understand and execute complex instructions with unprecedented accuracy and fluency. However, despite these advancements, there remains a noticeable gap in the development of Chinese instruction tuning. The unique linguistic features and cultural depth of the Chinese language pose challenges for instruction tuning tasks. Existing datasets are either derived from English-centric LLMs or are ill-suited for aligning with the interaction patterns of real-world Chinese users. To bridge this gap, we introduce COIG-CQIA, a high-quality Chinese instruction tuning dataset. Our aim is to build a diverse, wide-ranging instruction-tuning dataset to better align model behavior with human interactions. To this end, we collect a high-quality human-written corpus from various sources on the Chinese Internet, including Q&A communities, Wikis, examinations, and existing NLP datasets. This corpus was rigorously filtered and carefully processed to form the COIG-CQIA dataset. Furthermore, we train models of various scales on different subsets of CQIA, following in-depth evaluation and analyses. The findings from our experiments offer valuable insights for selecting and developing Chinese instruction-tuning datasets. We also find that models trained on CQIA-Subset achieve competitive results in human assessment as well as knowledge and security benchmarks. Data are available at https://huggingface.co/datasets/m-a-p/COIG-CQI

arXiv.org e-Print Archive

HAT4RD: Hierarchical Adversarial Training for Rumor Detection in Social Media

Author: Hung-Yu Kao
Jiawen Li
Shiwen Ni
Publication venue: MDPI AG
Publication date: 01/09/2022
Field of study

With the development of social media, social communication has changed. While this facilitates people’s communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors can affect people’s judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expressed in hundreds of ways on social media. As such, the robustness and generalization of the current rumor detection model are in question. We proposed a novel hierarchical adversarial training method for rumor detection (HAT4RD) on social media. Specifically, HAT4RD is based on gradient ascent by adding adversarial perturbations to the embedding layers of post-level and event-level modules to deceive the detector. At the same time, the detector uses stochastic gradient descent to minimize the adversarial risk to learn a more robust model. In this way, the post-level and event-level sample spaces are enhanced, and we verified the robustness of our model under a variety of adversarial attacks. Moreover, visual experiments indicate that the proposed model drifts into an area with a flat loss landscape, thereby, leading to better generalization. We evaluate our proposed method on three public rumor datasets from two commonly used social platforms (Twitter and Weibo). Our experimental results demonstrate that our model achieved better results compared with the state-of-the-art methods

Directory of Open Access Journals

PubMed Central

Pattern synthesis approach for circularly polarised four‐dimensional antenna arrays

Author: Dong Ni
Hall P.S.
Huan Yang
Shiwen Yang
Teshirogi T.
Zaiping Nie
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date
Field of study

Crossref

An Underutilized Food “Miwu”: Diet History, Nutritional Evaluations, and Countermeasures for Industrial Development

Author: Du Xuan
Hou Kai
Su Shiwen
Wang Fang
Wang Jiayi
Wu Wei
Xue Wenjing
Yang Ni
Zou Jinpeng
Publication venue: MDPI
Publication date: 01/04/2023
Field of study

About 10 major crops basically feed the world. In fact, there are still a large number of plants that have not been fully explored and utilized because they have been ignored by the market and research. The expansion of food sources in various countries plays an important role in maintaining food security and nutrition security in the world. Miwu is the aerial part of the medicinal plant Rhizoma Chuanxiong belonging to a traditional local characteristic food raw material. Its edible value is still little known. Through textual research, component determination, literature survey, field research, and SWOT analysis, this paper has a comprehensive understanding of Miwu’s diet history, chemical components, safety risks, and industrial development status. It is found that Miwu has been eaten for 800 years, is rich in nutrients and active ingredients, and has no acute toxicity. In addition, the current industrial development of Miwu has significant advantages and many challenges. To sum up, Miwu is a potentially underutilized food raw material. This paper also provides countermeasures for the industrialized development of Miwu, which will provide a milestone reference for the future utilization and development of Miwu

Repository@Nottingham