12 research outputs found

    ELECTRA is a Zero-Shot Learner, Too

    Full text link
    Recently, for few-shot or even zero-shot learning, the new paradigm "pre-train, prompt, and predict" has achieved remarkable achievements compared with the "pre-train, fine-tune" paradigm. After the success of prompt-based GPT-3, a series of masked language model (MLM)-based (e.g., BERT, RoBERTa) prompt learning methods became popular and widely used. However, another efficient pre-trained discriminative model, ELECTRA, has probably been neglected. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a novel our proposed replaced token detection (RTD)-based prompt learning method. Experimental results show that ELECTRA model based on RTD-prompt learning achieves surprisingly state-of-the-art zero-shot performance. Numerically, compared to MLM-RoBERTa-large and MLM-BERT-large, our RTD-ELECTRA-large has an average of about 8.4% and 13.7% improvement on all 15 tasks. Especially on the SST-2 task, our RTD-ELECTRA-large achieves an astonishing 90.1% accuracy without any training data. Overall, compared to the pre-trained masked language models, the pre-trained replaced token detection model performs better in zero-shot learning. The source code is available at: https://github.com/nishiwen1214/RTD-ELECTRA.Comment: The source code is available at: https://github.com/nishiwen1214/RTD-ELECTR

    Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models

    Full text link
    Recently Large Language Models (LLMs) have demonstrated their amazing text understanding and generation capabilities. However, even stronger LLMs may still learn incorrect knowledge from the training corpus, as well as some knowledge that is outdated over time. Direct secondary fine-tuning with data containing new knowledge may be ineffective in updating knowledge due to the conflict between old and new knowledge. In this paper, we propose a new paradigm for fine-tuning called F-Learning (Forgetting before Learning), which is based on parametric arithmetic to achieve forgetting of old knowledge and learning of new knowledge. Experimental results on two publicly available datasets demonstrate that our proposed F-Learning can obviously improve the knowledge updating performance of both full fine-tuning and LoRA fine-tuning. Moreover, we have also discovered that forgetting old knowledge by subtracting the parameters of LoRA can achieve a similar effect to subtracting the parameters of full fine-tuning, and sometimes even surpass it significantly.Comment: 8 pages, 2 figures, 2 table

    COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning

    Full text link
    Recently, there have been significant advancements in large language models (LLMs), particularly focused on the English language. These advancements have enabled these LLMs to understand and execute complex instructions with unprecedented accuracy and fluency. However, despite these advancements, there remains a noticeable gap in the development of Chinese instruction tuning. The unique linguistic features and cultural depth of the Chinese language pose challenges for instruction tuning tasks. Existing datasets are either derived from English-centric LLMs or are ill-suited for aligning with the interaction patterns of real-world Chinese users. To bridge this gap, we introduce COIG-CQIA, a high-quality Chinese instruction tuning dataset. Our aim is to build a diverse, wide-ranging instruction-tuning dataset to better align model behavior with human interactions. To this end, we collect a high-quality human-written corpus from various sources on the Chinese Internet, including Q&A communities, Wikis, examinations, and existing NLP datasets. This corpus was rigorously filtered and carefully processed to form the COIG-CQIA dataset. Furthermore, we train models of various scales on different subsets of CQIA, following in-depth evaluation and analyses. The findings from our experiments offer valuable insights for selecting and developing Chinese instruction-tuning datasets. We also find that models trained on CQIA-Subset achieve competitive results in human assessment as well as knowledge and security benchmarks. Data are available at https://huggingface.co/datasets/m-a-p/COIG-CQI

    HAT4RD: Hierarchical Adversarial Training for Rumor Detection in Social Media

    No full text
    With the development of social media, social communication has changed. While this facilitates people’s communication and access to information, it also provides an ideal platform for spreading rumors. In normal or critical situations, rumors can affect people’s judgment and even endanger social security. However, natural language is high-dimensional and sparse, and the same rumor may be expressed in hundreds of ways on social media. As such, the robustness and generalization of the current rumor detection model are in question. We proposed a novel hierarchical adversarial training method for rumor detection (HAT4RD) on social media. Specifically, HAT4RD is based on gradient ascent by adding adversarial perturbations to the embedding layers of post-level and event-level modules to deceive the detector. At the same time, the detector uses stochastic gradient descent to minimize the adversarial risk to learn a more robust model. In this way, the post-level and event-level sample spaces are enhanced, and we verified the robustness of our model under a variety of adversarial attacks. Moreover, visual experiments indicate that the proposed model drifts into an area with a flat loss landscape, thereby, leading to better generalization. We evaluate our proposed method on three public rumor datasets from two commonly used social platforms (Twitter and Weibo). Our experimental results demonstrate that our model achieved better results compared with the state-of-the-art methods

    Pattern synthesis approach for circularly polarised four‐dimensional antenna arrays

    No full text

    An Underutilized Food “Miwu”: Diet History, Nutritional Evaluations, and Countermeasures for Industrial Development

    No full text
    About 10 major crops basically feed the world. In fact, there are still a large number of plants that have not been fully explored and utilized because they have been ignored by the market and research. The expansion of food sources in various countries plays an important role in maintaining food security and nutrition security in the world. Miwu is the aerial part of the medicinal plant Rhizoma Chuanxiong belonging to a traditional local characteristic food raw material. Its edible value is still little known. Through textual research, component determination, literature survey, field research, and SWOT analysis, this paper has a comprehensive understanding of Miwu’s diet history, chemical components, safety risks, and industrial development status. It is found that Miwu has been eaten for 800 years, is rich in nutrients and active ingredients, and has no acute toxicity. In addition, the current industrial development of Miwu has significant advantages and many challenges. To sum up, Miwu is a potentially underutilized food raw material. This paper also provides countermeasures for the industrialized development of Miwu, which will provide a milestone reference for the future utilization and development of Miwu
    corecore