Search CORE

11 research outputs found

An Empirical Study of Parameter Efficient Fine-tuning on Vision-Language Pre-train Model

Author: Li Yunfan
Liu Dayiheng
Lv Jiancheng
Peng Xi
Ren Xingzhang
Tian Yuxin
Yang Mouxing
Publication venue
Publication date: 18/05/2024
Field of study

Recent studies applied Parameter Efficient Fine-Tuning techniques (PEFTs) to efficiently narrow the performance gap between pre-training and downstream. There are two important factors for various PEFTs, namely, the accessible data size and fine-tunable parameter size. A natural expectation for PEFTs is that the performance of various PEFTs is positively related to the data size and fine-tunable parameter size. However, according to the evaluation of five PEFTs on two downstream vision-language (VL) tasks, we find that such an intuition holds only if the downstream data and task are not consistent with pre-training. For downstream fine-tuning consistent with pre-training, data size no longer affects the performance, while the influence of fine-tunable parameter size is not monotonous. We believe such an observation could guide the choice of training strategy for various PEFTs.Comment: Accepted by ICME202

arXiv.org e-Print Archive

Qwen Technical Report

Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Qwen, the base pretrained language models, and Qwen-Chat, the chat models finetuned with human alignment techniques. The base language models consistently demonstrate superior performance across a multitude of downstream tasks, and the chat models, particularly those trained using Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The chat models possess advanced tool-use and planning capabilities for creating agent applications, showcasing impressive performance even when compared to bigger models on complex tasks like utilizing a code interpreter. Furthermore, we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as well as mathematics-focused models, Math-Qwen-Chat, which are built upon base language models. These models demonstrate significantly improved performance in comparison with open-source models, and slightly fall behind the proprietary models.Comment: 59 pages, 5 figure

arXiv.org e-Print Archive

A Rough Set Based Optimization Method for Elderly Evaluation

Author: Jingbo Zhang
Tong Mo
Weiping Li
Xingzhang Ren
Zhonghai Wu
Publication venue: Scientific Research Publishing, Inc.
Publication date: 01/01/2017
Field of study

Crossref

A Security Vulnerability Threat Classification Method

Author: Tong Mo
Weiping Li
Xingzhang Ren
Yongle Hao
Yuanwei Hou
Publication venue: Springer International Publishing
Publication date: 02/11/2017
Field of study

Crossref

Effective Approaches to Neural Query Language Identification

Author: Baosong Yang
Dayiheng Liu
Haibo Zhang
Jun Xie
Liang Yao
Xiaoyu Lv
Xingzhang Ren
Publication venue: 'MIT Press - Journals'
Publication date: 01/07/2022
Field of study

Query language identification (Q-LID) plays a crucial role in a cross-lingual search engine. There exist two main challenges in Q-LID: (1) insufficient contextual information in queries for disambiguation; and (2) the lack of query-style training examples for low-resource languages. In this article, we propose a neural Q-LID model by alleviating the above problems from both model architecture and data augmentation perspectives. Concretely, we build our model upon the advanced Transformer model. In order to enhance the discrimination of queries, a variety of external features (e.g., character, word, as well as script) are fed into the model and fused by a multi-scale attention mechanism. Moreover, to remedy the low resource challenge in this task, a novel machine translation–based strategy is proposed to automatically generate synthetic query-style data for low-resource languages. We contribute the first Q-LID test set called QID-21, which consists of search queries in 21 languages. Experimental results reveal that our model yields better classification accuracy than strong baselines and existing LID systems on both query and traditional LID tasks.

Directory of Open Access Journals

Effective Approaches to Neural Query Language Identification

Author: Baosong Yang
Dayiheng Liu
Haibo Zhang
Jun Xie
Liang Yao
Xiaoyu Lv
Xingzhang Ren
Publication venue: MIT Press
Publication date: 01/01/2022
Field of study

Abstract Query language identification (Q-LID) plays a crucial role in a cross-lingual search engine. There exist two main challenges in Q-LID: (1) insufficient contextual information in queries for disambiguation; and (2) the lack of query-style training examples for low-resource languages. In this article, we propose a neural Q-LID model by alleviating the above problems from both model architecture and data augmentation perspectives. Concretely, we build our model upon the advanced Transformer model. In order to enhance the discrimination of queries, a variety of external features (e.g., character, word, as well as script) are fed into the model and fused by a multi-scale attention mechanism. Moreover, to remedy the low resource challenge in this task, a novel machine translation–based strategy is proposed to automatically generate synthetic query-style data for low-resource languages. We contribute the first Q-LID test set called QID-21, which consists of search queries in 21 languages. Experimental results reveal that our model yields better classification accuracy than strong baselines and existing LID systems on both query and traditional LID tasks.1</jats:p

Crossref

Unsupervised Preference-Aware Language Identification

Author: Baosong Yang
Dayiheng Liu
Haibo Zhang
Jun Xie
Liang Yao
Xiaoyu Lv
Xingzhang Ren
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

Crossref

Refining Traceability Links Between Vulnerability and Software Component in a Vulnerability Knowledge Graph

Author: Dongdong Du
Jien Chen
Jinan Sun
Qing Gao
Shikun Zhang
Wei Ye
Xiangyu Xi
Xingzhang Ren
Yupeng Wu
Publication venue: Springer International Publishing
Publication date: 01/01/2018
Field of study

Crossref

Frequency-Aware Contrastive Learning for Neural Machine Translation

Author: Baosong Yang
Dayiheng Liu
Haibo Zhang
Jinan Sun
Long Zhang
Shikun Zhang
Tong Zhang
Wei Ye
Wen Zhao
Xingzhang Ren
Publication venue: Association for the Advancement of Artificial Intelligence (AAAI)
Publication date: 28/06/2022
Field of study

Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Recent adaptive training methods promote the output of infrequent words by emphasizing their weights in the overall training objectives. Despite the improved recall of low-frequency words, their prediction precision is unexpectedly hindered by the adaptive objectives. Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. Specifically, we propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words, in a soft contrastive way based on the corresponding word frequencies. We conduct experiments on widely used NIST Chinese-English and WMT14 English-German translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space. Further investigation reveals that, comparing with related adaptive training strategies, the superiority of our method on low-frequency word prediction lies in the robustness of token-level recall across different frequencies without sacrificing precision.</jats:p

Crossref