4 research outputs found
Qwen Technical Report
Large language models (LLMs) have revolutionized the field of artificial
intelligence, enabling natural language processing tasks that were previously
thought to be exclusive to humans. In this work, we introduce Qwen, the first
installment of our large language model series. Qwen is a comprehensive
language model series that encompasses distinct models with varying parameter
counts. It includes Qwen, the base pretrained language models, and Qwen-Chat,
the chat models finetuned with human alignment techniques. The base language
models consistently demonstrate superior performance across a multitude of
downstream tasks, and the chat models, particularly those trained using
Reinforcement Learning from Human Feedback (RLHF), are highly competitive. The
chat models possess advanced tool-use and planning capabilities for creating
agent applications, showcasing impressive performance even when compared to
bigger models on complex tasks like utilizing a code interpreter. Furthermore,
we have developed coding-specialized models, Code-Qwen and Code-Qwen-Chat, as
well as mathematics-focused models, Math-Qwen-Chat, which are built upon base
language models. These models demonstrate significantly improved performance in
comparison with open-source models, and slightly fall behind the proprietary
models.Comment: 59 pages, 5 figure
Effective Approaches to Neural Query Language Identification
Query language identification (Q-LID) plays a crucial role in a cross-lingual search engine. There exist two main challenges in Q-LID: (1) insufficient contextual information in queries for disambiguation; and (2) the lack of query-style training examples for low-resource languages. In this article, we propose a neural Q-LID model by alleviating the above problems from both model architecture and data augmentation perspectives. Concretely, we build our model upon the advanced Transformer model. In order to enhance the discrimination of queries, a variety of external features (e.g., character, word, as well as script) are fed into the model and fused by a multi-scale attention mechanism. Moreover, to remedy the low resource challenge in this task, a novel machine translation–based strategy is proposed to automatically generate synthetic query-style data for low-resource languages. We contribute the first
Q-LID test set called QID-21, which consists of search queries in 21 languages. Experimental results reveal that our model yields better classification accuracy than strong baselines and existing LID systems on both query and traditional LID tasks.
Frequency-Aware Contrastive Learning for Neural Machine Translation
Low-frequency word prediction remains a challenge in modern neural machine translation (NMT) systems. Recent adaptive training methods promote the output of infrequent words by emphasizing their weights in the overall training objectives. Despite the improved recall of low-frequency words, their prediction precision is unexpectedly hindered by the adaptive objectives. Inspired by the observation that low-frequency words form a more compact embedding space, we tackle this challenge from a representation learning perspective. Specifically, we propose a frequency-aware token-level contrastive learning method, in which the hidden state of each decoding step is pushed away from the counterparts of other target words, in a soft contrastive way based on the corresponding word frequencies. We conduct experiments on widely used NIST Chinese-English and WMT14 English-German translation tasks. Empirical results show that our proposed methods can not only significantly improve the translation quality but also enhance lexical diversity and optimize word representation space. Further investigation reveals that, comparing with related adaptive training strategies, the superiority of our method on low-frequency word prediction lies in the robustness of token-level recall across different frequencies without sacrificing precision