Search CORE

10,149 research outputs found

Speculative Contrastive Decoding

Author: Huang Fei
Lu Keming
Yuan Hongyi
Yuan Zheng
Zhou Chang
Publication venue
Publication date: 15/11/2023
Field of study

Large language models (LLMs) have shown extraordinary performance in various language tasks, but high computational requirements hinder their widespread deployment. Speculative decoding, which uses amateur models to predict the generation of expert models, has been proposed as a way to accelerate LLM inference. However, speculative decoding focuses on acceleration instead of making the best use of the token distribution from amateur models. We proposed Speculative Contrastive Decoding (SCD), an accelerated decoding method leveraging the natural contrast between expert and amateur models in speculative decoding. Comprehensive evaluations on four benchmarks show that SCD can achieve similar acceleration factors as speculative decoding while further improving the generation quality as the contrastive decoding. The analysis of token probabilities further demonstrates the compatibility between speculative and contrastive decoding. Overall, SCD provides an effective approach to enhance the decoding quality of LLMs while saving computational resources.Comment: Working in Progres

arXiv.org e-Print Archive

Width-tuned magnetic order oscillation on zigzag edges of honeycomb nanoribbons

Author: Chen Wen-Chao
Gong Chang-De
Yin Wei-Guo
Yu Shun-Li
Zhou Yuan
Publication venue: 'American Chemical Society (ACS)'
Publication date: 13/07/2017
Field of study

Quantum confinement and interference often generate exotic properties in nanostructures. One recent highlight is the experimental indication of a magnetic phase transition in zigzag-edged graphene nanoribbons at the critical ribbon width of about 7 nm [G. Z. Magda et al., Nature \textbf{514}, 608 (2014)]. Here we show theoretically that with further increase in the ribbon width, the magnetic correlation of the two edges can exhibit an intriguing oscillatory behavior between antiferromagnetic and ferromagnetic, driven by acquiring the positive coherence between the two edges to lower the free energy. The oscillation effect is readily tunable in applied magnetic fields. These novel properties suggest new experimental manifestation of the edge magnetic orders in graphene nanoribbons, and enhance the hopes of graphene-like spintronic nanodevices functioning at room temperature.Comment: 22 pages, 9 figure

arXiv.org e-Print Archive

FigShare

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Author: Lin Junyang
Lin Runji
Lu Keming
Yuan Hongyi
Yuan Zheng
Zhou Chang
Zhou Jingren
Publication venue
Publication date: 14/11/2023
Field of study

The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods

arXiv.org e-Print Archive

Sequential Pattern Mining Algorithm Based on Text Data:Taking the Fault Text Records as an Example

Author: Chang Wenbing
Yang Cheng
Yuan Xinglong
Zhou Shenghan
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

VBN

#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

Author: Lin Junyang
Lin Runji
Lu Keming
Tan Chuanqi
Yuan Hongyi
Yuan Zheng
Zhou Chang
Zhou Jingren
Publication venue
Publication date: 15/08/2023
Field of study

Foundation language models obtain the instruction-following ability through supervised fine-tuning (SFT). Diversity and complexity are considered critical factors of a successful SFT dataset, while their definitions remain obscure and lack quantitative analyses. In this work, we propose InsTag, an open-set fine-grained tagger, to tag samples within SFT datasets based on semantics and intentions and define instruction diversity and complexity regarding tags. We obtain 6.6K tags to describe comprehensive user queries. Then we analyze popular open-sourced SFT datasets and find that the model ability grows with more diverse and complex data. Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data. The resulting models, TagLM, outperform open-source models based on considerably larger SFT data evaluated by MT-Bench, echoing the importance of query diversity and complexity. We open-source InsTag in https://github.com/OFA-Sys/InsTag

arXiv.org e-Print Archive