10,149 research outputs found

    Speculative Contrastive Decoding

    Full text link
    Large language models (LLMs) have shown extraordinary performance in various language tasks, but high computational requirements hinder their widespread deployment. Speculative decoding, which uses amateur models to predict the generation of expert models, has been proposed as a way to accelerate LLM inference. However, speculative decoding focuses on acceleration instead of making the best use of the token distribution from amateur models. We proposed Speculative Contrastive Decoding (SCD), an accelerated decoding method leveraging the natural contrast between expert and amateur models in speculative decoding. Comprehensive evaluations on four benchmarks show that SCD can achieve similar acceleration factors as speculative decoding while further improving the generation quality as the contrastive decoding. The analysis of token probabilities further demonstrates the compatibility between speculative and contrastive decoding. Overall, SCD provides an effective approach to enhance the decoding quality of LLMs while saving computational resources.Comment: Working in Progres

    Width-tuned magnetic order oscillation on zigzag edges of honeycomb nanoribbons

    Full text link
    Quantum confinement and interference often generate exotic properties in nanostructures. One recent highlight is the experimental indication of a magnetic phase transition in zigzag-edged graphene nanoribbons at the critical ribbon width of about 7 nm [G. Z. Magda et al., Nature \textbf{514}, 608 (2014)]. Here we show theoretically that with further increase in the ribbon width, the magnetic correlation of the two edges can exhibit an intriguing oscillatory behavior between antiferromagnetic and ferromagnetic, driven by acquiring the positive coherence between the two edges to lower the free energy. The oscillation effect is readily tunable in applied magnetic fields. These novel properties suggest new experimental manifestation of the edge magnetic orders in graphene nanoribbons, and enhance the hopes of graphene-like spintronic nanodevices functioning at room temperature.Comment: 22 pages, 9 figure

    Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

    Full text link
    The complementary potential of Large Language Models (LLM) assumes off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and tasks so that an ensemble of LLMs can achieve consistently better performance. Existing ensemble methods for LLMs mainly focus on reward model ranking of outputs, leading to significant computation overhead. To combat this issue, we revisit the complementary potential of LLMs and further elaborate it by mining latent expertise with off-the-shelf reward models. We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function, which can precisely distribute each query to the LLM with expertise about it. We also integrate a tag-based label enhancement to mitigate noise from uncertainty when using rewards as silver supervision. Zooter shows computation efficiency in inference as it introduces only a minor computation overhead of a routing function compared with reward model ranking methods. We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks. Zooter outperforms the best single model on average and ranks first on 44% of tasks, even surpassing multiple reward model ranking methods

    #InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models

    Full text link
    Foundation language models obtain the instruction-following ability through supervised fine-tuning (SFT). Diversity and complexity are considered critical factors of a successful SFT dataset, while their definitions remain obscure and lack quantitative analyses. In this work, we propose InsTag, an open-set fine-grained tagger, to tag samples within SFT datasets based on semantics and intentions and define instruction diversity and complexity regarding tags. We obtain 6.6K tags to describe comprehensive user queries. Then we analyze popular open-sourced SFT datasets and find that the model ability grows with more diverse and complex data. Based on this observation, we propose a data selector based on InsTag to select 6K diverse and complex samples from open-source datasets and fine-tune models on InsTag-selected data. The resulting models, TagLM, outperform open-source models based on considerably larger SFT data evaluated by MT-Bench, echoing the importance of query diversity and complexity. We open-source InsTag in https://github.com/OFA-Sys/InsTag
    • …
    corecore