13,533 research outputs found
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models
The complementary potential of Large Language Models (LLM) assumes
off-the-shelf LLMs have heterogeneous expertise in a wide range of domains and
tasks so that an ensemble of LLMs can achieve consistently better performance.
Existing ensemble methods for LLMs mainly focus on reward model ranking of
outputs, leading to significant computation overhead. To combat this issue, we
revisit the complementary potential of LLMs and further elaborate it by mining
latent expertise with off-the-shelf reward models. We propose Zooter, a
reward-guided routing method distilling rewards on training queries to train a
routing function, which can precisely distribute each query to the LLM with
expertise about it. We also integrate a tag-based label enhancement to mitigate
noise from uncertainty when using rewards as silver supervision. Zooter shows
computation efficiency in inference as it introduces only a minor computation
overhead of a routing function compared with reward model ranking methods. We
evaluate Zooter on a comprehensive benchmark collection with 26 subsets on
different domains and tasks. Zooter outperforms the best single model on
average and ranks first on 44% of tasks, even surpassing multiple reward model
ranking methods
Semantic-Aware Adversarial Training for Reliable Deep Hashing Retrieval
Deep hashing has been intensively studied and successfully applied in
large-scale image retrieval systems due to its efficiency and effectiveness.
Recent studies have recognized that the existence of adversarial examples poses
a security threat to deep hashing models, that is, adversarial vulnerability.
Notably, it is challenging to efficiently distill reliable semantic
representatives for deep hashing to guide adversarial learning, and thereby it
hinders the enhancement of adversarial robustness of deep hashing-based
retrieval models. Moreover, current researches on adversarial training for deep
hashing are hard to be formalized into a unified minimax structure. In this
paper, we explore Semantic-Aware Adversarial Training (SAAT) for improving the
adversarial robustness of deep hashing models. Specifically, we conceive a
discriminative mainstay features learning (DMFL) scheme to construct semantic
representatives for guiding adversarial learning in deep hashing. Particularly,
our DMFL with the strict theoretical guarantee is adaptively optimized in a
discriminative learning manner, where both discriminative and semantic
properties are jointly considered. Moreover, adversarial examples are
fabricated by maximizing the Hamming distance between the hash codes of
adversarial samples and mainstay features, the efficacy of which is validated
in the adversarial attack trials. Further, we, for the first time, formulate
the formalized adversarial training of deep hashing into a unified minimax
optimization under the guidance of the generated mainstay codes. Extensive
experiments on benchmark datasets show superb attack performance against the
state-of-the-art algorithms, meanwhile, the proposed adversarial training can
effectively eliminate adversarial perturbations for trustworthy deep
hashing-based retrieval. Our code is available at
https://github.com/xandery-geek/SAAT
Structure of Agkistrodotoxin in an orthorhombic crystal form with six molecules per asymmetric unit
This is the publisher's version, also available electronically from "http://scripts.iucr.org".The structure of agkistrodotoxin crystallized under basic conditions has been determined at 2.8 Ã… resolution by the molecular-replacement technique and refined to a crystallographic R factor of 0.194 and a free R factor of 0.260 with good stereochemistry. The molecular packing in the crystal differs from other PLA2s. The six molecules in the asymmetric unit form three dimers linked by Ca2+ ions in a near-perfect six-ligand octahedral coordinating system. Extensive intermolecular hydrophobic interactions occur at the interfacial recognition site of each neurotoxin molecule, which provides an insight into phospholipase A2-membrane interactions. This hydrophobic interaction-induced molecular association along the interfacial recognition site suggests a self-protection mechanism of agkistrodotoxin
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Foundation language models obtain the instruction-following ability through
supervised fine-tuning (SFT). Diversity and complexity are considered critical
factors of a successful SFT dataset, while their definitions remain obscure and
lack quantitative analyses. In this work, we propose InsTag, an open-set
fine-grained tagger, to tag samples within SFT datasets based on semantics and
intentions and define instruction diversity and complexity regarding tags. We
obtain 6.6K tags to describe comprehensive user queries. Then we analyze
popular open-sourced SFT datasets and find that the model ability grows with
more diverse and complex data. Based on this observation, we propose a data
selector based on InsTag to select 6K diverse and complex samples from
open-source datasets and fine-tune models on InsTag-selected data. The
resulting models, TagLM, outperform open-source models based on considerably
larger SFT data evaluated by MT-Bench, echoing the importance of query
diversity and complexity. We open-source InsTag in
https://github.com/OFA-Sys/InsTag
On the Universal Approximation Property and Equivalence of Stochastic Computing-based Neural Networks and Binary Neural Networks
Large-scale deep neural networks are both memory intensive and
computation-intensive, thereby posing stringent requirements on the computing
platforms. Hardware accelerations of deep neural networks have been extensively
investigated in both industry and academia. Specific forms of binary neural
networks (BNNs) and stochastic computing based neural networks (SCNNs) are
particularly appealing to hardware implementations since they can be
implemented almost entirely with binary operations. Despite the obvious
advantages in hardware implementation, these approximate computing techniques
are questioned by researchers in terms of accuracy and universal applicability.
Also it is important to understand the relative pros and cons of SCNNs and BNNs
in theory and in actual hardware implementations. In order to address these
concerns, in this paper we prove that the "ideal" SCNNs and BNNs satisfy the
universal approximation property with probability 1 (due to the stochastic
behavior). The proof is conducted by first proving the property for SCNNs from
the strong law of large numbers, and then using SCNNs as a "bridge" to prove
for BNNs. Based on the universal approximation property, we further prove that
SCNNs and BNNs exhibit the same energy complexity. In other words, they have
the same asymptotic energy consumption with the growing of network size. We
also provide a detailed analysis of the pros and cons of SCNNs and BNNs for
hardware implementations and conclude that SCNNs are more suitable for
hardware.Comment: 9 pages, 3 figure
- …