62 research outputs found
CNM: An Interpretable Complex-valued Network for Matching
This paper seeks to model human language by the mathematical framework of
quantum physics. With the well-designed mathematical formulations in quantum
physics, this framework unifies different linguistic units in a single
complex-valued vector space, e.g. words as particles in quantum states and
sentences as mixed systems. A complex-valued network is built to implement this
framework for semantic matching. With well-constrained complex-valued
components, the network admits interpretations to explicit physical meanings.
The proposed complex-valued network for matching (CNM) achieves comparable
performances to strong CNN and RNN baselines on two benchmarking question
answering (QA) datasets
Learning to Diversify Web Search Results with a Document Repulsion Model
Search diversification (also called diversity search), is an important approach to tackling the query ambiguity problem in information retrieval. It aims to diversify the search results that are originally ranked according to their probabilities of relevance to a given query, by re-ranking them to cover as many as possible different aspects (or subtopics) of the query. Most existing diversity search models heuristically balance the relevance ranking and the diversity ranking, yet lacking an efficient learning mechanism to reach an optimized parameter setting. To address this problem, we propose a learning-to-diversify approach which can directly optimize the search diversification performance (in term of any effectiveness metric). We first extend the ranking function of a widely used learning-to-rank framework, i.e., LambdaMART, so that the extended ranking function can correlate relevance and diversity indicators. Furthermore, we develop an effective learning algorithm, namely Document Repulsion Model (DRM), to train the ranking function based on a Document Repulsion Theory (DRT). DRT assumes that two result documents covering similar query aspects (i.e., subtopics) should be mutually repulsive, for the purpose of search diversification. Accordingly, the proposed DRM exerts a repulsion force between each pair of similar documents in the learning process, and includes the diversity effectiveness metric to be optimized as part of the loss function. Although there have been existing learning based diversity search methods, they often involve an iterative sequential selection process in the ranking process, which is computationally complex and time consuming for training, while our proposed learning strategy can largely reduce the time cost. Extensive experiments are conducted on the TREC diversity track data (2009, 2010 and 2011). The results demonstrate that our model significantly outperforms a number of baselines in terms of effectiveness and robustness. Further, an efficiency analysis shows that the proposed DRM has a lower computational complexity than the state of the art learning-to-diversify methods
On Elastic Language Models
Large-scale pretrained language models have achieved compelling performance
in a wide range of language understanding and information retrieval tasks.
Knowledge distillation offers an opportunity to compress a large language model
to a small one, in order to reach a reasonable latency-performance tradeoff.
However, for scenarios where the number of requests (e.g., queries submitted to
a search engine) is highly variant, the static tradeoff attained by the
compressed language model might not always fit. Once a model is assigned with a
static tradeoff, it could be inadequate in that the latency is too high when
the number of requests is large or the performance is too low when the number
of requests is small. To this end, we propose an elastic language model
(ElasticLM) that elastically adjusts the tradeoff according to the request
stream. The basic idea is to introduce a compute elasticity to the compressed
language model, so that the tradeoff could vary on-the-fly along scalable and
controllable compute. Specifically, we impose an elastic structure to enable
ElasticLM with compute elasticity and design an elastic optimization to learn
ElasticLM under compute elasticity. To serve ElasticLM, we apply an elastic
schedule. Considering the specificity of information retrieval, we adapt
ElasticLM to dense retrieval and reranking and present ElasticDenser and
ElasticRanker respectively. Offline evaluation is conducted on a language
understanding benchmark GLUE; and several information retrieval tasks including
Natural Question, Trivia QA, and MS MARCO. The results show that ElasticLM
along with ElasticDenser and ElasticRanker can perform correctly and
competitively compared with an array of static baselines. Furthermore, online
simulation with concurrency is also carried out. The results demonstrate that
ElasticLM can provide elastic tradeoffs with respect to varying request stream.Comment: 27 pages, 11 figures, 9 table
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning
The emergence of large language models (LLMs) has opened up unprecedented
possibilities for automating complex tasks that are often comparable to human
performance. Despite their capabilities, LLMs still encounter difficulties in
completing tasks that require high levels of accuracy and complexity due to
their inherent limitations in handling multifaceted problems single-handedly.
This paper introduces `Smurfs', a cutting-edge multi-agent framework designed
to revolutionize the application of LLMs. By seamlessly transforming a
conventional LLM into a synergistic multi-agent ensemble, Smurfs can enhance
the model's ability to solve complex tasks at no additional cost. This is
achieved through innovative prompting strategies that allocate distinct roles
within the model, thereby facilitating collaboration among specialized agents
and forming an intelligent multi-agent system. Our empirical investigation on
both open-ended task of StableToolBench and closed-ended task on HotpotQA
showcases Smurfs' superior capability in intricate tool utilization scenarios.
Notably, Smurfs outmatches all the baseline methods in both experiments,
setting new state-of-the-art performance. Furthermore, through comprehensive
ablation studies, we dissect the contribution of the core components of the
multi-agent framework to its overall efficacy. This not only verifies the
effectiveness of the framework, but also sets a route for future exploration of
multi-agent LLM systems
Injecting Knowledge into Biomedical Pre-trained Models via Polymorphism and Synonymous Substitution
Pre-trained language models (PLMs) were considered to be able to store
relational knowledge present in the training data. However, some relational
knowledge seems to be discarded unsafely in PLMs due to \textbf{report bias}:
low-frequency relational knowledge might be underexpressed compared to
high-frequency one in PLMs. This gives us a hint that relational knowledge
might not be redundant to the stored knowledge of PLMs, but rather be
complementary. To additionally inject relational knowledge into PLMs, we
propose a simple-yet-effective approach to inject relational knowledge into
PLMs, which is inspired by three observations (namely, polymorphism, synonymous
substitution, and association). In particular, we switch entities in the
training corpus to related entities (either hypernyms/hyponyms/synonyms, or
arbitrarily-related concepts). Experimental results show that the proposed
approach could not only better capture relational knowledge, but also improve
the performance in various biomedical downstream tasks. Our model is available
in \url{https://github.com/StevenZHB/BioPLM_InjectingKnowledge}
OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning
Large language models (LLMs) often struggle with maintaining accuracy
throughout multiple multiple reasoning steps, especially in mathematical
reasoning where an error in earlier steps can propagate to subsequent ones and
it ultimately leading to an incorrect answer. To reduce error propagation,
guided decoding is employed to direct the LM decoding on a step-by-step basis.
We argue that in guided decoding, assessing the potential of an incomplete
reasoning path can be more advantageous than simply ensuring per-step
correctness, as the former approach leads towards a correct final answer. This
transforms the task into a problem in planning.
Inspired by the findings that , we propose Outcome-supervised
Value Model (OVM) that employs outcome supervision for training a value model,
which prioritizes steps that lead to accurate conclusions. Furthermore, the OVM
eliminates the need for labor-intensive annotations of step-level correctness,
thereby significantly enhancing its scalability. Our experiments on two
multi-step mathematical reasoning datasets, GSM8K and Game of 24, demonstrate
the superior performance of the OVM model. Notably, in GSM8K, our
; especially it does not utilize GPT-4 or code execution. These
findings offer a novel perspective on the role of outcome supervision in
training value models for multi-step reasoning tasks and provide theoretical
justification for its advantage in value estimation for guided decoding.Comment: Accepted to NAACL findings.
https://github.com/FreedomIntelligence/OV
Recommended from our members
End-to-End Quantum-like Language Models with Application to Question Answering
Language Modeling (LM) is a fundamental research topic ina range of areas. Recently, inspired by quantum theory, a novel Quantum Language Model (QLM) has been proposed for Information Retrieval (IR). In this paper, we aim to broaden the theoretical and practical basis of QLM. We develop a Neural Network based Quantum-like Language Model (NNQLM) and apply it to Question Answering. Specifically, based on word embeddings, we design a new density matrix, which represents a sentence (e.g., a question or an answer) and encodes a mixture of semantic subspaces. Such a density matrix, together with a joint representation of the question and the answer, can be integrated into neural network architectures (e.g., 2-dimensional convolutional neural networks). Experiments on the TREC-QA and WIKIQA datasets have verified the effectiveness of our proposed models
IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models
This paper provides a unified account of two schools of thinking in
information retrieval modelling: the generative retrieval focusing on
predicting relevant documents given a query, and the discriminative retrieval
focusing on predicting relevancy given a query-document pair. We propose a game
theoretical minimax game to iteratively optimise both models. On one hand, the
discriminative model, aiming to mine signals from labelled and unlabelled data,
provides guidance to train the generative model towards fitting the underlying
relevance distribution over documents given the query. On the other hand, the
generative model, acting as an attacker to the current discriminative model,
generates difficult examples for the discriminative model in an adversarial way
by minimising its discrimination objective. With the competition between these
two models, we show that the unified framework takes advantage of both schools
of thinking: (i) the generative model learns to fit the relevance distribution
over documents via the signals from the discriminative model, and (ii) the
discriminative model is able to exploit the unlabelled data selected by the
generative model to achieve a better estimation for document ranking. Our
experimental results have demonstrated significant performance gains as much as
23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of
applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde
- …