99 research outputs found
NIR-Prompt: A Multi-task Generalized Neural Information Retrieval Training Framework
Information retrieval aims to find information that meets users' needs from
the corpus. Different needs correspond to different IR tasks such as document
retrieval, open-domain question answering, retrieval-based dialogue, etc.,
while they share the same schema to estimate the relationship between texts. It
indicates that a good IR model can generalize to different tasks and domains.
However, previous studies indicate that state-of-the-art neural information
retrieval (NIR) models, e.g, pre-trained language models (PLMs) are hard to
generalize. Mainly because the end-to-end fine-tuning paradigm makes the model
overemphasize task-specific signals and domain biases but loses the ability to
capture generalized essential signals. To address this problem, we propose a
novel NIR training framework named NIR-Prompt for retrieval and reranking
stages based on the idea of decoupling signal capturing and combination.
NIR-Prompt exploits Essential Matching Module (EMM) to capture the essential
matching signals and gets the description of tasks by Matching Description
Module (MDM). The description is used as task-adaptation information to combine
the essential matching signals to adapt to different tasks. Experiments under
in-domain multi-task, out-of-domain multi-task, and new task adaptation
settings show that NIR-Prompt can improve the generalization of PLMs in NIR for
both retrieval and reranking stages compared with baselines.Comment: This article is the extension of arXiv:2204.02725 and accepted by
TOI
Think Before You Speak: Cultivating Communication Skills of Large Language Models via Inner Monologue
The emergence of large language models (LLMs) further improves the
capabilities of open-domain dialogue systems and can generate fluent, coherent,
and diverse responses. However, LLMs still lack an important ability:
communication skills, which makes them more like information seeking tools than
anthropomorphic chatbots. To make LLMs more anthropomorphic and proactive
during the conversation, we add five communication skills to the response
generation process: topic transition, proactively asking questions, concept
guidance, empathy, and summarising often. The addition of communication skills
increases the interest of users in the conversation and attracts them to chat
for longer. To enable LLMs better understand and use communication skills, we
design and add the inner monologue to LLMs. The complete process is achieved
through prompt engineering and in-context learning. To evaluate communication
skills, we construct a benchmark named Cskills for evaluating various
communication skills, which can also more comprehensively evaluate the dialogue
generation ability of the model. Experimental results show that the proposed
CSIM strategy improves the backbone models and outperforms the baselines in
both automatic and human evaluations
RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder for Language Modeling
Retrieval-augmented language models show promise in addressing issues like
outdated information and hallucinations in language models (LMs). However,
current research faces two main problems: 1) determining what information to
retrieve, and 2) effectively combining retrieved information during generation.
We argue that valuable retrieved information should not only be related to the
current source text but also consider the future target text, given the nature
of LMs that model future tokens. Moreover, we propose that aggregation using
latent variables derived from a compact latent space is more efficient than
utilizing explicit raw text, which is limited by context length and susceptible
to noise. Therefore, we introduce RegaVAE, a retrieval-augmented language model
built upon the variational auto-encoder (VAE). It encodes the text corpus into
a latent space, capturing current and future information from both source and
target text. Additionally, we leverage the VAE to initialize the latent space
and adopt the probabilistic form of the retrieval generation paradigm by
expanding the Gaussian prior distribution into a Gaussian mixture distribution.
Theoretical analysis provides an optimizable upper bound for RegaVAE.
Experimental results on various datasets demonstrate significant improvements
in text generation quality and hallucination removal.Comment: Accepted to the Findings of EMNLP 202
- …