44 research outputs found
Zero-Shot Relation Extraction via Reading Comprehension
We show that relation extraction can be reduced to answering simple reading
comprehension questions, by associating one or more natural-language questions
with each relation slot. This reduction has several advantages: we can (1)
learn relation-extraction models by extending recent neural
reading-comprehension techniques, (2) build very large training sets for those
models by combining relation-specific crowd-sourced questions with distant
supervision, and even (3) do zero-shot learning by extracting new relation
types that are only specified at test-time, for which we have no labeled
training examples. Experiments on a Wikipedia slot-filling task demonstrate
that the approach can generalize to new questions for known relation types with
high accuracy, and that zero-shot generalization to unseen relation types is
possible, at lower accuracy levels, setting the bar for future work on this
task.Comment: CoNLL 201
A Bayesian Approach To Analysing Training Data Attribution In Deep Learning
Training data attribution (TDA) techniques find influential training data for
the model's prediction on the test data of interest. They approximate the
impact of down- or up-weighting a particular training sample. While
conceptually useful, they are hardly applicable to deep models in practice,
particularly because of their sensitivity to different model initialisation. In
this paper, we introduce a Bayesian perspective on the TDA task, where the
learned model is treated as a Bayesian posterior and the TDA estimates as
random variables. From this novel viewpoint, we observe that the influence of
an individual training sample is often overshadowed by the noise stemming from
model initialisation and SGD batch composition. Based on this observation, we
argue that TDA can only be reliably used for explaining deep model predictions
that are consistently influenced by certain training data, independent of other
noise factors. Our experiments demonstrate the rarity of such noise-independent
training-test data pairs but confirm their existence. We recommend that future
researchers and practitioners trust TDA estimates only in such cases. Further,
we find a disagreement between ground truth and estimated TDA distributions and
encourage future work to study this gap. Code is provided at
https://github.com/ElisaNguyen/bayesian-tda
Volcano: Mitigating Multimodal Hallucination through Self-Feedback Guided Revision
Large multimodal models (LMMs) suffer from multimodal hallucination, where
they provide incorrect responses misaligned with the given visual information.
Recent works have conjectured that one of the reasons behind multimodal
hallucination might be due to the vision encoder failing to ground on the image
properly. To mitigate this issue, we propose a novel approach that leverages
self-feedback as visual cues. Building on this approach, we introduce Volcano,
a multimodal self-feedback guided revision model. Volcano generates natural
language feedback to its initial response based on the provided visual
information and utilizes this feedback to self-revise its initial response.
Volcano effectively reduces multimodal hallucination and achieves
state-of-the-art on MMHal-Bench, POPE, and GAVIE. It also improves on general
multimodal abilities and outperforms previous models on MM-Vet and MMBench.
Through a qualitative analysis, we show that Volcano's feedback is properly
grounded on the image than the initial response. This indicates that Volcano
can provide itself with richer visual information, helping alleviate multimodal
hallucination. We publicly release Volcano models of 7B and 13B sizes along
with the data and code at https://github.com/kaistAI/Volcano
Prompt Injection: Parameterization of Fixed Inputs
Recent works have shown that attaching prompts to the input is effective at
conditioning Language Models (LM) to perform specific tasks. However, prompts
are always included in the input text during inference, thus incurring
substantial computational and memory overhead. Also, there is currently no
straightforward method of utilizing prompts that are longer than the maximum
input length of the LMs without incurring additional costs during inference. We
propose Prompt Injection (PI), a novel formulation of injecting the prompt into
the parameters of an LM to be an efficient alternative to attaching fixed
prompts to the input. We show that in scenarios with long fixed prompts, PI can
be up to 280 times more efficient in terms of total FLOPs than previous
approaches. We further explore methodologies for PI and show promising results
in persona-dependent conversation, semantic parsing, and zero-shot learning
with task instructions. Through these explorations, we show that PI can be a
promising direction for conditioning language models, especially in scenarios
with long and fixed prompts.Comment: PING results in Table 2 updated (bug fixed
Continually Updating Generative Retrieval on Dynamic Corpora
Generative retrieval has recently been gaining a lot of attention from the
research community for its simplicity, high performance, and the ability to
fully leverage the power of deep autoregressive models. However, prior work on
generative retrieval has mostly investigated on static benchmarks, while
realistic retrieval applications often involve dynamic environments where
knowledge is temporal and accumulated over time. In this paper, we introduce a
new benchmark called STREAMINGIR, dedicated to quantifying the generalizability
of retrieval methods to dynamically changing corpora derived from StreamingQA,
that simulates realistic retrieval use cases. On this benchmark, we conduct an
in-depth comparative evaluation of bi-encoder and generative retrieval in terms
of performance as well as efficiency under varying degree of supervision. Our
results suggest that generative retrieval shows (1) detrimental performance
when only supervised data is used for fine-tuning, (2) superior performance
over bi-encoders when only unsupervised data is available, and (3) lower
performance to bi-encoders when both unsupervised and supervised data is used
due to catastrophic forgetting; nevertheless, we show that parameter-efficient
measures can effectively mitigate the issue and result in competitive
performance and efficiency with respect to the bi-encoder baseline. Our results
open up a new potential for generative retrieval in practical dynamic
environments. Our work will be open-sourced.Comment: Work in progres