13 research outputs found
How do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking
Attribution methods assess the contribution of inputs to the model
prediction. One way to do so is erasure: a subset of inputs is considered
irrelevant if it can be removed without affecting the prediction. Though
conceptually simple, erasure's objective is intractable and approximate search
remains expensive with modern deep NLP models. Erasure is also susceptible to
the hindsight bias: the fact that an input can be dropped does not mean that
the model `knows' it can be dropped. The resulting pruning is over-aggressive
and does not reflect how the model arrives at the prediction. To deal with
these challenges, we introduce Differentiable Masking. DiffMask learns to
mask-out subsets of the input while maintaining differentiability. The decision
to include or disregard an input token is made with a simple model based on
intermediate hidden layers of the analyzed model. First, this makes the
approach efficient because we predict rather than search. Second, as with
probing classifiers, this reveals what the network `knows' at the corresponding
layers. This lets us not only plot attribution heatmaps but also analyze how
decisions are formed across network layers. We use DiffMask to study BERT
models on sentiment classification and question answering.Comment: Accepted at the 2020 Conference on Empirical Methods in Natural
Language Processing (EMNLP). Source code available at
https://github.com/nicola-decao/diffmask . 18 pages, 15 figures, 4 table
Multimodal Automated Fact-Checking: A Survey
Misinformation is often conveyed in multiple modalities, e.g. a miscaptioned
image. Multimodal misinformation is perceived as more credible by humans, and
spreads faster than its text-only counterparts. While an increasing body of
research investigates automated fact-checking (AFC), previous surveys mostly
focus on text. In this survey, we conceptualise a framework for AFC including
subtasks unique to multimodal misinformation. Furthermore, we discuss related
terms used in different communities and map them to our framework. We focus on
four modalities prevalent in real-world fact-checking: text, image, audio, and
video. We survey benchmarks and models, and discuss limitations and promising
directions for future researchComment: The 2023 Conference on Empirical Methods in Natural Language
Processing (EMNLP): Finding
UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering
We study open-domain question answering with structured, unstructured and
semi-structured knowledge sources, including text, tables, lists and knowledge
bases. Departing from prior work, we propose a unifying approach that
homogenizes all sources by reducing them to text and applies the
retriever-reader model which has so far been limited to text sources only. Our
approach greatly improves the results on knowledge-base QA tasks by 11 points,
compared to latest graph-based methods. More importantly, we demonstrate that
our unified knowledge (UniK-QA) model is a simple and yet effective way to
combine heterogeneous sources of knowledge, advancing the state-of-the-art
results on two popular question answering benchmarks, NaturalQuestions and
WebQuestions, by 3.5 and 2.6 points, respectively
NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned
We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage contestants to explore the trade-off between storing retrieval corpora or the parameters of learned models. In this report, we describe the motivation and organization of the competition, review the best submissions, and analyze system predictions to inform a discussion of evaluation for open-domain QA