75 research outputs found
What do Deck Chairs and Sun Hats Have in Common? Uncovering Shared Properties in Large Concept Vocabularies
Concepts play a central role in many applications. This includes settings
where concepts have to be modelled in the absence of sentence context. Previous
work has therefore focused on distilling decontextualised concept embeddings
from language models. But concepts can be modelled from different perspectives,
whereas concept embeddings typically mostly capture taxonomic structure. To
address this issue, we propose a strategy for identifying what different
concepts, from a potentially large concept vocabulary, have in common with
others. We then represent concepts in terms of the properties they share with
the other concepts. To demonstrate the practical usefulness of this way of
modelling concepts, we consider the task of ultra-fine entity typing, which is
a challenging multi-label classification problem. We show that by augmenting
the label set with shared properties, we can improve the performance of the
state-of-the-art models for this task.Comment: Accepted for EMNLP 202
Learning to Select from Multiple Options
Many NLP tasks can be regarded as a selection problem from a set of options,
such as classification tasks, multi-choice question answering, etc. Textual
entailment (TE) has been shown as the state-of-the-art (SOTA) approach to
dealing with those selection problems. TE treats input texts as premises (P),
options as hypotheses (H), then handles the selection problem by modeling (P,
H) pairwise. Two limitations: first, the pairwise modeling is unaware of other
options, which is less intuitive since humans often determine the best options
by comparing competing candidates; second, the inference process of pairwise TE
is time-consuming, especially when the option space is large. To deal with the
two issues, this work first proposes a contextualized TE model (Context-TE) by
appending other k options as the context of the current (P, H) modeling.
Context-TE is able to learn more reliable decision for the H since it considers
various context. Second, we speed up Context-TE by coming up with Parallel-TE,
which learns the decisions of multiple options simultaneously. Parallel-TE
significantly improves the inference speed while keeping comparable performance
with Context-TE. Our methods are evaluated on three tasks (ultra-fine entity
typing, intent detection and multi-choice QA) that are typical selection
problems with different sizes of options. Experiments show our models set new
SOTA performance; particularly, Parallel-TE is faster than the pairwise TE by k
times in inference. Our code is publicly available at
https://github.com/jiangshdd/LearningToSelect.Comment: Accepted by AAAI 202
Ultra-fine entity typing with prior knowledge about labels: a simple clustering based approach
Ultra-fine entity typing (UFET) is the task of inferring the semantic types from a large set of fine-grained candidates that apply to a given entity mention. This task is especially challenging because we only have a small number of training examples for many types, even with distant supervision strategies. State-of-the-art models, therefore, have to rely on prior knowledge about the type labels in some way. In this paper, we show that the performance of existing methods can be improved using a simple technique: we use pre-trained label embeddings to cluster the labels into semantic domains and then treat these domains as additional types. We show that this strategy consistently leads to improved results as long as high-quality label embeddings are used. Furthermore, we use the label clusters as part of a simple post-processing technique, which results in further performance gains. Both strategies treat the UFET model as a black box and can thus straightforwardly be used to improve a wide range of existing models
Dense Retrieval as Indirect Supervision for Large-space Decision Making
Many discriminative natural language understanding (NLU) tasks have large
label spaces. Learning such a process of large-space decision making is
particularly challenging due to the lack of training instances per label and
the difficulty of selection among many fine-grained labels. Inspired by dense
retrieval methods for passage finding in open-domain QA, we propose a
reformulation of large-space discriminative NLU tasks as a learning-to-retrieve
task, leading to a novel solution named Dense Decision Retrieval (DDR ).
Instead of predicting fine-grained decisions as logits, DDR adopts a
dual-encoder architecture that learns to predict by retrieving from a decision
thesaurus. This approach not only leverages rich indirect supervision signals
from easy-to-consume learning resources for dense retrieval, it also leads to
enhanced prediction generalizability with a semantically meaningful
representation of the large decision space. When evaluated on tasks with
decision spaces ranging from hundreds to hundred-thousand scales, DDR
outperforms strong baselines greatly by 27.54% in P@1 on two extreme
multi-label classification tasks, 1.17% in F1 score ultra-fine entity typing,
and 1.26% in accuracy on three few-shot intent classification tasks on average.
Code and resources are available at https://github.com/luka-group/DDRComment: EMNLP 2023 (Findings
EnCore: Fine-Grained Entity Typing by Pre-Training Entity Encoders on Coreference Chains
Entity typing is the task of assigning semantic types to the entities that
are mentioned in a text. In the case of fine-grained entity typing (FET), a
large set of candidate type labels is considered. Since obtaining sufficient
amounts of manual annotations is then prohibitively expensive, FET models are
typically trained using distant supervision. In this paper, we propose to
improve on this process by pre-training an entity encoder such that embeddings
of coreferring entities are more similar to each other than to the embeddings
of other entities. The main problem with this strategy, which helps to explain
why it has not previously been considered, is that predicted coreference links
are often too noisy. We show that this problem can be addressed by using a
simple trick: we only consider coreference links that are predicted by two
different off-the-shelf systems. With this prudent use of coreference links,
our pre-training strategy allows us to improve the state-of-the-art in
benchmarks on fine-grained entity typing, as well as traditional entity
extraction.Comment: To appear at EACL 202
Learning to Correct Noisy Labels for Fine-Grained Entity Typing via Co-Prediction Prompt Tuning
Fine-grained entity typing (FET) is an essential task in natural language
processing that aims to assign semantic types to entities in text. However, FET
poses a major challenge known as the noise labeling problem, whereby current
methods rely on estimating noise distribution to identify noisy labels but are
confused by diverse noise distribution deviation. To address this limitation,
we introduce Co-Prediction Prompt Tuning for noise correction in FET, which
leverages multiple prediction results to identify and correct noisy labels.
Specifically, we integrate prediction results to recall labeled labels and
utilize a differentiated margin to identify inaccurate labels. Moreover, we
design an optimization objective concerning divergent co-predictions during
fine-tuning, ensuring that the model captures sufficient information and
maintains robustness in noise identification. Experimental results on three
widely-used FET datasets demonstrate that our noise correction approach
significantly enhances the quality of various types of training samples,
including those annotated using distant supervision, ChatGPT, and
crowdsourcing.Comment: Accepted by Findings of EMNLP 2023, 11 page
- …