1,375 research outputs found
Zero-Shot Learning with Common Sense Knowledge Graphs
Zero-shot learning relies on semantic class representations such as
hand-engineered attributes or learned embeddings to predict classes without any
labeled examples. We propose to learn class representations from common sense
knowledge graphs. Common sense knowledge graphs are an untapped source of
explicit high-level knowledge that requires little human effort to apply to a
range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a
general-purpose framework with a novel transformer graph convolutional network
(TrGCN) for generating class representations. Our proposed TrGCN architecture
computes non-linear combinations of the node neighbourhood and shows
improvements on zero-shot learning tasks in language and vision. Our results
show ZSL-KG outperforms the best performing graph-based zero-shot learning
framework by an average of 2.1 accuracy points with improvements as high as 3.4
accuracy points. Our ablation study on ZSL-KG with alternate graph neural
networks shows that our TrGCN adds up to 1.2 accuracy points improvement on
these tasks
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval
Recognizing entities in texts is a central need in many information-seeking
scenarios, and indeed, Named Entity Recognition (NER) is arguably one of the
most successful examples of a widely adopted NLP task and corresponding NLP
technology. Recent advances in large language models (LLMs) appear to provide
effective solutions (also) for NER tasks that were traditionally handled with
dedicated models, often matching or surpassing the abilities of the dedicated
models. Should NER be considered a solved problem? We argue to the contrary:
the capabilities provided by LLMs are not the end of NER research, but rather
an exciting beginning. They allow taking NER to the next level, tackling
increasingly more useful, and increasingly more challenging, variants. We
present three variants of the NER task, together with a dataset to support
them. The first is a move towards more fine-grained -- and intersectional --
entity types. The second is a move towards zero-shot recognition and extraction
of these fine-grained types based on entity-type labels. The third, and most
challenging, is the move from the recognition setup to a novel retrieval setup,
where the query is a zero-shot entity type, and the expected result is all the
sentences from a large, pre-indexed corpus that contain entities of these
types, and their corresponding spans. We show that all of these are far from
being solved. We provide a large, silver-annotated corpus of 4 million
paragraphs covering 500 entity types, to facilitate research towards all of
these three goals.Comment: Findings of EMNLP 202
Dense Retrieval as Indirect Supervision for Large-space Decision Making
Many discriminative natural language understanding (NLU) tasks have large
label spaces. Learning such a process of large-space decision making is
particularly challenging due to the lack of training instances per label and
the difficulty of selection among many fine-grained labels. Inspired by dense
retrieval methods for passage finding in open-domain QA, we propose a
reformulation of large-space discriminative NLU tasks as a learning-to-retrieve
task, leading to a novel solution named Dense Decision Retrieval (DDR ).
Instead of predicting fine-grained decisions as logits, DDR adopts a
dual-encoder architecture that learns to predict by retrieving from a decision
thesaurus. This approach not only leverages rich indirect supervision signals
from easy-to-consume learning resources for dense retrieval, it also leads to
enhanced prediction generalizability with a semantically meaningful
representation of the large decision space. When evaluated on tasks with
decision spaces ranging from hundreds to hundred-thousand scales, DDR
outperforms strong baselines greatly by 27.54% in P@1 on two extreme
multi-label classification tasks, 1.17% in F1 score ultra-fine entity typing,
and 1.26% in accuracy on three few-shot intent classification tasks on average.
Code and resources are available at https://github.com/luka-group/DDRComment: EMNLP 2023 (Findings
PIVOINE: Instruction Tuning for Open-world Information Extraction
We consider the problem of Open-world Information Extraction (Open-world IE),
which extracts comprehensive entity profiles from unstructured texts. Different
from the conventional closed-world setting of Information Extraction (IE),
Open-world IE considers a more general situation where entities and relations
could be beyond a predefined ontology. More importantly, we seek to develop a
large language model (LLM) that is able to perform Open-world IE to extract
desirable entity profiles characterized by (possibly fine-grained) natural
language instructions. We achieve this by finetuning LLMs using instruction
tuning. In particular, we construct INSTRUCTOPENWIKI, a substantial instruction
tuning dataset for Open-world IE enriched with a comprehensive corpus,
extensive annotations, and diverse instructions. We finetune the pretrained
BLOOM models on INSTRUCTOPENWIKI and obtain PIVOINE, an LLM for Open-world IE
with strong instruction-following capabilities. Our experiments demonstrate
that PIVOINE significantly outperforms traditional closed-world methods and
other LLM baselines, displaying impressive generalization capabilities on both
unseen instructions and out-of-ontology cases. Consequently, PIVOINE emerges as
a promising solution to tackle the open-world challenge in IE effectively
- …