33,575 research outputs found
Few-shot classification in Named Entity Recognition Task
For many natural language processing (NLP) tasks the amount of annotated data
is limited. This urges a need to apply semi-supervised learning techniques,
such as transfer learning or meta-learning. In this work we tackle Named Entity
Recognition (NER) task using Prototypical Network - a metric learning
technique. It learns intermediate representations of words which cluster well
into named entity classes. This property of the model allows classifying words
with extremely limited number of training examples, and can potentially be used
as a zero-shot learning method. By coupling this technique with transfer
learning we achieve well-performing classifiers trained on only 20 instances of
a target class.Comment: In proceedings of the 34th ACM/SIGAPP Symposium on Applied Computin
Mutual Reinforcement Effects in Japanese Sentence Classification and Named Entity Recognition Tasks
Information extraction(IE) is a crucial subfield within natural language
processing. However, for the traditionally segmented approach to sentence
classification and Named Entity Recognition, the intricate interactions between
these individual subtasks remain largely uninvestigated. In this study, we
propose an integrative analysis, converging sentence classification with Named
Entity Recognition, with the objective to unveil and comprehend the mutual
reinforcement effect within these two information extraction subtasks. To
achieve this, we introduce a Sentence Classification and Named Entity
Recognition Multi-task (SCNM) approach that combines Sentence Classification
(SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label
Generation (SLG) framework for SCNM and construct a Wikipedia dataset
containing both SC and NER. Using a format converter, we unify input formats
and employ a generative model to generate SC-labels, NER-labels, and associated
text segments. We propose a Constraint Mechanism (CM) to improve generated
format accuracy. Our results show SC accuracy increased by 1.13 points and NER
by 1.06 points in SCNM compared to standalone tasks, with CM raising format
accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects
between SC and NER, and integration enhances both tasks' performance. We
additionally implemented the SLG framework on single SC task. It yielded
superior accuracies compared to the baseline on two distinct Japanese SC
datasets. Notably, in the experiment of few-shot learning, SLG framework shows
much better performance than fine-tune method. These empirical findings
contribute additional evidence to affirm the efficacy of the SLG framework.Comment: 25 pages, 12 figures, 19 tables. arXiv admin note: substantial text
overlap with arXiv:2306.1597
Zero- and Few-Shot Machine Learning for Named Entity Recognition in Biomedical Texts
Named entity recognition (NER) is an NLP that involves identifying and classifying named
entities in text. Token classification is a crucial subtask of NER that assumes assigning
labels to individual tokens within a text, indicating the named entity category to which
they belong. Fine-tuning large language models (LLMs) on labeled domain datasets has
emerged as a powerful technique for improving NER performance. By training a pretrained
LLM such as BERT on domain-specific labeled data, the model learns to recognize
named entities specific to that domain with high accuracy. This approach has been applied
to a wide range of domains including biomedical and has demonstrated significant
improvements in NER accuracy.
Still, data for fine-tuning pre-trained LLMs is large and labeling is a time-consuming
and expensive process that requires expert domain knowledge. Also, domains with an
open set of classes yield difficulties in traditional machine learning approaches since the
number of classes to predict needs to be pre-defined.
Our solution to the two mentioned problems is based on data transformation for
factorizing the initial multiple classification problem into a binary one and applying crossencoder-
based BERT architecture for zero- and few-shot learning.
To create our dataset, we transformed six widely used biomedical datasets that contain
various biomedical entities such as genes, drugs, diseases, adverse events, chemicals,
etc., into a uniform format. This transformation process enabled us to merge the datasets
into a single cohesive dataset of 26 different named entity classes.
We then fine-tuned two pre-trained language models: BioBERT and PubMedBERT for the
NER task in zero- and few-shot settings. The results of the experiment for 9 classes in
zero-shot mode are promising for semantically similar classes and improve significantly
after providing only a few supporting examples for almost all classes. The best results
were obtained using a fine-tuned PubMedBERT model, with average F1 scores of
35.44%, 50.10%, 69.94%, and 79.51% for zero-shot, one-shot, 10-shot, and 100-shot NER
respectively.Book of abstract: 4th Belgrade Bioinformatics Conference, June 19-23, 202
Accelerated materials language processing enabled by GPT
Materials language processing (MLP) is one of the key facilitators of
materials science research, as it enables the extraction of structured
information from massive materials science literature. Prior works suggested
high-performance MLP models for text classification, named entity recognition
(NER), and extractive question answering (QA), which require complex model
architecture, exhaustive fine-tuning and a large number of human-labelled
datasets. In this study, we develop generative pretrained transformer
(GPT)-enabled pipelines where the complex architectures of prior MLP models are
replaced with strategic designs of prompt engineering. First, we develop a
GPT-enabled document classification method for screening relevant documents,
achieving comparable accuracy and reliability compared to prior models, with
only small dataset. Secondly, for NER task, we design an entity-centric
prompts, and learning few-shot of them improved the performance on most of
entities in three open datasets. Finally, we develop an GPT-enabled extractive
QA model, which provides improved performance and shows the possibility of
automatically correcting annotations. While our findings confirm the potential
of GPT-enabled MLP models as well as their value in terms of reliability and
practicability, our scientific methods and systematic approach are applicable
to any materials science domain to accelerate the information extraction of
scientific literature
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
We evaluate four state-of-the-art instruction-tuned large language models
(LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13
real-world clinical and biomedical natural language processing (NLP) tasks in
English, such as named-entity recognition (NER), question-answering (QA),
relation extraction (RE), etc. Our overall results demonstrate that the
evaluated LLMs begin to approach performance of state-of-the-art models in
zero- and few-shot scenarios for most tasks, and particularly well for the QA
task, even though they have never seen examples from these tasks before.
However, we observed that the classification and RE tasks perform below what
can be achieved with a specifically trained model for the medical field, such
as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all
the studied tasks, with some models being better suited for certain tasks than
others.Comment: Under review proces
Type-Aware Decomposed Framework for Few-Shot Named Entity Recognition
Despite the recent success achieved by several two-stage prototypical
networks in few-shot named entity recognition (NER) task, the overdetected
false spans at the span detection stage and the inaccurate and unstable
prototypes at the type classification stage remain to be challenging problems.
In this paper, we propose a novel Type-Aware Decomposed framework, namely
TadNER, to solve these problems. We first present a type-aware span filtering
strategy to filter out false spans by removing those semantically far away from
type names. We then present a type-aware contrastive learning strategy to
construct more accurate and stable prototypes by jointly exploiting support
samples and type names as references. Extensive experiments on various
benchmarks prove that our proposed TadNER framework yields a new
state-of-the-art performance. Our code and data will be available at
https://github.com/NLPWM-WHU/TadNER.Comment: Accepted to the Findings of EMNLP 2023, camera ready versio
FewRel: A Large-Scale Supervised Few-Shot Relation Classification Dataset with State-of-the-Art Evaluation
We present a Few-Shot Relation Classification Dataset (FewRel), consisting of
70, 000 sentences on 100 relations derived from Wikipedia and annotated by
crowdworkers. The relation of each sentence is first recognized by distant
supervision methods, and then filtered by crowdworkers. We adapt the most
recent state-of-the-art few-shot learning methods for relation classification
and conduct a thorough evaluation of these methods. Empirical results show that
even the most competitive few-shot learning models struggle on this task,
especially as compared with humans. We also show that a range of different
reasoning skills are needed to solve our task. These results indicate that
few-shot relation classification remains an open problem and still requires
further research. Our detailed analysis points multiple directions for future
research. All details and resources about the dataset and baselines are
released on http://zhuhao.me/fewrel.Comment: EMNLP 2018. The first four authors contribute equally. The order is
determined by dice rolling. Visit our website http://zhuhao.me/fewre
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Pretrained contextual representation models (Peters et al., 2018; Devlin et
al., 2018) have pushed forward the state-of-the-art on many NLP tasks. A new
release of BERT (Devlin, 2018) includes a model simultaneously pretrained on
104 languages with impressive performance for zero-shot cross-lingual transfer
on a natural language inference task. This paper explores the broader
cross-lingual potential of mBERT (multilingual) as a zero shot language
transfer model on 5 NLP tasks covering a total of 39 languages from various
language families: NLI, document classification, NER, POS tagging, and
dependency parsing. We compare mBERT with the best-published methods for
zero-shot cross-lingual transfer and find mBERT competitive on each task.
Additionally, we investigate the most effective strategy for utilizing mBERT in
this manner, determine to what extent mBERT generalizes away from language
specific features, and measure factors that influence cross-lingual transfer.Comment: EMNLP 2019 Camera Read
- ā¦