1,023 research outputs found
Corpora and evaluation tools for multilingual named entity grammar development
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats
PUnifiedNER: a Prompting-based Unified NER System for Diverse Datasets
Much of named entity recognition (NER) research focuses on developing
dataset-specific models based on data from the domain of interest, and a
limited set of related entity types. This is frustrating as each new dataset
requires a new model to be trained and stored. In this work, we present a
``versatile'' model -- the Prompting-based Unified NER system (PUnifiedNER) --
that works with data from different domains and can recognise up to 37 entity
types simultaneously, and theoretically it could be as many as possible. By
using prompt learning, PUnifiedNER is a novel approach that is able to jointly
train across multiple corpora, implementing intelligent on-demand entity
recognition. Experimental results show that PUnifiedNER leads to significant
prediction benefits compared to dataset-specific models with impressively
reduced model deployment costs. Furthermore, the performance of PUnifiedNER can
achieve competitive or even better performance than state-of-the-art
domain-specific methods for some datasets. We also perform comprehensive pilot
and ablation studies to support in-depth analysis of each component in
PUnifiedNER.Comment: Accepted to AAAI 202
A Boundary Offset Prediction Network for Named Entity Recognition
Named entity recognition (NER) is a fundamental task in natural language
processing that aims to identify and classify named entities in text. However,
span-based methods for NER typically assign entity types to text spans,
resulting in an imbalanced sample space and neglecting the connections between
non-entity and entity spans. To address these issues, we propose a novel
approach for NER, named the Boundary Offset Prediction Network (BOPN), which
predicts the boundary offsets between candidate spans and their nearest entity
spans. By leveraging the guiding semantics of boundary offsets, BOPN
establishes connections between non-entity and entity spans, enabling
non-entity spans to function as additional positive samples for entity
detection. Furthermore, our method integrates entity type and span
representations to generate type-aware boundary offsets instead of using entity
types as detection targets. We conduct experiments on eight widely-used NER
datasets, and the results demonstrate that our proposed BOPN outperforms
previous state-of-the-art methods.Comment: Accepted by Findings of EMNLP 2023, 13 page
Natural Language Processing in-and-for Design Research
We review the scholarly contributions that utilise Natural Language
Processing (NLP) methods to support the design process. Using a heuristic
approach, we collected 223 articles published in 32 journals and within the
period 1991-present. We present state-of-the-art NLP in-and-for design research
by reviewing these articles according to the type of natural language text
sources: internal reports, design concepts, discourse transcripts, technical
publications, consumer opinions, and others. Upon summarizing and identifying
the gaps in these contributions, we utilise an existing design innovation
framework to identify the applications that are currently being supported by
NLP. We then propose a few methodological and theoretical directions for future
NLP in-and-for design research
Comparative Analysis of Contextual Relation Extraction based on Deep Learning Models
Contextual Relation Extraction (CRE) is mainly used for constructing a
knowledge graph with a help of ontology. It performs various tasks such as
semantic search, query answering, and textual entailment. Relation extraction
identifies the entities from raw texts and the relations among them. An
efficient and accurate CRE system is essential for creating domain knowledge in
the biomedical industry. Existing Machine Learning and Natural Language
Processing (NLP) techniques are not suitable to predict complex relations from
sentences that consist of more than two relations and unspecified entities
efficiently. In this work, deep learning techniques have been used to identify
the appropriate semantic relation based on the context from multiple sentences.
Even though various machine learning models have been used for relation
extraction, they provide better results only for binary relations, i.e.,
relations occurred exactly between the two entities in a sentence. Machine
learning models are not suited for complex sentences that consist of the words
that have various meanings. To address these issues, hybrid deep learning
models have been used to extract the relations from complex sentence
effectively. This paper explores the analysis of various deep learning models
that are used for relation extraction.Comment: This Paper Presented in the International Conference on FOSS
Approaches towards Computational Intelligence and Language TTechnolog on
February 2023, Thiruvananthapura
- …