18,848 research outputs found
Multi-Grained Named Entity Recognition
This paper presents a novel framework, MGNER, for Multi-Grained Named Entity
Recognition where multiple entities or entity mentions in a sentence could be
non-overlapping or totally nested. Different from traditional approaches
regarding NER as a sequential labeling task and annotate entities
consecutively, MGNER detects and recognizes entities on multiple granularities:
it is able to recognize named entities without explicitly assuming
non-overlapping or totally nested structures. MGNER consists of a Detector that
examines all possible word segments and a Classifier that categorizes entities.
In addition, contextual information and a self-attention mechanism are utilized
throughout the framework to improve the NER performance. Experimental results
show that MGNER outperforms current state-of-the-art baselines up to 4.4% in
terms of the F1 score among nested/non-overlapping NER tasks.Comment: In ACL 2019 as a long pape
Do Multi-Sense Embeddings Improve Natural Language Understanding?
Learning a distinct representation for each sense of an ambiguous word could
lead to more powerful and fine-grained models of vector-space representations.
Yet while `multi-sense' methods have been proposed and tested on artificial
word-similarity tasks, we don't know if they improve real natural language
understanding tasks. In this paper we introduce a multi-sense embedding model
based on Chinese Restaurant Processes that achieves state of the art
performance on matching human word similarity judgments, and propose a
pipelined architecture for incorporating multi-sense embeddings into language
understanding.
We then test the performance of our model on part-of-speech tagging, named
entity recognition, sentiment analysis, semantic relation identification and
semantic relatedness, controlling for embedding dimensionality. We find that
multi-sense embeddings do improve performance on some tasks (part-of-speech
tagging, semantic relation identification, semantic relatedness) but not on
others (named entity recognition, various forms of sentiment analysis). We
discuss how these differences may be caused by the different role of word sense
information in each of the tasks. The results highlight the importance of
testing embedding models in real applications
ScienceExamCER: A High-Density Fine-Grained Science-Domain Corpus for Common Entity Recognition
Named entity recognition identifies common classes of entities in text, but
these entity labels are generally sparse, limiting utility to downstream tasks.
In this work we present ScienceExamCER, a densely-labeled semantic
classification corpus of 133k mentions in the science exam domain where nearly
all (96%) of content words have been annotated with one or more fine-grained
semantic class labels including taxonomic groups, meronym groups, verb/action
groups, properties and values, and synonyms. Semantic class labels are drawn
from a manually-constructed fine-grained typology of 601 classes generated
through a data-driven analysis of 4,239 science exam questions. We show an
off-the-shelf BERT-based named entity recognition model modified for
multi-label classification achieves an accuracy of 0.85 F1 on this task,
suggesting strong utility for downstream tasks in science domain question
answering requiring densely-labeled semantic classification
Tint, the Swiss-Army Tool for Natural Language Processing in Italian
In this we paper present the last version of Tint, an opensource, fast and extendable Natural Language Processing suite for Italian based on Stanford CoreNLP. The new release includes a set of text processing components for fine-grained linguistic analysis, from tokenization to relation extraction, including part-of-speech tagging, morphological analysis, lemmatization, multi-word expression recognition, dependency parsing, named-entity recognition, keyword extraction, and much more. Tint is written in Java freely distributed under the GPL license. Although some modules do not perform at a state-of-the-art level, Tint reaches very good accuracy in all modules, and can be easily used out-of-the-box
MultiNERD: A Multilingual, Multi-Genre and Fine-Grained Dataset for Named Entity Recognition (and Disambiguation)
Named Entity Recognition (NER) is the task of identifying named entities in texts and classifying them through specific semantic categories, a process which is crucial for a wide range of NLP applications. Current datasets for NER focus mainly on coarse-grained entity types, tend to consider a single textual genre and to cover a narrow set of languages, thus limiting the general applicability of NER systems.In this work, we design a new methodology for automatically producing NER annotations, and address the aforementioned limitations by introducing a novel dataset that covers 10 languages, 15 NER categories and 2 textual genres.We also introduce a manually-annotated test set, and extensively evaluate the quality of our novel dataset on both this new test set and standard benchmarks for NER.In addition, in our dataset, we include: i) disambiguation information to enable the development of multilingual entity linking systems, and ii) image URLs to encourage the creation of multimodal systems. We release our dataset at https://github.com/Babelscape/multinerd
- …