Search CORE

3,594 research outputs found

Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition

Author: De Meulder Fien
Sang Erik F. Tjong Kim
Publication venue
Publication date: 01/01/2003
Field of study

We describe the CoNLL-2003 shared task: language-independent named entity recognition. We give background information on the data sets (English and German) and the evaluation method, present a general overview of the systems that have taken part in the task and discuss their performance

arXiv.org e-Print Archive

CiteSeerX

Tilburg University Repository

Neural Reranking for Named Entity Recognition

Author: Dong Fei
Yang Jie
Zhang Yue
Publication venue
Publication date: 17/07/2017
Field of study

We propose a neural reranking system for named entity recognition (NER). The basic idea is to leverage recurrent neural network models to learn sentence-level patterns that involve named entity mentions. In particular, given an output sentence produced by a baseline NER model, we replace all entity mentions, such as \textit{Barack Obama}, into their entity types, such as \textit{PER}. The resulting sentence patterns contain direct output information, yet is less sparse without specific named entities. For example, "PER was born in LOC" can be such a pattern. LSTM and CNN structures are utilised for learning deep representations of such sentences for reranking. Results show that our system can significantly improve the NER accuracies over two different baselines, giving the best reported results on a standard benchmark.Comment: Accepted as regular paper by RANLP 201

arXiv.org e-Print Archive

Crossref

Multi-Engine Approach for Named Entity Recognition in Bengali

Author: Bandyopadhyay Sivaji
Ekbal Asif
Publication venue: De La Salle University - Dasmarinas
Publication date: 01/01/2008
Field of study

PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200

Waseda University Repository

Capsule network with shortcut routing

Author: Gwang-Hyun Yu
Jin-Young Kim
Trong Vo Hoang
Vu Dang Thanh
Publication venue
Publication date: 14/07/2023
Field of study

This study introduces "shortcut routing," a novel routing mechanism in capsule networks that addresses computational inefficiencies by directly activating global capsules from local capsules, eliminating intermediate layers. An attention-based approach with fuzzy coefficients is also explored for improved efficiency. Experimental results on Mnist, smallnorb, and affNist datasets show comparable classification performance, achieving accuracies of 99.52%, 93.91%, and 89.02% respectively. The proposed fuzzy-based and attention-based routing methods significantly reduce the number of calculations by 1.42 and 2.5 times compared to EM routing, highlighting their computational advantages in capsule networks. These findings contribute to the advancement of efficient and accurate hierarchical pattern representation models.Comment: 8 pages, published at IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences E104.A(8

arXiv.org e-Print Archive

Improving the Performance of a Named Entity Extractor by Applying a Stacking Scheme

Author: Díaz Madrigal Víctor Jesús
Enríquez de Salamanca Ros Fernando
Romero Moreno Luisa María
Troyano Jiménez José Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

In this paper we investigate the way of improving the performance of a Named Entity Extraction (NEE) system by applying machine learning techniques and corpus transformation. The main resources used in our experiments are the publicly available tagger TnT and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We split the NEE task into two subtasks 1) Named Entity Recognition (NER) that involves the identification of the group of words that make up the name of an entity and 2) Named Entity Classification (NEC) that determines the category of a named entity. We have focused our work on the improvement of the NER task, generating four different taggers with the same training corpus and combining them using a stacking scheme. We improve the baseline of the NER task (Fβ=1 value of 81.84) up to a value of 88.37. When a NEC module is added to the NER system the performance of the whole NEE task is also improved. A value of 70.47 is achieved from a baseline of 66.07

idUS. Depósito de Investigación Universidad de Sevilla

UH-MatCom at eHealth-KD Challenge 2020: Deep-Learning and Ensemble Models for Knowledge Discovery in Spanish Documents

Author: Consuegra-Ayala Juan Pablo
Palomar Manuel
Publication venue: CEUR
Publication date: 01/01/2020
Field of study

The eHealth-KD challenge hosted at IberLEF 2020 proposes a set of resources and evaluation scenarios to encourage the development of systems for the automatic extraction of knowledge from unstructured text. This paper describes the system presented by team UH-MatCom in the challenge. Several deep-learning models are trained and ensembled to automatically extract relevant entities and relations from plain text documents. State of the art techniques such as BERT, Bi-LSTM, and CRF are applied. The use of external knowledge sources such as ConceptNet is explored. The system achieved average results in the challenge, ranking fifth across all different evaluation scenarios. The ensemble method produced a slight improvement in performance. Additional work needs to be done for the relation extraction task to successfully benefit from external knowledge sources.This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089)

Repositorio Institucional de la Universidad de Alicante

Cross-sentence contexts in Named Entity Recognition with BERT

Author: Luoma Jouni
Publication venue
Publication date: 15/06/2021
Field of study

Named entity recognition (NER) is a task under the broader scope of Natural Language Processing (NLP). The computational task of NER is often cast as a sequence classification task where the goal is to label each word (or token) in the input sequence with a class from a predefined set of classes. The development of deep transfer learning methodologies in recent years has greatly influenced both NLP and NER. There have been improvements in the performance of NER models but at the same time the use of cross-sentence context, the sentences around the sentence of interest, has diminished in NER methods. Many of the current methods use inputs that consist of only one sentence of text at a time. It is nevertheless clear that useful information for NER is often found also elsewhere in text. Recent self-attention models like BERT can both capture long-distance relationships in input and represent inputs consisting of several sentences. This creates opportunities for making use of cross-sentence information in NLP tasks. This thesis presents a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. The study shows that adding context as additional sentences to BERT input systematically increases NER performance. Adding multiple sentences in input samples also allows the study of predictions for the sentences in different contexts. A straightforward method of Contextual Majority Voting (CMV) is proposed to combine these different predictions. The study demonstrates that using CMV increases NER performance even further. Evaluation of the proposed methods on established datasets, including the Conference on Computational Natural Language Learning CoNLL'02 and CoNLL'03 NER benchmarks, demonstrates that the proposed approach can improve on the state-of-the-art NER results for English, Dutch, and Finnish, achieves the best reported BERT-based results for German, and is on par with other BERT-based approaches for Spanish. The methods implemented for this work are published under open licenses

UTUPub