Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Matos, Emanuel; Miguel, Pedro

Towards Automatic Creation of Annotations to Foster Development of Named Entity Recognizers

Authors: Emanuel Matos
Pedro Miguel
Publication date: 1 January 2021
Publisher: OASIcs - OpenAccess Series in Informatics. 10th Symposium on Languages, Applications and Technologies (SLATE 2021)
Doi

Abstract

Named Entity Recognition (NER) is an essential step for many natural language processing tasks, including Information Extraction. Despite recent advances, particularly using deep learning techniques, the creation of accurate named entity recognizers continues a complex task, highly dependent on annotated data availability. To foster existence of NER systems for new domains it is crucial to obtain the required large volumes of annotated data with low or no manual labor. In this paper it is proposed a system to create the annotated data automatically, by resorting to a set of existing NERs and information sources (DBpedia). The approach was tested with documents of the Tourism domain. Distinct methods were applied for deciding the final named entities and respective tags. The results show that this approach can increase the confidence on annotations and/or augment the number of categories possible to annotate. This paper also presents examples of new NERs that can be rapidly created with the obtained annotated data. The annotated data, combined with the possibility to apply both the ensemble of NER systems and the new Gazetteer-based NERs to large corpora, create the necessary conditions to explore the recent neural deep learning state-of-art approaches to NER (ex: BERT) in domains with scarce or nonexistent data for training

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Dagstuhl Research Online Publication Server

oai:drops-oai.dagstuhl.de:1442...

Last time updated on 02/12/2021