Search CORE

1,920 research outputs found

Named Entity Recognition without Gazetteers

Author: Grover Claire
Mikheev Andrei
Moens Marc
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/1999
Field of study

It is often claimed that Named En-tity recognition systems need extensive gazetteers|lists of names of people, or-ganisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named En-tity recognition systems. We report on a Named Entity recogni-tion system which combines rule-based grammars with statistical (maximum en-tropy) models. We report on the sys-tem's performance with gazetteers of dif-ferent types and dierent sizes, using test material from the muc{7 competition. We show that, for the text type and task of this competition, it is suÆcient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. We conclude with observations about the domain indepen-dence of the competition and of our ex-periments.

CiteSeerX

Crossref

Edinburgh Research Explorer

Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

Author: Matwin Stan
Nadeau David
Turney Peter D.
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Author: Del Corro Luciano
Dembelova Tatiana
Hoffart Johannes
Seyler Dominic
Weikum Gerhard
Publication venue
Publication date: 01/01/2017
Field of study

KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them

arXiv.org e-Print Archive

MPG.PuRe

Semi-supervised sequence tagging with bidirectional language models

Author: Ammar Waleed
Bhagavatula Chandra
Peters Matthew E.
Power Russell
Publication venue
Publication date: 01/01/2017
Field of study

Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.Comment: To appear in ACL 201

arXiv.org e-Print Archive

Crossref

Boosting Named Entity Recognition with Neural Character Embeddings

Author: Guimarães Victor
Santos Cicero Nogueira dos
Publication venue
Publication date: 01/01/2015
Field of study

Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level representations (embeddings) to perform sequential classification. We perform an extensive number of experiments using two annotated corpora in two different languages: HAREM I corpus, which contains texts in Portuguese; and the SPA CoNLL-2002 corpus, which contains texts in Spanish. Our experimental results shade light on the contribution of neural character embeddings for NER. Moreover, we demonstrate that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features. For the HAREM I corpus, CharWNN outperforms the state-of-the-art system by 7.9 points in the F1-score for the total scenario (ten NE classes), and by 7.2 points in the F1 for the selective scenario (five NE classes).Comment: 9 page

arXiv.org e-Print Archive

Crossref

A geo-temporal information extraction service for processing descriptive metadata in digital libraries

Author: Borbinha José
Manguinhas H.
Martins Bruno
Siabato Vaca Willington Libardo
Publication venue: E.T.S.I. en Topografía, Geodesia y Cartografía (UPM)
Publication date: 01/01/2009
Field of study

In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches

Archivo Digital UPM

Named Entity Recognition for English Language Using Deep Learning Based Bi Directional LSTM-RNN

Author: Babu A.Ramesh
Duppati Sanjay Kumar
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 17/05/2023
Field of study

The NER has been important in different applications like data Retrieval and Extraction, Text Summarization, Machine Translation, Question Answering (Q-A), etc. While several investigations have been carried out for NER in English, a high-accuracy tool still must be designed per the Literature Survey. This paper suggests an English Named Entities Recognition methodology using NLP algorithms called Bi-Directional Long short-term memory-based recurrent neural network (LSTM-RNN). Most English Language NER systems use detailed features and handcrafted algorithms with gazetteers. The proposed model is language-independent and has no domain-specific features or handcrafted algorithms. Also, it depends on semantic knowledge from word vectors realized by an unsupervised learning algorithm on an unannotated corpus. It achieved state-of-the-art performance in English without the use of any morphological research or without using gazetteers of any sort. A little database group of 200 sentences includes 3080 words. The features selection and generations are presented to catch the Name Entity. The proposed work is desired to forecast the Name Entity of the focus words in a sentence with high accuracy with the benefit of practical knowledge acquisition techniques

International Journal on Recent and Innovation Trends in Computing and Communication