1,920 research outputs found

    Named Entity Recognition without Gazetteers

    Get PDF
    It is often claimed that Named En-tity recognition systems need extensive gazetteers|lists of names of people, or-ganisations, locations, and other named entities. Indeed, the compilation of such gazetteers is sometimes mentioned as a bottleneck in the design of Named En-tity recognition systems. We report on a Named Entity recogni-tion system which combines rule-based grammars with statistical (maximum en-tropy) models. We report on the sys-tem's performance with gazetteers of dif-ferent types and dierent sizes, using test material from the muc{7 competition. We show that, for the text type and task of this competition, it is suÆcient to use relatively small gazetteers of well-known names, rather than large gazetteers of low-frequency names. We conclude with observations about the domain indepen-dence of the competition and of our ex-periments.

    Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity

    Get PDF
    In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)

    KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

    Full text link
    KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them

    Semi-supervised sequence tagging with bidirectional language models

    Full text link
    Pre-trained word embeddings learned from unlabeled text have become a standard component of neural network architectures for NLP tasks. However, in most cases, the recurrent network that operates on word-level representations to produce context sensitive representations is trained on relatively little labeled data. In this paper, we demonstrate a general semi-supervised approach for adding pre- trained context embeddings from bidirectional language models to NLP systems and apply it to sequence labeling tasks. We evaluate our model on two standard datasets for named entity recognition (NER) and chunking, and in both cases achieve state of the art results, surpassing previous systems that use other forms of transfer or joint learning with additional labeled data and task specific gazetteers.Comment: To appear in ACL 201

    Boosting Named Entity Recognition with Neural Character Embeddings

    Full text link
    Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level representations (embeddings) to perform sequential classification. We perform an extensive number of experiments using two annotated corpora in two different languages: HAREM I corpus, which contains texts in Portuguese; and the SPA CoNLL-2002 corpus, which contains texts in Spanish. Our experimental results shade light on the contribution of neural character embeddings for NER. Moreover, we demonstrate that the same neural network which has been successfully applied to POS tagging can also achieve state-of-the-art results for language-independet NER, using the same hyperparameters, and without any handcrafted features. For the HAREM I corpus, CharWNN outperforms the state-of-the-art system by 7.9 points in the F1-score for the total scenario (ten NE classes), and by 7.2 points in the F1 for the selective scenario (five NE classes).Comment: 9 page

    A geo-temporal information extraction service for processing descriptive metadata in digital libraries

    Get PDF
    In the context of digital map libraries, resources are usually described according to metadata records that define the relevant subject, location, time-span, format and keywords. On what concerns locations and time-spans, metadata records are often incomplete or they provide information in a way that is not machine-understandable (e.g. textual descriptions). This paper presents techniques for extracting geotemporal information from text, using relatively simple text mining methods that leverage on a Web gazetteer service. The idea is to go from human-made geotemporal referencing (i.e. using place and period names in textual expressions) into geo-spatial coordinates and time-spans. A prototype system, implementing the proposed methods, is described in detail. Experimental results demonstrate the efficiency and accuracy of the proposed approaches

    Named Entity Recognition for English Language Using Deep Learning Based Bi Directional LSTM-RNN

    Get PDF
    The NER has been important in different applications like data Retrieval and Extraction, Text Summarization, Machine Translation, Question Answering (Q-A), etc. While several investigations have been carried out for NER in English, a high-accuracy tool still must be designed per the Literature Survey. This paper suggests an English Named Entities Recognition methodology using NLP algorithms called Bi-Directional Long short-term memory-based recurrent neural network (LSTM-RNN). Most English Language NER systems use detailed features and handcrafted algorithms with gazetteers. The proposed model is language-independent and has no domain-specific features or handcrafted algorithms. Also, it depends on semantic knowledge from word vectors realized by an unsupervised learning algorithm on an unannotated corpus. It achieved state-of-the-art performance in English without the use of any morphological research or without using gazetteers of any sort. A little database group of 200 sentences includes 3080 words. The features selection and generations are presented to catch the Name Entity. The proposed work is desired to forecast the Name Entity of the focus words in a sentence with high accuracy with the benefit of practical knowledge acquisition techniques
    • …
    corecore