83,886 research outputs found
KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition
KnowNER is a multilingual Named Entity Recognition (NER) system that
leverages different degrees of external knowledge. A novel modular framework
divides the knowledge into four categories according to the depth of knowledge
they convey. Each category consists of a set of features automatically
generated from different information sources (such as a knowledge-base, a list
of names or document-specific semantic annotations) and is used to train a
conditional random field (CRF). Since those information sources are usually
multilingual, KnowNER can be easily trained for a wide range of languages. In
this paper, we show that the incorporation of deeper knowledge systematically
boosts accuracy and compare KnowNER with state-of-the-art NER approaches across
three languages (i.e., English, German and Spanish) performing amongst
state-of-the art systems in all of them
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
Entity Recognition at First Sight: Improving NER with Eye Movement Information
Previous research shows that eye-tracking data contains information about the
lexical and syntactic properties of text, which can be used to improve natural
language processing models. In this work, we leverage eye movement features
from three corpora with recorded gaze information to augment a state-of-the-art
neural model for named entity recognition (NER) with gaze embeddings. These
corpora were manually annotated with named entity labels. Moreover, we show how
gaze features, generalized on word type level, eliminate the need for recorded
eye-tracking data at test time. The gaze-augmented models for NER using
token-level and type-level features outperform the baselines. We present the
benefits of eye-tracking features by evaluating the NER models on both
individual datasets as well as in cross-domain settings.Comment: Accepted at NAACL-HLT 201
Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection
The state-of-the-art named entity recognition (NER) systems are supervised
machine learning models that require large amounts of manually annotated data
to achieve high accuracy. However, annotating NER data by human is expensive
and time-consuming, and can be quite difficult for a new language. In this
paper, we present two weakly supervised approaches for cross-lingual NER with
no human annotation in a target language. The first approach is to create
automatically labeled NER data for a target language via annotation projection
on comparable corpora, where we develop a heuristic scheme that effectively
selects good-quality projection-labeled data from noisy data. The second
approach is to project distributed representations of words (word embeddings)
from a target language to a source language, so that the source-language NER
system can be applied to the target language without re-training. We also
design two co-decoding schemes that effectively combine the outputs of the two
projection-based approaches. We evaluate the performance of the proposed
approaches on both in-house and open NER data for several target languages. The
results show that the combined systems outperform three other weakly supervised
approaches on the CoNLL data.Comment: 11 pages, The 55th Annual Meeting of the Association for
Computational Linguistics (ACL), 201
- …