57,469 research outputs found
An integrated architecture for shallow and deep processing
We present an architecture for the integration of shallow and deep NLP components which is aimed at flexible combination of different language technologies for a range of practical current and future applications. In particular, we describe the integration of a high-level HPSG parsing system with different high-performance shallow components, ranging from named entity recognition to chunk parsing and shallow clause recognition. The NLP components enrich a representation of natural language text with layers of new XML meta-information using a single shared data structure, called the text chart. We describe details of the integration methods, and show how information extraction and language checking applications for realworld German text benefit from a deep grammatical analysis
Joint Entity Extraction and Assertion Detection for Clinical Text
Negative medical findings are prevalent in clinical reports, yet
discriminating them from positive findings remains a challenging task for
information extraction. Most of the existing systems treat this task as a
pipeline of two separate tasks, i.e., named entity recognition (NER) and
rule-based negation detection. We consider this as a multi-task problem and
present a novel end-to-end neural model to jointly extract entities and
negations. We extend a standard hierarchical encoder-decoder NER model and
first adopt a shared encoder followed by separate decoders for the two tasks.
This architecture performs considerably better than the previous rule-based and
machine learning-based systems. To overcome the problem of increased parameter
size especially for low-resource settings, we propose the Conditional Softmax
Shared Decoder architecture which achieves state-of-art results for NER and
negation detection on the 2010 i2b2/VA challenge dataset and a proprietary
de-identified clinical dataset.Comment: Accepted at the 57th Annual Meeting of the Association for
Computational Linguistics (ACL 2019
Corpora and evaluation tools for multilingual named entity grammar development
We present an effort for the development of multilingual named entity grammars in a unification-based finite-state formalism (SProUT). Following an extended version of the MUC7 standard, we have developed Named Entity Recognition grammars for German, Chinese, Japanese, French, Spanish, English, and Czech. The grammars recognize person names, organizations, geographical locations, currency, time and date expressions. Subgrammars and gazetteers are shared as much as possible for the grammars of the different languages. Multilingual corpora from the business domain are used for grammar development and evaluation. The annotation format (named entity and other linguistic information) is described. We present an evaluation tool which provides detailed statistics and diagnostics, allows for partial matching of annotations, and supports user-defined mappings between different annotation and grammar output formats
Extracting adverse drug reactions and their context using sequence labelling ensembles in TAC2017
Adverse drug reactions (ADRs) are unwanted or harmful effects experienced
after the administration of a certain drug or a combination of drugs,
presenting a challenge for drug development and drug administration. In this
paper, we present a set of taggers for extracting adverse drug reactions and
related entities, including factors, severity, negations, drug class and
animal. The systems used a mix of rule-based, machine learning (CRF) and deep
learning (BLSTM with word2vec embeddings) methodologies in order to annotate
the data. The systems were submitted to adverse drug reaction shared task,
organised during Text Analytics Conference in 2017 by National Institute for
Standards and Technology, archiving F1-scores of 76.00 and 75.61 respectively.Comment: Paper describing submission for TAC ADR shared tas
Named Entity Recognition in Twitter using Images and Text
Named Entity Recognition (NER) is an important subtask of information
extraction that seeks to locate and recognise named entities. Despite recent
achievements, we still face limitations with correctly detecting and
classifying entities, prominently in short and noisy text, such as Twitter. An
important negative aspect in most of NER approaches is the high dependency on
hand-crafted features and domain-specific knowledge, necessary to achieve
state-of-the-art results. Thus, devising models to deal with such
linguistically complex contexts is still challenging. In this paper, we propose
a novel multi-level architecture that does not rely on any specific linguistic
resource or encoded rule. Unlike traditional approaches, we use features
extracted from images and text to classify named entities. Experimental tests
against state-of-the-art NER for Twitter on the Ritter dataset present
competitive results (0.59 F-measure), indicating that this approach may lead
towards better NER models.Comment: The 3rd International Workshop on Natural Language Processing for
Informal Text (NLPIT 2017), 8 page
- …