2,962 research outputs found
Information Extraction from Scientific Literature for Method Recommendation
As a research community grows, more and more papers are published each year.
As a result there is increasing demand for improved methods for finding
relevant papers, automatically understanding the key ideas and recommending
potential methods for a target problem. Despite advances in search engines, it
is still hard to identify new technologies according to a researcher's need.
Due to the large variety of domains and extremely limited annotated resources,
there has been relatively little work on leveraging natural language processing
in scientific recommendation. In this proposal, we aim at making scientific
recommendations by extracting scientific terms from a large collection of
scientific papers and organizing the terms into a knowledge graph. In
preliminary work, we trained a scientific term extractor using a small amount
of annotated data and obtained state-of-the-art performance by leveraging large
amount of unannotated papers through applying multiple semi-supervised
approaches. We propose to construct a knowledge graph in a way that can make
minimal use of hand annotated data, using only the extracted terms,
unsupervised relational signals such as co-occurrence, and structural external
resources such as Wikipedia. Latent relations between scientific terms can be
learned from the graph. Recommendations will be made through graph inference
for both observed and unobserved relational pairs.Comment: Thesis Proposal. arXiv admin note: text overlap with arXiv:1708.0607
Effective Use of Bidirectional Language Modeling for Transfer Learning in Biomedical Named Entity Recognition
Biomedical named entity recognition (NER) is a fundamental task in text
mining of medical documents and has many applications. Deep learning based
approaches to this task have been gaining increasing attention in recent years
as their parameters can be learned end-to-end without the need for
hand-engineered features. However, these approaches rely on high-quality
labeled data, which is expensive to obtain. To address this issue, we
investigate how to use unlabeled text data to improve the performance of NER
models. Specifically, we train a bidirectional language model (BiLM) on
unlabeled data and transfer its weights to "pretrain" an NER model with the
same architecture as the BiLM, which results in a better parameter
initialization of the NER model. We evaluate our approach on four benchmark
datasets for biomedical NER and show that it leads to a substantial improvement
in the F1 scores compared with the state-of-the-art approaches. We also show
that BiLM weight transfer leads to a faster model training and the pretrained
model requires fewer training examples to achieve a particular F1 score.Comment: Machine Learning for Healthcare (MLHC) 2018, Comments: 12 pages,
updated authors affiliation
Dynamic Transfer Learning for Named Entity Recognition
State-of-the-art named entity recognition (NER) systems have been improving
continuously using neural architectures over the past several years. However,
many tasks including NER require large sets of annotated data to achieve such
performance. In particular, we focus on NER from clinical notes, which is one
of the most fundamental and critical problems for medical text analysis. Our
work centers on effectively adapting these neural architectures towards
low-resource settings using parameter transfer methods. We complement a
standard hierarchical NER model with a general transfer learning framework
consisting of parameter sharing between the source and target tasks, and
showcase scores significantly above the baseline architecture. These sharing
schemes require an exponential search over tied parameter sets to generate an
optimal configuration. To mitigate the problem of exhaustively searching for
model optimization, we propose the Dynamic Transfer Networks (DTN), a gated
architecture which learns the appropriate parameter sharing scheme between
source and target datasets. DTN achieves the improvements of the optimized
transfer learning framework with just a single training setting, effectively
removing the need for exponential search.Comment: AAAI 2019 Workshop on Health Intelligenc
Knowledge-Augmented Language Model and its Application to Unsupervised Named-Entity Recognition
Traditional language models are unable to efficiently model entity names
observed in text. All but the most popular named entities appear infrequently
in text providing insufficient context. Recent efforts have recognized that
context can be generalized between entity names that share the same type (e.g.,
\emph{person} or \emph{location}) and have equipped language models with access
to an external knowledge base (KB). Our Knowledge-Augmented Language Model
(KALM) continues this line of work by augmenting a traditional model with a KB.
Unlike previous methods, however, we train with an end-to-end predictive
objective optimizing the perplexity of text. We do not require any additional
information such as named entity tags. In addition to improving language
modeling performance, KALM learns to recognize named entities in an entirely
unsupervised way by using entity type information latent in the model. On a
Named Entity Recognition (NER) task, KALM achieves performance comparable with
state-of-the-art supervised models. Our work demonstrates that named entities
(and possibly other types of world knowledge) can be modeled successfully using
predictive learning and training on large corpora of text without any
additional information.Comment: NAACL 2019; updated to cite Zhou et al. (2018) EMNLP as a piece of
related wor
A Practical Incremental Learning Framework For Sparse Entity Extraction
This work addresses challenges arising from extracting entities from textual
data, including the high cost of data annotation, model accuracy, selecting
appropriate evaluation criteria, and the overall quality of annotation. We
present a framework that integrates Entity Set Expansion (ESE) and Active
Learning (AL) to reduce the annotation cost of sparse data and provide an
online evaluation method as feedback. This incremental and interactive learning
framework allows for rapid annotation and subsequent extraction of sparse data
while maintaining high accuracy. We evaluate our framework on three publicly
available datasets and show that it drastically reduces the cost of sparse
entity annotation by an average of 85% and 45% to reach 0.9 and 1.0 F-Scores
respectively. Moreover, the method exhibited robust performance across all
datasets.Comment: https://www.aclweb.org/anthology/C18-1059
Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models
Many business documents processed in modern NLP and IR pipelines are visually
rich: in addition to text, their semantics can also be captured by visual
traits such as layout, format, and fonts. We study the problem of information
extraction from visually rich documents (VRDs) and present a model that
combines the power of large pre-trained language models and graph neural
networks to efficiently encode both textual and visual information in business
documents. We further introduce new fine-tuning objectives to improve in-domain
unsupervised fine-tuning to better utilize large amount of unlabeled in-domain
data. We experiment on real world invoice and resume data sets and show that
the proposed method outperforms strong text-based RoBERTa baselines by 6.3%
absolute F1 on invoices and 4.7% absolute F1 on resumes. When evaluated in a
few-shot setting, our method requires up to 30x less annotation data than the
baseline to achieve the same level of performance at ~90% F1.Comment: 10 pages, to appear in SIGIR 2020 Industry Trac
Query-Based Named Entity Recognition
In this paper, we propose a new strategy for the task of named entity
recognition (NER). We cast the task as a query-based machine reading
comprehension task: e.g., the task of extracting entities with PER is
formalized as answering the question of "which person is mentioned in the text
?". Such a strategy comes with the advantage that it solves the long-standing
issue of handling overlapping or nested entities (the same token that
participates in more than one entity categories) with sequence-labeling
techniques for NER. Additionally, since the query encodes informative prior
knowledge, this strategy facilitates the process of entity extraction, leading
to better performances. We experiment the proposed model on five widely used
NER datasets on English and Chinese, including MSRA, Resume, OntoNotes, ACE04
and ACE05. The proposed model sets new SOTA results on all of these datasets.Comment: Please refer to the full version of this paper: A unified framework
for named entity recognition arXiv:1910.1147
UH-MatCom at eHealth-KD Challenge 2020: Deep-Learning and Ensemble Models for Knowledge Discovery in Spanish Documents
The eHealth-KD challenge hosted at IberLEF 2020 proposes a set of resources and evaluation scenarios to encourage the development of systems for the automatic extraction of knowledge from unstructured text. This paper describes the system presented by team UH-MatCom in the challenge. Several deep-learning models are trained and ensembled to automatically extract relevant entities and relations from plain text documents. State of the art techniques such as BERT, Bi-LSTM, and CRF are applied. The use of external knowledge sources such as ConceptNet is explored. The system achieved average results in the challenge, ranking fifth across all different evaluation scenarios. The ensemble method produced a slight improvement in performance. Additional work needs to be done for the relation extraction task to successfully benefit from external knowledge sources.This research has been partially funded by the University of Alicante and the University of Havana, the Generalitat Valenciana (Conselleria d’Educació, Investigació, Cultura i Esport) and the Spanish Government through the projects LIVING-LANG (RTI2018-094653-B-C22) and SIIA (PROMETEO/2018/089, PROMETEU/2018/089)
Learning with Joint Inference and Latent Linguistic Structure in Graphical Models
Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system\u27s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of telephone , combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application.
In this dissertation we present a general framework for constructing and reasoning on joint graphical model formulations of NLP problems. Individual models are composed using weighted Boolean logic constraints, and inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches.
Additionally we propose a novel marginalization-based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task
- …