1,086 research outputs found
Normalization of Disease Mentions with Convolutional Neural Networks
Normalization of disease mentions has an important role in biomedical natural language processing (BioNLP) applications, such as the construction of biomedical databases. Various disease mention normalization systems have been developed, though state-of-the-art systems either rely on candidate concept generation, or do not generalize to new concepts not seen during training.
This thesis explores the possibility of building a disease mention normalization system that both generalizes to unseen concepts and does not rely on candidate generation. To this end, it is hypothesized that modern neural networks are sophisticated enough to solve this problem. This hypothesis is tested by building a normalization system using deep learning approaches, and evaluating the accuracy of this system on the NCBI disease corpus. The system leverages semantic information in the biomedical literature by using continuous vector space representations for strings of disease mentions and concepts. A neural encoder is trained to encode vector representations of strings of disease mentions and concepts. This encoder theoretically enables the model to generalize to unseen concepts during training. The encoded strings are used to compare the similarity between concepts and a given mention. Viewing normalization as a ranking problem, the concept with the highest similarity estimated is selected as the predicted concept for the mention.
For the development of the system, synthetic data is used for pre-training to facilitate the learning of the model. In addition, various architectures are explored. While the model succeeds in prediction without candidate concept generation, its performance is not comparable to those of the state-of-the-art systems. Normalization of disease mentions without candidate generation while including the possibility for the system to generalize to unseen concepts is not trivial. Further efforts can be focused on, for example, testing more neural architectures, and the use of more sophisticated word representations
Bi-Encoders based Species Normalization -- Pairwise Sentence Learning to Rank
Motivation: Biomedical named-entity normalization involves connecting
biomedical entities with distinct database identifiers in order to facilitate
data integration across various fields of biology. Existing systems for
biomedical named entity normalization heavily rely on dictionaries, manually
created rules, and high-quality representative features such as lexical or
morphological characteristics. However, recent research has investigated the
use of neural network-based models to reduce dependence on dictionaries,
manually crafted rules, and features. Despite these advancements, the
performance of these models is still limited due to the lack of sufficiently
large training datasets. These models have a tendency to overfit small training
corpora and exhibit poor generalization when faced with previously unseen
entities, necessitating the redesign of rules and features. Contribution: We
present a novel deep learning approach for named entity normalization, treating
it as a pair-wise learning to rank problem. Our method utilizes the widely-used
information retrieval algorithm Best Matching 25 to generate candidate
concepts, followed by the application of bi-directional encoder representation
from the encoder (BERT) to re-rank the candidate list. Notably, our approach
eliminates the need for feature-engineering or rule creation. We conduct
experiments on species entity types and evaluate our method against
state-of-the-art techniques using LINNAEUS and S800 biomedical corpora. Our
proposed approach surpasses existing methods in linking entities to the NCBI
taxonomy. To the best of our knowledge, there is no existing neural
network-based approach for species normalization in the literature
Medical Entity Linking using Triplet Network
Entity linking (or Normalization) is an essential task in text mining that
maps the entity mentions in the medical text to standard entities in a given
Knowledge Base (KB). This task is of great importance in the medical domain. It
can also be used for merging different medical and clinical ontologies. In this
paper, we center around the problem of disease linking or normalization. This
task is executed in two phases: candidate generation and candidate scoring. In
this paper, we present an approach to rank the candidate Knowledge Base entries
based on their similarity with disease mention. We make use of the Triplet
Network for candidate ranking. While the existing methods have used carefully
generated sieves and external resources for candidate generation, we introduce
a robust and portable candidate generation scheme that does not make use of the
hand-crafted rules. Experimental results on the standard benchmark NCBI disease
dataset demonstrate that our system outperforms the prior methods by a
significant margin.Comment: ClinicalNLP@NAACL 201
GRAPHENE: A Precise Biomedical Literature Retrieval Engine with Graph Augmented Deep Learning and External Knowledge Empowerment
Effective biomedical literature retrieval (BLR) plays a central role in
precision medicine informatics. In this paper, we propose GRAPHENE, which is a
deep learning based framework for precise BLR. GRAPHENE consists of three main
different modules 1) graph-augmented document representation learning; 2) query
expansion and representation learning and 3) learning to rank biomedical
articles. The graph-augmented document representation learning module
constructs a document-concept graph containing biomedical concept nodes and
document nodes so that global biomedical related concept from external
knowledge source can be captured, which is further connected to a BiLSTM so
both local and global topics can be explored. Query expansion and
representation learning module expands the query with abbreviations and
different names, and then builds a CNN-based model to convolve the expanded
query and obtain a vector representation for each query. Learning to rank
minimizes a ranking loss between biomedical articles with the query to learn
the retrieval function. Experimental results on applying our system to TREC
Precision Medicine track data are provided to demonstrate its effectiveness.Comment: CIKM 201
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking
Extraction from raw text to a knowledge base of entities and fine-grained
types is often cast as prediction into a flat set of entity and type labels,
neglecting the rich hierarchies over types and entities contained in curated
ontologies. Previous attempts to incorporate hierarchical structure have
yielded little benefit and are restricted to shallow ontologies. This paper
presents new methods using real and complex bilinear mappings for integrating
hierarchical information, yielding substantial improvement over flat
predictions in entity linking and fine-grained entity typing, and achieving new
state-of-the-art results for end-to-end models on the benchmark FIGER dataset.
We also present two new human-annotated datasets containing wide and deep
hierarchies which we will release to the community to encourage further
research in this direction: MedMentions, a collection of PubMed abstracts in
which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet,
which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k
entity types. In experiments on all three datasets we show substantial gains
from hierarchy-aware training.Comment: ACL 201
- …