346 research outputs found
BERT Based Clinical Knowledge Extraction for Biomedical Knowledge Graph Construction and Analysis
Background : Knowledge is evolving over time, often as a result of new
discoveries or changes in the adopted methods of reasoning. Also, new facts or
evidence may become available, leading to new understandings of complex
phenomena. This is particularly true in the biomedical field, where scientists
and physicians are constantly striving to find new methods of diagnosis,
treatment and eventually cure. Knowledge Graphs (KGs) offer a real way of
organizing and retrieving the massive and growing amount of biomedical
knowledge.
Objective : We propose an end-to-end approach for knowledge extraction and
analysis from biomedical clinical notes using the Bidirectional Encoder
Representations from Transformers (BERT) model and Conditional Random Field
(CRF) layer.
Methods : The approach is based on knowledge graphs, which can effectively
process abstract biomedical concepts such as relationships and interactions
between medical entities. Besides offering an intuitive way to visualize these
concepts, KGs can solve more complex knowledge retrieval problems by
simplifying them into simpler representations or by transforming the problems
into representations from different perspectives. We created a biomedical
Knowledge Graph using using Natural Language Processing models for named entity
recognition and relation extraction. The generated biomedical knowledge graphs
(KGs) are then used for question answering.
Results : The proposed framework can successfully extract relevant structured
information with high accuracy (90.7% for Named-entity recognition (NER), 88%
for relation extraction (RE)), according to experimental findings based on
real-world 505 patient biomedical unstructured clinical notes.
Conclusions : In this paper, we propose a novel end-to-end system for the
construction of a biomedical knowledge graph from clinical textual using a
variation of BERT models
Ensemble Transfer Learning for Multilingual Coreference Resolution
Entity coreference resolution is an important research problem with many
applications, including information extraction and question answering.
Coreference resolution for English has been studied extensively. However, there
is relatively little work for other languages. A problem that frequently occurs
when working with a non-English language is the scarcity of annotated training
data. To overcome this challenge, we design a simple but effective
ensemble-based framework that combines various transfer learning (TL)
techniques. We first train several models using different TL methods. Then,
during inference, we compute the unweighted average scores of the models'
predictions to extract the final set of predicted clusters. Furthermore, we
also propose a low-cost TL method that bootstraps coreference resolution models
by utilizing Wikipedia anchor texts. Leveraging the idea that the coreferential
links naturally exist between anchor texts pointing to the same article, our
method builds a sizeable distantly-supervised dataset for the target language
that consists of tens of thousands of documents. We can pre-train a model on
the pseudo-labeled dataset before finetuning it on the final target dataset.
Experimental results on two benchmark datasets, OntoNotes and SemEval, confirm
the effectiveness of our methods. Our best ensembles consistently outperform
the baseline approach of simple training by up to 7.68% in the F1 score. These
ensembles also achieve new state-of-the-art results for three languages:
Arabic, Dutch, and Spanish
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
Stress Testing BERT Anaphora Resolution Models for Reaction Extraction in Chemical Patents
The high volume of published chemical patents and the importance of a timely
acquisition of their information gives rise to automating information
extraction from chemical patents. Anaphora resolution is an important component
of comprehensive information extraction, and is critical for extracting
reactions. In chemical patents, there are five anaphoric relations of interest:
co-reference, transformed, reaction associated, work up, and contained. Our
goal is to investigate how the performance of anaphora resolution models for
reaction texts in chemical patents differs in a noise-free and noisy
environment and to what extent we can improve the robustness against noise of
the model
- …