2 research outputs found
Numerical Atrribute Extraction from Clinical Texts
This paper describes about information extraction system, which is an
extension of the system developed by team Hitachi for "Disease/Disorder
Template filling" task organized by ShARe/CLEF eHealth Evolution Lab 2014. In
this extension module we focus on extraction of numerical attributes and values
from discharge summary records and associating correct relation between
attributes and values. We solve the problem in two steps. First step is
extraction of numerical attributes and values, which is developed as a Named
Entity Recognition (NER) model using Stanford NLP libraries. Second step is
correctly associating the attributes to values, which is developed as a
relation extraction module in Apache cTAKES framework. We integrated Stanford
NER model as cTAKES pipeline component and used in relation extraction module.
Conditional Random Field (CRF) algorithm is used for NER and Support Vector
Machines (SVM) for relation extraction. For attribute value relation
extraction, we observe 95% accuracy using NER alone and combined accuracy of
87% with NER and SVM.Comment: 6 Page
Transfer Learning for Scientific Data Chain Extraction in Small Chemical Corpus with BERT-CRF Model
Computational chemistry develops fast in recent years due to the rapid growth
and breakthroughs in AI. Thanks for the progress in natural language
processing, researchers can extract more fine-grained knowledge in publications
to stimulate the development in computational chemistry. While the works and
corpora in chemical entity extraction have been restricted in the biomedicine
or life science field instead of the chemistry field, we build a new corpus in
chemical bond field annotated for 7 types of entities: compound, solvent,
method, bond, reaction, pKa and pKa value. This paper presents a novel BERT-CRF
model to build scientific chemical data chains by extracting 7 chemical
entities and relations from publications. And we propose a joint model to
extract the entities and relations simultaneously. Experimental results on our
Chemical Special Corpus demonstrate that we achieve state-of-art and
competitive NER performance