Search CORE

56 research outputs found

A scalable machine-learning approach to recognize chemical names within large text databases

Author: A Zamora
CH Davis
E Charniak
G Nenadic
I Donaldson
J Finkel
JD Wren
JD Wren
JD Wren
JD Wren
Jonathan D Wren
L Hirschman
LR Rabiner
M Krauthammer
M Narayanaswamy
MA Drake
MD Yandell
PAV Hall
S Albert
S Raychaudhuri
U Leser
WJ Wilbur
WR Pearson
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

MOTIVATION: The use or study of chemical compounds permeates almost every scientific field and in each of them, the amount of textual information is growing rapidly. There is a need to accurately identify chemical names within text for a number of informatics efforts such as database curation, report summarization, tagging of named entities and keywords, or the development/curation of reference databases. RESULTS: A first-order Markov Model (MM) was evaluated for its ability to distinguish chemical names from words, yielding ~93% recall in recognizing chemical terms and ~99% precision in rejecting non-chemical terms on smaller test sets. However, because total false-positive events increase with the number of words analyzed, the scalability of name recognition was measured by processing 13.1 million MEDLINE records. The method yielded precision ranges from 54.7% to 100%, depending upon the cutoff score used, averaging 82.7% for approximately 1.05 million putative chemical terms extracted. Extracted chemical terms were analyzed to estimate the number of spelling variants per term, which correlated with the total number of times the chemical name appeared in MEDLINE. This variability in term construction was found to affect both information retrieval and term mapping when using PubMed and Ovid

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts

Author: AB Clegg
C Nedellec
D Klein
D Rebholz-Schuhmann
E Charniak
H Jose
Hans-Werner Mewes
I Donaldson
J Tsujii
J-H Eom
Jason Weston
K Fundel
L Hirschman
M Lease
M Palmer
Mark Isalan
R Collobert
R Collobert
R Hoffmann
Ronan Collobert
RT-H Tsai
S Bethard
S Pradhan
TH Tsai
Thorsten Barnickel
Volker Stümpflen
Y Kogan
Y Miyao
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

To reduce the increasing amount of time spent on literature search in the life sciences, several methods for automated knowledge extraction have been developed. Co-occurrence based approaches can deal with large text corpora like MEDLINE in an acceptable time but are not able to extract any specific type of semantic relation. Semantic relation extraction methods based on syntax trees, on the other hand, are computationally expensive and the interpretation of the generated trees is difficult. Several natural language processing (NLP) approaches for the biomedical domain exist focusing specifically on the detection of a limited set of relation types. For systems biology, generic approaches for the detection of a multitude of relation types which in addition are able to process large text corpora are needed but the number of systems meeting both requirements is very limited. We introduce the use of SENNA (“Semantic Extraction using a Neural Network Architecture”), a fast and accurate neural network based Semantic Role Labeling (SRL) program, for the large scale extraction of semantic relations from the biomedical literature. A comparison of processing times of SENNA and other SRL systems or syntactical parsers used in the biomedical domain revealed that SENNA is the fastest Proposition Bank (PropBank) conforming SRL program currently available. 89 million biomedical sentences were tagged with SENNA on a 100 node cluster within three days. The accuracy of the presented relation extraction approach was evaluated on two test sets of annotated sentences resulting in precision/recall values of 0.71/0.43. We show that the accuracy as well as processing speed of the proposed semantic relation extraction approach is sufficient for its large scale application on biomedical text. The proposed approach is highly generalizable regarding the supported relation types and appears to be especially suited for general-purpose, broad-scale text mining systems. The presented approach bridges the gap between fast, cooccurrence-based approaches lacking semantic relations and highly specialized and computationally demanding NLP approaches

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PuSH

Prediction of Preterm Deliveries from EHG Signals Using Machine Learning

Author: A Greenough
Abir Hussain
C Buhimschi
C Buhimschi
C Buhimschi
C Rabotti
Chelsea Dobbins
Dhiya Al-Jumeily
E Charniak
G Fele-Žorž
H Leman
I Verdenik
J Gondry
J Nahar
JS Richman
L Tong
LJ Mangham
LJ Muglia
M Doret
M Hassan
M Lucovnik
M Lucovnik
M McPheeters
MO Diab
MO Diab
MP Vinken
NV Chawla
P Carre
Paul Fergus
Pauline Cheung
R Blagus
R Rattihalli
RE Garfield
RE Garfiled
RL Goldenberg
Shamaila Iram
T Fawcett
T Sun
TA Lasko
W Lin
WJ Lammers
WL Maner
WL Maner
WL Maner
Y Wang
Zhi Wei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 28/10/2013
Field of study

There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier

LJMU Research Online (Liverpool John Moores University)

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Queensland eSpace

University of Turku in the BioNLP'11 Shared Task

Author: A Jimeno Yepes
D McClosky
D McClosky
de Marneffe
E Buyko
E Charniak
Filip Ginter
H Kilicoglu
H Kilicoglu
I Tsochantaridis
J Björne
J Björne
J Björne
J Heimonen
J Jourde
Jari Björne
JD Kim
JD Kim
JD Kim
JP Euzéby
M Miwa
M Miwa
MC de Marneffe
MF Porter
N Nguyen
P Stenetorp
R Bossy
S Pyysalo
S Pyysalo
S Riedel
S Riedel
S Riedel
S Van Landeghem
S Van Landeghem
T Ohta
Tapio Salakoski
Y Kim
Z Ratkovic
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Semantically linking molecular entities in literature through entity relationships

Author: A Airola
A Reverter
Bernard De Baets
C Burgess
D Jurgens
D McClosky
DLT Rohde
E Charniak
EW Sayers
H Kilicoglu
I Tsochantaridis
J Björne
J Björne
J Björne
J Björne
Jari Björne
JD Kim
JD Kim
JD Kim
M Buckland
M de Marneffe
M de Marneffe
M Krallinger
M Miwa
M Sahlgren
MF Porter
R Leaman
S Pyysalo
S Pyysalo
S Pyysalo
S van Dongen
S Van Landeghem
S Van Landeghem
S Van Landeghem
S Van Landeghem
S Van Landeghem
S Van Landeghem
Sofie Van Landeghem
T Ohta
Tapio Salakoski
The UniProt Consortium
Thomas Abeel
TK Landauer
VN Vapnik
Yves Van de Peer
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Background Text mining tools have gained popularity to process the vast amount of available research articles in the biomedical literature. It is crucial that such tools extract information with a sufficient level of detail to be applicable in real life scenarios. Studies of mining non-causal molecular relations attribute to this goal by formally identifying the relations between genes, promoters, complexes and various other molecular entities found in text. More importantly, these studies help to enhance integration of text mining results with database facts. Results We describe, compare and evaluate two frameworks developed for the prediction of non-causal or 'entity' relations (REL) between gene symbols and domain terms. For the corresponding REL challenge of the BioNLP Shared Task of 2011, these systems ranked first (57.7% F-score) and second (41.6% F-score). In this paper, we investigate the performance discrepancy of 16 percentage points by benchmarking on a related and more extensive dataset, analysing the contribution of both the term detection and relation extraction modules. We further construct a hybrid system combining the two frameworks and experiment with intersection and union combinations, achieving respectively high-precision and high-recall results. Finally, we highlight extremely high-performance results (F-score > 90%) obtained for the specific subclass of embedded entity relations that are essential for integrating text mining predictions with database facts. Conclusions The results from this study will enable us in the near future to annotate semantic relations between molecular entities in the entire scientific literature available through PubMed. The recent release of the EVEX dataset, containing biomolecular event predictions for millions of PubMed articles, is an interesting and exciting opportunity to overlay these entity relations with event predictions on a literature-wide scale

Crossref

TU Delft Repository

Springer - Publisher Connector

Ghent University Academic Bibliography

PubMed Central

Archivsystem Ask23