Search CORE

74 research outputs found

Getting Started in Text Mining: Part Two

Author: Andrey Rzhetsky
CE Crangle
DR Swanson
I Spasic
JD Kim
JW Huss III
KB Cohen
L Hirschman
M Fleischman
Mark B. Gerstein
Michael Seringhaus
MV Blagosklonny
NH Shah
Olga G. Troyanskaya
R Kanagasabai
R Mitkov
S Aerts
SM Douglas
W Hersh
Y Sasaki
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

BioInfer: a corpus for information extraction in the biomedical domain

Author: A Yakushiji
CF Baker
D Lin
DD Sleator
E Alphonse
E Tsivtsivadze
E Tsivtsivadze
F Ginter
Filip Ginter
G Hripcsak
H Shatkay
J Cohen
J Ding
J Kim
Jari Björne
JM Temkin
Jorma Boberg
Jouni Järvinen
Juho Heimonen
K Franzén
K Kipper
KB Cohen
KB Cohen
L Hirschman
L Salwinski
M Ashburner
N Daraselia
P Kingsbury
P Kingsbury
P Szolovits
S Aubin
S Pyysalo
S Pyysalo
S Pyysalo
S Siegel
Sampo Pyysalo
T Ohta
T Pahikkala
T Wattarujeekrit
Tapio Salakoski
TH King
Y Tateisi
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Lately, there has been a great interest in the application of information extraction methods to the biomedical domain, in particular, to the extraction of relationships of genes, proteins, and RNA from scientific publications. The development and evaluation of such methods requires annotated domain corpora. RESULTS: We present BioInfer (Bio Information Extraction Resource), a new public resource providing an annotated corpus of biomedical English. We describe an annotation scheme capturing named entities and their relationships along with a dependency analysis of sentence syntax. We further present ontologies defining the types of entities and relationships annotated in the corpus. Currently, the corpus contains 1100 sentences from abstracts of biomedical research articles annotated for relationships, named entities, as well as syntactic dependencies. Supporting software is provided with the corpus. The corpus is unique in the domain in combining these annotation types for a single set of sentences, and in the level of detail of the relationship annotation. CONCLUSION: We introduce a corpus targeted at protein, gene, and RNA relationships which serves as a resource for the development of information extraction systems and their components such as parsers and domain analyzers. The corpus will be maintained and further developed with a current version being available at

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Normalizing biomedical terms by minimizing ambiguity and variability

Author: AA Morgan
B Settles
BL Humphreys
C Blaschke
D Hanisch
E Brill
G Navarro
G Ngai
G Zhou
H Fang
H Liu
H Liu
JD Kim
JD Wren
John McNaught
K Samuel
KB Cohen
L Hirschman
L Tanabe
L Tanabe
L Yeganova
M Krauthammer
MJ Schuemie
S Kulick
Sophia Ananiadou
The UniProt Consortium
WW Cohen
Y Tsuruoka
Y Tsuruoka
Yoshimasa Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. Results We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. Conclusions The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

All-paths graph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning

Author: A Airola
A Yakushiji
AB Clegg
Antti Airola
AP Bradley
C Giuliano
C Nédellec
CD Meyer
D Zelenko
Filip Ginter
J Björne
J Ding
J Heimonen
JA Hanley
JAK Suykens
Jari Björne
JD Kim
JG Caporaso
K Fundel
KB Cohen
L Hirschman
L Hunter
M Lease
M Miwa
MC de Marneffe
P Zweigenbaum
R Bunescu
R Bunescu
R Bunescu
R Rifkin
R Sætre
S Pyysalo
S Pyysalo
S Pyysalo
S Van Landeghem
Sampo Pyysalo
T Gärtner
T Mitsumori
T Pahikkala
T Pahikkala
Tapio Pahikkala
Tapio Salakoski
Y Miyao
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The Genia Event and Protein Coreference tasks of the BioNLP Shared Task 2011

Author: A Casillas
A Vlachos
A Vlachos
Akinori Yonezawa
C Quirk
D McClosky
D Tuggener
E Emadzadeh
H Kilicoglu
H Kilicoglu
H Liu
H Poon
J Björne
J Björne
J Björne
JD Kim
JD Kim
JD Kim
JD Kim
Jin-Dong Kim
Jun'ichi Tsujii
KB Cohen
L Hirschman
M Miwa
M Miwa
N Chinchor
N Nguyen
Ngan Nguyen
NL Nguyen
Q Le Minh
QC Bui
S Riedel
S Riedel
S Riedel
Toshihisa Takagi
Y Kim
Yue Wang
Publication venue: BioMed Central
Publication date
Field of study

Crossref

PubMed Central

Concept recognition for extracting protein interaction relations from biomedical text

Crossref

Springer - Publisher Connector

PubMed Central

Extraction of pharmacokinetic evidence of drug-drug interactions from the literature

Author: A Abi-Haidar
A Agresti
A Kolchinsky
A Kolchinsky
A Lourenço
Anália Lourenço
Artemy Kolchinsky
B Percha
B Settles
BW Matthews
C Jankel
CM Bishop
DM Jessop
DS Wishart
DS Wishart
E Leopold
ER Hajjar
F Cheng
F Leitner
F Lin
F Pedregosa
G Gonzalez
H El-Shishiny
H Shatkay
Heng-Yi Wu
I Segura-Bedmar
I Segura-Bedmar
I Segura-Bedmar
I Segura-Bedmar
I Segura-Bedmar
JD Duke
JM Hall
KB Cohen
L Hirschman
L Tari
Lang Li
LJ Jensen
LL Leape
Luis M. Rocha
M Herrero-Zazo
M Huang
M Krallinger
M Krallinger
ME Wall
MF Porter
ML Becker
N Tatonetti
NP Tatonetti
P Baldi
PJ Bickel
R Boyce
R Boyce
R Nisha
RE Fan
RT McDonald
S Hennessy
Willy John Wilbur
Y Mao
Z Wang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/05/2015
Field of study

Drug-drug interaction (DDI) is a major cause of morbidity and mortality and a subject of intense scientific interest. Biomedical literature mining can aid DDI research by extracting evidence for large numbers of potential interactions from published literature and clinical databases. Though DDI is investigated in domains ranging in scale from intracellular biochemistry to human populations, literature mining has not been used to extract specific types of experimental evidence, which are reported differently for distinct experimental goals. We focus on pharmacokinetic evidence for DDI, essential for identifying causal mechanisms of putative interactions and as input for further pharmacological and pharmacoepidemiology investigations. We used manually curated corpora of PubMed abstracts and annotated sentences to evaluate the efficacy of literature mining on two tasks: first, identifying PubMed abstracts containing pharmacokinetic evidence of DDIs; second, extracting sentences containing such evidence from abstracts. We implemented a text mining pipeline and evaluated it using several linear classifiers and a variety of feature transforms. The most important textual features in the abstract and sentence classification tasks were analyzed. We also investigated the performance benefits of using features derived from PubMed metadata fields, various publicly available named entity recognizers, and pharmacokinetic dictionaries. Several classifiers performed very well in distinguishing relevant and irrelevant abstracts (reaching F10.93, MCC0.74, iAUC0.99) and sentences (F10.76, MCC0.65, iAUC0.83). We found that word bigram features were important for achieving optimal classifier performance and that features derived from Medical Subject Headings (MeSH) terms significantly improved abstract classification. We also found that some drug-related named entity recognition tools and dictionaries led to slight but significant improvements, especially in classification of evidence sentences. Based on our thorough analysis of classifiers and feature transforms and the high classification performance achieved, we demonstrate that literature mining can aid DDI discovery by supporting automatic extraction of specific types of experimental evidence.National Institutes of Health, National Library of Medicine Program, grant 01LM011945-01 "BLR: Evidence-based Drug-Interaction Discovery: In-Vivo, In-Vitro and Clinical," a grant from the Indiana University Collaborative Research Program 2013, "Drug-Drug Interaction Prediction from Large-scale Mining of Literature and Patient Records," as well as a grant from the joint program between the Fundação Luso-Americana para o Desenvolvimento (Portugal) and National Science Foundation (USA), 2012-2014, "Network Mining For Gene Regulation And Biochemical Signaling.

arXiv.org e-Print Archive

Public Library of Science (PLOS)

CiteSeerX

Universidade do Minho: RepositoriUM

Access to Research and Communications Annals

Crossref

IUPUIScholarWorks

Directory of Open Access Journals

PubMed Central

OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

Abstract Background Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at <url>http://bionlp.sourceforge.net/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Corpus annotation for mining biomedical events from literature

Abstract Background Advanced Text Mining (TM) such as semantic enrichment of papers, event or relation extraction, and intelligent Question Answering have increasingly attracted attention in the bio-medical domain. For such attempts to succeed, text annotation from the biological point of view is indispensable. However, due to the complexity of the task, semantic annotation has never been tried on a large scale, apart from relatively simple term annotation. Results We have completed a new type of semantic annotation, event annotation, which is an addition to the existing annotations in the GENIA corpus. The corpus has already been annotated with POS (Parts of Speech), syntactic trees, terms, etc. The new annotation was made on half of the GENIA corpus, consisting of 1,000 Medline abstracts. It contains 9,372 sentences in which 36,114 events are identified. The major challenges during event annotation were (1) to design a scheme of annotation which meets specific requirements of text annotation, (2) to achieve biology-oriented annotation which reflect biologists' interpretation of text, and (3) to ensure the homogeneity of annotation quality across annotators. To meet these challenges, we introduced new concepts such as Single-facet Annotation and Semantic Typing, which have collectively contributed to successful completion of a large scale annotation. Conclusion The resulting event-annotated corpus is the largest and one of the best in quality among similar annotation efforts. We expect it to become a valuable resource for NLP (Natural Language Processing)-based TM in the bio-medical domain.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Promoting advance planning for health care and research among older adults: A randomized controlled trial

Author: A Fagerlin
Alzheimer's Disease International
Anne-Marie Boire-Lavigne
British Committee for Standards in Haematology
C Matheis-Kraft
C Patterson
C Viens
CB Stocking
CE Schwartz
CE Schwartz
CJM Maas
D Blanchette
D McCauley Casserly
D Wendler
Danièle Blanchette
DI Shalowitz
DK Beland
DL Duay
DW Molloy
DW Molloy
DW Molloy
DW Molloy
EA Van Wynen
FP Hopp
G Bravo
G Bravo
G Bravo
G Bravo
G Bravo
G Bresci
G Gade
GA Sachs
Gina Bravo
HL Muncie
HS Perkins
IA Gutheil
IM Barrio-Cantalejo
J Rello
JB Engelhardt
JC Callahan
JD Singer
JF Plouffe
Judith Lauzon
Julie Lane
KB Hirschman
KM Detering
L Ayalon
LA Mandell
LS Brunner
M Coppolino
M Guay
M Weinstein
Marcel Arcand
Marie-France Dubois
Maryse Guay
MD Cantor
MF Drummond
MF Dubois
MI Tamayo-Velazquez
MJ Silveira
National Collaborating Centre for Acute Care
P Muthappan
PA Singer
PA Singer
Paule Hottin
PH Ditto
PS Appelbaum
R Douglas
R Dresser
R Little
R Pierce
R Rossaint
R Schiff
RA Pearlman
RC Littell
RL Sudore
RV Patel
SF Assmann
SL Camhi
SL Theis
SP Weinrich
Suzanne Bellemare
SYH Kim
SYH Kim
TJ Prendergast
TR Fried
TR Fried
TR Fried
V Austin-Wells
Y Bourgueil
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Family members are often required to act as substitute decision-makers when health care or research participation decisions must be made for an incapacitated relative. Yet most families are unable to accurately predict older adult preferences regarding future health care and willingness to engage in research studies. Discussion and documentation of preferences could improve proxies' abilities to decide for their loved ones. This trial assesses the efficacy of an advance planning intervention in improving the accuracy of substitute decision-making and increasing the frequency of documented preferences for health care and research. It also investigates the financial impact on the healthcare system of improving substitute decision-making. Methods/Design Dyads (<it>n </it>= 240) comprising an older adult and his/her self-selected proxy are randomly allocated to the experimental or control group, after stratification for type of designated proxy and self-report of prior documentation of healthcare preferences. At baseline, clinical and research vignettes are used to elicit older adult preferences and assess the ability of their proxy to predict those preferences. Responses are elicited under four health states, ranging from the subject's current health state to severe dementia. For each state, we estimated the public costs of the healthcare services that would typically be provided to a patient under these scenarios. Experimental dyads are visited at home, twice, by a specially trained facilitator who communicates the dyad-specific results of the concordance assessment, helps older adults convey their wishes to their proxies, and offers assistance in completing a guide entitled <it>My Preferences </it>that we designed specifically for that purpose. In between these meetings, experimental dyads attend a group information session about <it>My Preferences</it>. Control dyads attend three monthly workshops aimed at promoting healthy behaviors. Concordance assessments are repeated at the end of the intervention and 6 months later to assess improvement in predictive accuracy and cost savings, if any. Copies of completed guides are made at the time of these assessments. Discussion This study will determine whether the tested intervention guides proxies in making decisions that concur with those of older adults, motivates the latter to record their wishes in writing, and yields savings for the healthcare system. Trial Registration <a href="http://www.controlled-trials.com/ISRCTN89993391">ISRCTN89993391</a></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Savoirs UdeS