Search CORE

9,823 research outputs found

Large-scale event extraction from literature with multi-level gene normalization

Author: Ananiadou Sophia
Bjorne Jari
Ginter Filip
Hakala Kai
Kao Hung-Yu
Lu Zhiyong
Pyysalo Sampo
Salakoski Tapio
Van de Peer Yves
Van Landeghem Sofie
Wei Chih-Hsuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

FigShare

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

7th German Conference on Chemoinformatics: 25 CIC-Workshop : Goslar, Germany, 6 - 8 November 2011 ; meeting abstracts / Edited by Frank Oellien, Uli Fechner and Thomas Engel

Author: Engel Thomas
Fechner Uli
Oellien Frank
Publication venue
Publication date: 01/05/2012
Field of study

Hochschulschriftenserver - Universität Frankfurt am Main

The potential of text mining in data integration and network biology for plant research : a case study on Arabidopsis

Author: De Bodt Stefanie
Drebert Zuzanna
Inzé Dirk
Van de Peer Yves
Van Landeghem Sofie
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2013
Field of study

Despite the availability of various data repositories for plant research, a wealth of information currently remains hidden within the biomolecular literature. Text mining provides the necessary means to retrieve these data through automated processing of texts. However, only recently has advanced text mining methodology been implemented with sufficient computational power to process texts at a large scale. In this study, we assess the potential of large-scale text mining for plant biology research in general and for network biology in particular using a state-of-the-art text mining system applied to all PubMed abstracts and PubMed Central full texts. We present extensive evaluation of the textual data for Arabidopsis thaliana, assessing the overall accuracy of this new resource for usage in plant network analyses. Furthermore, we combine text mining information with both protein-protein and regulatory interactions from experimental databases. Clusters of tightly connected genes are delineated from the resulting network, illustrating how such an integrative approach is essential to grasp the current knowledge available for Arabidopsis and to uncover gene information through guilt by association. All large-scale data sets, as well as the manually curated textual data, are made publicly available, hereby stimulating the application of text mining data in future plant biology studies

Ghent University Academic Bibliography

PubMed Central

Building Disease-Specific Drug-Protein Connectivity Maps from Molecular Interaction Networks and PubMed Abstracts

Author: A Beyer
A Hamosh
A Zanzoni
AJ Butte
AM Cohen
Andrey Rzhetsky
BR Schatz
BW Howell
C Perez-Iratxeta
C Qiu
C Tzourio
CAR Martins
CJ Mattingly
D Hristovski
D Maglott
DR Masys
DR Riddell
DS Wishart
E Estrada
ES Chen
F Forette
G Baydas
GD Bader
H Kitano
H Li
H-M Muller
J Bosch
J Lamb
J Lamb
J Li
J Wang
Jake Yue Chen
JE Eichner
Jiao Li
JL Morrison
JO Korbel
JY Chen
JY Chen
KR Brown
LJ Jensen
M van Oijen
M Wilson
N Ertekin-Taner
N Rifai
N Tiffin
O Hanon
P Srinivasan
PJ Lu
RA George
S Wachi
TS Prasad
U Leser
X Ma
Xiaoyan Zhu
Y Benjamini
Y Garten
Y Tsuruoka
YB Lee
Publication venue: Public Library of Science
Publication date: 01/07/2009
Field of study

The recently proposed concept of molecular connectivity maps enables researchers to integrate experimental measurements of genes, proteins, metabolites, and drug compounds under similar biological conditions. The study of these maps provides opportunities for future toxicogenomics and drug discovery applications. We developed a computational framework to build disease-specific drug-protein connectivity maps. We integrated gene/protein and drug connectivity information based on protein interaction networks and literature mining, without requiring gene expression profile information derived from drug perturbation experiments on disease samples. We described the development and application of this computational framework using Alzheimer's Disease (AD) as a primary example in three steps. First, molecular interaction networks were incorporated to reduce bias and improve relevance of AD seed proteins. Second, PubMed abstracts were used to retrieve enriched drug terms that are indirectly associated with AD through molecular mechanistic studies. Third and lastly, a comprehensive AD connectivity map was created by relating enriched drugs and related proteins in literature. We showed that this molecular connectivity map development approach outperformed both curated drug target databases and conventional information retrieval systems. Our initial explorations of the AD connectivity map yielded a new hypothesis that diltiazem and quinidine may be investigated as candidate drugs for AD treatment. Molecular connectivity maps derived computationally can help study molecular signature differences between different classes of drugs in specific disease contexts. To achieve overall good data coverage and quality, a series of statistical methods have been developed to overcome high levels of data noise in biological networks and literature mining results. Further development of computational molecular connectivity maps to cover major disease areas will likely set up a new model for drug development, in which therapeutic/toxicological profiles of candidate drugs can be checked computationally before costly clinical trials begin

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

PubMed-Scale Event Extraction for Post-Translational Modifications, Epigenetics and Protein Structural Relations

Author: Ananiadou S
Björne J
Ginter F
Ohta T
Pyysalo S
Salakoski T
Van de Peer Y
Van Landeghem S
Publication venue
Publication date: 01/01/2012
Field of study

The University of Manchester - Institutional Repository

OntoGene in BioCreative II

Author: Clematide Simon
Hess Michael
Kaljurand Kaarel
Kappeler Thomas
Klenner Manfred
Parisot Pierre
Rinaldi Fabio
Romacker Martin
Schneider Gerold
Vachon Therese
von Allmen Jean-Marc
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

OntoGene in BioCreative II

Author: Clematide S
Hess M
Kaljurand K
Kappeler T
Klenner M
Parisot P
Rinaldi Fabio
Romacker M
Schneider G
Vachon T
von Allmen J M
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2008
Field of study

BACKGROUND: Research scientists and companies working in the domains of biomedicine and genomics are increasingly faced with the problem of efficiently locating, within the vast body of published scientific findings, the critical pieces of information that are needed to direct current and future research investment. RESULTS: In this report we describe approaches taken within the scope of the second BioCreative competition in order to solve two aspects of this problem: detection of novel protein interactions reported in scientific articles, and detection of the experimental method that was used to confirm the interaction. Our approach to the former problem is based on a high-recall protein annotation step, followed by two strict disambiguation steps. The remaining proteins are then combined according to a number of lexico-syntactic filters, which deliver high-precision results while maintaining reasonable recall. The detection of the experimental methods is tackled by a pattern matching approach, which has delivered the best results in the official BioCreative evaluation. CONCLUSION: Although the results of BioCreative clearly show that no tool is sufficiently reliable for fully automated annotations, a few of the proposed approaches (including our own) already perform at a competitive level. This makes them interesting either as standalone tools for preliminary document inspection, or as modules within an environment aimed at supporting the process of curation of biomedical literature

ZORA

Biomedical Text Mining and Its Applications

Crossref

Directory of Open Access Journals

PubMed Central