Search CORE

Large-scale event extraction from literature with multi-level gene normalization

Author: Ananiadou Sophia
Bjorne Jari
Ginter Filip
Hakala Kai
Kao Hung-Yu
Lu Zhiyong
Pyysalo Sampo
Salakoski Tapio
Van de Peer Yves
Van Landeghem Sofie
Wei Chih-Hsuan
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Text mining for the life sciences aims to aid database curation, knowledge summarization and information retrieval through the automated processing of biomedical texts. To provide comprehensive coverage and enable full integration with existing biomolecular database records, it is crucial that text mining tools scale up to millions of articles and that their analyses can be unambiguously linked to information recorded in resources such as UniProt, KEGG, BioGRID and NCBI databases. In this study, we investigate how fully automated text mining of complex biomolecular events can be augmented with a normalization strategy that identifies biological concepts in text, mapping them to identifiers at varying levels of granularity, ranging from canonicalized symbols to unique gene and proteins and broad gene families. To this end, we have combined two state-of-the-art text mining components, previously evaluated on two community-wide challenges, and have extended and improved upon these methods by exploiting their complementary nature. Using these systems, we perform normalization and event extraction to create a large-scale resource that is publicly available, unique in semantic scope, and covers all 21.9 million PubMed abstracts and 460 thousand PubMed Central open access full-text articles. This dataset contains 40 million biomolecular events involving 76 million gene/protein mentions, linked to 122 thousand distinct genes from 5032 species across the full taxonomic tree. Detailed evaluations and analyses reveal promising results for application of this data in database and pathway curation efforts. The main software components used in this study are released under an open-source license. Further, the resulting dataset is freely accessible through a novel API, providing programmatic and customized access (http://www.evexdb.org/api/v001/). Finally, to allow for large-scale bioinformatic analyses, the entire resource is available for bulk download from http://evexdb.org/download/, under the Creative Commons -Attribution - Share Alike (CC BY-SA) license

Ghent University Academic Bibliography

The University of Manchester - Institutional Repository

FigShare

Event based text mining for integrated network construction

Author: Saeys Yvan
Van de Peer Yves
Van Landeghem Sofie
Publication venue: Microtome Publishing
Publication date: 01/01/2010
Field of study

The scientific literature is a rich and challenging data source for research in systems biology, providing numerous interactions between biological entities. Text mining techniques have been increasingly useful to extract such information from the literature in an automatic way, but up to now the main focus of text mining in the systems biology field has been restricted mostly to the discovery of protein-protein interactions. Here, we take this approach one step further, and use machine learning techniques combined with text mining to extract a much wider variety of interactions between biological entities. Each particular interaction type gives rise to a separate network, represented as a graph, all of which can be subsequently combined to yield a so-called integrated network representation. This provides a much broader view on the biological system as a whole, which can then be used in further investigations to analyse specific properties of the networ

Ghent University Academic Bibliography

A text-mining system for extracting metabolic reactions from full-text articles

Author: Czarnecki Jan M.
Nobeli Irene
Shepherd Adrian J.
Smith Adrian M.L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway—metabolic pathways—has been largely neglected. Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed

Springer - Publisher Connector

Birkbeck Institutional Research Online

The tomato Prf complex is a molecular trap for bacterial effectors based on Pto transphosphorylation

Author: A Balmuth
A Bendahmane
A Keller
A-J Wu
AI Nesvizhskii
Alexandra M. E. Jones
Alexi L. Balmuth
BJ DeYoung
BJ Deyoung
BP Thomma
CMT Rommens
F Shao
F Shao
FL Takken
G Sessa
HC McCann
J Ade
J Dong
J Zhou
JD Jones
Jeffery L. Dangl
John P. Rathjen
Jose R. Gutierrez
JP Rathjen
JR Gutierrez
KR Munkvold
L Shan
LN Johnson
M Bernoux
MR Swiderski
PN Dodds
R Cai
S Gimenez-Ibanez
SM Collier
T Boller
T Boller
T Maekawa
T Xiang
T Xiang
Tatiana S. Mucyn
TS Mucyn
TS Mucyn
V Bonardi
V Ntoukakis
Vardis Ntoukakis
VM Andriotis
W Xing
X Du
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

The bacteria Pseudomonas syringae is a pathogen of many crop species and one of the model pathogens for studying plant and bacterial arms race coevolution. In the current model, plants perceive bacteria pathogens via plasma membrane receptors, and recognition leads to the activation of general defenses. In turn, bacteria inject proteins called effectors into the plant cell to prevent the activation of immune responses. AvrPto and AvrPtoB are two such proteins that inhibit multiple plant kinases. The tomato plant has reacted to these effectors by the evolution of a cytoplasmic resistance complex. This complex is compromised of two proteins, Prf and Pto kinase, and is capable of recognizing the effector proteins. How the Pto kinase is able to avoid inhibition by the effector proteins is currently unknown. Our data shows how the tomato plant utilizes dimerization of resistance proteins to gain advantage over the faster evolving bacterial pathogen. Here we illustrate that oligomerisation of Prf brings into proximity two Pto kinases allowing them to avoid inhibition by the effectors by transphosphorylation and to activate immune responses

Public Library of Science (PLOS)

CiteSeerX

Warwick Research Archives Portal Repository

The Australian National University

FigShare

Molecular Interactions. On the Ambiguity of Ordinary Statements in Biomedical Literature

Author: Jansen Ludger
Schulz Stefan
Publication venue
Publication date: 01/01/2009
Field of study

Statements about the behavior of biochemical entities (e.g., about the interaction between two proteins) abound in the literature on molecular biology and are increasingly becoming the targets of information extraction and text mining techniques. We show that an accurate analysis of the semantics of such statements reveals a number of ambiguities that have to be taken into account in the practice of biomedical ontology engineering: Such statements can not only be understood as event reporting statements, but also as ascriptions of dispositions or tendencies that may or may not refer to collectives of interacting molecules or even to collectives of interaction events

PhilPapers

Uncoupling of p97 ATPase activity has a dominant negative effect on protein extraction

Author: Long David T.
Rycenga Halley B.
Wolfe Kelly B.
Yeh Elizabeth S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/07/2019
Field of study

p97 is a highly abundant, homohexameric AAA+ ATPase that performs a variety of essential cellular functions. Characterized as a ubiquitin-selective chaperone, p97 recognizes proteins conjugated to K48-linked polyubiquitin chains and promotes their removal from chromatin and other molecular complexes. Changes in p97 expression or activity are associated with the development of cancer and several related neurodegenerative disorders. Although pathogenic p97 mutations cluster in and around p97's ATPase domains, mutant proteins display normal or elevated ATPase activity. Here, we show that one of the most common p97 mutations (R155C) retains ATPase activity, but is functionally defective. p97-R155C can be recruited to ubiquitinated substrates on chromatin, but is unable to promote substrate removal. As a result, p97-R155C acts as a dominant negative, blocking protein extraction by a similar mechanism to that observed when p97's ATPase activity is inhibited or inactivated. However, unlike ATPase-deficient proteins, p97-R155C consumes excess ATP, which can hinder high-energy processes. Together, our results shed new insight into how pathogenic mutations in p97 alter its cellular function, with implications for understanding the etiology and treatment of p97-associated diseases

IUPUIScholarWorks

A realistic assessment of methods for extracting gene/protein interactions from free text

Author: A Moschitti
AB Clegg
Adrian J Shepherd
AM Cohen
Andrew B Clegg
AS Yeh
B Settles
C Nédellec
D Rebholz-Schuhmann
H Jose
HL Johnson
J Ding
J Fluck
JD Kim
JD Kim
K Franzén
K Fundel
K Sagae
L Hunter
M Krallinger
N Domedel-Puig
R Bunescu
R Hoffmann
R Kabiljo
R Kabiljo
R Leaman
R Sætre
Renata Kabiljo
S Pyysalo
S Pyysalo
S Pyysalo
T Hara
WA Baumgartner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Background: The automated extraction of gene and/or protein interactions from the literature is one of the most important targets of biomedical text mining research. In this paper we present a realistic evaluation of gene/protein interaction mining relevant to potential non-specialist users. Hence we have specifically avoided methods that are complex to install or require reimplementation, and we coupled our chosen extraction methods with a state-of-the-art biomedical named entity tagger. Results: Our results show: that performance across different evaluation corpora is extremely variable; that the use of tagged (as opposed to gold standard) gene and protein names has a significant impact on performance, with a drop in F-score of over 20 percentage points being commonplace; and that a simple keyword-based benchmark algorithm when coupled with a named entity tagger outperforms two of the tools most widely used to extract gene/protein interactions. Conclusion: In terms of availability, ease of use and performance, the potential non-specialist user community interested in automatically extracting gene and/or protein interactions from free text is poorly served by current tools and systems. The public release of extraction tools that are easy to install and use, and that achieve state-of-art levels of performance should be treated as a high priority by the biomedical text mining community

Springer - Publisher Connector

UCL Discovery