Search CORE

Universidade de Lisboa: Repositório.UL

Characterization and sequence prediction of structural variations in α-helix

Author: A Doig
A Moore
A Tendulkar
A Tendulkar
Ashish V Tendulkar
B Stapley
C Leslie
D Barlow
D Engel
J Richardson
M Brown
Pramod P Wangikar
R Johson
S Brenner
S Chakrabarti
S Dasgupta
S Kumar
T Joachims
T Joachims
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

On the pH-optimum of Activity and Stability of Proteins

Author: Alexov Emil
Tally Kemper
Publication venue: Clemson University Libraries
Publication date: 01/06/2010
Field of study

Biological macromolecules evolved to perform their function in specific cellular environment (subcellular compartments or tissues); therefore, they should be adapted to the biophysical characteristics of the corresponding environment, one of them being the characteristic pH. Many macromolecular properties are pH dependent, such as activity and stability. However, only activity is biologically important, while stability may not be crucial for the corresponding reaction. Here, we show that the pH-optimum of activity (the pH of maximal activity) is correlated with the pH-optimum of stability (the pH of maximal stability) on a set of 310 proteins with available experimental data. We speculate that such a correlation is needed to allow the corresponding macromolecules to tolerate small pH fluctuations that are inevitable with cellular function. Our findings rationalize the efforts of correlating the pH of maximal stability and the characteristic pH of subcellular compartments, as only pH of activity is subject of evolutionary pressure. In addition, our analysis confirmed the previous observation that pH-optimum of activity and stability are not correlated with the isoelectric point, pI, or with the optimal temperature

Clemson University: TigerPrints

The Text-mining based PubChem Bioassay neighboring analysis

Author: Bryant Steve H
Han Lianyi
Suzek Tugba O
Wang Yanli
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background In recent years, the number of High Throughput Screening (HTS) assays deposited in PubChem has grown quickly. As a result, the volume of both the structured information (i.e. molecular structure, bioactivities) and the unstructured information (such as descriptions of bioassay experiments), has been increasing exponentially. As a result, it has become even more demanding and challenging to efficiently assemble the bioactivity data by mining the huge amount of information to identify and interpret the relationships among the diversified bioassay experiments. In this work, we propose a text-mining based approach for bioassay neighboring analysis from the unstructured text descriptions contained in the PubChem BioAssay database. Results The neighboring analysis is achieved by evaluating the cosine scores of each bioassay pair and fraction of overlaps among the human-curated neighbors. Our results from the cosine score distribution analysis and assay neighbor clustering analysis on all PubChem bioassays suggest that strong correlations among the bioassays can be identified from their conceptual relevance. A comparison with other existing assay neighboring methods suggests that the text-mining based bioassay neighboring approach provides meaningful linkages among the PubChem bioassays, and complements the existing methods by identifying additional relationships among the bioassay entries. Conclusions The text-mining based bioassay neighboring analysis is efficient for correlating bioassays and studying different aspects of a biological process, which are otherwise difficult to achieve by existing neighboring procedures due to the lack of specific annotations and structured information. It is suggested that the text-mining based bioassay neighboring analysis can be used as a standalone or as a complementary tool for the PubChem bioassay neighboring process to enable efficient integration of assay results and generate hypotheses for the discovery of bioactivities of the tested reagents.</p

Mining protein function from text using term-based support vector machines

Author: Nenadic Goran
Rice Simon B
Stapley Benjamin J
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

Abstract Background Text mining has spurred huge interest in the domain of biology. The goal of the BioCreAtIvE exercise was to evaluate the performance of current text mining systems. We participated in Task 2, which addressed assigning Gene Ontology terms to human proteins and selecting relevant evidence from full-text documents. We approached it as a modified form of the document classification task. We used a supervised machine-learning approach (based on support vector machines) to assign protein function and select passages that support the assignments. As classification features, we used a protein's co-occurring terms that were automatically extracted from documents. Results The results evaluated by curators were modest, and quite variable for different problems: in many cases we have relatively good assignment of GO terms to proteins, but the selected supporting text was typically non-relevant (precision spanning from 3% to 50%). The method appears to work best when a substantial set of relevant documents is obtained, while it works poorly on single documents and/or short passages. The initial results suggest that our approach can also mine annotations from text even when an explicit statement relating a protein to a GO term is absent. Conclusion A machine learning approach to mining protein function predictions from text can yield good performance only if sufficient training data is available, and significant amount of supporting data is used for prediction. The most promising results are for combined document retrieval and GO term assignment, which calls for the integration of methods developed in BioCreAtIvE Task 1 and Task 2.</p

The University of Manchester - Institutional Repository

Annotation of protein residues based on a literature analysis: cross-validation against UniProtKb

Author: A Stark
Antonio Jimeno-Yepes
BJ Polacco
BJ Stapley
C Blaschke
C Blaschke
C Friedman
CH Wu
CJO Baker
CJO Baker
D Bourigault
D Rebholz-Schuhmann
D Rebholz-Schuhmann
Dietrich Rebholz-Schuhmann
DL Wheeler
DM Kristensen
EM Marcotte
F Cerbah
F Guenthner
F Horn
G Leroy
JA Barker
JC Nebel
Kevin Nagel
LC Lee
M Ikeda
MM Babu
P Pezik
R Kanagasabai
R Witte
S Gaudan
S Yoon
TJ Oldfield
Y Miyao
Y Tateisi
Y Tsuruoka
YL Yip
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background A protein annotation database, such as the Universal Protein Resource knowledge base (UniProtKb), is a valuable resource for the validation and interpretation of predicted 3D structure patterns in proteins. Existing studies have focussed on point mutation extraction methods from biomedical literature which can be used to support the time consuming work of manual database curation. However, these methods were limited to point mutation extraction and do not extract features for the annotation of proteins at the residue level. Results This work introduces a system that identifies protein residues in MEDLINE abstracts and annotates them with features extracted from the context written in the surrounding text. MEDLINE abstract texts have been processed to identify protein mentions in combination with taxonomic species and protein residues (F1-measure 0.52). The identified protein-species-residue triplets have been validated and benchmarked against reference data resources (UniProtKb, average F1-measure of 0.54). Then, contextual features were extracted through shallow and deep parsing and the features have been classified into predefined categories (F1-measure ranges from 0.15 to 0.67). Furthermore, the feature sets have been aligned with annotation types in UniProtKb to assess the relevance of the annotations for ongoing curation projects. Altogether, the annotations have been assessed automatically and manually against reference data resources. Conclusion This work proposes a solution for the automatic extraction of functional annotation for protein residues from biomedical articles. The presented approach is an extension to other existing systems in that a wider range of residue entities are considered and that features of residues are extracted as annotations.</p

Text mining of full-text journal articles combined with gene expression analysis reveals a relationship between sphingosine-1-phosphate and invasiveness of a glioblastoma cell line

Author: Berrar Daniel
Bremer Eric G
DeSesa Catherine
Dubitzky Werner
Hack Catherine
Natarajan Jeyakumar
Van Brocklyn James R
Zhang Yonghong
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Sphingosine 1-phosphate (S1P), a lysophospholipid, is involved in various cellular processes such as migration, proliferation, and survival. To date, the impact of S1P on human glioblastoma is not fully understood. Particularly, the concerted role played by matrix metalloproteinases (MMP) and S1P in aggressive tumor behavior and angiogenesis remains to be elucidated. RESULTS: To gain new insights in the effect of S1P on angiogenesis and invasion of this type of malignant tumor, we used microarrays to investigate the gene expression in glioblastoma as a response to S1P administration in vitro. We compared the expression profiles for the same cell lines under the influence of epidermal growth factor (EGF), an important growth factor. We found a set of 72 genes that are significantly differentially expressed as a unique response to S1P. Based on the result of mining full-text articles from 20 scientific journals in the field of cancer research published over a period of five years, we inferred gene-gene interaction networks for these 72 differentially expressed genes. Among the generated networks, we identified a particularly interesting one. It describes a cascading event, triggered by S1P, leading to the transactivation of MMP-9 via neuregulin-1 (NRG-1), vascular endothelial growth factor (VEGF), and the urokinase-type plasminogen activator (uPA). This interaction network has the potential to shed new light on our understanding of the role played by MMP-9 in invasive glioblastomas. CONCLUSION: Automated extraction of information from biological literature promises to play an increasingly important role in biological knowledge discovery. This is particularly true for high-throughput approaches, such as microarrays, and for combining and integrating data from different sources. Text mining may hold the key to unraveling previously unknown relationships between biological entities and could develop into an indispensable instrument in the process of formulating novel and potentially promising hypotheses

KnowledgeBank at OSU

Open Research Online (The Open University)

Ulster University's Research Portal

Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association

Author: Florence Horn
Fred E Cohen
Lawrence C Lee
Philip E Bourne
Publication venue: Public Library of Science
Publication date: 01/02/2007
Field of study

Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation–protein association. Our method was tested on 589 articles describing point mutations from the G protein–coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words

Public Library of Science (PLOS)