50,432 research outputs found
Enhancing the functional content of protein interaction networks
Protein interaction networks are a promising type of data for studying
complex biological systems. However, despite the rich information embedded in
these networks, they face important data quality challenges of noise and
incompleteness that adversely affect the results obtained from their analysis.
Here, we explore the use of the concept of common neighborhood similarity
(CNS), which is a form of local structure in networks, to address these issues.
Although several CNS measures have been proposed in the literature, an
understanding of their relative efficacies for the analysis of interaction
networks has been lacking. We follow the framework of graph transformation to
convert the given interaction network into a transformed network corresponding
to a variety of CNS measures evaluated. The effectiveness of each measure is
then estimated by comparing the quality of protein function predictions
obtained from its corresponding transformed network with those from the
original network. Using a large set of S. cerevisiae interactions, and a set of
136 GO terms, we find that several of the transformed networks produce more
accurate predictions than those obtained from the original network. In
particular, the measure proposed here performs particularly well for
this task. Further investigation reveals that the two major factors
contributing to this improvement are the abilities of CNS measures, especially
, to prune out noisy edges and introduce new links between
functionally related proteins
Predicting protein functions with message passing algorithms
Motivation: In the last few years a growing interest in biology has been
shifting towards the problem of optimal information extraction from the huge
amount of data generated via large scale and high-throughput techniques. One of
the most relevant issues has recently become that of correctly and reliably
predicting the functions of observed but still functionally undetermined
proteins starting from information coming from the network of co-observed
proteins of known functions.
Method: The method proposed in this article is based on a message passing
algorithm known as Belief Propagation, which takes as input the network of
proteins physical interactions and a catalog of known proteins functions, and
returns the probabilities for each unclassified protein of having one chosen
function. The implementation of the algorithm allows for fast on-line analysis,
and can be easily generalized to more complex graph topologies taking into
account hyper-graphs, {\em i.e.} complexes of more than two interacting
proteins.Comment: 12 pages, 9 eps figures, 1 additional html tabl
Global protein function prediction in protein-protein interaction networks
The determination of protein functions is one of the most challenging
problems of the post-genomic era. The sequencing of entire genomes and the
possibility to access gene's co-expression patterns has moved the attention
from the study of single proteins or small complexes to that of the entire
proteome. In this context, the search for reliable methods for proteins'
function assignment is of uttermost importance. Previous approaches to deduce
the unknown function of a class of proteins have exploited sequence
similarities or clustering of co-regulated genes, phylogenetic profiles,
protein-protein interactions, and protein complexes. We propose to assign
functional classes to proteins from their network of physical interactions, by
minimizing the number of interacting proteins with different categories. The
function assignment is made on a global scale and depends on the entire
connectivity pattern of the protein network. Multiple functional assignments
are made possible as a consequence of the existence of multiple equivalent
solutions. The method is applied to the yeast Saccharomices Cerevisiae
protein-protein interaction network. Robustness is tested in presence of a high
percentage of unclassified proteins and under deletion/insertion of
interactions.Comment: 5 pages, 2 figures, 2 supplementary table
Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes
Complexes of physically interacting proteins constitute fundamental
functional units responsible for driving biological processes within cells. A
faithful reconstruction of the entire set of complexes is therefore essential
to understand the functional organization of cells. In this review, we discuss
the key contributions of computational methods developed till date
(approximately between 2003 and 2015) for identifying complexes from the
network of interacting proteins (PPI network). We evaluate in depth the
performance of these methods on PPI datasets from yeast, and highlight
challenges faced by these methods, in particular detection of sparse and small
or sub- complexes and discerning of overlapping complexes. We describe methods
for integrating diverse information including expression profiles and 3D
structures of proteins with PPI networks to understand the dynamics of complex
formation, for instance, of time-based assembly of complex subunits and
formation of fuzzy complexes from intrinsically disordered proteins. Finally,
we discuss methods for identifying dysfunctional complexes in human diseases,
an application that is proving invaluable to understand disease mechanisms and
to discover novel therapeutic targets. We hope this review aptly commemorates a
decade of research on computational prediction of complexes and constitutes a
valuable reference for further advancements in this exciting area.Comment: 1 Tabl
Sampling of conformational ensemble for virtual screening using molecular dynamics simulations and normal mode analysis
Aim: Molecular dynamics simulations and normal mode analysis are
well-established approaches to generate receptor conformational ensembles
(RCEs) for ligand docking and virtual screening. Here, we report new fast
molecular dynamics-based and normal mode analysis-based protocols combined with
conformational pocket classifications to efficiently generate RCEs. Materials
\& methods: We assessed our protocols on two well-characterized protein targets
showing local active site flexibility, dihydrofolate reductase and large
collective movements, CDK2. The performance of the RCEs was validated by
distinguishing known ligands of dihydrofolate reductase and CDK2 among a
dataset of diverse chemical decoys. Results \& discussion: Our results show
that different simulation protocols can be efficient for generation of RCEs
depending on different kind of protein flexibility
ProLanGO: Protein Function Prediction Using Neural~Machine Translation Based on a Recurrent Neural Network
With the development of next generation sequencing techniques, it is fast and
cheap to determine protein sequences but relatively slow and expensive to
extract useful information from protein sequences because of limitations of
traditional biological experimental techniques. Protein function prediction has
been a long standing challenge to fill the gap between the huge amount of
protein sequences and the known function. In this paper, we propose a novel
method to convert the protein function problem into a language translation
problem by the new proposed protein sequence language "ProLan" to the protein
function language "GOLan", and build a neural machine translation model based
on recurrent neural networks to translate "ProLan" language to "GOLan"
language. We blindly tested our method by attending the latest third Critical
Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the
performance of our methods on selected proteins whose function was released
after CAFA competition. The good performance on the training and testing
datasets demonstrates that our new proposed method is a promising direction for
protein function prediction. In summary, we first time propose a method which
converts the protein function prediction problem to a language translation
problem and applies a neural machine translation model for protein function
prediction.Comment: 13 pages, 5 figure
Global Functional Atlas of \u3cem\u3eEscherichia coli\u3c/em\u3e Encompassing Previously Uncharacterized Proteins
One-third of the 4,225 protein-coding genes of Escherichia coli K-12 remain functionally unannotated (orphans). Many map to distant clades such as Archaea, suggesting involvement in basic prokaryotic traits, whereas others appear restricted to E. coli, including pathogenic strains. To elucidate the orphans’ biological roles, we performed an extensive proteomic survey using affinity-tagged E. coli strains and generated comprehensive genomic context inferences to derive a high-confidence compendium for virtually the entire proteome consisting of 5,993 putative physical interactions and 74,776 putative functional associations, most of which are novel. Clustering of the respective probabilistic networks revealed putative orphan membership in discrete multiprotein complexes and functional modules together with annotated gene products, whereas a machine-learning strategy based on network integration implicated the orphans in specific biological processes. We provide additional experimental evidence supporting orphan participation in protein synthesis, amino acid metabolism, biofilm formation, motility, and assembly of the bacterial cell envelope. This resource provides a “systems-wide” functional blueprint of a model microbe, with insights into the biological and evolutionary significance of previously uncharacterized proteins
- …