13,396 research outputs found
Enhancing the functional content of protein interaction networks
Protein interaction networks are a promising type of data for studying
complex biological systems. However, despite the rich information embedded in
these networks, they face important data quality challenges of noise and
incompleteness that adversely affect the results obtained from their analysis.
Here, we explore the use of the concept of common neighborhood similarity
(CNS), which is a form of local structure in networks, to address these issues.
Although several CNS measures have been proposed in the literature, an
understanding of their relative efficacies for the analysis of interaction
networks has been lacking. We follow the framework of graph transformation to
convert the given interaction network into a transformed network corresponding
to a variety of CNS measures evaluated. The effectiveness of each measure is
then estimated by comparing the quality of protein function predictions
obtained from its corresponding transformed network with those from the
original network. Using a large set of S. cerevisiae interactions, and a set of
136 GO terms, we find that several of the transformed networks produce more
accurate predictions than those obtained from the original network. In
particular, the measure proposed here performs particularly well for
this task. Further investigation reveals that the two major factors
contributing to this improvement are the abilities of CNS measures, especially
, to prune out noisy edges and introduce new links between
functionally related proteins
Going the distance for protein function prediction: a new distance metric for protein interaction networks
Due to an error introduced in the production process, the x-axes in the first panels of Figure 1 and Figure 7 are not formatted correctly. The correct Figure 1 can be viewed here: http://dx.doi.org/10.1371/annotation/343bf260-f6ff-48a2-93b2-3cc79af518a9In protein-protein interaction (PPI) networks, functional similarity is often inferred based on the function of directly interacting proteins, or more generally, some notion of interaction network proximity among proteins in a local neighborhood. Prior methods typically measure proximity as the shortest-path distance in the network, but this has only a limited ability to capture fine-grained neighborhood distinctions, because most proteins are close to each other, and there are many ties in proximity. We introduce diffusion state distance (DSD), a new metric based on a graph diffusion property, designed to capture finer-grained distinctions in proximity for transfer of functional annotation in PPI networks. We present a tool that, when input a PPI network, will output the DSD distances between every pair of proteins. We show that replacing the shortest-path metric by DSD improves the performance of classical function prediction methods across the board.MC, HZ, NMD and LJC were supported in part by National Institutes of Health (NIH) R01 grant GM080330. JP was supported in part by NIH grant R01 HD058880. This material is based upon work supported by the National Science Foundation under grant numbers CNS-0905565, CNS-1018266, CNS-1012910, and CNS-1117039, and supported by the Army Research Office under grant W911NF-11-1-0227 (to MEC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Predicting protein functions with message passing algorithms
Motivation: In the last few years a growing interest in biology has been
shifting towards the problem of optimal information extraction from the huge
amount of data generated via large scale and high-throughput techniques. One of
the most relevant issues has recently become that of correctly and reliably
predicting the functions of observed but still functionally undetermined
proteins starting from information coming from the network of co-observed
proteins of known functions.
Method: The method proposed in this article is based on a message passing
algorithm known as Belief Propagation, which takes as input the network of
proteins physical interactions and a catalog of known proteins functions, and
returns the probabilities for each unclassified protein of having one chosen
function. The implementation of the algorithm allows for fast on-line analysis,
and can be easily generalized to more complex graph topologies taking into
account hyper-graphs, {\em i.e.} complexes of more than two interacting
proteins.Comment: 12 pages, 9 eps figures, 1 additional html tabl
- …