6,304 research outputs found
Metric learning pairwise kernel for graph inference
Much recent work in bioinformatics has focused on the inference of various
types of biological networks, representing gene regulation, metabolic
processes, protein-protein interactions, etc. A common setting involves
inferring network edges in a supervised fashion from a set of high-confidence
edges, possibly characterized by multiple, heterogeneous data sets (protein
sequence, gene expression, etc.). Here, we distinguish between two modes of
inference in this setting: direct inference based upon similarities between
nodes joined by an edge, and indirect inference based upon similarities between
one pair of nodes and another pair of nodes. We propose a supervised approach
for the direct case by translating it into a distance metric learning problem.
A relaxation of the resulting convex optimization problem leads to the support
vector machine (SVM) algorithm with a particular kernel for pairs, which we
call the metric learning pairwise kernel (MLPK). We demonstrate, using several
real biological networks, that this direct approach often improves upon the
state-of-the-art SVM for indirect inference with the tensor product pairwise
kernel
Composite structural motifs of binding sites for delineating biological functions of proteins
Most biological processes are described as a series of interactions between
proteins and other molecules, and interactions are in turn described in terms
of atomic structures. To annotate protein functions as sets of interaction
states at atomic resolution, and thereby to better understand the relation
between protein interactions and biological functions, we conducted exhaustive
all-against-all atomic structure comparisons of all known binding sites for
ligands including small molecules, proteins and nucleic acids, and identified
recurring elementary motifs. By integrating the elementary motifs associated
with each subunit, we defined composite motifs which represent
context-dependent combinations of elementary motifs. It is demonstrated that
function similarity can be better inferred from composite motif similarity
compared to the similarity of protein sequences or of individual binding sites.
By integrating the composite motifs associated with each protein function, we
define meta-composite motifs each of which is regarded as a time-independent
diagrammatic representation of a biological process. It is shown that
meta-composite motifs provide richer annotations of biological processes than
sequence clusters. The present results serve as a basis for bridging atomic
structures to higher-order biological phenomena by classification and
integration of binding site structures.Comment: 34 pages, 7 figure
Topological network alignment uncovers biological function and phylogeny
Sequence comparison and alignment has had an enormous impact on our
understanding of evolution, biology, and disease. Comparison and alignment of
biological networks will likely have a similar impact. Existing network
alignments use information external to the networks, such as sequence, because
no good algorithm for purely topological alignment has yet been devised. In
this paper, we present a novel algorithm based solely on network topology, that
can be used to align any two networks. We apply it to biological networks to
produce by far the most complete topological alignments of biological networks
to date. We demonstrate that both species phylogeny and detailed biological
function of individual proteins can be extracted from our alignments.
Topology-based alignments have the potential to provide a completely new,
independent source of phylogenetic information. Our alignment of the
protein-protein interaction networks of two very different species--yeast and
human--indicate that even distant species share a surprising amount of network
topology with each other, suggesting broad similarities in internal cellular
wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde
Triangle network motifs predict complexes by complementing high-error interactomes with structural information
BackgroundA lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles.ResultsWe find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes.ConclusionGiven high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN
- …