3,818 research outputs found
Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes
Complexes of physically interacting proteins constitute fundamental
functional units responsible for driving biological processes within cells. A
faithful reconstruction of the entire set of complexes is therefore essential
to understand the functional organization of cells. In this review, we discuss
the key contributions of computational methods developed till date
(approximately between 2003 and 2015) for identifying complexes from the
network of interacting proteins (PPI network). We evaluate in depth the
performance of these methods on PPI datasets from yeast, and highlight
challenges faced by these methods, in particular detection of sparse and small
or sub- complexes and discerning of overlapping complexes. We describe methods
for integrating diverse information including expression profiles and 3D
structures of proteins with PPI networks to understand the dynamics of complex
formation, for instance, of time-based assembly of complex subunits and
formation of fuzzy complexes from intrinsically disordered proteins. Finally,
we discuss methods for identifying dysfunctional complexes in human diseases,
an application that is proving invaluable to understand disease mechanisms and
to discover novel therapeutic targets. We hope this review aptly commemorates a
decade of research on computational prediction of complexes and constitutes a
valuable reference for further advancements in this exciting area.Comment: 1 Tabl
Recent advances in clustering methods for protein interaction networks
The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed
Systematic identification of functional plant modules through the integration of complementary data sources
A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation
Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology
<p>Abstract</p> <p>Background</p> <p>In the literature, there are fruitful algorithmic approaches for identification functional modules in protein-protein interactions (PPI) networks. Because of accumulation of large-scale interaction data on multiple organisms and non-recording interaction data in the existing PPI database, it is still emergent to design novel computational techniques that can be able to correctly and scalably analyze interaction data sets. Indeed there are a number of large scale biological data sets providing indirect evidence for protein-protein interaction relationships.</p> <p>Results</p> <p>The main aim of this paper is to present a prior knowledge based mining strategy to identify functional modules from PPI networks with the aid of Gene Ontology. Higher similarity value in Gene Ontology means that two gene products are more functionally related to each other, so it is better to group such gene products into one functional module. We study (i) to encode the functional pairs into the existing PPI networks; and (ii) to use these functional pairs as pairwise constraints to supervise the existing functional module identification algorithms. Topology-based modularity metric and complex annotation in MIPs will be used to evaluate the identified functional modules by these two approaches.</p> <p>Conclusions</p> <p>The experimental results on Yeast PPI networks and GO have shown that the prior knowledge based learning methods perform better than the existing algorithms.</p
Transcriptome-based Gene Networks for Systems-level Analysis of Plant Gene Functions
Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of
cellular activities with increasing breadth and depth. However, we know very little about how the
genome functions and what the identified genes do. The lack of functional annotations of genes
greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant
biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant
Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some
aspects of their functions assigned. Therefore, there is an urgent need to develop innovative
methods to predict and expand on the currently available functional annotations of plant genes.
With open-access catching the ‘pulse’ of modern day molecular research, an integration of the
copious amount of transcriptome datasets allows rapid prediction of gene functions in specific
biological contexts, which provide added evidence over traditional homology-based functional
inference. The main goal of this dissertation was to develop data analysis strategies and tools
broadly applicable in systems biology research.
Two user friendly interactive web applications are presented: The Rice Regulatory
Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to
facilitate the identification of transcription factor targets during induction of various environmental
stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network
that encapsulates various aspects of seed formation, including embryogenesis, endosperm
development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is
proposed that uses network density as a parameter to estimate the gain or loss in correlation of
pathways between two conditionally independent coexpression networks
Frequent Pattern Finding in Integrated Biological Networks
Biomedical research is undergoing a revolution with the advance of high-throughput technologies. A major challenge in the post-genomic era is to understand how genes, proteins and small molecules are organized into signaling pathways and regulatory networks. To simplify the analysis of large complex molecular networks, strategies are sought to break them down into small yet relatively independent network modules, e.g. pathways and protein complexes.
In fulfillment of the motivation to find evolutionary origins of network modules, a novel strategy has been developed to uncover duplicated pathways and protein complexes. This search was first formulated into a computational problem which finds frequent patterns in integrated graphs. The whole framework was then successfully implemented as the software package BLUNT, which includes a parallelized version.
To evaluate the biological significance of the work, several large datasets were chosen, with each dataset targeting a different biological question. An application of BLUNT was performed on the yeast protein-protein interaction network, which is described. A large number of frequent patterns were discovered and predicted to be duplicated pathways. To explore how these pathways may have diverged since duplication, the differential regulation of duplicated pathways was studied at the transcriptional level, both in terms of time and location.
As demonstrated, this algorithm can be used as new data mining tool for large scale biological data in general. It also provides a novel strategy to study the evolution of pathways and protein complexes in a systematic way. Understanding how pathways and protein complexes evolve will greatly benefit the fundamentals of biomedical research
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Defining a robust biological prior from Pathway Analysis to drive Network Inference
Inferring genetic networks from gene expression data is one of the most
challenging work in the post-genomic era, partly due to the vast space of
possible networks and the relatively small amount of data available. In this
field, Gaussian Graphical Model (GGM) provides a convenient framework for the
discovery of biological networks. In this paper, we propose an original
approach for inferring gene regulation networks using a robust biological prior
on their structure in order to limit the set of candidate networks.
Pathways, that represent biological knowledge on the regulatory networks,
will be used as an informative prior knowledge to drive Network Inference. This
approach is based on the selection of a relevant set of genes, called the
"molecular signature", associated with a condition of interest (for instance,
the genes involved in disease development). In this context, differential
expression analysis is a well established strategy. However outcome signatures
are often not consistent and show little overlap between studies. Thus, we will
dedicate the first part of our work to the improvement of the standard process
of biomarker identification to guarantee the robustness and reproducibility of
the molecular signature.
Our approach enables to compare the networks inferred between two conditions
of interest (for instance case and control networks) and help along the
biological interpretation of results. Thus it allows to identify differential
regulations that occur in these conditions. We illustrate the proposed approach
by applying our method to a study of breast cancer's response to treatment
Assessment of network module identification across complex diseases
Many bioinformatics methods have been proposed for reducing the complexity of large gene or protein networks into relevant subnetworks or modules. Yet, how such methods compare to each other in terms of their ability to identify disease-relevant modules in different types of network remains poorly understood. We launched the 'Disease Module Identification DREAM Challenge', an open competition to comprehensively assess module identification methods across diverse protein-protein interaction, signaling, gene co-expression, homology and cancer-gene networks. Predicted network modules were tested for association with complex traits and diseases using a unique collection of 180 genome-wide association studies. Our robust assessment of 75 module identification methods reveals top-performing algorithms, which recover complementary trait-associated modules. We find that most of these modules correspond to core disease-relevant pathways, which often comprise therapeutic targets. This community challenge establishes biologically interpretable benchmarks, tools and guidelines for molecular network analysis to study human disease biology
- …