7,002 research outputs found

    Analysis of High-Throughput Data - Protein-Protein Interactions, Protein Complexes and RNA Half-life

    Get PDF
    The development of high-throughput techniques has lead to a paradigm change in biology from the small-scale analysis of individual genes and proteins to a genome-scale analysis of biological systems. Proteins and genes can now be studied in their interaction with each other and the cooperation within multi-subunit protein complexes can be investigated. Moreover, time-dependent dynamics and regulation of these processes and associations can now be explored by monitoring mRNA changes and turnover. The in-depth analysis of these large and complex data sets would not be possible without sophisticated algorithms for integrating different data sources, identifying interesting patterns in the data and addressing the high variability and error rates in biological measurements. In this thesis, we developed such methods for the investigation of protein interactions and complexes and the corresponding regulatory processes. In the first part, we analyze networks of physical protein-protein interactions measured in large-scale experiments. We show that the topology of the complete interactomes can be confidently extrapolated despite high numbers of missing and wrong interactions from only partial measurements of interaction networks. Furthermore, we find that the structure and stability of protein interaction networks is not only influenced by the degree distribution of the network but also considerably by the suppression or propagation of interactions between highly connected proteins. As analysis of network topology is generally focused on large eukaryotic networks, we developed new methods to analyze smaller networks of intraviral and virus-host interactions. By comparing interactomes of related herpesviral species, we could detect a conserved core of protein interactions and could address the low coverage of the yeast two-hybrid system. In addition, common strategies in the interaction of the viruses with the host cell were identified. New affinity purification methods now make it possible to directly study associations of proteins in complexes. Due to experimental errors the individual protein complexes have to be predicted with computational methods from these purification results. As previously published methods relied more or less heavily on existing knowledge on complexes, we developed an unsupervised prediction algorithm which is independent from such additional data. Using this approach, high-quality protein complexes can be identified from the raw purification data alone for any species purification experiments are performed. To identify the direct, physical interactions within these predicted complexes and their subcomponent structure, we describe a new approach to extract the highest scoring subnetwork connecting the complex and interactions not explained by alternative paths of indirect interactions. In this way, important interactions within the complexes can be identified and their substructure can be resolved in a straightforward way. To explore the regulation of proteins and complexes, we analyzed microarray measurements of mRNA abundance, de novo transcription and decay. Based on the relationship between newly transcribed, pre-existing and total RNA, transcript half-life can be estimated for individual genes using a new microarray normalization method and a quality control can be applied. We show that precise measurements of RNA half-life can be obtained from de novo transcription which are of superior accuracy to previously published results from RNA decay. Using such precise measurements, we studied RNA half-lives in human B-cells and mouse fibroblasts to identify conserved patterns governing RNA turnover. Our results show that transcript half-lives are strongly conserved and specifically correlated to gene function. Although transcript half-life is highly similar in protein complexes and \mbox{families}, individual proteins may deviate significantly from the remaining complex subunits or family members to efficiently support the regulation of protein complexes or to create non-redundant roles of functionally similar proteins. These results illustrate several of the many ways in which high-throughput measurements lead to a better understanding of biological systems. By studying large-scale measure\-ments in this thesis, the structure of protein interaction networks and protein complexes could be better characterized, important interactions and conserved strategies for herpes\-viral infection could be identified and interesting insights could be gained into the regulation of important biological processes and protein complexes. This was made possible by the development of novel algorithms and analysis approaches which will also be valuable for further research on these topics

    Charting the protein complexome in yeast by mass spectrometry

    Get PDF
    It has become evident over the past few years that many complex cellular processes, including control of the cell cycle and ubiquitin-dependent proteolysis, are carried out by sophisticated multisubunit protein machines that are dynamic in abundance, post-translational modification state, and composition. To understand better the nature of the macromolecular assemblages that carry out the cell cycle and ubiquitin-dependent proteolysis, we have used mass spectrometry extensively over the past few years to characterize both the composition of various protein complexes and the modification states of their subunits. In this article we review some of our recent efforts, and describe a promising new approach for using mass spectrometry to dissect protein interaction networks

    Identifying protein complexes and disease genes from biomolecular networks

    Get PDF
    With advances in high-throughput measurement techniques, large-scale biological data, such as protein-protein interaction (PPI) data, gene expression data, gene-disease association data, cellular pathway data, and so on, have been and will continue to be produced. Those data contain insightful information for understanding the mechanisms of biological systems and have been proved useful for developing new methods in disease diagnosis, disease treatment and drug design. This study focuses on two main research topics: (1) identifying protein complexes and (2) identifying disease genes from biomolecular networks. Firstly, protein complexes are groups of proteins that interact with each other at the same time and place within living cells. They are molecular entities that carry out cellular processes. The identification of protein complexes plays a primary role for understanding the organization of proteins and the mechanisms of biological systems. Many previous algorithms are designed based on the assumption that protein complexes are densely connected sub-graphs in PPI networks. In this research, a dense sub-graph detection algorithm is first developed following this assumption by using clique seeds and graph entropy. Although the proposed algorithm generates a large number of reasonable predictions and its f-score is better than many previous algorithms, it still cannot identify many known protein complexes. After that, we analyze characteristics of known yeast protein complexes and find that not all of the complexes exhibit dense structures in PPI networks. Many of them have a star-like structure, which is a very special case of the core-attachment structure and it cannot be identified by many previous core-attachment-structure-based algorithms. To increase the prediction accuracy of protein complex identification, a multiple-topological-structure-based algorithm is proposed to identify protein complexes from PPI networks. Four single-topological-structure-based algorithms are first employed to detect raw predictions with clique, dense, core-attachment and star-like structures, respectively. A merging and trimming step is then adopted to generate final predictions based on topological information or GO annotations of predictions. A comprehensive review about the identification of protein complexes from static PPI networks to dynamic PPI networks is also given in this study. Secondly, genetic diseases often involve the dysfunction of multiple genes. Various types of evidence have shown that similar disease genes tend to lie close to one another in various biomolecular networks. The identification of disease genes via multiple data integration is indispensable towards the understanding of the genetic mechanisms of many genetic diseases. However, the number of known disease genes related to similar genetic diseases is often small. It is not easy to capture the intricate gene-disease associations from such a small number of known samples. Moreover, different kinds of biological data are heterogeneous and no widely acceptable criterion is available to standardize them to the same scale. In this study, a flexible and reliable multiple data integration algorithm is first proposed to identify disease genes based on the theory of Markov random fields (MRF) and the method of Bayesian analysis. A novel global-characteristic-based parameter estimation method and an improved Gibbs sampling strategy are introduced, such that the proposed algorithm has the capability to tune parameters of different data sources automatically. However, the Markovianity characteristic of the proposed algorithm means it only considers information of direct neighbors to formulate the relationship among genes, ignoring the contribution of indirect neighbors in biomolecular networks. To overcome this drawback, a kernel-based MRF algorithm is further proposed to take advantage of the global characteristics of biological data via graph kernels. The kernel-based MRF algorithm generates predictions better than many previous disease gene identification algorithms in terms of the area under the receiver operating characteristic curve (AUC score). However, it is very time-consuming, since the Gibbs sampling process of the algorithm has to maintain a long Markov chain for every single gene. Finally, to reduce the computational time of the MRF-based algorithm, a fast and high performance logistic-regression-based algorithm is developed for identifying disease genes from biomolecular networks. Numerical experiments show that the proposed algorithm outperforms many existing methods in terms of the AUC score and running time. To summarize, this study has developed several computational algorithms for identifying protein complexes and disease genes from biomolecular networks, respectively. These proposed algorithms are better than many other existing algorithms in the literature

    An effective method for refining predicted protein complexes based on protein activity and the mechanism of protein complex formation

    Get PDF
    BACKGROUND: Identifying protein complexes from protein-protein interaction network is fundamental for understanding the mechanism of cellular component and protein function. At present, many methods to identify protein complexes are mainly based on the topological characteristics or the functional similarity features, neglecting the fact that proteins must be in their active forms to interact with others and the formation of protein complex is following a just-in-time mechanism. RESULTS: This paper firstly presents a protein complex formation model based on the just-in-time mechanism. By investigating known protein complexes combined with gene expression data, we find that most protein complexes can be formed in continuous time points, and the average overlapping rate of the known complexes during the formation is large. A method is proposed to refine the protein complexes predicted by clustering algorithms based on the protein complex formation model and the properties of known protein complexes. After refinement, the number of known complexes that are matched by predicted complexes, Sensitivity, Specificity, and f-measure are significantly improved, when compared with those of the original predicted complexes. CONCLUSION: The refining method can discard the spurious proteins by protein activity and generate new complexes by just-in-time assemble mechanism, which can enhance the ability to predict complex

    The G protein-coupled receptor heterodimer network (GPCR-HetNet) and its hub components

    Get PDF
    G protein-coupled receptors (GPCRs) oligomerization has emerged as a vital characteristic of receptor structure. Substantial experimental evidence supports the existence of GPCR-GPCR interactions in a coordinated and cooperative manner. However, despite the current development of experimental techniques for large-scale detection of GPCR heteromers, in order to understand their connectivity it is necessary to develop novel tools to study the global heteroreceptor networks. To provide insight into the overall topology of the GPCR heteromers and identify key players, a collective interaction network was constructed. Experimental interaction data for each of the individual human GPCR protomers was obtained manually from the STRING and SCOPUS databases. The interaction data were used to build and analyze the network using Cytoscape software. The network was treated as undirected throughout the study. It is comprised of 156 nodes, 260 edges and has a scale-free topology. Connectivity analysis reveals a significant dominance of intrafamily versus interfamily connections. Most of the receptors within the network are linked to each other by a small number of edges. DRD2, OPRM, ADRB2, AA2AR, AA1R, OPRK, OPRD and GHSR are identified as hubs. In a network representation 10 modules/clusters also appear as a highly interconnected group of nodes. Information on this GPCR network can improve our understanding of molecular integration. GPCR-HetNet has been implemented in Java and is freely available at http://www.iiia.csic.es/similar to ismel/GPCR-Nets/index.html

    Efficient and accurate greedy search methods for mining functional modules in protein interaction networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Most computational algorithms mainly focus on detecting highly connected subgraphs in PPI networks as protein complexes but ignore their inherent organization. Furthermore, many of these algorithms are computationally expensive. However, recent analysis indicates that experimentally detected protein complexes generally contain Core/attachment structures.</p> <p>Methods</p> <p>In this paper, a Greedy Search Method based on Core-Attachment structure (GSM-CA) is proposed. The GSM-CA method detects densely connected regions in large protein-protein interaction networks based on the edge weight and two criteria for determining core nodes and attachment nodes. The GSM-CA method improves the prediction accuracy compared to other similar module detection approaches, however it is computationally expensive. Many module detection approaches are based on the traditional hierarchical methods, which is also computationally inefficient because the hierarchical tree structure produced by these approaches cannot provide adequate information to identify whether a network belongs to a module structure or not. In order to speed up the computational process, the Greedy Search Method based on Fast Clustering (GSM-FC) is proposed in this work. The edge weight based GSM-FC method uses a greedy procedure to traverse all edges just once to separate the network into the suitable set of modules.</p> <p>Results</p> <p>The proposed methods are applied to the protein interaction network of S. cerevisiae. Experimental results indicate that many significant functional modules are detected, most of which match the known complexes. Results also demonstrate that the GSM-FC algorithm is faster and more accurate as compared to other competing algorithms.</p> <p>Conclusions</p> <p>Based on the new edge weight definition, the proposed algorithm takes advantages of the greedy search procedure to separate the network into the suitable set of modules. Experimental analysis shows that the identified modules are statistically significant. The algorithm can reduce the computational time significantly while keeping high prediction accuracy.</p
    • ā€¦
    corecore