2,342 research outputs found

    Methods for protein complex prediction and their contributions towards understanding the organization, function and dynamics of complexes

    Get PDF
    Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organization of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight challenges faced by these methods, in particular detection of sparse and small or sub- complexes and discerning of overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.Comment: 1 Tabl

    Network-based approaches to explore complex biological systems towards network medicine

    Get PDF
    Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes

    Complex-based analysis of dysregulated cellular processes in cancer

    Full text link
    Background: Differential expression analysis of (individual) genes is often used to study their roles in diseases. However, diseases such as cancer are a result of the combined effect of multiple genes. Gene products such as proteins seldom act in isolation, but instead constitute stable multi-protein complexes performing dedicated functions. Therefore, complexes aggregate the effect of individual genes (proteins) and can be used to gain a better understanding of cancer mechanisms. Here, we observe that complexes show considerable changes in their expression, in turn directed by the concerted action of transcription factors (TFs), across cancer conditions. We seek to gain novel insights into cancer mechanisms through a systematic analysis of complexes and their transcriptional regulation. Results: We integrated large-scale protein-interaction (PPI) and gene-expression datasets to identify complexes that exhibit significant changes in their expression across different conditions in cancer. We devised a log-linear model to relate these changes to the differential regulation of complexes by TFs. The application of our model on two case studies involving pancreatic and familial breast tumour conditions revealed: (i) complexes in core cellular processes, especially those responsible for maintaining genome stability and cell proliferation (e.g. DNA damage repair and cell cycle) show considerable changes in expression; (ii) these changes include decrease and countering increase for different sets of complexes indicative of compensatory mechanisms coming into play in tumours; and (iii) TFs work in cooperative and counteractive ways to regulate these mechanisms. Such aberrant complexes and their regulating TFs play vital roles in the initiation and progression of cancer.Comment: 22 pages, BMC Systems Biolog

    Automated design of bacterial genome sequences

    Get PDF
    Background: Organisms have evolved ways of regulating transcription to better adapt to varying environments. Could the current functional genomics data and models support the possibility of engineering a genome with completely rearranged gene organization while the cell maintains its behavior under environmental challenges? How would we proceed to design a full nucleotide sequence for such genomes? Results: As a first step towards answering such questions, recent work showed that it is possible to design alternative transcriptomic models showing the same behavior under environmental variations than the wild-type model. A second step would require providing evidence that it is possible to provide a nucleotide sequence for a genome encoding such transcriptional model. We used computational design techniques to design a rewired global transcriptional regulation of Escherichia coli, yet showing a similar transcriptomic response than the wild-type. Afterwards, we “compiled” the transcriptional networks into nucleotide sequences to obtain the final genome sequence. Our computational evolution procedure ensures that we can maintain the genotype-phenotype mapping during the rewiring of the regulatory network. We found that it is theoretically possible to reorganize E. coli genome into 86% fewer regulated operons. Such refactored genomes are constituted by operons that contain sets of genes sharing around the 60% of their biological functions and, if evolved under highly variable environmental conditions, have regulatory networks, which turn out to respond more than 20% faster to multiple external perturbations. Conclusions: This work provides the first algorithm for producing a genome sequence encoding a rewired transcriptional regulation with wild-type behavior under alternative environments

    Network analyses of proteome evolution and diversity

    Full text link
    The mapping of biomolecular interactions reveals that the function of most biological components depends on a web of interrelations with other cellular components, stressing the need for a systems-level view of biological functions. In this work, I explore ways in which the integration of network and genomic information from different organizational levels can lead to a better understanding of cellular systems and components. First, studying yeast, I show that the evolutionary properties of target genes constitute the dominant determinant of transcription factor (TF) evolutionary rate and that this evolutionary modularity is limited to activating regulatory relationships. I also show that targets of fast-evolving TFs show greater evolutionary expression changes and are enriched for niche-specific functions and other TFs. This work highlights the importance of trans-regulatory network evolution in species-specific gene expression and network adaptation. Next, I show that genes either lost or gained across fungal evolution are enriched in TFs and have very different network and genomic properties than universally conserved genes, including, in sharp contrast to other networks, a greater number of transcriptional regulators. Placing genes in the context of their evolutionary life-cycle reveals principles of network integration of gained genes and evidence for the progressive network and functional marginalization of genes as an evolutionary process preceding gene loss. In the final chapter, I study how alternative splicing (AS)-driven expansion of human proteome diversity leads to system-level complexity through the AS-mediated rewiring of the protein-protein interaction network. By overlaying different network and genomic datasets onto the first large-scale isoform-resolution interactome, I found that differentiating between splice variants is essential to capturing the full extent of the network's functional modularity. I also discovered that AS-mediated rewiring preferentially affects tissue-specific genes and that topologically different patterns of rewiring have distinct functional consequences. Furthermore, I found that most rewiring can be traced to the AS of evolutionarily conserved sequence modules, which promote or block interactions and tend to overlap linear motifs and disrupt known domain-domain interactions. Together, this work demonstrates that a network-level perspective and genomic data integration are essential to understanding the evolution and functional diversity of proteomes

    Identifying disease-associated genes based on artificial intelligence

    Get PDF
    Identifying disease-gene associations can help improve the understanding of disease mechanisms, which has a variety of applications, such as early diagnosis and drug development. Although experimental techniques, such as linkage analysis, genome-wide association studies (GWAS), have identified a large number of associations, identifying disease genes is still challenging since experimental methods are usually time-consuming and expensive. To solve these issues, computational methods are proposed to predict disease-gene associations. Based on the characteristics of existing computational algorithms in the literature, we can roughly divide them into three categories: network-based methods, machine learning-based methods, and other methods. No matter what models are used to predict disease genes, the proper integration of multi-level biological data is the key to improving prediction accuracy. This thesis addresses some limitations of the existing computational algorithms, and integrates multi-level data via artificial intelligence techniques. The thesis starts with a comprehensive review of computational methods, databases, and evaluation methods used in predicting disease-gene associations, followed by one network-based method and four machine learning-based methods. The first chapter introduces the background information, objectives of the studies and structure of the thesis. After that, a comprehensive review is provided in the second chapter to discuss the existing algorithms as well as the databases and evaluation methods used in existing studies. Having the objectives and future directions, the thesis then presents five computational methods for predicting disease-gene associations. The first method proposed in Chapter 3 considers the issue of non-disease gene selection. A shortest path-based strategy is used to select reliable non-disease genes from a disease gene network and a differential network. The selected genes are then used by a network-energy model to improve its performance. The second method proposed in Chapter 4 constructs sample-based networks for case samples and uses them to predict disease genes. This strategy improves the quality of protein-protein interaction (PPI) networks, which further improves the prediction accuracy. Chapter 5 presents a generic model which applies multimodal deep belief nets (DBN) to fuse different types of data. Network embeddings extracted from PPI networks and gene ontology (GO) data are fused with the multimodal DBN to obtain cross-modality representations. Chapter 6 presents another deep learning model which uses a convolutional neural network (CNN) to integrate gene similarities with other types of data. Finally, the fifth method proposed in Chapter 7 is a nonnegative matrix factorization (NMF)-based method. This method maps diseases and genes onto a lower-dimensional manifold, and the geodesic distance between diseases and genes are used to predict their associations. The method can predict disease genes even if the disease under consideration has no known associated genes. In summary, this thesis has proposed several artificial intelligence-based computational algorithms to address the typical issues existing in computational algorithms. Experimental results have shown that the proposed methods can improve the accuracy of disease-gene prediction
    • …
    corecore