19 research outputs found

    Organization of Physical Interactomes as Uncovered by Network Schemas

    Get PDF
    Large-scale protein-protein interaction networks provide new opportunities for understanding cellular organization and functioning. We introduce network schemas to elucidate shared mechanisms within interactomes. Network schemas specify descriptions of proteins and the topology of interactions among them. We develop algorithms for systematically uncovering recurring, over-represented schemas in physical interaction networks. We apply our methods to the S. cerevisiae interactome, focusing on schemas consisting of proteins described via sequence motifs and molecular function annotations and interacting with one another in one of four basic network topologies. We identify hundreds of recurring and over-represented network schemas of various complexity, and demonstrate via graph-theoretic representations how more complex schemas are organized in terms of their lower-order constituents. The uncovered schemas span a wide range of cellular activities, with many signaling and transport related higher-order schemas. We establish the functional importance of the schemas by showing that they correspond to functionally cohesive sets of proteins, are enriched in the frequency with which they have instances in the H. sapiens interactome, and are useful for predicting protein function. Our findings suggest that network schemas are a powerful paradigm for organizing, interrogating, and annotating cellular networks

    Novel genes exhibit distinct patterns of function acquisition and network integration

    Get PDF
    BackgroundGenes are created by a variety of evolutionary processes, some of which generate duplicate copies of an entire gene, while others rearrange pre-existing genetic elements or co-opt previously non-coding sequence to create genes with 'novel' sequences. These novel genes are thought to contribute to distinct phenotypes that distinguish organisms. The creation, evolution, and function of duplicated genes are well-studied; however, the genesis and early evolution of novel genes are not well-characterized. We developed a computational approach to investigate these issues by integrating genome-wide comparative phylogenetic analysis with functional and interaction data derived from small-scale and high-throughput experiments.ResultsWe examine the function and evolution of new genes in the yeast Saccharomyces cerevisiae. We observed significant differences in the functional attributes and interactions of genes created at different times and by different mechanisms. Novel genes are initially less integrated into cellular networks than duplicate genes, but they appear to gain functions and interactions more quickly than duplicates. Recently created duplicated genes show evidence of adapting existing functions to environmental changes, while young novel genes do not exhibit enrichment for any particular functions. Finally, we found a significant preference for genes to interact with other genes of similar age and origin.ConclusionsOur results suggest a strong relationship between how and when genes are created and the roles they play in the cell. Overall, genes tend to become more integrated into the functional networks of the cell with time, but the dynamics of this process differ significantly between duplicate and novel genes

    Computational tools for the study of cellular networks

    Get PDF
    Doutoramento em BiologiaCom a chegada da chamada era genómica tornou-se importante focar os estudos de fenómenos biológicas num ponto de vista de “sistema”. Isto deve se ao facto de ser fundamental compreender como é que as funções celulares emergem da interacção e integração dos muitos componentes celulares. Para tornar isto possível, muitos dos recentes desenvolvimentos tecnológicos têm sido focados na colecção de um grande números de dados sobre grande parte dos componentes celulares. A necessidade de desenvolver novos métodos computacionais, capazes de integrar e relacionar esta informação tornou-se por isso num imperativo. Estes novos métodos de análise devem permitir a criacao de modelos que consigam extender o conhecimento actual de forma a prever dados não conhecidos. O trabalho apresentado nesta tese tem como foco o desenvolvimento de métodos computacionais para o estudo de interacções entre proteínas. Em particular, foi desenvolvido um método para prever a especificidade de interacção de domínios de proteínas que ligam péptidos utilizando informação estrutural. Para demonstrar este método, foram escolhidos domínios SH3 de S. cerevisiae e domínios SH2 de H. sapiens. O trabalho aqui apresentado mostra que, conhecendo a especificidade de interacção, é possível usar genómica comparativa e o conhecimento da estrutura secondária das proteínas para prever quais os alvos de interacção destas proteínas no proteoma com mais de 75% de exactidão. Foi observado que a exactidão destas previsões aumenta quando se restringe a procura de locais de interacção a zonas do proteoma previstos como desordenados, sugerindo que os locais de interacção de domínios que se associam a péptidos tendem a residir nestas zonas. A análise dos actuais mapas de interacção de proteínas de várias espécies revelou que estas interacções apresentam considerável plasticidade evolutiva. O ritmo a que estas interacções mudam durante o processo evolutivo depende tanto da especificidade da interacção como dos processo biológico em que participam as proteínas. Como exemplo, o estudo do proteoma humano revelou que proteínas que participam na resposta imune, em funções de transporte e no estabelecimento de localização mostram sinais de selecção positiva para mudarem de interacções. Em resumo, são apresentados nesta tese novos métodos para prever interacções entre proteínas assim como novas hipóteses sobre o processo evolutivo destas interacções.With the developments of the so called Genomic era there has been an increasing awareness of the necessity to study biological phenomena from a systems view point. This is due to the importance of understanding how the interplay between the many cellular components brings about cellular functions. To this effect many of the most recent technological efforts in biology have been directed at collecting data that encompass most cellular components. Integration of these different experimental approaches and better comprehension of the vast data available, urged for the development of computational methods in biological research. These computational tools should be able to search for patterns that can extend current knowledge by providing predictive models of biological events. The work presented in this thesis focuses on computational methodologies to study protein-protein interactions. In particular, the results presented show that binding specificity of peptide binding domains can be obtained from protein structure. SH3 domains from S. cerevisiae and human SH2 domains were used to demonstrate this method. Also, this work demonstrates that, by knowing the binding specificity, it is possible to use comparative genomics and protein secondary structure information to accurately (>75% accuracy) predict the binding partners of a protein in the proteome. It was observed that restricting predictions to unstructured elements of the proteome increases the accuracy of the prediction. The analysis of current protein interaction information of many different species has revealed that protein interactions are quite plastic in evolution and are determined both by binding specificity and biological function. It was observed, for example, that human proteins related with immune response, transport and establishment of localization, show signs of positive selection for change of interactions. In summary, the work reported here, explores new methods to computationally predict protein interactions shedding light into the possible evolution of these interactions

    The Capabilities of Chaos and Complexity

    Get PDF
    To what degree could chaos and complexity have organized a Peptide or RNA World of crude yet necessarily integrated protometabolism? How far could such protolife evolve in the absence of a heritable linear digital symbol system that could mutate, instruct, regulate, optimize and maintain metabolic homeostasis? To address these questions, chaos, complexity, self-ordered states, and organization must all be carefully defined and distinguished. In addition their cause-and-effect relationships and mechanisms of action must be delineated. Are there any formal (non physical, abstract, conceptual, algorithmic) components to chaos, complexity, self-ordering and organization, or are they entirely physicodynamic (physical, mass/energy interaction alone)? Chaos and complexity can produce some fascinating self-ordered phenomena. But can spontaneous chaos and complexity steer events and processes toward pragmatic benefit, select function over non function, optimize algorithms, integrate circuits, produce computational halting, organize processes into formal systems, control and regulate existing systems toward greater efficiency? The question is pursued of whether there might be some yet-to-be discovered new law of biology that will elucidate the derivation of prescriptive information and control. “System” will be rigorously defined. Can a low-informational rapid succession of Prigogine’s dissipative structures self-order into bona fide organization

    Modular Algorithms for Biomolecular Network Alignment

    Get PDF
    Comparative analysis of biomolecular networks constructed using measurements from different conditions, tissues, and organisms offer a powerful approach to understanding the structure, function, dynamics, and evolution of complex biological systems. The rapidly advancing field of systems biology aims to understand the structure, function, dynamics, and evolution of complex biological systems in terms of the underlying networks of interactions among the large number of molecular participants involved including genes, proteins, and metabolites. In particular, the comparative analysis of network models representing biomolecular interactions in different species or tissues offers an important tool for identifying conserved modules, predicting functions of specific genes or proteins and studying the evolution of biological processes, among other applications. The primary focus of this dissertation is on the biomolecular network alignment problem: Given two or more network models, the problem is to optimally match the nodes and links in one network with the nodes and links of the other. The Biomolecular Network Alignment (BiNA) Toolkit developed as part of this dissertation provides a set of efficient (in terms of the running time complexity) and accurate (in terms of various evaluation criteria discussed in the literature) network alignment algorithms for biomolecular networks. BiNA is scalable, user-friendly, modular, and extensible for performing alignments on diverse types of biomolecular networks. The algorithm is applicable to (1) undirected graphs in their weighted and unweighted variations (2) undirected graphs in their labeled and unlabeled variations (3) and has been applied to align multiple networks from hundreds of nodes with a few thousand edges to networks with tens of thousands of nodes with millions of edges. The dissertation provides various applications of network comparison tools including how results from such alignments have been utilized to (1) construct phylogenetic trees based on protein-protein interaction networks, and (2) find biochemical pathways involved in ligand recognition in B cells

    Computational methods to explore hierarchical and modular structure of biological networks

    Get PDF
    Networks have been widely used to understand structure of complex systems. From studying biological networks of protein-protein, genetic and other types of interactions, we gain insights into functional organization of static biological systems that could hardly be measured experimentally in current state-of-the-art technology. Biological networks also serve as a principled framework that integrates multiple sources of genome-wide data sets such as gene expression arrays and sequencing. Yet, a large-scale network is often intractable for intuitive visualization and computation. We developed novel network clustering algorithms to harness the power of genome-scale biological networks of all genes/proteins. Especially our algorithms were capable of finding hidden modular structures in hierarchical stochastic block model. Since the modules are organized hierarchically, our algorithms facilitate downstream analysis and design of in-depth validation experiments in ``divide-and-conquer'' strategy. Moreover, we present empirical evidence that the hierarchical and modular structure best explains observed biological networks. We used the static clustering methods in two ways. First we sought to extend the static methods to dynamic clustering problems, and observed general patterns of dynamics of network modules. For examples we demonstrate dynamics of yeast metabolic cycle and Arabidopsis root developmental process. Moreover, we propose a prioritization scheme that sorts identified network modules in the order of discriminative power. In the course of research we conclude that biological networks are best understood as hierarchically organized modules, and the modules remain stable in unperturbed biological process, but they can respond differently to abnormal / external perturbations such as knock-down of key enzymes

    Discovering meaning from biological sequences: focus on predicting misannotated proteins, binding patterns, and G4-quadruplex secondary

    Get PDF
    Proteins are the principal catalytic agents, structural elements, signal transmitters, transporters, and molecular machines in cells. Experimental determination of protein function is expensive in time and resources compared to computational methods. Hence, assigning proteins function, predicting protein binding patterns, and understanding protein regulation are important problems in functional genomics and key challenges in bioinformatics. This dissertation comprises of three studies. In the first two papers, we apply machine-learning methods to (1) identify misannotated sequences and (2) predict the binding patterns of proteins. The third paper is (3) a genome-wide analysis of G4-quadruplex sequences in the maize genome. The first two papers are based on two-stage classification methods. The first stage uses machine-learning approaches that combine composition-based and sequence-based features. We use either a decision trees (HDTree) or support vector machines (SVM) as second-stage classifiers and show that classification performance reaches or outperforms more computationally expensive approaches. For study (1) our method identified potential misannotated sequences within a well-characterized set of proteins in a popular bioinformatics database. We identified misannotated proteins and show the proteins have contradicting AmiGO and UniProt annotations. For study (2), we developed a three-phase approach: Phase I classifies whether a protein binds with another protein. Phase II determines whether a protein-binding protein is a hub. Phase III classifies hub proteins based on the number of binding sites and the number of concurrent binding partners. For study (3), we carried out a computational genome-wide screen to identify non-telomeric G4-quadruplex (G4Q) elements in maize to explore their potential role in gene regulation for flowering plants. Analysis of G4Q-containing genes uncovered a striking tendency for their enrichment in genes of networks and pathways associated with electron transport, sugar degradation, and hypoxia responsiveness. The maize G4Q elements may play a previously unrecognized role in coordinating global regulation of gene expression in response to hypoxia to control carbohydrate metabolism for anaerobic metabolism. We demonstrated that our three studies have the ability to predict and provide new insights in classifying misannotated proteins, understanding protein binding patterns, and identifying a potentially new model for gene regulation

    Integrative Multi-Omics in Biomedical Research

    Get PDF
    Genomics technologies revolutionised biomedicine research, but the genome alone is not sufficient to capture biological complexity. Postgenomic methods, typically based on mass spectrometry, comprise the analysis of metabolites, lipids, and proteins and are an essential complement to genomics and transcriptomics. Multidimensional omics is becoming established to provide accurate and comprehensive state descriptions. This book covers the latest methodological developments for, and applications of integrative multi-omics in biomedical research
    corecore