11 research outputs found

    Bayesian Inference for Duplication-Mutation with Complementarity Network Models

    Get PDF
    We observe an undirected graph GG without multiple edges and self-loops, which is to represent a protein-protein interaction (PPI) network. We assume that GG evolved under the duplication-mutation with complementarity (DMC) model from a seed graph, G0G_0, and we also observe the binary forest Γ\Gamma that represents the duplication history of GG. A posterior density for the DMC model parameters is established, and we outline a sampling strategy by which one can perform Bayesian inference; that sampling strategy employs a particle marginal Metropolis-Hastings (PMMH) algorithm. We test our methodology on numerical examples to demonstrate a high accuracy and precision in the inference of the DMC model's mutation and homodimerization parameters

    Reconstruction of Network Evolutionary History from Extant Network Topology and Duplication History

    Full text link
    Genome-wide protein-protein interaction (PPI) data are readily available thanks to recent breakthroughs in biotechnology. However, PPI networks of extant organisms are only snapshots of the network evolution. How to infer the whole evolution history becomes a challenging problem in computational biology. In this paper, we present a likelihood-based approach to inferring network evolution history from the topology of PPI networks and the duplication relationship among the paralogs. Simulations show that our approach outperforms the existing ones in terms of the accuracy of reconstruction. Moreover, the growth parameters of several real PPI networks estimated by our method are more consistent with the ones predicted in literature.Comment: 15 pages, 5 figures, submitted to ISBRA 201

    Phylogenetic transfer of knowledge for biological networks

    Get PDF

    The evolutionary dynamics of protein-protein interaction networks inferred from the reconstruction of ancient networks

    Get PDF
    Cellular functions are based on the complex interplay of proteins, therefore the structure and dynamics of these protein-protein interaction (PPI) networks are the key to the functional understanding of cells. In the last years, large-scale PPI networks of several model organisms were investigated. Methodological improvements now allow the analysis of PPI networks of multiple organisms simultaneously as well as the direct modeling of ancestral networks. This provides the opportunity to challenge existing assumptions on network evolution. We utilized present-day PPI networks from integrated datasets of seven model organisms and developed a theoretical and bioinformatic framework for studying the evolutionary dynamics of PPI networks. A novel filtering approach using percolation analysis was developed to remove low confidence interactions based on topological constraints. We then reconstructed the ancient PPI networks of different ancestors, for which the ancestral proteomes, as well as the ancestral interactions, were inferred. Ancestral proteins were reconstructed using orthologous groups on different evolutionary levels. A stochastic approach, using the duplication-divergence model, was developed for estimating the probabilities of ancient interactions from today's PPI networks. The growth rates for nodes, edges, sizes and modularities of the networks indicate multiplicative growth and are consistent with the results from independent static analysis. Our results support the duplication-divergence model of evolution and indicate fractality and multiplicative growth as general properties of the PPI network structure and dynamics

    Genome Assembly Techniques

    Get PDF
    Since the publication of the human genome in 2001, the price and the time of DNA sequencing have dropped dramatically. The genome of many more species have since been sequenced, and genome sequencing is an ever more important tool for biologists. This trend will likely revolutionize biology and medicine in the near future where the genome sequence of each individual person, instead of a model genome for the human, becomes readily accessible. Nevertheless, genome assembly remains a challenging computational problem, even more so with second generation sequencing technologies which generate a greater amount of data and make the assembly process more complex. Research to quickly, cheaply and accurately assemble the increasing amount of DNA sequenced is of great practical importance. In the first part of this thesis, we present two software developed to improve genome assemblies. First, Jellyfish is a fast k-mer counter, capable of handling large data sets. k-mer frequencies are central to many tasks in genome assembly (e.g. for error correction, finding read overlaps) and other study of the genome (e.g. finding highly repeated sequences such as transposons). Second, Chromosome Builder is a scaffolder and contig placement software. It aims at improving the accuracy of genome assembly. In the second part of this thesis we explore several problems dealing with graphs. The theory of graphs can be used to solve many computational problems. For example, the genome assembly problem can be represented as finding an Eulerian path in a de Bruijn graph. The physical interactions between proteins (PPI network), or between transcription factors and genes (regulatory networks), are naturally expressed as graphs. First, we introduce the concept of "exactly 3-edge-connected" graphs. These graphs have only a remote biological motivation but are interesting in their own right. Second, we study the reconstruction of ancestral network which aims at inferring the state of ancestral species' biological networks based on the networks of current species

    Computationally Comparing Biological Networks and Reconstructing Their Evolution

    Get PDF
    Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks. First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition. Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work. Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data. Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models

    Quantitative methods for reconstructing protein-protein interaction histories

    Get PDF
    Protein-protein interactions (PPIs) are vital for the function of a cell and the evolution of these interactions produce much of the evolution of phenotype of an organism. However, as the evolutionary process cannot be observed, methods are required to infer evolution from existing data. An understanding of the resulting evolutionary relationships between species can then provide information for PPI prediction and function assignment. This thesis further develops and applies the interaction tree method for modelling PPI evolution within and between protein families. In this approach, a phylogeny of the protein family/ies of interest is used to explicitly construct a history of duplication and specification events. Given a model relating sequence change in this phylogeny to the probability of a rewiring event occurring, this method can then infer probabilities of interaction between the ancestral proteins described in the phylogeny. It is shown that the method can be adapted to infer the evolution of PPIs within obligate protein complexes, using a large set of such complexes to validate this application. This approach is then applied to reconstruct the history of the proteasome complex, using x-ray crystallography structures of the complex as input, with validation to show its utility in predicting present day complexes for which we have no structural data. The methodology is then adapted for application to transient PPIs. It is shown that the approach used in the previous chapter is inadequate here and a new scoring system is described based on a likelihood score of interaction. The predictive ability of this score is shown in predicting known two component systems in bacteria and its use in an interaction tree setting is demonstrated through inference of the interaction history between the histidine kinase and response regulator proteins responsible for sporulation onset in a set of bacteria. This thesis demonstrates that with suitable modifications the interaction tree approach is widely applicable to modelling PPI evolution and also, importantly, predicting existing PPIs. This demonstrates the need to incorporate phylogenetic data in to methods of predicting PPIs and gives some measure of the benefit in doing so

    Models and Algorithms in Biological Network Evolution with Modularity

    Get PDF
    Networks are commonly used to represent key processes in biology; examples include transcriptional regulatory networks, protein-protein interaction (PPI) networks, metabolic networks, etc. Databases store many such networks, as graphs, observed or inferred. Generative models for these networks have been proposed. For PPI networks, current models are based on duplication and divergence (D&D): a node (gene) is duplicated and inherits some subset of the connections of the original node. An early finding about biological networks is modularity: a higher-level structure is prevalent consisting of well connected subgraphs with less substantial connectivity to other such subgraphs. While D&D models spontaneously generate modular structures, neither have these structures been compared with those in the databases nor are D&D models known to maintain and evolve them. Given that the preferred generative models being based on D&D, the network inference models are also based on the same principle. We describe NEMo (Network Evolution with Modularity), a new model that embodies modularity. It consists of two layers: the lower layer is a derivation of the D&D process thus node-and-edge based, while the upper layer is module-aware. NEMo allows modules to appear and disappear, to fission and to merge, all driven by the underlying edge-level events using a duplication-based process. We also introduce measures to compare biological networks in terms of their modular structure. We present an extensive study of six model organisms across six public databases aimed at uncovering commonalities in network structure. We then use these commonalities as reference against which to compare the networks generated by D&D models and by our module-aware model NEMo. We find that, by restricting our data to high-confidence interactions, a number of shared structural features can be identified among the six species and six databases. When comparing these characteristics with those extracted from the networks produced by D&D models and our NEMo model, we further find that the networks generated by NEMo exhibit structural characteristics much closer to those of the PPI networks of the model organisms. We conclude that modularity in PPI networks takes a particular form, one that is better approximated by the module-aware NEMo model than by other current models. Finally, we draft the ideas for a module-aware network inference model that uses an altered form of our module-aware NEMo as the core component, from a parsimony perspective

    Parsimonious Reconstruction of Network Evolution

    Get PDF
    Abstract. We consider the problem of reconstructing a maximally parsimonious history of network evolution under models that support gene duplication and loss and independent interaction gain and loss. We introduce a combinatorial framework for encoding network histories, and we give a fast procedure that, given a set of duplication histories, in practice finds network histories with close to the minimum number of interaction gain or loss events. In contrast to previous studies, our method does not require knowing the relative ordering of unrelated duplication events. Results on simulated histories suggest that common ancestral networks can be accurately reconstructed using this parsimony approach.
    corecore