484 research outputs found
Bayesian Inference for Duplication-Mutation with Complementarity Network Models
We observe an undirected graph without multiple edges and self-loops,
which is to represent a protein-protein interaction (PPI) network. We assume
that evolved under the duplication-mutation with complementarity (DMC)
model from a seed graph, , and we also observe the binary forest
that represents the duplication history of . A posterior density for the DMC
model parameters is established, and we outline a sampling strategy by which
one can perform Bayesian inference; that sampling strategy employs a particle
marginal Metropolis-Hastings (PMMH) algorithm. We test our methodology on
numerical examples to demonstrate a high accuracy and precision in the
inference of the DMC model's mutation and homodimerization parameters
Reconstruction of Network Evolutionary History from Extant Network Topology and Duplication History
Genome-wide protein-protein interaction (PPI) data are readily available
thanks to recent breakthroughs in biotechnology. However, PPI networks of
extant organisms are only snapshots of the network evolution. How to infer the
whole evolution history becomes a challenging problem in computational biology.
In this paper, we present a likelihood-based approach to inferring network
evolution history from the topology of PPI networks and the duplication
relationship among the paralogs. Simulations show that our approach outperforms
the existing ones in terms of the accuracy of reconstruction. Moreover, the
growth parameters of several real PPI networks estimated by our method are more
consistent with the ones predicted in literature.Comment: 15 pages, 5 figures, submitted to ISBRA 201
Quantitative methods for reconstructing protein-protein interaction histories
Protein-protein interactions (PPIs) are vital for the function of a cell and the
evolution of these interactions produce much of the evolution of phenotype of an
organism. However, as the evolutionary process cannot be observed, methods are
required to infer evolution from existing data. An understanding of the resulting
evolutionary relationships between species can then provide information for PPI
prediction and function assignment.
This thesis further develops and applies the interaction tree method for modelling
PPI evolution within and between protein families. In this approach, a
phylogeny of the protein family/ies of interest is used to explicitly construct a history
of duplication and specification events. Given a model relating sequence change
in this phylogeny to the probability of a rewiring event occurring, this method can
then infer probabilities of interaction between the ancestral proteins described in
the phylogeny.
It is shown that the method can be adapted to infer the evolution of PPIs
within obligate protein complexes, using a large set of such complexes to validate
this application. This approach is then applied to reconstruct the history of the
proteasome complex, using x-ray crystallography structures of the complex as
input, with validation to show its utility in predicting present day complexes for
which we have no structural data.
The methodology is then adapted for application to transient PPIs. It is shown
that the approach used in the previous chapter is inadequate here and a new scoring
system is described based on a likelihood score of interaction. The predictive ability
of this score is shown in predicting known two component systems in bacteria and
its use in an interaction tree setting is demonstrated through inference of the
interaction history between the histidine kinase and response regulator proteins
responsible for sporulation onset in a set of bacteria.
This thesis demonstrates that with suitable modifications the interaction tree
approach is widely applicable to modelling PPI evolution and also, importantly,
predicting existing PPIs. This demonstrates the need to incorporate phylogenetic
data in to methods of predicting PPIs and gives some measure of the benefit in
doing so
Gene Duplicability-Connectivity-Complexity across Organisms and a Neutral Evolutionary Explanation
Gene duplication has long been acknowledged by biologists as a major evolutionary force shaping genomic architectures
and characteristics across the Tree of Life. Major research has been conducting on elucidating the fate of duplicated genes
in a variety of organisms, as well as factors that affect a geneメs duplicabilityヨthat is, the tendency of certain genes to retain
more duplicates than others. In particular, two studies have looked at the correlation between gene duplicability and its
degree in a protein-protein interaction network in yeast, mouse, and human, and another has looked at the correlation
between gene duplicability and its complexity (length, number of domains, etc.) in yeast. In this paper, we extend these
studies to six species, and two trends emerge. There is an increase in the duplicability-connectivity correlation that agrees
with the increase in the genome size as well as the phylogenetic relationship of the species. Further, the duplicabilitycomplexity
correlation seems to be constant across the species. We argue that the observed correlations can be explained
by neutral evolutionary forces acting on the genomic regions containing the genes. For the duplicability-connectivity
correlation, we show through simulations that an increasing trend can be obtained by adjusting parameters to approximate
genomic characteristics of the respective species. Our results call for more research into factors, adaptive and non-adaptive
alike, that determine a geneメs duplicability
Models and Algorithms in Biological Network Evolution with Modularity
Networks are commonly used to represent key processes in biology; examples include transcriptional regulatory networks, protein-protein interaction (PPI) networks, metabolic networks, etc. Databases store many such networks, as graphs, observed or inferred. Generative models for these networks have been proposed. For PPI networks, current models are based on duplication and divergence (D&D): a node (gene) is duplicated and inherits some subset of the connections of the original node. An early finding about biological networks is modularity: a higher-level structure is prevalent consisting of well connected subgraphs with less substantial connectivity to other such subgraphs. While D&D models spontaneously generate modular structures, neither have these structures been compared with those in the databases nor are D&D models known to maintain and evolve them. Given that the preferred generative models being based on D&D, the network inference models are also based on the same principle. We describe NEMo (Network Evolution with Modularity), a new model that embodies modularity. It consists of two layers: the lower layer is a derivation of the D&D process thus node-and-edge based, while the upper layer is module-aware. NEMo allows modules to appear and disappear, to fission and to merge, all driven by the underlying edge-level events using a duplication-based process. We also introduce measures to compare biological networks in terms of their modular structure. We present an extensive study of six model organisms across six public databases aimed at uncovering commonalities in network structure. We then use these commonalities as reference against which to compare the networks generated by D&D models and by our module-aware model NEMo. We find that, by restricting our data to high-confidence interactions, a number of shared structural features can be identified among the six species and six databases. When comparing these characteristics with those extracted from the networks produced by D&D models and our NEMo model, we further find that the networks generated by NEMo exhibit structural characteristics much closer to those of the PPI networks of the model organisms. We conclude that modularity in PPI networks takes a particular form, one that is better approximated by the module-aware NEMo model than by other current models. Finally, we draft the ideas for a module-aware network inference model that uses an altered form of our module-aware NEMo as the core component, from a parsimony perspective
The evolution of the huntingtin-associated protein 40 (HAP40) in conjunction with huntingtin
Background The huntingtin-associated protein 40 (HAP40) abundantly interacts with huntingtin (HTT), the protein that is altered in Huntington's disease (HD). Therefore, we analysed the evolution of HAP40 and its interaction with HTT. Results We found that in amniotes HAP40 is encoded by a single-exon gene, whereas in all other organisms it is expressed from multi-exon genes. HAP40 co-occurs with HTT in unikonts, including filastereans such as Capsaspora owczarzaki and the amoebozoan Dictyostelium discoideum, but both proteins are absent from fungi. Outside unikonts, a few species, such as the free-living amoeboflagellate Naegleria gruberi, contain putative HTT and HAP40 orthologs. Biochemically we show that the interaction between HTT and HAP40 extends to fish, and bioinformatic analyses provide evidence for evolutionary conservation of this interaction. The closest homologue of HAP40 in current protein databases is the family of soluble N-ethylmaleimide-sensitive factor attachment proteins (SNAPs). Conclusion Our results indicate that the transition from a multi-exon to a single-exon gene appears to have taken place by retroposition during the divergence of amphibians and amniotes, followed by the loss of the parental multi-exon gene. Furthermore, it appears that the two proteins probably originated at the root of eukaryotes. Conservation of the interaction between HAP40 and HTT and their likely coevolution strongly indicate functional importance of this interaction
The UlaG protein family defines novel structural and functional motifs grafted on an ancient RNase fold
Background: Bacterial populations are highly successful at colonizing new habitats and adapting to changing environmental conditions, partly due to their capacity to evolve novel virulence and metabolic pathways in response to stress conditions and to shuffle them by horizontal gene transfer (HGT). A common theme in the evolution of new functions consists of gene duplication followed by functional divergence. UlaG, a unique manganese-dependent metallo-b-lactamase (MBL) enzyme involved in L-ascorbate metabolism by commensal and symbiotic enterobacteria, provides a model for the study of the emergence of new catalytic activities from the modification of an ancient fold. Furthermore, UlaG is the founding member of the so-called UlaG-like (UlaGL) protein family, a recently established and poorly characterized family comprising divalent (and perhaps trivalent)metal-binding MBLs that catalyze transformations on phosphorylated sugars and nucleotides. Results: Here we combined protein structure-guided and sequence-only molecular phylogenetic analyses to dissect the molecular evolution of UlaG and to study its phylogenomic distribution, its relatedness with present-day UlaGL protein sequences and functional conservation. Phylogenetic analyses indicate that UlaGL sequences are present in Bacteria and Archaea, with bona fide orthologs found mainly in mammalian and plant-associated Gramnegative and Gram-positive bacteria. The incongruence between the UlaGL tree and known species trees indicates exchange by HGT and suggests that the UlaGL-encoding genes provided a growth advantage under changing conditions. Our search for more distantly related protein sequences aided by structural homology has uncovered that UlaGL sequences have a common evolutionary origin with present-day RNA processing and metabolizing MBL enzymes widespread in Bacteria, Archaea, and Eukarya. This observation suggests an ancient origin for the UlaGL family within the broader trunk of the MBL superfamily by duplication, neofunctionalization and fixation. Conclusions: Our results suggest that the forerunner of UlaG was present as an RNA metabolizing enzyme in the last common ancestor, and that the modern descendants of that ancestral gene have a wide phylogenetic distribution and functional roles. We propose that the UlaGL family evolved new metabolic roles among bacterial and possibly archeal phyla in the setting of a close association with metazoans, such as in the mammalian gastrointestinal tract or in animal and plant pathogens, as well as in environmental settings. Accordingly, the major evolutionary forces shaping the UlaGL family include vertical inheritance and lineage-specific duplication and acquisition of novel metabolic functions, followed by HGT and numerous lineage-specific gene loss events
- …