Search CORE

562 research outputs found

Improved homology-driven computational validation of protein-protein interactions motivated by the evolutionary gene duplication and divergence hypothesis

Abstract Background Protein-protein interaction (PPI) data sets generated by high-throughput experiments are contaminated by large numbers of erroneous PPIs. Therefore, computational methods for PPI validation are necessary to improve the quality of such data sets. Against the background of the theory that most extant PPIs arose as a consequence of gene duplication, the sensitive search for homologous PPIs, i.e. for PPIs descending from a common ancestral PPI, should be a successful strategy for PPI validation. Results To validate an experimentally observed PPI, we combine FASTA and PSI-BLAST to perform a sensitive sequence-based search for pairs of interacting homologous proteins within a large, integrated PPI database. A novel scoring scheme that incorporates both quality and quantity of all observed matches allows us (1) to consider also tentative paralogs and orthologs in this analysis and (2) to combine search results from more than one homology detection method. ROC curves illustrate the high efficacy of this approach and its improvement over other homology-based validation methods. Conclusion New PPIs are primarily derived from preexisting PPIs and not invented <it>de novo</it>. Thus, the hallmark of true PPIs is the existence of homologous PPIs. The sensitive search for homologous PPIs within a large body of known PPIs is an efficient strategy to separate biologically relevant PPIs from the many spurious PPIs reported by high-throughput experiments.</p

Paris Lodron University of Salzburg

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

PTOMSM: A modified version of Topological Overlap Measure used for predicting Protein-Protein Interaction Network

Author: Xun Huang
Publication venue
Publication date: 25/12/2009
Field of study

A variety of methods are developed to integrating diverse biological data to predict novel interaction relationship between proteins. However, traditional integration can only generate protein interaction pairs within existing relationships. Therefore, we propose a modified version of Topological Overlap Measure to identify not only extant direct PPIs links, but also novel protein interactions that can be indirectly inferred from various relationships between proteins. Our method is more powerful than a naïve Bayesian-network-based integration in PPI prediction, and could generate more reliable candidate PPIs. Furthermore, we examined the influence of the sizes of training and test datasets on prediction, and further demonstrated the effectiveness of PTOMSM in predicting PPI. More importantly, this method can be extended naturally to predict other types of biological networks, and may be combined with Bayesian method to further improve the prediction

Crossref

Nature Precedings

Darwin and Fisher meet at biotech : on the potential of computational molecular evolution in industry

Author: Anisimova Maria
Publication venue: BioMed Central
Publication date: 01/01/2015
Field of study

Today computational molecular evolution is a vibrant research field that benefits from the availability of large and complex new generation sequencing data - ranging from full genomes and proteomes to microbiomes, metabolomes and epigenomes. The grounds for this progress were established long before the discovery of the DNA structure. Specifically, Darwin's theory of evolution by means of natural selection not only remains relevant today, but also provides a solid basis for computational research with a variety of applications. But a long-term progress in biology was ensured by the mathematical sciences, as exemplified by Sir R. Fisher in early 20th century. Now this is true more than ever: The data size and its complexity require biologists to work in close collaboration with experts in computational sciences, modeling and statistics

Repository for Publications and Research Data

Springer - Publisher Connector

PubMed Central

ZHAW digitalcollection

The Evolution of Function in the Rab family of Small GTPases

Author: Diekmann Yoan
Publication venue: Universidade Nova de Lisboa. Instituto de Tecnologia Química e Biológica.
Publication date: 01/04/2014
Field of study

Dissertation presented to obtain the PhD degree in Computational Biology.The question how protein function evolves is a fundamental problem with profound implications for both functional end evolutionary studies on proteins. Here, we review some of the work that has addressed or contributed to this question. We identify and comment on three different levels relevant for the evolution of protein function. First, biochemistry. This is the focus of our discussion, as protein function itself commonly receives least attention in studies on protein evolution.(...

Repositório da Universidade Nova de Lisboa

Computationally Comparing Biological Networks and Reconstructing Their Evolution

Author: Patro Robert
Publication venue
Publication date: 01/01/2012
Field of study

Biological networks, such as protein-protein interaction, regulatory, or metabolic networks, provide information about biological function, beyond what can be gleaned from sequence alone. Unfortunately, most computational problems associated with these networks are NP-hard. In this dissertation, we develop algorithms to tackle numerous fundamental problems in the study of biological networks. First, we present a system for classifying the binding affinity of peptides to a diverse array of immunoglobulin antibodies. Computational approaches to this problem are integral to virtual screening and modern drug discovery. Our system is based on an ensemble of support vector machines and exhibits state-of-the-art performance. It placed 1st in the 2010 DREAM5 competition. Second, we investigate the problem of biological network alignment. Aligning the biological networks of different species allows for the discovery of shared structures and conserved pathways. We introduce an original procedure for network alignment based on a novel topological node signature. The pairwise global alignments of biological networks produced by our procedure, when evaluated under multiple metrics, are both more accurate and more robust to noise than those of previous work. Next, we explore the problem of ancestral network reconstruction. Knowing the state of ancestral networks allows us to examine how biological pathways have evolved, and how pathways in extant species have diverged from that of their common ancestor. We describe a novel framework for representing the evolutionary histories of biological networks and present efficient algorithms for reconstructing either a single parsimonious evolutionary history, or an ensemble of near-optimal histories. Under multiple models of network evolution, our approaches are effective at inferring the ancestral network interactions. Additionally, the ensemble approach is robust to noisy input, and can be used to impute missing interactions in experimental data. Finally, we introduce a framework, GrowCode, for learning network growth models. While previous work focuses on developing growth models manually, or on procedures for learning parameters for existing models, GrowCode learns fundamentally new growth models that match target networks in a flexible and user-defined way. We show that models learned by GrowCode produce networks whose target properties match those of real-world networks more closely than existing models

CiteSeerX

Digital Repository at the University of Maryland

Protein interactions across and between eukaryotic kingdoms: networks, inference strategies, integration of functional data and evolutionary dynamics

Author: Pevzner Samuel J
Publication venue: Boston University
Publication date: 01/01/2013
Field of study

Thesis (Ph.D.)--Boston UniversityHow cellular elements coordinate their function is a fundamental question in biology. A crucial step towards understanding cellular systems is the mapping of physical interactions between protein, DNA, RNA and other macromolecules or metabolites. Genome-scale technologies have yielded protein-protein interaction networks for several eukaryotic species and have provided insight into biological processes and evolution, but many of the currently available networks are biased. Towards a true human protein-protein interaction network, we examined literature-based aggregations of lowthroughput experiments, high-throughput experimental networks validated using different strategies, and predicted interaction networks to infer how the underlying interactome may differ from current maps. Using systematically mapped interactome networks, which appear to be the least biased, we explored the functional organization of Arabidopsis thaliana and characterize the asymmetric divergence of duplicated paralogous proteins through their interaction profiles. To further dissect the relationship between interactions and function enforced by evolution, we investigated a first-of-its-kind systematic crossspecies human-yeast hybrid interactome network. Although the cross-species network is topologically similar to conventional intra-species networks, we found signatures of dynamic changes in interaction propensities due to countervailing evolutionary forces. Collectively, these analyses of human, plant and yeast interactome networks bridge separate experiments to characterize bias, function and evolution across eukaryotic kingdoms

Boston University Institutional Repository (OpenBU)

Robust Algorithms for Detecting Hidden Structure in Biological Data

Author: Sloutsky Roman
Publication venue: Washington University Open Scholarship
Publication date: 15/08/2017
Field of study

Biological data, such as molecular abundance measurements and protein sequences, harbor complex hidden structure that reflects its underlying biological mechanisms. For example, high-throughput abundance measurements provide a snapshot the global state of a living cell, while homologous protein sequences encode the residue-level logic of the proteins\u27 function and provide a snapshot of the evolutionary trajectory of the protein family. In this work I describe algorithmic approaches and analysis software I developed for uncovering hidden structure in both kinds of data. Clustering is an unsurpervised machine learning technique commonly used to map the structure of data collected in high-throughput experiments, such as quantification of gene expression by DNA microarrays or short-read sequencing. Clustering algorithms always yield a partitioning of the data, but relying on a single partitioning solution can lead to spurious conclusions. In particular, noise in the data can cause objects to fall into the same cluster by chance rather than due to meaningful association. In the first part of this thesis I demonstrate approaches to clustering data robustly in the presence of noise and apply robust clustering to analyze the transcriptional response to injury in a neuron cell. In the second part of this thesis I describe identifying hidden specificity determining residues (SDPs) from alignments of protein sequences descended through gene duplication from a common ancestor (paralogs) and apply the approach to identify numerous putative SDPs in bacterial transcription factors in the LacI family. Finally, I describe and demonstrate a new algorithm for reconstructing the history of duplications by which paralogs descended from their common ancestor. This algorithm addresses the complexity of such reconstruction due to indeterminate or erroneous homology assignments made by sequence alignment algorithms and to the vast prevalence of divergence through speciation over divergence through gene duplication in protein evolution

Washington University St. Louis: Open Scholarship

Recommended from our members

Probabilistic Reconstruction and Comparative Systems Biology of Microbial Metabolism

Author: Plata Caviedes German
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

With the number of sequenced microbial species soon to be in the tens of thousands, we are in a unique position to investigate microbial function, ecology, and evolution on a large scale. In this dissertation I first describe the use of hundreds of in silico models of bacterial metabolic networks to study the long-term the evolution of growth and gene-essentiality phenotypes. The results show that, over billions of years of evolution, the conservation of bacterial phenotypic properties drops by a similar fraction per unit time following an exponential decay. The analysis provides a framework to generate and test hypotheses related to the phenotypic evolution of different microbial groups and for comparative analyses based on phenotypic properties of species. Mapping of genome sequences to phenotypic predictions -such as used in the analysis just described- critically relies on accurate functional annotations. In this context, I next describe GLOBUS, a probabilistic method for genome-wide biochemical annotations. GLOBUS uses Gibbs sampling to calculate probabilities for each possible assignment of genes to metabolic functions based on sequence information and both local and global genomic context data. Several important functional predictions made by GLOBUS were experimentally validated in Bacillus subtilis and hundreds more were obtained across other species. Complementary to the automated annotation method, I also describe the manual reconstruction and constraints-based analysis of the metabolic network of the malaria parasite Plasmodium falciparum. After careful reconciliation of the model with available biochemical and phenotypic data, the high-quality reconstruction allowed the prediction and in vivo validation of a novel potential antimalarial target. The model was also used to contextualize different types of genome-scale data such as gene expression and metabolomics measurements. Finally, I present two projects related to population genetics aspects of sequence and genome evolution. The first project addresses the question of why highly expressed proteins evolve slowly, showing that, at least for Escherichia coli, this is more likely to be a consequence of selection for translational efficiency than selection to avoid misfolded protein toxicity. The second project investigates genetic robustness mediated by gene duplicates in the context of large natural microbial populations. The analysis shows that, under these conditions, the ability of duplicated yeast genes to effectively compensate for the loss of their paralogs is not a monotonic function of their sequence divergence

Columbia University Academic Commons

Evolutionary genomics : statistical and computational methods

Author: Anisimova Maria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

ZHAW digitalcollection

Directory of Open Access Books (DOAB)

Recommended from our members

The evolution of protein kinase specificity

Author: Bradley David
Publication venue: University of Cambridge
Publication date: 12/03/2019
Field of study

All research conducted at EMBL-EBI under the supervision of Dr. Pedro Beltrao. Work on the PhD project was paused temporarily in the Spring of 2017 for me to undertake a 3-month internship at EMBO Press (in Heidelberg).Protein phosphorylation represents one of the most important post-translational modifica- tions (PTMs) for cell signalling, and is is catalysed by a group of enzymes called protein kinases. Through this activity they serve as key regulators of almost all cellular processes. This is achieved at any time by a network of different kinases that are transiently active. The fidelity of cell systems control therefore requires that each kinase targets only a restricted set of substrates. This specificity is achieved partly by contextual factors that separate kinases spatially and temporally, but also by sequence features that are encoded in the kinase domain itself. For this thesis I focus on elements of kinase specificity that are encoded in the the active site of the enzyme. During these investigations I have tried to address three main questions: 1) How is specificity for residues surrounding the phosphorylation site determined in the kinase? 2) How did these specificities evolve? and 3) To what extent does kinase evolution correlate with the evolution of its substrates? First, I developed a sequence-based method for the automated detection of kinase speci- ficity determining residues (SDRs). The putative determinants were then rationalised using available structural data, and in two specific cases were validated experimentally. I also used mutation data from The Cancer Genome Atlas (TCGA) to demonstrate that kinase SDRs are often targeted during cancer. Second, a global analysis of SDR evolution was performed for kinases following gene duplication and speciation, revealing that SDRs often diverge between paralogues but not between orthologues. This global analysis is followed by a detailed case study of G-protein coupled receptor kinase (GRKs) evolution using ancestral sequence reconstructions. Third, I inferred global substrate preferences in a taxonomically broad range of species using phosphoproteome data. I then related the evolution of substrate motif sequences to that of their cognate effector kinases where possible. The results strongly suggest that many of the motifs emerged in a universal eukaryotic ancestor. I finish by summarising the major findings of this doctoral research, which to my knowl- edge represents the most comprehensive analysis to date of protein kinase specificity and its evolution.BBSR

Apollo (Cambridge)