586,315 research outputs found

    Historical contingency and entrenchment in protein evolution under purifying selection

    Get PDF
    The fitness contribution of an allele at one genetic site may depend on alleles at other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations under selection, and can shape the course of protein evolution across divergent species. Whereas epistasis between adaptive substitutions has been the subject of extensive study, relatively little is known about epistasis under purifying selection. Here we use mechanistic models of thermodynamic stability in a ligand-binding protein to explore the structure of epistatic interactions between substitutions that fix in protein sequences under purifying selection. We find that the selection coefficients of mutations that are nearly-neutral when they fix are highly contingent on the presence of preceding mutations. Conversely, mutations that are nearly-neutral when they fix are subsequently entrenched due to epistasis with later substitutions. Our evolutionary model includes insertions and deletions, as well as point mutations, and so it allows us to quantify epistasis within each of these classes of mutations, and also to study the evolution of protein length. We find that protein length remains largely constant over time, because indels are more deleterious than point mutations. Our results imply that, even under purifying selection, protein sequence evolution is highly contingent on history and so it cannot be predicted by the phenotypic effects of mutations assayed in the wild-type sequence.Comment: 42 pages, 13 figure

    Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.

    Get PDF
    A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery

    The SWISS-MODEL Repository: new features and functionalities

    Get PDF
    The SWISS-MODEL Repository is a database of annotated 3D protein structure models generated by the SWISS-MODEL homology-modelling pipeline. As of September 2005, the repository contained 675 000 models for 604 000 different protein sequences of the UniProt database. Regular updates ensure that the content of the repository reflects the current state of sequence and structure databases, integrating new or modified target sequences, and making use of new template structures. Each Repository entry consists of one or more 3D models accompanied by detailed information about the target protein and the model building process: functional annotation, a detailed template selection log, target-template alignment, summary of the model building and model quality assessment. The SWISS-MODEL Repository is freely accessible at http://swissmodel.expasy.org/repositor

    The SWISS-MODEL Repository: new features and functionalities

    Get PDF
    The SWISS-MODEL Repository is a database of annotated 3D protein structure models generated by the SWISS-MODEL homology-modelling pipeline. As of September 2005, the repository contained 675 000 models for 604 000 different protein sequences of the UniProt database. Regular updates ensure that the content of the repository reflects the current state of sequence and structure databases, integrating new or modified target sequences, and making use of new template structures. Each Repository entry consists of one or more 3D models accompanied by detailed information about the target protein and the model building process: functional annotation, a detailed template selection log, target-template alignment, summary of the model building and model quality assessment. The SWISS-MODEL Repository is freely accessible at

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft
    • …
    corecore