474 research outputs found

    Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone

    Get PDF
    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.We thank F. Abascal and M. L. Tress for helpful discussions. This work was supported by Spanish Ministry of Economy and Competitiveness Projects BFU2015-71241-R and BIO2012-40205, cofunded by the European Regional Development Fund.S

    Analysis of several key factors influencing deep learning-based inter-residue contact prediction

    Get PDF
    Motivation: Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated. Results: We analyzed the results of our three deep learning-based contact prediction methods (MULTICOMCLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction. Availability and implementation: https://github.com/multicom-toolbox/DNCON2/

    Inverse Statistical Physics of Protein Sequences: A Key Issues Review

    Full text link
    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e.~evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.Comment: 18 pages, 7 figure

    Allosteric signalling in the outer membrane translocation domain of PapC usher

    Get PDF
    PapC ushers are outer-membrane proteins enabling assembly and secretion of P pili in uropathogenic E. coli. Their translocation domain is a large β-barrel occluded by a plug domain, which is displaced to allow the translocation of pilus subunits across the membrane. Previous studies suggested that this gating mechanism is controlled by a β-hairpin and an α-helix. To investigate the role of these elements in allosteric signal communication we developed a method combining evolutionary and molecular dynamics studies of the native translocation domain and mutants lacking the β-hairpin and/or α-helix. Analysis of a hybrid residue interaction network suggests distinct regions (residue 'communities') within the translocation domain (especially around β12-β14) linking these elements, thereby modulating PapC gating. Antibiotic sensitivity and electrophysiology experiments on a set of alanine-substitution mutants confirmed functional roles for four of these communities. This study illuminates the gating mechanism of PapC ushers and its importance in maintaining outer-membrane permeability

    Statistical Analysis of Protein Sequences: A Coevolutionary Study of Molecular Chaperones

    Get PDF
    Recent advances in DNA sequencing technologies led to the accumulation of enormous quantities of genetic information available in public databases. This rapid growth of available biological datasets calls for quantitative analysis tools and concomitantly opens the doors for new analysis paradigms. Particularly, the analysis of correlated mutations and their structural interpretation have witnessed a second youth in the last years. A natural formulation for such approaches is provided by the statistical physics of disordered systems. This thesis is articulated around different projects aimed at studying particular biological systems of interests, the Hsp70 molecular chaperones, through the lens provided by methods rooted in statistical physics. In a first project, we focus on correlated mutations within the Hsp70 family. Our analysis reveals the existence of a biologically important macro-molecular arrangement of these chaperones and we investigate its phylogenetic origin. A second project investigates the interactions between the Hsp70 chaperones and one of their main co-chaperones, J-proteins. Through the combined use of coevolutionary analysis and molecular simulations at both coarse-grained and atomistic levels, we construct a structural and dynamical model of this interaction which rationalizes previous experimental evidence. In a subsequent study, we specifically focus on the J-protein co-chaperones. Through phylogenetic and coevolutionary methods, we investigate the origin of recently discovered interactions which form the basis of the disaggregation machinery in higher eukaryotes. Finally, in a fourth project, we shift our attention to the analysis of proteins involved in the iron-sulfur cluster assembly pathway. Analysis of residue coevolution in the different proteins composing this pathway reveals multiple structural insights at several scales
    corecore