20 research outputs found

    Statistical Analysis of Protein Sequences: A Coevolutionary Study of Molecular Chaperones

    Get PDF
    Recent advances in DNA sequencing technologies led to the accumulation of enormous quantities of genetic information available in public databases. This rapid growth of available biological datasets calls for quantitative analysis tools and concomitantly opens the doors for new analysis paradigms. Particularly, the analysis of correlated mutations and their structural interpretation have witnessed a second youth in the last years. A natural formulation for such approaches is provided by the statistical physics of disordered systems. This thesis is articulated around different projects aimed at studying particular biological systems of interests, the Hsp70 molecular chaperones, through the lens provided by methods rooted in statistical physics. In a first project, we focus on correlated mutations within the Hsp70 family. Our analysis reveals the existence of a biologically important macro-molecular arrangement of these chaperones and we investigate its phylogenetic origin. A second project investigates the interactions between the Hsp70 chaperones and one of their main co-chaperones, J-proteins. Through the combined use of coevolutionary analysis and molecular simulations at both coarse-grained and atomistic levels, we construct a structural and dynamical model of this interaction which rationalizes previous experimental evidence. In a subsequent study, we specifically focus on the J-protein co-chaperones. Through phylogenetic and coevolutionary methods, we investigate the origin of recently discovered interactions which form the basis of the disaggregation machinery in higher eukaryotes. Finally, in a fourth project, we shift our attention to the analysis of proteins involved in the iron-sulfur cluster assembly pathway. Analysis of residue coevolution in the different proteins composing this pathway reveals multiple structural insights at several scales

    Addressing Conditioning Data in Multiple-Point Statistics Simulation Algorithms Based on a Multiple Grid Approach

    Get PDF
    Multiple-point statistics (MPS) allows simulations reproducing structures of a conceptual model given by a training image (TI) to be generated within a stochastic framework. In classical implementations, fixed search templates are used to retrieve the patterns from the TI. A multiple grid approach allows the large-scale structures present in the TI to be captured, while keeping the search template small. The technique consists in decomposing the simulation grid into several grid levels: One grid level is composed of each second node of the grid level one rank finer. Then each grid level is successively simulated by using the corresponding rescaled search template from the coarse level to the fine level (the simulation grid itself). For a conditional simulation, a basic method (as in snesim) to honor the hard data consists in assigning the data to the closest nodes of the current grid level before simulating it. In this paper, another method (implemented in impala) that consists in assigning the hard data to the closest nodes of the simulation grid (fine level), and then in spreading them up to the coarse grid by using simulations based on the MPS inferred from the TI is presented in detail. We study the effect of conditioning and show that the first method leads to systematic biases depending on the location of the conditioning data relative to the grid levels, whereas the second method allows for properly dealing with conditional simulations and a multiple grid approach

    The architecture of EMC reveals a path for membrane protein insertion

    Get PDF
    Approximately 25% of eukaryotic genes code for integral membrane proteins that are assembled at the endoplasmic reticulum. An abundant and widely conserved multi-protein complex termed EMC has been implicated in membrane protein biogenesis, but its mechanism of action is poorly understood. Here, we define the composition and architecture of human EMC using biochemical assays, crystallography of individual subunits, site-specific photocrosslinking, and cryo-EM reconstruction. Our results suggest that EMC’s cytosolic domain contains a large, moderately hydrophobic vestibule that can bind a substrate’s transmembrane domain (TMD). The cytosolic vestibule leads into a lumenally-sealed, lipid-exposed intramembrane groove large enough to accommodate a single substrate TMD. A gap between the cytosolic vestibule and intramembrane groove provides a potential path for substrate egress from EMC. These findings suggest how EMC facilitates energy-independent membrane insertion of TMDs, explain why only short lumenal domains are translocated by EMC, and constrain models of EMC’s proposed chaperone function

    The architecture of EMC reveals a path for membrane protein insertion

    Get PDF
    Approximately 25% of eukaryotic genes code for integral membrane proteins that are assembled at the endoplasmic reticulum. An abundant and widely conserved multi-protein complex termed EMC has been implicated in membrane protein biogenesis, but its mechanism of action is poorly understood. Here, we define the composition and architecture of human EMC using biochemical assays, crystallography of individual subunits, site-specific photocrosslinking, and cryo-EM reconstruction. Our results suggest that EMC’s cytosolic domain contains a large, moderately hydrophobic vestibule that can bind a substrate’s transmembrane domain (TMD). The cytosolic vestibule leads into a lumenally-sealed, lipid-exposed intramembrane groove large enough to accommodate a single substrate TMD. A gap between the cytosolic vestibule and intramembrane groove provides a potential path for substrate egress from EMC. These findings suggest how EMC facilitates energy-independent membrane insertion of TMDs, explain why only short lumenal domains are translocated by EMC, and constrain models of EMC’s proposed chaperone function

    Combinatorial expression of GPCR isoforms affects signalling and drug responses

    Get PDF
    G-protein-coupled receptors (GPCRs) are membrane proteins that modulate physiology across human tissues in response to extracellular signals. GPCR-mediated signalling can differ because of changes in the sequence1,2 or expression3 of the receptors, leading to signalling bias when comparing diverse physiological systems4. An underexplored source of such bias is the generation of functionally diverse GPCR isoforms with different patterns of expression across different tissues. Here we integrate data from human tissue-level transcriptomes, GPCR sequences and structures, proteomics, single-cell transcriptomics, population-wide genetic association studies and pharmacological experiments. We show how a single GPCR gene can diversify into several isoforms with distinct signalling properties, and how unique isoform combinations expressed in different tissues can generate distinct signalling states. Depending on their structural changes and expression patterns, some of the detected isoforms may influence cellular responses to drugs and represent new targets for developing drugs with improved tissue selectivity. Our findings highlight the need to move from a canonical to a context-specific view of GPCR signalling that considers how combinatorial expression of isoforms in a particular cell type, tissue or organism collectively influences receptor signalling and drug responses

    Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting

    No full text
    International audienceExtracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data

    Addressing Conditioning Data in Multiple-Point Statistics Simulation Algorithms Based on a Multiple Grid Approach

    No full text
    Multiple-point statistics (MPS) allows simulations reproducing structures of a conceptual model given by a training image (TI) to be generated within a stochastic framework. In classical implementations, fixed search templates are used to retrieve the patterns from the TI. A multiple grid approach allows the large-scale structures present in the TI to be captured, while keeping the search template small. The technique consists in decomposing the simulation grid into several grid levels: One grid level is composed of each second node of the grid level one rank finer. Then each grid level is successively simulated by using the corresponding rescaled search template from the coarse level to the fine level (the simulation grid itself). For a conditional simulation, a basic method (as in snesim) to honor the hard data consists in assigning the data to the closest nodes of the current grid level before simulating it. In this paper, another method (implemented in impala) that consists in assigning the hard data to the closest nodes of the simulation grid (fine level), and then in spreading them up to the coarse grid by using simulations based on the MPS inferred from the TI is presented in detail. We study the effect of conditioning and show that the first method leads to systematic biases depending on the location of the conditioning data relative to the grid levels, whereas the second method allows for properly dealing with conditional simulations and a multiple grid approach
    corecore