15 research outputs found

    ALGORITHMES ET OUTILS INFORMATIQUES POUR L'ETUDE DES PROTEINES INTRINSEQUEMENT DESORDONNEES

    No full text
    National audienceIntrinsically Disordered Proteins (IDPs) are involved in many biological processes. Their inherent plasticity facilitates very specialized tasks in cell regulation and signalling, and their malfunction is linked to severe pathologies. Understanding the functional roles of IDPs requires their structural characterization, which is extremely challenging, and needs a tight coupling of experimental and computational methods. In contrast to structured/globular proteins, IDPs cannot be represented by a single conformation, and their models must be based on ensembles of conformations representing a distribution of states that the protein adopts in solution. While purely random coil ensembles can be reliably constructed by available bioinformatics tools, these tools fail to reproduce the conformational equilibrium present in partially-structured regions.In this thesis, we propose several computational methods that, combined with experimental data, provide a better structural characterization of IDPs. These methods can be grouped in two main categories: methods to construct conformational ensemble models, and methods to simulate conformational transitions.Contributing to the first type of methods, we propose a new approach to generate realistic conformational ensembles that improves previously existing methods, being able to reproduce the partially-structured regions in IDPs. This method exploits structural information encoded in a database of three-residue fragments (tripeptides) extracted from high-resolution experimentally-solved protein structures. We have shown that conformational ensembles generated by our method reproduce accurately structural descriptors obtained from NMR and SAXS experiments for a benchmark set of nine IDPs. Also exploiting the tripeptide database, we have developed an algorithm to predict the propensity of some fragments inside IDPs to form secondary structure elements. This new method provides more accurate results than those of the most commonly-used predictors available on our benchmark set of well-characterized IDPs.Contributing to the second type of methods, we have developed an original approach to model the folding mechanism of secondary structural elements. The computation of conformational transitions is formulated as a discrete path search problem using the tripeptide database. To evaluate the approach, we have applied the strategy to two small synthetic polypeptides mimicking two common structural motifs in proteins. The folding mechanisms extracted are very similar to those obtained when using traditional, computationally expensive approaches. Finally, we have developed a more general method to compute transition paths between a (possibly large) set of conformations of an IDP. This method builds on a multi-tree variant of the TRRT algorithm, developed at LAAS-CNRS, and which provided good results for small and middle-sized biomolecules. In order to apply this method to IDPs, we have proposed a hybrid strategy for the parallelization of the algorithm, enabling an efficient e! xecution in computer clusters.In addition to the aforementioned methodological work, I have been actively involved in multidisciplinary work, together with biophysicists and biologists, where I have applied these methods to the investigation of important biological systems, in particular the huntingtin protein, the causative agent of Huntington’s disease.In conclusion, the work carried out during my PhD thesis has enabled a better understanding of the relationship between sequence and structural properties of IDPs, paving the way to novel applications. For example, this deeper understanding of sequence-structure relationships will enable us to anticipate structural perturbations exerted by sequence mutations, and subsequently, the rational design of IDPs with tailored properties for biotechnological applications.Les protéines intrinsèquement désordonnées (IDPs, acronyme en anglais de "Intrinsically Disordered Proteins") sont des essentielles dans des nombreux processus biologiques. Leur plasticité inhérente facilite des tâches spécialisées, complémentaire à celles des protéines globulaires, dans la régulation et dans la signalisation cellulaire, et leur dysfonctionnement est lié à des pathologies sévères. Comprendre leur rôle fonctionnel exige de caractériser la structure des IDPs et des complexes qu'elles forment. Modéliser les IDPs est extrêmement difficile et exige un couplage étroit des méthodes expérimentales et informatiques. Contrairement aux protéines structurées/globulaires, les IDPs ne peuvent pas être représentées par une seule conformation, et leurs modèles doivent être basés sur des ensembles de conformations qui représentent une distribution des états que la protéine adopte en solution. Il existe de multiples outils bioinformatique! s qui permettent d'identifier à priori les éléments partiellement structurés au sein des IDPs. Cependant, les caractéristiques structurelles détectées par ces programmes dépendent fortement de la méthodologie utilisée, et les différentes méthodes produisent souvent des résultats contradictoires. Alors que des ensembles purement composés par "random coil" peuvent être construits de manière fiable, l'équilibre conformationnel présent dans les régions partiellement structurées est mal reproduit.Dans cette thèse, nous proposons plusieurs méthodes de calcul qui, combinées à des données expérimentales, permettent une meilleure caractérisation structurelle des IDPs. Ces méthodes peuvent être regroupées en deux grandes catégories : les méthodes de construction de modèles d'ensembles conformationnels et les méthodes de simulation des transitions conformationnelles.Contribuant au premier type de méthodes, nous présentons une nouvelle approche pour générer des ensembles conformationnels réalistes, qui améliore les approches existantes, et permet de reproduire les régions partiellement structurées des IDPs. Cette méthode exploite les informations structurelles codées dans une base de données de fragments de trois résidus (tripeptides) extraits de structures protéiques de haute résolution obtenues expérimentalement. Nous avons montré que les ensembles conformationnels générés par notre méthode reproduisent fidèlement les descripteurs structurels obtenus à partir d'expériences RMN et SAXS. En tant que composante nécessaire de l'algorithme de construction d'ensemble, nous avons développé un algorithme pour prédire la propension de certains fragments à l'intérieur des IDPs à former des éléments de structure secondaire. Cette nouvelle méthode, qui exploite également la base de données de tripeptides, fournit! des résultats plus précis que ceux des prédicteurs les plus couramment utilisés sur plusieurs IDPs bien caractérisées. Bien que le prédicteur structurel ait été principalement développé pour compléter notre méthode de modélisation d'ensembles, il peut également être très utile comme outil indépendant.Contribuant au deuxième type de méthodes, nous avons développé une approche originale pour modéliser le mécanisme de repliement des éléments structuraux secondaires. Le calcul des transitions conformationnelles menant à la formation des éléments structuraux est formulé comme un problème de recherche de chemin discret à l'aide de la base de données de tripeptides. Pour évaluer l'approche, nous avons appliqué la stratégie à deux petits polypeptides synthétiques imitant deux motifs structurels communs dans les protéines. Les mécanismes de repliement extraits sont très similaires à ceux obtenus en utilisant des approches traditionnelles et coûteuses en calcul. Enfin, nous avons mis au point une méthode plus générale pour calculer les chemins de transition entre un ensemble (éventuellement important) de conformations d'IDPs. Cette méthode s'appuie sur une variante multi-arbre de l'algorithme Transition-based Rapidly-exploring Random Tree (Multi-TRRT)! , récemment développé au LAAS-CNRS, et qui a donné de bons résultats pour les biomolécules de petites et moyennes tailles. Afin d'appliquer cette méthode aux IDPs, nous avons proposé une stratégie hybride pour la parallélisation de l'algorithme, permettant une exécution efficace dans les clusters de calcul.Outre le travail méthodologique susmentionné, j'ai également participé activement à des travaux multidisciplinaires, en collaboration avec des biophysiciens et des biologistes, où j'ai appliqué ces méthodes à l'étude de systèmes biologiques importants, en particulier la protéine huntingtin, l'agent responsable de la maladie de Huntington.En conclusion, les travaux menés dans le cadre de cette thèse de doctorat ont permis de mieux comprendre la relation entre la séquence et la structure des IDPs, ouvrant la voie à de nouvelles applications. Grâce à cette compréhension plus approfondie des relations séquence-structure il sera possible d'anticiper les perturbations structurelles exercées par les mutations dans la séquence, ainsi que la conception rationnelle des IDPs ayant des propriétés spécifiques pour des applications dans les biotechnologies

    Investigating the Formation of Structural Elements in Proteins Using Local Sequence-Dependent Information and a Heuristic Search Algorithm

    No full text
    International audienceStructural elements inserted in proteins are essential to define folding/unfolding 1 mechanisms and partner recognition events governing signaling processes in living organisms. 2 Here, we present an original approach to model the folding mechanism of these structural elements. 3 Our approach is based on the exploitation of local, sequence-dependent structural information 4 encoded in a database of three-residue fragments extracted from a large set of high-resolution 5 experimentally determined protein structures. The computation of conformational transitions leading 6 to the formation of the structural elements is formulated as a discrete path search problem using this 7 database. To solve this problem, we propose a heuristically-guided depth-first search algorithm. The 8 domain-dependent heuristic function aims at minimizing the length of the path in terms of angular 9 distances, while maximizing the local density of the intermediate states, which is related to their 10 probability of existence. We have applied the strategy to two small synthetic polypeptides mimicking 11 two common structural motifs in proteins. The folding mechanisms extracted are very similar to 12 those obtained when using traditional, computationally expensive approaches. These results show 13 that the proposed approach, thanks to its simplicity and computational efficiency, is a promising 14 research direction. 1

    Realistic Ensemble Models of Intrinsically Disordered Proteins Using a Structure-Encoding Coil Database

    No full text
    International audienceIntrinsically Disordered Proteins (IDPs) play fundamental roles in signaling, regulation and cell homeostasis by specifically interacting with their partners. The structural characterization of these interacting regions remains challenging and requires the integration of extensive experimental information. Here we present an approach that exploits the structural information encoded in tripeptide fragments from coil regions of high-resolution structures. Our results indicate that a simple building approach that disregards the sequence context provides a good structural representation of fully disordered regions. Conversely, the description of partially structured motifs calls for the consideration of sequence-dependent structural preferences. By using NMR Residual Dipolar Couplings and SAXS data for multiple IDPs we demonstrate that the appropriate combination of these two building strategies produces ensemble models that correctly describe the secondary structural classes and the population of partially structured regions. This study paves the way for the extension of structure prediction and protein design to disordered proteins

    Structural Characterization of Highly Flexible Proteins by Small-Angle Scattering.

    No full text
    International audienceIntrinsically Disordered Proteins (IDPs) are fundamental actors of biological processes. Their inherent plasticity facilitates very specialized tasks in cell regulation and signalling, and their malfunction is linked to severe pathologies. Understanding the functional role of disorder requires the structural characterization of IDPs and the complexes they form. Small-angle Scattering of X-rays (SAXS) and Neutrons (SANS) have notably contributed to this structural understanding. In this review we summarize the most relevant developments in the field of SAS studies of disordered proteins. Emphasis is given to ensemble methods and how SAS data can be combined with computational approaches or other biophysical information such as NMR. The unique capabilities of SAS enable its application to extremely challenging disordered systems such as low-complexity regions, amyloidogenic proteins and transient biomolecular complexes. This reinforces the fundamental role of SAS in the structural and dynamic characterization of this elusive family of proteins

    Interdomain linkers tailor the stability of immunoglobulin repeats in polyproteins

    No full text
    International audienceLinkers in polyproteins are considered as mere spacers between two adjacent domains. However, a series of studies using single-molecule force spectroscopy have recently reported distinct thermodynamic stability of I27 in polyproteins with varying linkers and indicated the vital role of linkers in domain stability. A flexible glycine rich linker (-(GGG)n, n≥3) featured unfolding at lower forces than regularly used arg-ser (RS) based linker. Interdomain interactions among I27 domains in Gly-rich linkers were suggested to lead to reduced domain stability. However, the negative impact of inter domain interactions on domain stability is thermodynamically counter-intuitive and demanded thorough investigations. Here, using an array of ensemble equilibrium experiments and in-silico measurements with I27 singlet and doublets with two aforementioned linkers, we delineate that the inter-domain interactions in fact raise the stability of the polyprotein with RS linker. More surprisingly, a highly flexible Gly-rich linker has no interference on the stability of polyprotein. Overall, we conclude that flexible linkers are preferred in a polyprotein for maintaining domain's independence

    Access to atomic resolution structural information of homo-repeats by NMR: The huntingtin case

    No full text
    Trabajo presentado al 257th National Meeting of the American-Chemical-Society (ACS), celebrado del 31 de marzo al 4 de abril de 2019.Peer reviewe

    Evidence of the reduced abundance of proline cis conformation in protein poly-proline tracts

    Get PDF
    International audienceProline is found in a cis conformation in proteins more often than other proteinogenic amino acids, where it influences structure and modulates function, being the focus of several high-resolution structural studies. However, until now, technical and methodological limitations have hampered the site-specific investigation of the conformational preferences of prolines present in poly-proline (poly-P) homo-repeats in their protein context. Here, we apply site-specific isotopic labeling to obtain high-resolution NMR data on the cis/trans equilibrium of prolines within the poly-P repeats of huntingtin exon 1, the causative agent of Huntington's disease. Screening prolines in different positions in long (poly-P 11) and short (poly-P 3) poly-P tracts, we found that while the first proline of poly-P tracts adopts similar levels of cis conformation as isolated prolines, a length-dependent reduced abundance of cis conformers is observed for terminal prolines. Interestingly, the cis isomer could not be detected in inner prolines, in line with percentages derived form a large database of proline-centered tripeptides extracted from crystallographic structures. These results suggest a strong cooperative effect within poly-Ps that enhances their stiffness by diminishing the stability of the cis conformation. This rigidity is key to rationalize the protection towards aggregation that the poly-P tract confers to huntingtin. Furthermore, the study provides new avenues to probe the structural properties of poly-P tracts in protein design as scaffolds or nanoscale rulers

    Multi-site-specific isotopic labeling accelerates high-resolution structural investigations of pathogenic huntingtin exon-1

    No full text
    Huntington's disease neurodegeneration occurs when the number of consecutive glutamines in the huntingtin exon-1 (HTTExon1) exceeds a pathological threshold of 35. The sequence homogeneity of HTTExon1 reduces the signal dispersion in NMR spectra, hampering its structural characterization. By simultaneously introducing three isotopically labeled glutamines in a site-specific manner in multiple concatenated samples, 18 glutamines of a pathogenic HTTExon1 with 36 glutamines were unambiguously assigned. Chemical shift analyses indicate the α-helical persistence in the homorepeat and the absence of an emerging toxic conformation around the pathological threshold. Using the same type of samples, the recognition mechanism of Hsc70 molecular chaperone has been investigated, indicating that it binds to the N17 region of HTTExon1, inducing the partial unfolding of the poly-Q. The proposed strategy facilitates high-resolution structural and functional studies in low-complexity regions

    Predicting secondary structure propensities in IDPs using simple statistics from three-residue fragments

    No full text
    International audienceIntrinsically Disordered Proteins (IDPs) play key functional roles facilitated by their inherent plasticity. In most of the cases, IDPs recognize their partners through partially-structured elements inserted in fully-disordered chains. The identification and characterization of these elements is fundamental to understand the functional mechanisms of IDPs. Although several computational methods have been developed to identify order within disordered chains, most of the current secondary structure predictors are focused on globular proteins and are not necessarily appropriate for IDPs. Here, we present a comprehen-sible method, called Local Structural Propensity Predictor (LS2P), to predict secondary structure elements from IDP sequences. LS2P performs statistical analyses from a database of three-residue fragments extracted from coil regions of high-resolution protein structures. In addition to identifying scarcely populated helical and extended regions, the method pinpoints short stretches triggering β-turn formation or promoting α-helices. The simplicity of the method enables a direct connection between experimental observations and structural features encoded in IDP sequences

    The structure of pathogenic huntingtin exon 1 defines the bases of its aggregation propensity

    Get PDF
    Huntington’s Disease is a neurodegenerative disorder caused by a CAG expansion in the first exon of the HTT gene, resulting in an extended poly-glutamine (poly-Q) tract in huntingtin (httex1). The structural changes occurring to the poly-Q when increasing its length remain poorly understood due to its intrinsic flexibility and the strong compositional bias. The systematic application of site-specific isotopic labeling has enabled residue-specific NMR investigations of the poly-Q tract of pathogenic httex1 variants with 46 and 66 consecutive glutamines. Integrative data analysis reveals that the poly-Q tract adopts long α-helical conformations propagated and stabilized by glutamine side chain to backbone hydrogen bonds. We show that α-helical stability is a stronger signature in defining aggregation kinetics and the structure of the resulting fibrils than the number of glutamines. Our observations provide a structural perspective of the pathogenicity of expanded httex1 and pave the way to a deeper understanding of poly-Q-related diseases
    corecore