20 research outputs found

    Molecular Mechanisms for the Evolution of DNA Specificity in a Transcription Factor Family

    Get PDF
    Transcription factors (TFs) bind to specific DNA sequences near target genes to precisely coordinate their regulation. Despite the central role of transcription factors in development and homeostasis, the mechanisms by which TFs have evolved to bind and regulate distinct DNA sequences are poorly understood. This dissertation details the highly collaborative work to determine the genetic, biochemical and biophysical mechanisms by which distinct DNA-binding specificities evolved in the steroid receptor (SR) family of transcription factors. Using ancestral protein reconstruction, we resurrected and functionally characterized the historical transition in DNA-binding specificity between ancient SR proteins. We found that DNA-binding specificity evolved by changes in the energetic components of binding; interactions at the protein-DNA interface were weakened while inter-protein cooperativity was greatly improved. We identified a group of fourteen historical substitutions that were sufficient to recapitulate the derived protein's binding function. Three of these substitutions, which we defined as function-switching, were sufficient to change DNA specificity; however, their introduction greatly decreased binding affinity and was deleterious for protein function. A group of eleven permissive substitutions, which had no effect on DNA specificity, allowed for the protein to tolerate the deleterious effects of the function-switching substitutions. They non-specifically increased binding affinity by improving interactions at the protein-DNA interface and increasing inter-protein cooperativity. We then dissected the functional role of individual substitutions in both the function-switching and permissive groups. We first determined the binding affinity of all possible combinations of function-switching substitutions for a library of DNA sequences. This allowed for us to functionally characterize the sequence space that separated the ancestral and derived DNA-binding specificities as well as identify the genetic determinants for DNA specificity. Lastly, we dissected the effects of the permissive substitutions on the energetics of DNA binding to determine the mechanisms by which they exerted their permissive effect. Together, this work provides insight into the molecular determinants of DNA specificity and identifies the molecular mechanisms by which these interactions changed during the evolution of novel specificity in an important transcription factor family. This dissertation includes previously published and unpublished co-authored material.2016-01-1

    Tools and annotations for variation

    Get PDF
    Since the finishing of the Human Genome Project, many next-generation (NGS) or high-throughput sequencing platforms have emerged. One of the applications of NGS technology, variant discovery, can serve as a basis for precision medicine. Large sequencing projects are generating huge amounts of genetic variation data, which are stored in databases, either large central databases such as dbSNP, or gene- or disease-centered locus-specific databases (LSDBs). There are many variation databases with many different formats and varying quality. Apart from storage and analysis pipeline capacity problems, the interpretation of the variation is also an issue. Computational methods for predicting the effects of variants have been and are being developed, since experimental assessment of variation effects is often not feasible. Benchmark datasets are needed for the development and for performance assessment of such prediction methods.We studied quality related aspects of variant databases and benchmark datasets. The online tool called VariOtator was developed to aid in the consistent use of the Variation Ontology, which was specifically developed to describe variation. Standardization is one aspect of database quality; the use of an ontology for variant annotation will contribute to the enhancement of it.BTKbase is a locus-specific database containing information on variants in BTK, the gene involved in X-linked agammaglobulinemia (XLA), a primary immunodeficiency. If available, phenotypic data, i.e. the variant effects, are also provided. Statistics on variants and variation types showed that there is a wide spectrum of variants and variation types, and that the distribution of protein variants in the different BTK domains is not even.The VariSNP database containing datasets with neutral (non-pathogenic) variants was generated by selecting variants from dbSNP and filtering for variants found in the ClinVar, PhenCode and SwissProt databases. Variants in these three databases are considered to be disease-related. The VariSNP database contains 13 datasets following the functional classification of dbSNP, and is updated on a regular basis.To study the sensitivity to variation in different protein and disease groups, we predicted the pathogenicity of all possible single amino acid substitutions (SAASs) in all proteins in these groups, using the well-performing prediction method PON P2. Large differences in the proportions of harmful, benign and unknown variants were found, and distinctive patterns of SAAS types were found, both in the original and variant amino acids.Representativeness is one quality aspect of variation benchmark datasets, and relates to the representation of the space of variants and their effects. We studied the coverage and distribution of protein features, including structure (CATH) and enzyme classification (EC), Pfam domains and Gene Ontology terms, in established benchmark datasets. None of the datasets is fully representative. Coverage of the features is in general better in the larger datasets, and better in the neutral datasets. At the higher levels of the CATH and EC classifications, all datasets were unbiased, but for the lower levels and other features, all datasets were biased

    Doctor of Philosophy

    Get PDF
    dissertationChronic myeloid leukemia (CML) is caused by the constitutive kinase activity of the fusion oncoprotein BCR-ABL. Conventional therapy in CML utilizes tyrosine kinase inhibitors (TKIs), small molecules that target the ATP-binding pocket in the BCR-ABL kinase domain. Despite their success in treating this disease, continued use of TKIs can lead to drug resistance due to point mutations in the kinase domain. Additionally, non-specific (non-BCR-ABL) kinase inhibition by these TKIs can cause toxic off-target effects. To function as an aberrant kinase, BCR-ABL must first homo-oligomerize via a coiled-coil (CC) domain located at its N-terminus. Thus, inhibiting BCR-ABL oligomerization abolishes its function as an oncoprotein. Designing an inhibitor of the 72-amino acid BCR-ABL CC domain is the focus of this dissertation. To engineer a construct capable inhibiting oligomerization, strategically designed mutations were incorporated into an isolated BCR-ABL CC domain with the goal of promoting higher affinity binding to endogenous BCR-ABL while at the same time disfavoring binding to our isolated CC construct. The designed construct, called CCmut3, was tested in vitro in leukemia cells containing both wild-type and mutant BCR-ABL. Overall, in vitro treatment with CCmut3 resulted in a decrease in BCR-ABL kinase activity, induction of apoptosis, and a reduction in the proliferation and transformative ability of CML cells. Next, combining CCmut3 with ponatinib, a recently approved BCR-ABL TKI, was also explored. This combination resulted in improved BCR-ABL inhibition and a lowering of the dose of ponatinib necessary for efficacy. Finally, the later chapters in this dissertation focus on possible methods in which the deliverability and stability of CCmut3 can be improved. Truncation and helical capping were both attempted, however, neither provided an inhibitory advantage over the full-length CCmut3 construct. Thus, current designs are focusing on creating a hydrocarbon stapled and truncated CCmut3 peptide, expected to result in a translatable product for wild-type and therapy-resistant CML

    PON-SC - program for identifying steric clashes caused by amino acid substitutions

    No full text
    Background: Amino acid substitutions due to DNA nucleotide replacements are frequently disease-causing because of affecting functionally important sites. If the substituting amino acid does not fit into the protein, it causes structural alterations that are often harmful. Clashes of amino acids cause local or global structural changes. Testing structural compatibility of variations has been difficult due to the lack of a dedicated method that could handle vast amounts of variation data produced by next generation sequencing technologies. Results: We developed a method, PON-SC, for detecting protein structural clashes due to amino acid substitutions. The method utilizes side chain rotamer library and tests whether any of the common rotamers can be fitted into the protein structure. The tool was tested both with variants that cause and do not cause clashes and found to have accuracy of 0.71 over five test datasets. Conclusions: We developed a fast method for residue side chain clash detection. The method provides in addition to the prediction also visualization of the variant in three dimensional structure

    Structural studies in DNA

    Get PDF

    Computational Investigations of Backbone Dynamics in Intrinsically Disordered Proteins

    Get PDF
    Intrinsically disordered proteins (IDPs), due to their dynamic nature, play important roles in molecular recognition, signalling, regulation, or binding of nucleic acids. IDPs have been extensively studied computationally in terms of binary disorder/order classification. This approach has proven to be fruitful and enabled researchers to estimate the amount of disorder in prokaryotic and eukaryotic genomes. Other computational methods – molecular dynamics, or other simulation techniques, require a starting structure. However, there are no approaches permitting insight into the behaviour of disordered ensembles from sequence alone. Such a method would facilitate the study of proteins of unknown structures, help to obtain a better classification of the disordered regions, and the design disorder-to-order transitions. In this work, I develop FRAGFOLD-IDP, a method to address this issue. Using a fragment-based structure prediction approach – FRAGFOLD, I generate the ensembles of IDPs and show that the features extracted from them correspond well with the backbone dynamics of NMR ensembles deposited in the PDB. FRAGFOLD-IDP predictions significantly improve over a naïve approach and help to get a better insight into the dynamics of the disordered ensembles. The results also show it is not necessary to predict the correct fold of the protein to reliably assign per-residue fluctuations to the sequence in question. This suggests that disorder is a local property and it does not depend on the protein fold. Next, I validate FRAGFOLD-IDP on the disorder classification task and show that the method performs comparably to machine learning-based approaches designed specifically for this task. I also found that FRAGFOLD-IDP produces results on par with DynaMine, a machine learning approach to predict the NMR order parameters and that the results of both methods are not correlated. Thus, I constructed a consensus neural network predictor, which takes the results of FRAGFOLD-IDP, DynaMine and physicochemical features to predict per-residue fluctuations, improving upon both input methods

    Structural and functional characterization of eIF4E1 and eIF4E2 complexes involved in translational control

    Get PDF
    Protein synthesis is one of the costliest processes in the cell. Therefore, the initiation of translation is a tightly regulated process. One major control mechanism targets the activity or formation of the so-called eIF4F (eukaryotic initiation factor 4F) complex bound to the 5’ cap structure of an mRNA. This heterotrimeric complex, consisting of the RNA helicase eIF4A, the cap-binding protein eIF4E and the scaffold subunit eIF4G, is ultimately required for the recruitment of the 43S PIC (pre-initiation complex) to the mRNA, leading to subsequent scanning and initiation. The formation of the eIF4F complex is under the control of a group of inhibitory proteins known as eIF4E-binding proteins (4E-BPs), which bind to eIF4E and prevent its interaction with eIF4G. 4E-BPs comprise a group of functionally distinct proteins and include global translational repressors such as the three human proteins 4E-BP1-3, or large, multidomain proteins that likely act on an mRNA-specific level. Alternatively, the assembly of the eIF4F complex can be prevented by the eIF4E-homologous protein (4EHP or eIF4E2), which competes with eIF4E in binding to the 5’cap structure of an mRNA. Compared to the global repression by 4E-BPs, the later mechanism only acts on a message specific level. Comprehensive molecular insight into eIF4E- and 4EHP-complexes involved in the regulation of translation initiation was lacking. My doctoral work provides a fundamental structural and mechanistic understanding of the formation of these regulatory complexes. In my initial studies, I characterized the binding of various 4E-BPs to eIF4E and provided the first structural insights into an extended eIF4E-binding mode of different 4E-BPs. The structures revealed a conserved mode of interaction with eIF4E, despite the lack of sequence conservation. Additionally, in a collaborative project, I observed that the eIF4E-binding mode characteristic of 4E-BP complexes is also present in eIF4E-eIF4G complexes, expanding the knowledge on the mechanism of translation initiation and its regulation. Another part of my doctoral studies focused on 4E-BPs very specific functions and architecture. Specifically, I investigated the binding mode of an invertebrate-specific 4E-BP called Mextli. My studies unveiled an unexpected variation and evolutionary plasticity in the eIF4E-binding mode of Mextli homologs across species, which confer distinct functional properties to the respective eIF4E-complexes. I also studied 4EHP, the second member of the eIF4E protein family, and its specific interaction partners, the Grb10-interacting GYF domain-containing (GIGYF) proteins 1 and 2, and obtained the first crystal structures of theses 4EHP-specific binding partners bound to 4EHP. The molecular details of the 4EHP-GIGYF translational repressor complex explain why GIGYF proteins bind to 4EHP and not to eIF4E. Overall, my doctoral studies revealed new insights on eIF4E-related complexes and their diverse roles in posttranscriptional gene regulation
    corecore