1,396 research outputs found

    Interrogating and Predicting Tolerated Sequence Diversity in Protein Folds: Application to E. elaterium Trypsin Inhibitor-II Cystine-Knot Miniprotein

    Get PDF
    Cystine-knot miniproteins (knottins) are promising molecular scaffolds for protein engineering applications. Members of the knottin family have multiple loops capable of displaying conformationally constrained polypeptides for molecular recognition. While previous studies have illustrated the potential of engineering knottins with modified loop sequences, a thorough exploration into the tolerated loop lengths and sequence space of a knottin scaffold has not been performed. In this work, we used the Ecballium elaterium trypsin inhibitor II (EETI) as a model member of the knottin family and constructed libraries of EETI loop-substituted variants with diversity in both amino acid sequence and loop length. Using yeast surface display, we isolated properly folded EETI loop-substituted clones and applied sequence analysis tools to assess the tolerated diversity of both amino acid sequence and loop length. In addition, we used covariance analysis to study the relationships between individual positions in the substituted loops, based on the expectation that correlated amino acid substitutions will occur between interacting residue pairs. We then used the results of our sequence and covariance analyses to successfully predict loop sequences that facilitated proper folding of the knottin when substituted into EETI loop 3. The sequence trends we observed in properly folded EETI loop-substituted clones will be useful for guiding future protein engineering efforts with this knottin scaffold. Furthermore, our findings demonstrate that the combination of directed evolution with sequence and covariance analyses can be a powerful tool for rational protein engineering

    The landscape of extreme genomic variation in the highly adaptable Atlantic killifish

    Get PDF
    © The Author(s), 2017. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Genome Biology and Evolution 9 (2017): 659-676, doi:10.1093/gbe/evx023.Understanding and predicting the fate of populations in changing environments require knowledge about the mechanisms that support phenotypic plasticity and the adaptive value and evolutionary fate of genetic variation within populations. Atlantic killifish (Fundulus heteroclitus) exhibit extensive phenotypic plasticity that supports large population sizes in highly fluctuating estuarine environments. Populations have also evolved diverse local adaptations. To yield insights into the genomic variation that supports their adaptability, we sequenced a reference genome and 48 additional whole genomes from a wild population. Evolution of genes associated with cell cycle regulation and apoptosis is accelerated along the killifish lineage, which is likely tied to adaptations for life in highly variable estuarine environments. Genome-wide standing genetic variation, including nucleotide diversity and copy number variation, is extremely high. The highest diversity genes are those associated with immune function and olfaction, whereas genes under greatest evolutionary constraint are those associated with neurological, developmental, and cytoskeletal functions. Reduced genetic variation is detected for tight junction proteins, which in killifish regulate paracellular permeability that supports their extreme physiological flexibility. Low-diversity genes engage in more regulatory interactions than high-diversity genes, consistent with the influence of pleiotropic constraint on molecular evolution. High genetic variation is crucial for continued persistence of species given the pace of contemporary environmental change. Killifish populations harbor among the highest levels of nucleotide diversity yet reported for a vertebrate species, and thus may serve as a useful model system for studying evolutionary potential in variable and changing environments.This work was primarily supported by a grant from the National Science Foundation (collaborative research grants DEB-1265282, DEB-1120512, DEB-1120013, DEB-1120263, DEB-1120333, DEB-1120398 to J.K.C., D.L.C., M.E.H., S.I.K., M.F.O., J.R.S., W.W., and A.W.). Further support was provided by the National Institute of Environmental Health Sciences (1R01ES021934-01 to A.W., P42ES7373 to T.H.H., P42ES007381 to M.E.H., and R01ES019324 to J.R.S.), the National Institute of General Medical Sciences (P20GM103423 and P20GM104318 to B.L.K.), and the National Science Foundation (DBI-0640462 and XSEDE-MCB100147 to D.G.)

    Identification of specificity determining residues in peptide recognition domains using an information theoretic approach applied to large-scale binding maps

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Peptide Recognition Domains (PRDs) are commonly found in signaling proteins. They mediate protein-protein interactions by recognizing and binding short motifs in their ligands. Although a great deal is known about PRDs and their interactions, prediction of PRD specificities remains largely an unsolved problem.</p> <p>Results</p> <p>We present a novel approach to identifying these Specificity Determining Residues (SDRs). Our algorithm generalizes earlier information theoretic approaches to coevolution analysis, to become applicable to this problem. It leverages the growing wealth of binding data between PRDs and large numbers of random peptides, and searches for PRD residues that exhibit strong evolutionary covariation with some positions of the statistical profiles of bound peptides. The calculations involve only information from sequences, and thus can be applied to PRDs without crystal structures. We applied the approach to PDZ, SH3 and kinase domains, and evaluated the results using both residue proximity in co-crystal structures and verified binding specificity maps from mutagenesis studies.</p> <p>Discussion</p> <p>Our predictions were found to be strongly correlated with the physical proximity of residues, demonstrating the ability of our approach to detect physical interactions of the binding partners. Some high-scoring pairs were further confirmed to affect binding specificity using previous experimental results. Combining the covariation results also allowed us to predict binding profiles with higher reliability than two other methods that do not explicitly take residue covariation into account.</p> <p>Conclusions</p> <p>The general applicability of our approach to the three different domain families demonstrated in this paper suggests its potential in predicting binding targets and assisting the exploration of binding mechanisms.</p

    The Landscape of Extreme Genomic Variation in the Highly Adaptable Atlantic Killifish

    Get PDF
    Understanding and predicting the fate of populations in changing environments require knowledge about the mechanisms that support phenotypic plasticity and the adaptive value and evolutionary fate of genetic variation within populations. Atlantic killifish (Fundulus heteroclitus) exhibit extensive phenotypic plasticity that supports large population sizes in highly fluctuating estuarine environments. Populations have also evolved diverse local adaptations. To yield insights into the genomic variation that supports their adaptability, we sequenced a reference genome and 48 additional whole genomes from a wild population. Evolution of genes associated with cell cycle regulation and apoptosis is accelerated along the killifish lineage, which is likely tied to adaptations for life in highly variable estuarine environments. Genome-wide standing genetic variation, including nucleotide diversity and copy number variation, is extremely high. The highest diversity genes are those associated with immune function and olfaction, whereas genes under greatest evolutionary constraint are those associated with neurological, developmental, and cytoskeletal functions. Reduced genetic variation is detected for tight junction proteins, which in killifish regulate paracellular permeability that supports their extreme physiological flexibility. Low-diversity genes engage in more regulatory interactions than high-diversity genes, consistent with the influence of pleiotropic constraint on molecular evolution. High genetic variation is crucial for continued persistence of species given the pace of contemporary environmental change. Killifish populations harbor among the highest levels of nucleotide diversity yet reported for a vertebrate species, and thus may serve as a useful model system for studying evolutionary potential in variable and changing environments

    Proteome-Wide Prediction of Novel DNA/RNA-Binding Proteins Using Amino Acid Composition and Periodicity in the Hyperthermophilic Archaeon Pyrococcus furiosus

    Get PDF
    Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I–III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87% of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method

    Learning from cadherin structures and sequences: affinity determinants and protein architecture

    Get PDF
    Cadherins are a family of cell-surface proteins mediating adhesion that are important in development and maintenance of tissues. The family is defined by the repeating cadherin domain (EC) in their extracellular region, but they are diverse in terms of protein size, architecture and cellular function. The best-understood subfamily is the type I classical cadherins, which are found in vertebrates and have five EC domains. Among the five different type I classical cadherins, the binding interactions are highly specific in their homo- and heterophilic binding affinities, though their sequences are very similar. As previously shown, E- and N-cadherins, two prototypic members of the subfamily, differ in their homophilic K_D by about an order of magnitude, while their heterophilic affinity is intermediate. To examine the source of the binding affinity differences among type I cadherins, we used crystal structures, analytical ultracentrifugation (AUC), surface plasmon resonance (SPR), and electron paramagnetic resonance (EPR) studies. Phylogenetic analysis and binding affinity behavior show that the type I cadherins can be further divided into two subgroups, with E- and N-cadherin representing each. In addition to the affinity differences in their wild-type binding through the strand-swapped interface, a second interface also shows an affinity difference between E- and N-cadherin. This X-dimer interface, which is a weakly binding kinetic intermediate in E-cadherin, has a much stronger affinity in N-cadherin: nearly as strong as N-cadherin wild-type binding. In the swapped and X-dimer interactions of E- and N-cadherin, differences in hydrophobic surface area can mostly account for the affinity difference. However, several mutants of N-cadherin have a K_D an order of magnitude stronger even than the wild-type N-cadherin. In these mutants, the source of the strong affinity seems to be entropic stabilization through an equilibrium between multiple conformations with similar energies. We thus have a molecular-level understanding of vertebrate classical cadherins, with a detailed understanding of their adhesive mechanism and their binding affinity determinants. However, the adhesive mechanisms of cadherins from invertebrates, which are structurally divergent yet function in similar roles, remain unknown. We present crystal structures of the predicted N-terminal region of Drosophila N-cadherin (DN-cadherin). Of the 16 total predicted EC domains, we have crystallized the EC1-3 and EC1-4 segments. While the linker regions for the EC1-EC2 and EC3-EC4 pairs display binding of three Ca^2+ ions similar to that in vertebrate cadherins, domains EC2 and EC3 are joined in a bent orientation by a novel, previously uncharacterized Ca^2+-free linker. Based on sequence analysis of the further ECs of DN-cadherin, we predict another such Ca^2+-free linker between EC7 and EC8. Biophysical analysis demonstrates that a construct containing the first nine predicted EC domains of DN-cadherin forms homodimers with affinity similar to vertebrate classical cadherins. Intriguingly, this segment contains both the crystallized and predicted Ca^2+-free linkers, suggesting a complex binding interface. Sequence analysis of the cadherin family reveals that similar Ca^2+-free linkers are widely distributed in the ectodomains of both vertebrate and invertebrate cadherins. In cases of long cadherins, there are frequently multiple Ca^2+-free linkers in a single protein chain. It thus appears that a combination of calcium-binding and calcium-free linkers can allow cadherins to form three-dimensional arrangements that are more complex than the extended, calcium-rigidified structures in classical cadherins. Discovery of the Ca^2+-free linker, together with the differing numbers and arrangements of ECs and other domain types, implies that the cadherin superfamily is more structurally diverse than previously thought. Because little is known about the function and even less about the structure of the majority of the superfamily, studying the linear architecture (i.e. the precise sequence of ECs and the characteristics of the interdomain linkers) at the scale of the superfamily would give significant new insights on the structure and function of less-understood cadherins. With this motivation, we have constructed a cadherin database with relevant information on two different scales: the protein and the domain. On the whole protein level, we represent the architecture of each cadherin by recording the arrangement of ECs, different linker types, and other (non-EC) domain types in the protein. On the individual EC level, based on the sequence, we record the domain characteristics that give rise to the different structural features at the protein level. We have annotated over 9,600 proteins from 560 organisms, containing over 69,000 ECs; and built an online interface to search and access this information. Our aim is to provide a tool for understanding the protein architecture, function, and relationships among cadherins, a structurally diverse protein family. Together, these studies examine the relationships between sequence, structure and function of cadherins at different scales. In the classical cadherin study, small changes of one or two residues can dramatically alter the dimer conformations and thus lead to large differences in binding affinity between highly related cadherins, or between wild-type and mutant proteins. These seemingly small mutations can result in even higher binding affinity with the effect of entropic stabilization by multiple conformations. In DN-cadherin, the absence of certain calcium-binding motifs in adjacent ECs leads to a new linker type and a new interdomain orientation. This, in turn, has great implications in the global shape, and possibly the binding mechanism of the protein. The cadherin database aims to provide information at different structural levels in order to allow users to draw connections between primary sequence, domain structure and protein architecture, to ultimately learn about protein function

    Local Chromatin Features Including PU.1 and IKAROS Binding and H3K4 Methylation Shape the Repertoire of Immunoglobulin Kappa Genes Chosen for V(D)J Recombination.

    Get PDF
    V(D)J recombination is essential for the generation of diverse antigen receptor (AgR) repertoires. In B cells, immunoglobulin kappa (Igκ) light chain recombination follows immunoglobulin heavy chain (Igh) recombination. We recently developed the DNA-based VDJ-seq assay for the unbiased quantitation of Igh VH and DH repertoires. Integration of VDJ-seq data with genome-wide datasets revealed that two chromatin states at the recombination signal sequence (RSS) of VH genes are highly predictive of recombination in mouse pro-B cells. It is unknown whether local chromatin states contribute to Vκ gene choice during Igκ recombination. Here we adapt VDJ-seq to profile the Igκ VκJκ repertoire and present a comprehensive readout in mouse pre-B cells, revealing highly variable Vκ gene usage. Integration with genome-wide datasets for histone modifications, DNase hypersensitivity, transcription factor binding and germline transcription identified PU.1 binding at the RSS, which was unimportant for Igh, as highly predictive of whether a Vκ gene will recombine or not, suggesting that it plays a binary, all-or-nothing role, priming genes for recombination. Thereafter, the frequency with which these genes recombine was shaped both by the presence and level of enrichment of several other chromatin features, including H3K4 methylation and IKAROS binding. Moreover, in contrast to the Igh locus, the chromatin landscape of the promoter, as well as of the RSS, contributes to Vκ gene recombination. Thus, multiple facets of local chromatin features explain much of the variation in Vκ gene usage. Together, these findings reveal shared and divergent roles for epigenetic features and transcription factors in AgR V(D)J recombination and provide avenues for further investigation of chromatin signatures that may underpin V(D)J-mediated chromosomal translocations

    Regulation of POU transcription factor activity by OBF1 and Sox2

    Get PDF
    For a cell to exert a specialized function certain genes have to be expressed, others repressed. Transcription factors, regulating this expression, do not function alone, but are often part of multi-protein complexes. Regulating a single gene with more than one transcription factor is an efficient way to integrate responses to a variety of signals using a limited number of proteins. DNA binding proteins often interact with each other and with non-DNA binding proteins in a specific arrangement. The assembly of these complexes is often highly cooperative and promotes high levels of transcriptional synergy. The center of my thesis is the family of POU transcription factors. Specifically, I elaborate the interaction within the POU protein family, with members of other transcription factor families and with cofactors. In all cases, the assembly of the correct array of polypeptides on the DNA requires specific protein-protein and protein-DNA interactions. As an example of POU factors interacting with each other and with a cofactor I investigated the properties of a protein-DNA complex with the B-cell-specific cofactor OBF1 and the Oct1 dimer. Depending on the DNA sequence they bind to, Oct1 dimers are arranged in configurations that are either accessible (PORE sequence) or inaccessible (MORE sequence) to OBF1. In Chapter 3 I show that the expression of Osteopontin, which contains a PORE sequence in its enhancer region, depends on the presence of OBF1 in B-cells. OBF1 alleviates DNA sequence requirements of the Oct1 dimer on PORE-related sequences in vitro. Furthermore, OBF1 enhances POU dimer-DNA interactions and overrides Oct1 interface mutations, which abolish PORE-mediated dimerization without OBF1. Based on the biochemical data, I propose a novel Oct1 dimer arrangement when OBF1 is bound. As an example of Oct factors interacting with members of another transcription factor family I studied the interactions of Sox2 with Oct1 and Oct4, respectively. POU and Sox transcription factors exemplify partnerships established between various transcriptional regulators during early embryonic development. The combination of Oct4 and Sox2 on DNA is considered to direct the establishment of the first three lineages in the mammalian embryo. Although functional cooperativity between key regulator proteins is pivotal for milestone decisions in mammalian development, little is known about the underlying molecular mechanisms. The data in Chapter 4 validate experimental high-resolution structure determination, followed by model building. The study shows that Oct4 and Sox2 are able to dimerize on DNA in distinct conformational arrangements. The binding site characteristics of their target genes are responsible for the correct spatial alignment of the Velcro-like interaction domains on their surface. Interestingly, these surfaces frequently have redundant functions and are instrumental in recruiting various interacting protein partners. In Chapter 5 I investigated how Sox2 and Oct4 regulate transcription of a target gene. The first intron of Osteopontin contains a Sox-binding site and a unique PORE to which Oct4 can either bind as a monomer or a dimer. The study reveals that Sox2-specific repression depends on an upstream Sox site and an intact PORE, although neither the Sox nor the PORE sites are negative elements on their own. A mechanism is being proposed how Sox2 represses Oct4-mediated activation of Osteopontin

    Characterising antibody immunity and ageing in a short-lived teleost

    Get PDF
    Ageing individuals exhibit a pervasive decline in adaptive immune function, with important implications for health and lifespan. Systemic changes observed in the structure and diversity of antibody repertoires with age are thought to play an important role in this immunosenescent phenotype; however, the relatively long lifespan of most vertebrate model organisms makes thorough investigation of the ageing repertoire challenging. As a naturally short-lived vertebrate, the turquoise killifish (Nothobranchius furzeri) offers an exciting new opportunity to study the ageing of the adaptive immune system in general and antibody repertoires in particular. In this thesis, I used a combination of existing genomic assemblies and new sequencing data to assemble and characterise the immunoglobulin heavy chain (IGH) locus sequence in the turquoise killifish and compare it to those of closely related species, revealing a history of dynamic locus evolution and repeated duplication and loss of the specialised mucosal isotype IGHZ. The N. furzeri locus itself lacks IGHZ, making it one of the few known teleost species not to possess this isotype. These results support a high rate of evolution in teleost IGH loci and set a strong foundation for the study of comparative evolutionary immunology in cyprionodontiform fishes. Having characterised the IGH locus sequence in N. furzeri, I used it to establish targeted immunoglobulin sequencing in this species, enabling quantitative interrogation of the antibody repertoire. Applying this protocol to whole-body killifish samples revealed complex and individualised antibody repertoires which decline rapidly in within-individual diversity and increase in between-individual variability with age, demonstrating that turquoise killifish exhibit a rapid repertoire-ageing phenotype in line with their short lifespans. This loss of diversity with age was particularly strong in isolated gut samples, a phenomenon that may be related to the constant strong antigenic exposure experienced at mucosal surfaces and has not been previously investigated in a vertebrate model. Taken together, these results establish the turquoise killifish as a novel model for vertebrate immunosenescence and lay the groundwork for future interrogation of -- and intervention in -- adaptive-immune ageing

    The ARF tumor suppressor targets PPM1G/PP2Cγ to counteract NF-κB transcription tuning cell survival and the inflammatory response

    Get PDF
    Inducible transcriptional programs mediate the regulation of key biological processes and organismal functions. Despite their complexity, cells have evolved mechanisms to precisely control gene programs in response to environmental cues to regulate cell fate and maintain normal homeostasis. Upon stimulation with proinflammatory cytokines such as tumor necrosis factor-α (TNF), the master transcriptional regulator nuclear factor (NF)-κB utilizes the PPM1G/PP2Cγ phosphatase as a coactivator to normally induce inflammatory and cell survival programs. However, how PPM1G activity is precisely regulated to control NF-κB transcription magnitude and kinetics remains unknown. Here, we describe a mechanism by which the ARF tumor suppressor binds PPM1G to negatively regulate its coactivator function in the NF-κB circuit thereby promoting insult resolution. ARF becomes stabilized upon binding to PPM1G and forms a ternary protein complex with PPM1G and NF-κB at target gene promoters in a stimulidependent manner to provide tunable control of the NF-κB transcriptional program. Consistently, loss of ARF in colon epithelial cells leads to up-regulation of NF-κB antiapoptotic genes upon TNF stimulation and renders cells partially resistant to TNFinduced apoptosis in the presence of agents blocking the antiapoptotic program. Notably, patient tumor data analysis validates these findings by revealing that loss of ARF strongly correlates with sustained expression of inflammatory and cell survival programs. Collectively, we propose that PPM1G emerges as a therapeutic target in a variety of cancers arising from ARF epigenetic silencing, to loss of ARF function, as well as tumors bearing oncogenic NF-κB activation.Fil: Hyder, Usman. University of Texas; Estados UnidosFil: McCann, Jennifer L.. University of Texas; Estados UnidosFil: Wang, Jinli. University of Texas; Estados UnidosFil: Fung, Victor. University of Texas; Estados UnidosFil: Bayo Fina, Juan Miguel. Universidad Austral. Facultad de Ciencias Biomédicas. Instituto de Investigaciones en Medicina Traslacional. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigaciones en Medicina Traslacional; ArgentinaFil: D'Orso, Iván. University of Texas; Estados Unido
    corecore