10 research outputs found

    Sequence specificity despite intrinsic disorder: How a disease-associated Val/Met polymorphism rearranges tertiary interactions in a long disordered protein

    Get PDF
    The role of electrostatic interactions and mutations that change charge states in intrinsically disordered proteins (IDPs) is well-established, but many disease-associated mutations in IDPs are charge-neutral. The Val66Met single nucleotide polymorphism (SNP) in precursor brain-derived neurotrophic factor (BDNF) is one of the earliest SNPs to be associated with neuropsychiatric disorders, and the underlying molecular mechanism is unknown. Here we report on over 250 μs of fully-atomistic, explicit solvent, temperature replica-exchange molecular dynamics (MD) simulations of the 91 residue BDNF prodomain, for both the V66 and M66 sequence. The simulations were able to correctly reproduce the location of both local and non-local secondary structure changes due to the Val66Met mutation, when compared with NMR spectroscopy. We find that the change in local structure is mediated via entropic and sequence specific effects. We developed a hierarchical sequence-based framework for analysis and conceptualization, which first identifies blobs of 4-15 residues representing local globular regions or linkers. We use this framework within a novel test for enrichment of higher-order (tertiary) structure in disordered proteins; the size and shape of each blob is extracted from MD simulation of the real protein (RP), and used to parameterize a self-avoiding heterogenous polymer (SAHP). The SAHP version of the BDNF prodomain suggested a protein segmented into three regions, with a central long, highly disordered polyampholyte linker separating two globular regions. This effective segmentation was also observed in full simulations of the RP, but the Val66Met substitution significantly increased interactions across the linker, as well as the number of participating residues. The Val66Met substitution replaces β-bridging between V66 and V94 (on either side of the linker) with specific side-chain interactions between M66 and M95. The protein backbone in the vicinity of M95 is then free to form β-bridges with residues 31-41 near the N-terminus, which condenses the protein. A significant role for Met/Met interactions is consistent with previously-observed non-local effects of the Val66Met SNP, as well as established interactions between the Met66 sequence and a Met-rich receptor that initiates neuronal growth cone retraction

    A global high-density chromatin interaction network reveals functional long-range and trans-chromosomal relationships

    Get PDF
    BACKGROUND: Chromatin contacts are essential for gene-expression regulation; however, obtaining a high-resolution genome-wide chromatin contact map is still prohibitively expensive owing to large genome sizes and the quadratic scale of pairwise data. Chromosome conformation capture (3C)-based methods such as Hi-C have been extensively used to obtain chromatin contacts. However, since the sparsity of these maps increases with an increase in genomic distance between contacts, long-range or trans-chromatin contacts are especially challenging to sample. RESULTS: Here, we create a high-density reference genome-wide chromatin contact map using a meta-analytic approach. We integrate 3600 human, 6700 mouse, and 500 fly Hi-C experiments to create species-specific meta-Hi-C chromatin contact maps with 304 billion, 193 billion, and 19 billion contacts in respective species. We validate that meta-Hi-C contact maps are uniquely powered to capture functional chromatin contacts in both cis and trans. We find that while individual dataset Hi-C networks are largely unable to predict any long-range coexpression (median 0.54 AUC), meta-Hi-C networks perform comparably in both cis and trans (0.65 AUC vs 0.64 AUC). Similarly, for long-range expression quantitative trait loci (eQTL), meta-Hi-C contacts outperform all individual Hi-C experiments, providing an improvement over the conventionally used linear genomic distance-based association. Assessing between species, we find patterns of chromatin contact conservation in both cis and trans and strong associations with coexpression even in species for which Hi-C data is lacking. CONCLUSIONS: We have generated an integrated chromatin interaction network which complements a large number of methodological and analytic approaches focused on improved specificity or interpretation. This high-depth "super-experiment" is surprisingly powerful in capturing long-range functional relationships of chromatin interactions, which are now able to predict coexpression, eQTLs, and cross-species relationships. The meta-Hi-C networks are available at https://labshare.cshl.edu/shares/gillislab/resource/HiC/

    Contiguously hydrophobic sequences are functionally significant throughout the human exome

    Get PDF
    SignificanceProteins rely on the hydrophobic effect to maintain structure and interactions with the environment. Surprisingly, natural selection on amino acid hydrophobicity has not been detected using modern genetic data. Analyses that treat each amino acid separately do not reveal significant results, which we confirm here. However, because the hydrophobic effect becomes more powerful as more hydrophobic molecules are introduced, we tested whether unbroken stretches of hydrophobic amino acids are under selection. Using genetic variant data from across the human genome, we find evidence that selection increases with the length of the unbroken hydrophobic sequence. These results could lead to improvements in a wide range of genomic tools as well as insights into protein-aggregation disease etiology and protein evolutionary history

    Characterization of intrinsically disordered regions in proteins informed by human genetic diversity

    Get PDF
    All proteomes contain both proteins and polypeptide segments that don't form a defined three-dimensional structure yet are biologically active-called intrinsically disordered proteins and regions (IDPs and IDRs). Most of these IDPs/IDRs lack useful functional annotation limiting our understanding of their importance for organism fitness. Here we characterized IDRs using protein sequence annotations of functional sites and regions available in the UniProt knowledgebase ("UniProt features": active site, ligand-binding pocket, regions mediating protein-protein interactions, etc.). By measuring the statistical enrichment of twenty-five UniProt features in 981 IDRs of 561 human proteins, we identified eight features that are commonly located in IDRs. We then collected the genetic variant data from the general population and patient-based databases and evaluated the prevalence of population and pathogenic variations in IDPs/IDRs. We observed that some IDRs tolerate 2 to 12-times more single amino acid-substituting missense mutations than synonymous changes in the general population. However, we also found that 37% of all germline pathogenic mutations are located in disordered regions of 96 proteins. Based on the observed-to-expected frequency of mutations, we categorized 34 IDRs in 20 proteins (DDX3X, KIT, RB1, etc.) as intolerant to mutation. Finally, using statistical analysis and a machine learning approach, we demonstrate that mutation-intolerant IDRs carry a distinct signature of functional features. Our study presents a novel approach to assign functional importance to IDRs by leveraging the wealth of available genetic data, which will aid in a deeper understating of the role of IDRs in biological processes and disease mechanisms

    Sequence specificity despite intrinsic disorder: how a disease-associated Val/Met polymorphism rearranges tertiary interactions in a long disordered protein

    No full text
    The role of electrostatic interactions and mutations that change charge states in intrinsically disordered proteins (IDPs) is well-established, but many disease-associated mutations in IDPs are charge-neutral. The Val66Met single nucleotide polymorphism (SNP) in precursor brain-derived neurotrophic factor (BDNF) is one of the earliest SNPs to be associated with neuropsychiatric disorders, and the underlying molecular mechanism is unknown. Here we report on over 250 μs of fully-atomistic, explicit solvent, temperature replica exchange molecular dynamics (MD) simulations of the 91 residue BDNF prodomain, for both the V66 and M66 sequence. The simulations were able to correctly reproduce the location of both local and non-local secondary changes due to the Val66Met mutation when compared with NMR spectroscopy. We find that the change in local structure is mediated via entropic and sequence specific effects. We developed a hierarchical sequence-based framework for analysis and conceptualization, which first identifies “blobs” of 5-15 residues representing local globular regions or linkers. We use this framework within a novel test for enrichment of higher-order (tertiary) structure in disordered proteins; the size and shape of each blob is extracted from MD simulation of the real protein (RP), and used to parameterize a self-avoiding heterogenous polymer (SAHP). The SAHP version of the BDNF prodomain suggested a protein segmented into three regions, with a central long, highly disordered polyampholyte linker separating two globular regions. This effective segmentation was also observed in full simulations of the RP, but the Val66Met substitution significantly increased interactions across the linker, as well as the number of participating residues. The Val66Met substitution replaces β-bridging between Val66 and Val94 (on either side of the linker) with specific side-chain interactions between Met66 and Met95.The protein backbone in the vicinity of Met95 is then free to form β-bridges with residues 31-41 near the N-terminus, which condenses the protein. A significant role for Met/Met interactions is consistent with previously-observed non-local effects of the Val66Met SNP, as well as established interactions between the Met66 sequence and a Met-rich receptor that initiates neuronal growth cone retraction.</p

    High depth genome wide chromatin contact matrix

    No full text
    These high-depth meta-Hi-C chromatin contact matrices are surprisingly powerful in capturing long-range functional relationships of chromatin interactions, which are now able to predict coexpression, eQTLs, and cross-species relationships.  For building the meta-Hi-C matrix, we uniformly processed 3619, 6732, and 487 Hi-C runs for Human, Mouse, and Fly respectively. The runs were obtained after querying Sequence Read Archive with field limitations of given species and Hi-C as experiment strategy.  A genome-wide chromatin contact matrix was created for each run after mapping the reads to the same reference genome for each species. Reads were aligned to the hg38, mm10, and dm6 genomes in Human, Mouse and Fly respectively. All chromatin contact matrices for a species were aggregated to create the meta-Hi-C matrix.  The genome-wide Hi-C matrices were divided into cis (intrachromsomal) and trans (interchromosomal). In cis, each chromosome is stored in a seperate matrix. In trans,  the matrix is genome-wide but with no cis contacts (cis contacts are always 0 in this case). Here the cis contact matrices are available at 10KB and 1KB resolution for all three species. The trans contact matrices are available at 10KB resolution for all three species and additionally at 1KB resolution for Fly. The matrices are stored as *.h5 in HiCMatrix format (https://github.com/deeptools/HiCMatrix). HiCExplorer (https://hicexplorer.readthedocs.io/en/latest/) can be used to process these files or to convert them to other formats if desired.  Meta-Hi-C matrices at several other resolutions are available for download via online tool at https://gillisweb.cshl.edu/HiC/ or direct download at https://labshare.cshl.edu/shares/gillislab/resource/HiC/</p
    corecore