20,015 research outputs found

    Sequence-based Multiscale Model (SeqMM) for High-throughput chromosome conformation capture (Hi-C) data analysis

    Full text link
    In this paper, I introduce a Sequence-based Multiscale Model (SeqMM) for the biomolecular data analysis. With the combination of spectral graph method, I reveal the essential difference between the global scale models and local scale ones in structure clustering, i.e., different optimization on Euclidean (or spatial) distances and sequential (or genomic) distances. More specifically, clusters from global scale models optimize Euclidean distance relations. Local scale models, on the other hand, result in clusters that optimize the genomic distance relations. For a biomolecular data, Euclidean distances and sequential distances are two independent variables, which can never be optimized simultaneously in data clustering. However, sequence scale in my SeqMM can work as a tuning parameter that balances these two variables and deliver different clusterings based on my purposes. Further, my SeqMM is used to explore the hierarchical structures of chromosomes. I find that in global scale, the Fiedler vector from my SeqMM bears a great similarity with the principal vector from principal component analysis, and can be used to study genomic compartments. In TAD analysis, I find that TADs evaluated from different scales are not consistent and vary a lot. Particularly when the sequence scale is small, the calculated TAD boundaries are dramatically different. Even for regions with high contact frequencies, TAD regions show no obvious consistence. However, when the scale value increases further, although TADs are still quite different, TAD boundaries in these high contact frequency regions become more and more consistent. Finally, I find that for a fixed local scale, my method can deliver very robust TAD boundaries in different cluster numbers.Comment: 22 PAGES, 13 FIGURE

    The development of biomolecular Raman optical activity spectroscopy

    Get PDF
    Following its first observation over 40 years ago, Raman optical activity (ROA), which may be measured as a small difference in the intensity of vibrational Raman scattering from chiral molecules in right- and left-circularly polarized incident light or, equivalently, the intensity of a small circularly polarized component in the scattered light using incident light of fixed polarization, has evolved into a powerful chiroptical spectroscopy for studying a large range of biomolecules in aqueous solution. The long and tortuous path leading to the first observations of ROA in biomolecules in 1989, in which the author was closely involved from the very beginning, is documented, followed by a survey of subsequent developments and applications up to the present day. Among other things, ROA provides information about motif and fold, as well as secondary structure, of proteins; solution structure of carbohydrates; polypeptide and carbohydrate structure of intact glycoproteins; new insight into structural elements present in unfolded protein sequences; and protein and nucleic acid structure of intact viruses. Quantum chemical simulations of observed Raman optical activity spectra provide the complete three-dimensional structure, together with information about conformational dynamics, of smaller biomolecules. Biomolecular ROA measurements are now routine thanks to a commercial instrument based on a novel design becoming available in 2004

    Distinct conformational stability and functional activity of four highly homologous endonuclease colicins

    Get PDF
    The family of conserved colicin DNases E2, E7, E8, and E9 are microbial toxins that kill bacteria through random degradation of the chromosomal DNA. In the present work, we compare side by side the conformational stabilities of these four highly homologous colicin DNases. Our results indicate that the apo-forms of these colicins are at room temperature and neutral pH in a dynamic conformational equilibrium between at least two quite distinct conformers. We show that the thermal stabilities of the apo-proteins differ by up to 20degreesC. The observed differences correlate with the observed conformational behavior, that is, the tendency of the protein to form either an open, less stable or closed, more stable conformation in solution, as deduced by both tryptophan accessibility studies and electrospray ionization mass spectrometry. Given these surprising structural differences, we next probed the catalytic activity of the four DNases and also observed a significant variation in relative activities. However, no unequivocal link between the activity of the protein and its thermal and structural stability could easily be made. The observed differences in conformational and functional properties of the four colicin DNases are surprising given that they are a closely related ( greater than or equal to65% identity) family of enzymes containing a highly conserved (betabetaalpha-Me) active site motif. The different behavior of the apo-enzymes must therefore most likely depend on more subtle changes in amino acid sequences, most likely in the exosite region (residues 72-98) that is required for specific high-affinity binding of the cognate immunity protein

    MAS NMR detection of hydrogen bonds for protein secondary structure characterization

    Get PDF
    Hydrogen bonds are essential for protein structure and function, making experimental access to long-range interactions between amide protons and heteroatoms invaluable. Here we show that measuring distance restraints involving backbone hydrogen atoms and carbonyl- or α-carbons enables the identification of secondary structure elements based on hydrogen bonds, provides long-range contacts and validates spectral assignments. To this end, we apply specifically tailored, proton-detected 3D (H)NCOH and (H)NCAH experiments under fast magic angle spinning (MAS) conditions to microcrystalline samples of SH3 and GB1. We observe through-space, semi-quantitative correlations between protein backbone carbon atoms and multiple amide protons, enabling us to determine hydrogen bonding patterns and thus to identify β-sheet topologies and α-helices in proteins. Our approach shows the value of fast MAS and suggests new routes in probing both secondary structure and the role of functionally-relevant protons in all targets of solid-state MAS NMR

    Protein folding on the ribosome studied using NMR spectroscopy

    Get PDF
    NMR spectroscopy is a powerful tool for the investigation of protein folding and misfolding, providing a characterization of molecular structure, dynamics and exchange processes, across a very wide range of timescales and with near atomic resolution. In recent years NMR methods have also been developed to study protein folding as it might occur within the cell, in a de novo manner, by observing the folding of nascent polypeptides in the process of emerging from the ribosome during synthesis. Despite the 2.3 MDa molecular weight of the bacterial 70S ribosome, many nascent polypeptides, and some ribosomal proteins, have sufficient local flexibility that sharp resonances may be observed in solution-state NMR spectra. In providing information on dynamic regions of the structure, NMR spectroscopy is therefore highly complementary to alternative methods such as X-ray crystallography and cryo-electron microscopy, which have successfully characterized the rigid core of the ribosome particle. However, the low working concentrations and limited sample stability associated with ribosome-nascent chain complexes means that such studies still present significant technical challenges to the NMR spectroscopist. This review will discuss the progress that has been made in this area, surveying all NMR studies that have been published to date, and with a particular focus on strategies for improving experimental sensitivity

    Sequence composition and environment effects on residue fluctuations in protein structures

    Get PDF
    The spectrum and scale of fluctuations in protein structures affect the range of cell phenomena, including stability of protein structures or their fragments, allosteric transitions and energy transfer. The study presents a statistical-thermodynamic analysis of relationship between the sequence composition and the distribution of residue fluctuations in protein-protein complexes. A one-node-per residue elastic network model accounting for the nonhomogeneous protein mass distribution and the inter-atomic interactions through the renormalized inter-residue potential is developed. Two factors, a protein mass distribution and a residue environment, were found to determine the scale of residue fluctuations. Surface residues undergo larger fluctuations than core residues, showing agreement with experimental observations. Ranking residues over the normalized scale of fluctuations yields a distinct classification of amino acids into three groups. The structural instability in proteins possibly relates to the high content of the highly fluctuating residues and a deficiency of the weakly fluctuating residues in irregular secondary structure elements (loops), chameleon sequences and disordered proteins. Strong correlation between residue fluctuations and the sequence composition of protein loops supports this hypothesis. Comparing fluctuations of binding site residues (interface residues) with other surface residues shows that, on average, the interface is more rigid than the rest of the protein surface and Gly, Ala, Ser, Cys, Leu and Trp have a propensity to form more stable docking patches on the interface. The findings have broad implications for understanding mechanisms of protein association and stability of protein structures.Comment: 8 pages, 4 figure

    Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

    Get PDF
    A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively contextindependent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time