142,740 research outputs found

    Determination of complex small molecule structures using molecular alignment simulation.

    Get PDF
    Correct structural assignment of small molecules and natural products is critical for drug discovery and organic chemistry. Anisotropy-based NMR spectroscopy is a powerful tool for structural assignment of organic molecules, but relies on utilization of a medium that disrupts the isotropic motion of molecules in organic solvents. Here, we establish a quantitative correlation between the atomic structure of the alignment medium, the molecular structure of the small molecule and molecule-specific anisotropic NMR parameters. The quantitative correlation uses an accurate three-dimensional molecular alignment model that predicts residual dipolar couplings of small molecules aligned by poly( Ī³ -benzyl-ŹŸ-glutamate). The technique facilitates reliable determination of the correct stereoisomer and enables unequivocal, rapid determination of complex molecular structures from extremely sparse NMR data

    Structural Studies on Flexible Small Molecules Based on NMR in Oriented Media. Methodology and Application to Natural Products

    Get PDF
    This thesis describes the development and application of structural elucidation methodologies based on NMR in aligned media. Nuclear magnetic resonance is arguably the most important technique for the structural analysis of organic molecules in solution. In the last decade, Residual Dipolar Coupling (RDC) analysis emerged as a powerful tool for the determination of the three-dimensional structure of organic molecules in solution, complementing and even outperforming the approach based on the classical NMR observables such as NOE or 3J couplings. While application of RDCs to the structural analysis of proteins developed rapidly, their use with ā€œsmallā€ molecules (typically organic compounds and natural products with MW < 1000 Da) is still scarce. From the spectroscopic point of view, two features of small molecules pose the main obstacles to the application of RDC to their analysis: the scarcity of observable couplings and the complexity stemming from conformational flexibility in solution. Besides, sample preparation with the optimal degree of alignment is still an issue for most classes of compounds. In this thesis, all these topics are addressed and new experimental and computational advancements are presented. i) Sample preparation. Weak alignment in water and aligning properties of polyacrylamide gels. ii) New observables. Long-range protonā€“carbon RDCs. iii) Analysis of flexible organic molecules

    NOBAI: a web server for character coding of geometrical and statistical features in RNA structure

    Get PDF
    The Numeration of Objects in Biology: Alignment Inferences (NOBAI) web server provides a web interface to the applications in the NOBAI software package. This software codes topological and thermodynamic information related to the secondary structure of RNA molecules as multi-state phylogenetic characters, builds character matrices directly in NEXUS format and provides sequence randomization options. The web server is an effective tool that facilitates the search for evolutionary history embedded in the structure of functional RNA molecules. The NOBAI web server is accessible at ā€˜http://www.manet.uiuc.edu/nobai/nobai.phpā€™. This web site is free and open to all users and there is no login requirement

    ModeRNA: a tool for comparative modeling of RNA 3D structure

    Get PDF
    RNA is a large group of functionally important biomacromolecules. In striking analogy to proteins, the function of RNA depends on its structure and dynamics, which in turn is encoded in the linear sequence. However, while there are numerous methods for computational prediction of protein three-dimensional (3D) structure from sequence, with comparative modeling being the most reliable approach, there are very few such methods for RNA. Here, we present ModeRNA, a software tool for comparative modeling of RNA 3D structures. As an input, ModeRNA requires a 3D structure of a template RNA molecule, and a sequence alignment between the target to be modeled and the template. It must be emphasized that a good alignment is required for successful modeling, and for large and complex RNA molecules the development of a good alignment usually requires manual adjustments of the input data based on previous expertise of the respective RNA family. ModeRNA can model post-transcriptional modifications, a functionally important feature analogous to post-translational modifications in proteins. ModeRNA can also model DNA structures or use them as templates. It is equipped with many functions for merging fragments of different nucleic acid structures into a single model and analyzing their geometry. Windows and UNIX implementations of ModeRNA with comprehensive documentation and a tutorial are freely available

    A bioinformatics framework for RNA structure mining, motif discovery and polyadenylation analysis

    Get PDF
    The RNA molecules play various important roles in the cell and their functionality depends not only on the sequence information but to a large extent on their structure. The development of computational and predictive approaches to study RNA molecules is extremely valuable. In this research, a tool named RADAR was developed that provides a multitude of functionality for RNA data analysis and research. It aligns structure annotated RNA sequences so that both the sequence as well as structure information is taken into consideration. This tool is capable of performing pair-wise structure alignment, multiple structure alignment, database search and clustering. In addition, it provides two salient features: (i) constrained alignment of RNA secondary structures, and (ii) prediction of consensus structure for a set of RNA sequences. This tool is also hosted on the web and can be freely accessed and the software can be downloaded from http://datalab.njitedu/biodata/rna/RSmatch/server.htm . The RADAR software has been applied to various datasets (genomes of various mammals, viruses and parasites) and our experimental results show that this approach is capable of detecting functionally important regions. As an application of RADAR, a systematic data mining approach was developed, termed GLEAN-UTR, to identify small stem loop RNA structure elements in the Untranslated regions (UTRs) that are conserved between human and mouse orthologs and exist in multiple genes with common Gene Ontology terms. This study resulted in 90 distinct RNA structure groups containing 748 structures, with 3\u27 Histone stem loop (HSL3) and Iron Response element (IRE) among the top hits. Further, the role played by structure in mRNA polyadenylation was investigated. Polyadenylation is an important step towards the maturation of almost all cellular mRNAs in eukaryotes. Studies have identified several cis-elements besides the widely known polyadenylation signal (PAS) element (AATAAA or ATTAAA or a close variant) which may have a role to play in poly(A) site identification. In this study the differences in structural stability of sequences surrounding poly(A) sites was investigated and it was found that for the genes containing single poly(A) site, the surrounding sequence is most stable as compared with the surrounding sequences for alternative poly(A) sites. This indicates that structure may be providing a evolutionary advantage for single poly(A) sites that prevents multiple poly(A) sites from arising. In addition the study found that the structural stability of the region surrounding a polyadenylation site correlates with its distance from the next gene. The shortest distance corresponding to a greater structural stability

    Water structure in solution and crystal molecular dynamics simulations compared to protein crystal structures

    Get PDF
    The function of proteins is influenced not only by the atomic structure but also by the detailed structure of the solvent surrounding it. Computational studies of protein structure also critically depend on the water structure around the protein. Herein we compare the water structure obtained from molecular dynamics (MD) simulations of galectin-3 in complex with two ligands to crystallographic water molecules observed in the corresponding crystal structures. We computed MD trajectories both in a water box, which mimics a protein in solution, and in a crystallographic unit cell, which mimics a protein in a crystal. The calculations were compared to crystal structures obtained at both cryogenic and room temperature. Two types of analyses of the MD simulations were performed. First, the positions of the crystallographic water molecules were compared to peaks in the MD density after alignment of the protein in each snapshot. The results of this analysis indicate that all simulations reproduce the crystallographic water structure rather poorly. However, if we define the crystallographic water sites based on their distances to nearby protein atoms and follow these sites throughout the simulations, the MD simulations reproduce the crystallographic water sites much better. This shows that the failure of MD simulations to reproduce the water structure around proteins in crystal structures observed both in this and previous studies is caused by the problem of identifying water sites for a flexible and dynamic protein (traditionally done by overlaying the structures). Our local clustering approach solves the problem and shows that the MD simulations reasonably reproduce the water structure observed in crystals. Furthermore, analysis of the crystal MD simulations indicates a few water molecules that are close to unmodeled electron density peaks in the crystal structures, suggesting that crystal MD could be used as a complementary tool for identifying and modelling water in protein crystallography

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Genome maps across 26 human populations reveal population-specific patterns of structural variation.

    Get PDF
    Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (&gt;2ā€‰kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60ā€‰Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome
    • ā€¦
    corecore