348 research outputs found

    RNA and protein 3D structure modeling: similarities and differences

    Get PDF
    In analogy to proteins, the function of RNA depends on its structure and dynamics, which are encoded in the linear sequence. While there are numerous methods for computational prediction of protein 3D structure from sequence, there have been very few such methods for RNA. This review discusses template-based and template-free approaches for macromolecular structure prediction, with special emphasis on comparison between the already tried-and-tested methods for protein structure modeling and the very recently developed “protein-like” modeling methods for RNA. We highlight analogies between many successful methods for modeling of these two types of biological macromolecules and argue that RNA 3D structure can be modeled using “protein-like” methodology. We also highlight the areas where the differences between RNA and proteins require the development of RNA-specific solutions

    Exploration of the Disambiguation of Amino Acid Types to Chi-1 Rotamer Types in Protein Structure Prediction and Design

    Full text link
    A protein’s global fold provide insight into function; however, function specificity is often detailed in sidechain orientation. Thus, determining the rotamer conformations is often crucial in the contexts of protein structure/function prediction and design. For all non-glycine and non-alanine types, chi-1 rotamers occupy a small number of discrete number of states. Herein, we explore the possibility of describing evolution from the perspective of the sidechains’ structure versus the traditional twenty amino acid types. To validate our hypothesis that this perspective is more crucial to our understanding of evolutionary relationships, we investigate its uses as evolutionary, substitution matrices for sequence alignments for fold recognition purposes and computational protein design with specific focus in designing beta sheet environments, where previous studies have been done on amino acid-types alone. Throughout this study, we also propose the concept of the “chi-1 rotamer sequence” that describes the chi-1 rotamer composition of a protein. We also present attempts to predict these sequences and real-value torsion angles from amino acid sequence information. First, we describe our developments of log-odds scoring matrices for sequence alignments. Log-odds substitution matrices are widely used in sequence alignments for their ability to determine evolutionary relationship between proteins. Traditionally, databases of sequence information guide the construction of these matrices which illustrates its power in discovering distant or weak homologs. Weak homologs, typically those that share low sequence identity (< 30%), are often difficult to identify when only using basic amino acid sequence alignment. While protein threading approaches have addressed this issue, many of these approaches include sequenced-based information or profiles guided by amino acid-based substitution matrices, namely BLOSUM62. Here, we generated a structural-based substitution matrix born by TM-align structural alignments that captures both the sequence mutation rate within same protein family folds and the chi-1 rotamer that represents each amino acid. These rotamer substitution matrices (ROTSUMs) discover new homologs and improved alignments in the PDB that traditional substitution matrices, based solely on sequence information, cannot identify. Certain tools and algorithms to estimate rotamer torsions angles have been developed but typically require either knowledge of backbone coordinates and/or experimental data to help guide the prediction. Herein, we developed a fragment-based algorithm, Rot1Pred, to determine the chi-1 states in each position of a given amino acid sequence, yielding a chi-1 rotamer sequence. This approach employs fragment matching of the query sequence to sequence-structure fragment pairs in the PDB to predict the query’s sidechain structure information. Real-value torsion angles were also predicted and compared against SCWRL4. Results show that overall and for most amino-acid types, Rot1Pred can calculate chi-1 torsion angles significantly closer to native angles compared to SCWRL4 when evaluated on I-TASSER generated model backbones. Finally, we’ve developed and explored chi-1-rotamer-based statistical potentials and evolutionary profiles constructed for de novo computational protein design. Previous analyses which aim to energetically describe the preference of amino acid types in beta sheet environments (parallel vs antiparallel packing or n- and c-terminal beta strand capping) have been performed with amino acid types although no explicit rotamer representation is given in their scoring functions. In our study, we construct statistical functions which describes chi-1 rotamer preferences in these environments and illustrate their improvement over previous methods. These specialized knowledge-based energy functions have generated sequences whose I-TASSER predicted models are structurally-alike to their input structures yet consist of low sequence identity.PHDChemical BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145951/1/jarrettj_1.pd

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Predicting and Testing Helix-Mimetic Inhibitors of the p53-Mdm2 Interaction

    Get PDF
    Aberrant protein-protein interactions (PPIs) are found in many disease states. Consequently, there is a need for PPI inhibitors for use as research tools and pharmaceutical lead compounds. Computational methods could greatly assist with the search for new PPIs. Oligobenzamides are novel PPI inhibitors which can theoretically be produced to display any sequence of side chains. Understanding the nature of oligobenzamide binding is important for identification of the most efficient strategy of predicting oligobenzamide inhibitors. The prediction of oligobenzamide affinities using thermodynamic integration and implicit solvent methods is described. Affinities of oligobenzamides for Mdm2 predicted using implicit solvent methods bore a moderate correlation with measured affinities. Examination of MM-PBSA results using analysis of variance revealed that it is not necessary to run simulations with every member of a large combinatorial library in order to predict their relative affinities because within a particular binding site, the degree of interaction between the side chains is small. However, it could be useful to separate molecules based on their predicted binding pose because oligobenzamides can bind to Mdm2 in many different ways, depending on the choice of side chains. This insight will be valuable for future attempts to predict oligobenzamide affinities. The 1H-15N HSQC NMR spectrum peaks of 15N-labelled Mdm2 L33E were assigned to facilitate the future validation of binding poses. An oligoamide was shown using NMR to bind in the correct place. However, NMR testing revealed that oligobenzamides can aggregate in aqueous solution despite being soluble. A novel FRET-based method was also developed which can be used to test potential inhibitors with a low solubility and high absorbance during their development. It was adapted for a microwell plate to facilitate future high throughput screening and an assay involving Cherry-labelled Mdm2 was tested which could be developed into an in vivo assay in the future

    Near-Native Protein Loop Sampling Using Nonparametric Density Estimation Accommodating Sparcity

    Get PDF
    Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/

    Coarse-grained simulations of RNA and DNA duplexes

    Full text link
    Although RNAs play many cellular functions little is known about the dynamics and thermodynamics of these molecules. In principle, all-atom molecular dynamics simulations can investigate these issues, but with current computer facilities, these simulations have been limited to small RNAs and to short times. HiRe-RNA, a recently proposed high-resolution coarse-grained for RNA that captures many geometric details such as base pairing and stacking, is able to fold RNA molecules to near-native structures in a short computational time. So far it had been applied to simple hairpins, and here we present its application to duplexes of a couple dozen nucleotides and show how with our model and with Replica Exchange Molecular Dynamics (REMD) we can easily predict the correct double helix from a completely random configuration and study the dissociation curve. To show the versatility of our model, we present an application to a double stranded DNA molecule as well. A reconstruction algorithm allows us to obtain full atom structures from the coarse-grained model. Through atomistic Molecular Dynamics (MD) we can compare the dynamics starting from a representative structure of a low temperature replica or from the experimental structure, and show how the two are statistically identical, highlighting the validity of a coarse-grained approach for structured RNAs and DNAs.Comment: 28 pages, 11 figure

    Chemoinformatics approaches for new drugs discovery

    Get PDF
    Chemoinformatics uses computational methods and technologies to solve chemical problems. It works on molecular structures, their representations, properties and related data. The first and most important phase in this field is the translation of interconnected atomic systems into in-silico models, ensuring complete and correct chemical information transfer. In the last 20 years the chemical databases evolved from the state of molecular repositories to research tools for new drugs identification, while the modern high-throughput technologies allow for continuous chemical libraries size increase as highlighted by publicly available repository like PubChem [http://pubchem.ncbi.nlm.nih.gov/], ZINC [http://zinc.docking.org/], ChemSpider[http://www.chemspider. com/]. Chemical libraries fundamental requirements are molecular uniqueness, absence of ambiguity, chemical correctness (related to atoms, bonds, chemical orthography), standardized storage and registration formats. The aim of this work is the development of chemoinformatics tools and data for drug discovery process. The first part of the research project was focused on accessible commercial chemical space analysis; looking for molecular redundancy and in-silico models correctness in order to identify a unique and univocal molecular descriptor for chemical libraries indexing. This allows for the 0%-redundancy achievement on a 42 millions compounds library. The protocol was implemented as MMsDusty, a web based tool for molecular databases cleaning. The major protocol developed is MMsINC, a chemoinformatics platform based on a starting number of 4 millions non-redundant high-quality annotated and biomedically relevant chemical structures; the library is now being expanded up to 460 millions compounds. MMsINC is able to perform various types of queries, like substructure or similarity search and descriptors filtering. MMsINC is interfaced with PDB(Protein Data Bank)[http://www.rcsb.org/pdb/home/home.do] and related to approved drugs. The second developed protocol is called pepMMsMIMIC, a peptidomimetic screening tool based on multiconformational chemical libraries; the screening process uses pharmacophoric fingerprints similarity to identify small molecules able to geometrically and chemically mimic endogenous peptides or proteins. The last part of this project lead to the implementation of an optimized and exhaustive conformational space analysis protocol for small molecules libraries; this is crucial for high quality 3D molecular models prediction as requested in chemoinformatics applications. The torsional exploration was optimized in the range of most frequent dihedral angles seen in X-ray solved small molecules structures of CSD(Cambridge Structural Database); by appling this on a 89 millions structures library was generated a library of 2.6 x 10 exp 7 high quality conformers. Tools, protocols and platforms developed in this work allow for chemoinformatics analysis and screening on large size chemical libraries achieving high quality, correct and unique chemical data and in-silico model

    Stability and Mechanical Properties of w1-X Mox b4.2 (X=0.0-1.0) From First Principles

    Full text link
    Heavy transition-metal tetraborides (e.g., tungsten tetraboride, molybdenum tetraboride, and molybdenum-doped tungsten tetraboride) exhibit superior mechanical properties, but solving their complex crystal structures has been a long-standing challenge. Recent experimental x-ray and neutron diffraction measurements combined with first-principles structural searches have identified a complex structure model for tungsten tetraboride that contains a boron trimer as an unusual structural unit with a stoichiometry of 1:4.2. In this paper, we expand the study to binary MoB4.2 and ternary W1-xMoxB4.2 (x=0.0-1.0) compounds to assess their thermodynamic stability and mechanical properties using a tailor-designed crystal structure search method in conjunction with first-principles energetic calculations. Our results reveal that an orthorhombic MoB4.2 structure in Cmcm symmetry matches well the experimental x-ray diffraction patterns. For the synthesized ternary Mo-doped tungsten tetraborides, a series of W1-xMoxB4.2 structures are theoretically designed using a random substitution approach by replacing the W to Mo atoms in the Cmcm binary crystal structure. This approach leads to the discovery of several W1-xMoxB4.2 structures that are energetically superior and stable against decomposition into binary WB4.2 and MoB4.2. The structural and mechanical properties of these low-energy W1-xMoxB4.2 structures largely follow the Vegard\u27s law. Under changing composition parameter x=0.0-1.0, the superior mechanical properties of W1-xMoxB4.2 stay in a narrow range. This unusual phenomenon stems from the strong covalent network with directional bonding configurations formed by boron atoms to resist elastic deformation. The findings offer insights into the fundamental structural and physical properties of ternary W1-xMoxB4.2 in relation to the binary WB4.2/MoB4.2 compounds, which open a promising avenue for further rational optimization of the functional performance of transition-metal borides that can be synthesized under favorable experimental conditions for wide applications

    ModeRNA: a tool for comparative modeling of RNA 3D structure

    Get PDF
    RNA is a large group of functionally important biomacromolecules. In striking analogy to proteins, the function of RNA depends on its structure and dynamics, which in turn is encoded in the linear sequence. However, while there are numerous methods for computational prediction of protein three-dimensional (3D) structure from sequence, with comparative modeling being the most reliable approach, there are very few such methods for RNA. Here, we present ModeRNA, a software tool for comparative modeling of RNA 3D structures. As an input, ModeRNA requires a 3D structure of a template RNA molecule, and a sequence alignment between the target to be modeled and the template. It must be emphasized that a good alignment is required for successful modeling, and for large and complex RNA molecules the development of a good alignment usually requires manual adjustments of the input data based on previous expertise of the respective RNA family. ModeRNA can model post-transcriptional modifications, a functionally important feature analogous to post-translational modifications in proteins. ModeRNA can also model DNA structures or use them as templates. It is equipped with many functions for merging fragments of different nucleic acid structures into a single model and analyzing their geometry. Windows and UNIX implementations of ModeRNA with comprehensive documentation and a tutorial are freely available

    Enumeration, conformation sampling and population of libraries of peptide macrocycles for the search of chemotherapeutic cardioprotection agents

    Get PDF
    Peptides are uniquely endowed with features that allow them to perturb previously difficult to drug biomolecular targets. Peptide macrocycles in particular have seen a flurry of recent interest due to their enhanced bioavailability, tunability and specificity. Although these properties make them attractive hit-candidates in early stage drug discovery, knowing which peptides to pursue is non‐trivial due to the magnitude of the peptide sequence space. Computational screening approaches show promise in their ability to address the size of this search space but suffer from their inability to accurately interrogate the conformational landscape of peptide macrocycles. We developed an in‐silico compound enumerator that was tasked with populating a conformationally laden peptide virtual library. This library was then used in the search for cardio‐protective agents (that may be administered, reducing tissue damage during reperfusion after ischemia (heart attacks)). Our enumerator successfully generated a library of 15.2 billion compounds, requiring the use of compression algorithms, conformational sampling protocols and management of aggregated compute resources in the context of a local cluster. In the absence of experimental biophysical data, we performed biased sampling during alchemical molecular dynamics simulations in order to observe cyclophilin‐D perturbation by cyclosporine A and its mitochondrial targeted analogue. Reliable intermediate state averaging through a WHAM analysis of the biased dynamic pulling simulations confirmed that the cardio‐protective activity of cyclosporine A was due to its mitochondrial targeting. Paralleltempered solution molecular dynamics in combination with efficient clustering isolated the essential dynamics of a cyclic peptide scaffold. The rapid enumeration of skeletons from these essential dynamics gave rise to a conformation laden virtual library of all the 15.2 Billion unique cyclic peptides (given the limits on peptide sequence imposed). Analysis of this library showed the exact extent of physicochemical properties covered, relative to the bare scaffold precursor. Molecular docking of a subset of the virtual library against cyclophilin‐D showed significant improvements in affinity to the target (relative to cyclosporine A). The conformation laden virtual library, accessed by our methodology, provided derivatives that were able to make many interactions per peptide with the cyclophilin‐D target. Machine learning methods showed promise in the training of Support Vector Machines for synthetic feasibility prediction for this library. The synergy between enumeration and conformational sampling greatly improves the performance of this library during virtual screening, even when only a subset is used
    corecore