100 research outputs found

    Serverification of Molecular Modeling Applications: the Rosetta Online Server that Includes Everyone (ROSIE)

    Get PDF
    The Rosetta molecular modeling software package provides experimentally tested and rapidly evolving tools for the 3D structure prediction and high-resolution design of proteins, nucleic acids, and a growing number of non-natural polymers. Despite its free availability to academic users and improving documentation, use of Rosetta has largely remained confined to developers and their immediate collaborators due to the code's difficulty of use, the requirement for large computational resources, and the unavailability of servers for most of the Rosetta applications. Here, we present a unified web framework for Rosetta applications called ROSIE (Rosetta Online Server that Includes Everyone). ROSIE provides (a) a common user interface for Rosetta protocols, (b) a stable application programming interface for developers to add additional protocols, (c) a flexible back-end to allow leveraging of computer cluster resources shared by RosettaCommons member institutions, and (d) centralized administration by the RosettaCommons to ensure continuous maintenance. This paper describes the ROSIE server infrastructure, a step-by-step 'serverification' protocol for use by Rosetta developers, and the deployment of the first nine ROSIE applications by six separate developer teams: Docking, RNA de novo, ERRASER, Antibody, Sequence Tolerance, Supercharge, Beta peptide design, NCBB design, and VIP redesign. As illustrated by the number and diversity of these applications, ROSIE offers a general and speedy paradigm for serverification of Rosetta applications that incurs negligible cost to developers and lowers barriers to Rosetta use for the broader biological community. ROSIE is available at http://rosie.rosettacommons.org

    PSPP: A Protein Structure Prediction Pipeline for Computing Clusters

    Get PDF
    BACKGROUND:Protein structures are critical for understanding the mechanisms of biological systems and, subsequently, for drug and vaccine design. Unfortunately, protein sequence data exceed structural data by a factor of more than 200 to 1. This gap can be partially filled by using computational protein structure prediction. While structure prediction Web servers are a notable option, they often restrict the number of sequence queries and/or provide a limited set of prediction methodologies. Therefore, we present a standalone protein structure prediction software package suitable for high-throughput structural genomic applications that performs all three classes of prediction methodologies: comparative modeling, fold recognition, and ab initio. This software can be deployed on a user's own high-performance computing cluster. METHODOLOGY/PRINCIPAL FINDINGS:The pipeline consists of a Perl core that integrates more than 20 individual software packages and databases, most of which are freely available from other research laboratories. The query protein sequences are first divided into domains either by domain boundary recognition or Bayesian statistics. The structures of the individual domains are then predicted using template-based modeling or ab initio modeling. The predicted models are scored with a statistical potential and an all-atom force field. The top-scoring ab initio models are annotated by structural comparison against the Structural Classification of Proteins (SCOP) fold database. Furthermore, secondary structure, solvent accessibility, transmembrane helices, and structural disorder are predicted. The results are generated in text, tab-delimited, and hypertext markup language (HTML) formats. So far, the pipeline has been used to study viral and bacterial proteomes. CONCLUSIONS:The standalone pipeline that we introduce here, unlike protein structure prediction Web servers, allows users to devote their own computing assets to process a potentially unlimited number of queries as well as perform resource-intensive ab initio structure prediction

    In silico selection of RNA aptamers

    Get PDF
    In vitro selection of RNA aptamers that bind to a specific ligand usually begins with a random pool of RNA sequences. We propose a computational approach for designing a starting pool of RNA sequences for the selection of RNA aptamers for specific analyte binding. Our approach consists of three steps: (i) selection of RNA sequences based on their secondary structure, (ii) generating a library of three-dimensional (3D) structures of RNA molecules and (iii) high-throughput virtual screening of this library to select aptamers with binding affinity to a desired small molecule. We developed a set of criteria that allows one to select a sequence with potential binding affinity from a pool of random sequences and developed a protocol for RNA 3D structure prediction. As verification, we tested the performance of in silico selection on a set of six known aptamer–ligand complexes. The structures of the native sequences for the ligands in the testing set were among the top 5% of the selected structures. The proposed approach reduces the RNA sequences search space by four to five orders of magnitude—significantly accelerating the experimental screening and selection of high-affinity aptamers

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    Clustering System and Clustering Support Vector Machine for Local Protein Structure Prediction

    Get PDF
    Protein tertiary structure plays a very important role in determining its possible functional sites and chemical interactions with other related proteins. Experimental methods to determine protein structure are time consuming and expensive. As a result, the gap between protein sequence and its structure has widened substantially due to the high throughput sequencing techniques. Problems of experimental methods motivate us to develop the computational algorithms for protein structure prediction. In this work, the clustering system is used to predict local protein structure. At first, recurring sequence clusters are explored with an improved K-means clustering algorithm. Carefully constructed sequence clusters are used to predict local protein structure. After obtaining the sequence clusters and motifs, we study how sequence variation for sequence clusters may influence its structural similarity. Analysis of the relationship between sequence variation and structural similarity for sequence clusters shows that sequence clusters with tight sequence variation have high structural similarity and sequence clusters with wide sequence variation have poor structural similarity. Based on above knowledge, the established clustering system is used to predict the tertiary structure for local sequence segments. Test results indicate that highest quality clusters can give highly reliable prediction results and high quality clusters can give reliable prediction results. In order to improve the performance of the clustering system for local protein structure prediction, a novel computational model called Clustering Support Vector Machines (CSVMs) is proposed. In our previous work, the sequence-to-structure relationship with the K-means algorithm has been explored by the conventional K-means algorithm. The K-means clustering algorithm may not capture nonlinear sequence-to-structure relationship effectively. As a result, we consider using Support Vector Machine (SVM) to capture the nonlinear sequence-to-structure relationship. However, SVM is not favorable for huge datasets including millions of samples. Therefore, we propose a novel computational model called CSVMs. Taking advantage of both the theory of granular computing and advanced statistical learning methodology, CSVMs are built specifically for each information granule partitioned intelligently by the clustering algorithm. Compared with the clustering system introduced previously, our experimental results show that accuracy for local structure prediction has been improved noticeably when CSVMs are applied

    Dynamic particle swarm optimization of biomolecular simulation parameters with flexible objective functions

    Get PDF
    Molecular simulations are a powerful tool to complement and interpret ambiguous experimental data on biomolecules to obtain structural models. Such data-assisted simulations often rely on parameters, the choice of which is highly non-trivial and crucial to performance. The key challenge is weighting experimental information with respect to the underlying physical model. We introduce FLAPS, a self-adapting variant of dynamic particle swarm optimization, to overcome this parameter selection problem. FLAPS is suited for the optimization of composite objective functions that depend on both the optimization parameters and additional, a priori unknown weighting parameters, which substantially influence the search-space topology. These weighting parameters are learned at runtime, yielding a dynamically evolving and iteratively refined search-space topology. As a practical example, we show how FLAPS can be used to find functional parameters for small-angle X-ray scattering-guided protein simulations

    Deep Evolutionary Generative Molecular Modeling for RNA Aptamer Drug Design

    Get PDF
    Deep Aptamer Evolutionary Model (DAPTEV Model). Typical drug development processes are costly, time consuming and often manual with regard to research. Aptamers are short, single-stranded oligonucleotides (RNA/DNA) that bind to, and inhibit, target proteins and other types of molecules similar to antibodies. Compared with small-molecule drugs, these aptamers can bind to their targets with high affinity (binding strength) and specificity (designed to uniquely interact with the target only). The typical development process for aptamers utilizes a manual process known as Systematic Evolution of Ligands by Exponential Enrichment (SELEX), which is costly, slow, and often produces mild results. The focus of this research is to create a deep learning approach for the generating and evolving of aptamer sequences to support aptamer-based drug development. These sequences must be unique, contain at least some level of structural complexity, and have a high level of affinity and specificity for the intended target. Moreover, after training, the deep learning system, known as a Variational Autoencoder, must possess the ability to be queried for new sequences without the need for further training. Currently, this research is applied to the SARS-CoV-2 (Covid-19) spike protein’s receptor-binding domain (RBD). However, careful consideration has been placed in the intentional design of a general solution for future viral applications. Each individual run took five and a half days to complete. Over the course of two months, three runs were performed for three different models. After some sequence, score, and statistical comparisons, it was observed that the deep learning model was able to produce structurally complex aptamers with strong binding affinities and specificities to the target Covid-19 RBD. Furthermore, due to the nature of VAEs, this model is indeed able to be queried for new aptamers of similar quality based on previous training. Results suggest that VAE-based deep learning methods are capable of optimizing aptamer-target binding affinities and specificities (multi-objective learning), and are a strong tool to aid in aptamer-based drug development

    Investigating Cryptic Binding Sites by Molecular Dynamics Simulations

    Get PDF
    This Account highlights recent advances and discusses major challenges in investigations of cryptic (hidden) binding sites by molecular simulations. Cryptic binding sites are not visible in protein targets crystallized without a ligand and only become visible crystallographically upon binding events. These sites have been shown to be druggable and might provide a rare opportunity to target difficult proteins. However, due to their hidden nature, they are difficult to find through experimental screening. Computational methods based on atomistic molecular simulations remain one of the best approaches to identify and characterize cryptic binding sites. However, not all methods are equally efficient. Some are more apt at quickly probing protein dynamics but do not provide thermodynamic or druggability information, while others that are able to provide such data are demanding in terms of time and resources. Here, we review the recent contributions of mixed-solvent simulations, metadynamics, Markov state models, and other enhanced sampling methods to the field of cryptic site identification and characterization. We discuss how these methods were able to provide precious information on the nature of the site opening mechanisms, to predict previously unknown sites which were used to design new ligands, and to compute the free energy landscapes and kinetics associated with the opening of the sites and the binding of the ligands. We highlight the potential and the importance of such predictions in drug discovery, especially for difficult (“undruggable”) targets. We also discuss the major challenges in the field and their possible solutions
    corecore