1,286 research outputs found

    SIMS: A Hybrid Method for Rapid Conformational Analysis

    Get PDF
    Proteins are at the root of many biological functions, often performing complex tasks as the result of large changes in their structure. Describing the exact details of these conformational changes, however, remains a central challenge for computational biology due the enormous computational requirements of the problem. This has engendered the development of a rich variety of useful methods designed to answer specific questions at different levels of spatial, temporal, and energetic resolution. These methods fall largely into two classes: physically accurate, but computationally demanding methods and fast, approximate methods. We introduce here a new hybrid modeling tool, the Structured Intuitive Move Selector (SIMS), designed to bridge the divide between these two classes, while allowing the benefits of both to be seamlessly integrated into a single framework. This is achieved by applying a modern motion planning algorithm, borrowed from the field of robotics, in tandem with a well-established protein modeling library. SIMS can combine precise energy calculations with approximate or specialized conformational sampling routines to produce rapid, yet accurate, analysis of the large-scale conformational variability of protein systems. Several key advancements are shown, including the abstract use of generically defined moves (conformational sampling methods) and an expansive probabilistic conformational exploration. We present three example problems that SIMS is applied to and demonstrate a rapid solution for each. These include the automatic determination of ďľ‘ďľ‘activeďľ’ďľ’ residues for the hinge-based system Cyanovirin-N, exploring conformational changes involving long-range coordinated motion between non-sequential residues in Ribose- Binding Protein, and the rapid discovery of a transient conformational state of Maltose-Binding Protein, previously only determined by Molecular Dynamics. For all cases we provide energetic validations using well-established energy fields, demonstrating this framework as a fast and accurate tool for the analysis of a wide range of protein flexibility problems

    Techniques for modeling and analyzing RNA and protein folding energy landscapes

    Get PDF
    RNA and protein molecules undergo a dynamic folding process that is important to their function. Computational methods are critical for studying this folding pro- cess because it is difficult to observe experimentally. In this work, we introduce new computational techniques to study RNA and protein energy landscapes, includ- ing a method to approximate an RNA energy landscape with a coarse graph (map) and new tools for analyzing graph-based approximations of RNA and protein energy landscapes. These analysis techniques can be used to study RNA and protein fold- ing kinetics such as population kinetics, folding rates, and the folding of particular subsequences. In particular, a map-based Master Equation (MME) method can be used to analyze the population kinetics of the maps, while another map analysis tool, map-based Monte Carlo (MMC) simulation, can extract stochastic folding pathways from the map. To validate the results, I compared our methods with other computational meth- ods and with experimental studies of RNA and protein. I first compared our MMC and MME methods for RNA with other computational methods working on the com- plete energy landscape and show that the approximate map captures the major fea- tures of a much larger (e.g., by orders of magnitude) complete energy landscape. Moreover, I show that the methods scale well to large molecules, e.g., RNA with 200+ nucleotides. Then, I correlate the computational results with experimental findings. I present comparisons with two experimental cases to show how I can pre- dict kinetics-based functional rates of ColE1 RNAII and MS2 phage RNA and their mutants using our MME and MMC tools respectively. I also show that the MME and MMC tools can be applied to map-based approximations of protein energy energy landscapes and present kinetics analysis results for several proteins

    Major Subject: Computer ScienceTECHNIQUES FOR MODELING AND ANALYZING RNA AND PROTEIN FOLDING ENERGY LANDSCAPES

    Get PDF
    Major Subject: Computer Scienceiii Techniques for Modeling and Analyzing RNA and Protein Folding Energ

    Identifying genome-wide transcription units from histone modifications using EPIGENE

    Get PDF
    With the successful completion of the human genome project and the rapid development of sequencing technologies, transcriptome annotation across multiple human cell types and tissues is now available. Accurate transcriptome annotation is critical for understanding the functional as well as the regulatory roles of genomic regions. Current methods for identifying genome-wide active transcription units (TUs) use RNA sequencing (RNA-seq). However, this approach requires large quantities of mRNAs making the identification of highly unstable regulatory RNAs (like microRNA precursors) difficult. As a result of this complexity in identifying inherently unstable TUs, the transcriptome landscape across all cells and tissues remains incomplete. This problem can be alleviated by chromatin-based approaches due to a well-established correlation between transcription and histone modification. Here, I present EPIGENE, a novel chromatin segmentation method for identifying genome-wide active TUs using transcription-associated histone modifications. Unlike existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate Hidden Markov Model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables to identify the chromatin state sequence underlying an active TU. Using EPIGENE, I successfully predicted genome-wide TUs across multiple human cell lines. EPIGENE predicted TUs were enriched for RNA Polymerase II (Pol II) at the transcription start site (TSS) and in gene body indicating that they are indeed transcribed. Comprehensive validation using existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE predicted TUs more precisely compared to existing chromatin segmentation and RNA-seq based approaches across multiple human cell lines. Using EPIGENE, I also identified 232 novel TUs in K562 and 43 novel cell-specific TUs in K562, HepG2, and IMR90, all of which were supported by Pol II ChIP-seq and nascent RNA-seq evidence

    Rapid Sampling of Molecular Motions with Prior Information Constraints

    Get PDF
    Proteins are active, flexible machines that perform a range of different functions. Innovative experimental approaches may now provide limited partial information about conformational changes along motion pathways of proteins. There is therefore a need for computational approaches that can efficiently incorporate prior information into motion prediction schemes. In this paper, we present PathRover, a general setup designed for the integration of prior information into the motion planning algorithm of rapidly exploring random trees (RRT). Each suggested motion pathway comprises a sequence of low-energy clash-free conformations that satisfy an arbitrary number of prior information constraints. These constraints can be derived from experimental data or from expert intuition about the motion. The incorporation of prior information is very straightforward and significantly narrows down the vast search in the typically high-dimensional conformational space, leading to dramatic reduction in running time. To allow the use of state-of-the-art energy functions and conformational sampling, we have integrated this framework into Rosetta, an accurate protocol for diverse types of structural modeling. The suggested framework can serve as an effective complementary tool for molecular dynamics, Normal Mode Analysis, and other prevalent techniques for predicting motion in proteins. We applied our framework to three different model systems. We show that a limited set of experimentally motivated constraints may effectively bias the simulations toward diverse predicates in an outright fashion, from distance constraints to enforcement of loop closure. In particular, our analysis sheds light on mechanisms of protein domain swapping and on the role of different residues in the motion

    Techniques for modeling and analyzing RNA and protein folding energy landscapes

    Get PDF
    RNA and protein molecules undergo a dynamic folding process that is important to their function. Computational methods are critical for studying this folding pro- cess because it is difficult to observe experimentally. In this work, we introduce new computational techniques to study RNA and protein energy landscapes, includ- ing a method to approximate an RNA energy landscape with a coarse graph (map) and new tools for analyzing graph-based approximations of RNA and protein energy landscapes. These analysis techniques can be used to study RNA and protein fold- ing kinetics such as population kinetics, folding rates, and the folding of particular subsequences. In particular, a map-based Master Equation (MME) method can be used to analyze the population kinetics of the maps, while another map analysis tool, map-based Monte Carlo (MMC) simulation, can extract stochastic folding pathways from the map. To validate the results, I compared our methods with other computational meth- ods and with experimental studies of RNA and protein. I first compared our MMC and MME methods for RNA with other computational methods working on the com- plete energy landscape and show that the approximate map captures the major fea- tures of a much larger (e.g., by orders of magnitude) complete energy landscape. Moreover, I show that the methods scale well to large molecules, e.g., RNA with 200+ nucleotides. Then, I correlate the computational results with experimental findings. I present comparisons with two experimental cases to show how I can pre- dict kinetics-based functional rates of ColE1 RNAII and MS2 phage RNA and their mutants using our MME and MMC tools respectively. I also show that the MME and MMC tools can be applied to map-based approximations of protein energy energy landscapes and present kinetics analysis results for several proteins

    Algorithmes pour le (dés)assemblage d'objets complexes et applications à la biologie structurale

    Get PDF
    La compréhension et la prédiction des relations structure-fonction de protéines par des approches in sillico représentent aujourd'hui un challenge. Malgré le développement récent de méthodes algorithmiques pour l'étude du mouvement et des interactions moléculaires, la flexibilité de macromolécules reste largement hors de portée des outils actuels de modélisation moléculaire. L'objectif de cette thèse est de développer une nouvelle approche basée sur des algorithmes de planification de mouvement issus de la robotique pour mieux traiter la flexibilité moléculaire dans l'étude des interactions protéiques. Nous avons étendu un algorithme récent d'exploration par échantillonnage aléatoire, ML-RRT pour le désassemblage d'objets articulés complexes. Cet algorithme repose sur la décomposition des paramètres de configuration en deux sous-ensembles actifs et passifs, qui sont traités de manière découplée. Les extensions proposées permettent de considérer plusieurs degrés de mobilité pour la partie passive, qui peut être poussée ou attirée par la partie active. Cet outil algorithmique a été appliqué avec succès pour l'étude des changements conformationnels de protéines induits lors de la diffusion d'un ligand. A partir de cette extension, nous avons développé une nouvelle méthode pour la résolution simultanée du séquençage et des mouvements de désassemblage entre plusieurs objets. La méthode, nommée Iterative-ML-RRT, calcule non seulement les trajectoires permettant d'extraire toutes les pièces d'un objet complexe assemblé, mais également l'ordre permettant le désassemblage. L'approche est générale et a été appliquée pour l'étude du processus de dissociation de complexes macromoléculaires en introduisant une fonction d'évaluation basée sur l'énergie d'interaction. Les résultats présentés dans cette thèse montrent non seulement l'efficacité mais aussi la généralité des algorithmes proposés. ABSTRACT : Understanding and predicting structure-function relationships in proteins with fully in silico approaches remain today a great challenge. Despite recent developments of computational methods for studying molecular motions and interactions, dealing with macromolecular flexibility largely remains out of reach of the existing molecular modeling tools. The aim of this thesis is to develop a novel approach based on motion planning algorithms originating from robotics to better deal with macromolecular flexibility in protein interaction studies. We have extended a recent sampling-based algorithm, ML-RRT, for (dis)-assembly path planning of complex articulated objects. This algorithm is based on a partition of the configuration parameters into active and passive subsets, which are then treated in a decoupled manner. The presented extensions permit to consider different levels of mobility for the passive parts that can be pushed or pulled by the motion of active parts. This algorithmic tool is successfully applied to study protein conformational changes induced by the diffusion of a ligand inside it. Building on the extension of ML-RRT, we have developed a novel method for simultaneously (dis)assembly sequencing and path planning. The new method, called Iterative-ML-RRT, computes not only the paths for extracting all the parts from a complex assembled object, but also the preferred order that the disassembly process has to follow. We have applied this general approach for studying disassembly pathways of macromolecular complexes considering a scoring function based on the interaction energy. The results described in this thesis prove not only the efficacy but also the generality of the proposed algorithm

    (Dis)assembly path planning for complex objects and applications to structural biology

    Get PDF
    Understanding and predicting structure-function relationships in proteins with fully in silico approaches remain today a great challenge. Despite recent developments of computational methods for studying molecular motions and interactions, dealing with macromolecular flexibility largely remains out of reach of the existing molecular modeling tools. The aim of this thesis is to develop a novel approach based on motion planning algorithms originating from robotics to better deal with macromolecular flexibility in protein interaction studies. We have extended a recent sampling-based algorithm, ML-RRT, for (dis)-assembly path planning of complex articulated objects. This algorithm is based on a partition of the configuration parameters into active and passive subsets, which are then treated in a decoupled manner. The presented extensions permit to consider different levels of mobility for the passive parts that can be pushed or pulled by the motion of active parts. This algorithmic tool is successfully applied to study protein conformational changes induced by the diffusion of a ligand inside it. Building on the extension of ML-RRT, we have developed a novel method for simultaneously (dis)assembly sequencing and path planning. The new method, called Iterative-ML-RRT, computes not only the paths for extracting all the parts from a complex assembled object, but also the preferred order that the disassembly process has to follow. We have applied this general approach for studying disassembly pathways of macromolecular complexes considering a scoring function based on the interaction energy. The results described in this thesis prove not only the efficacy but also the generality of the proposed algorithm
    • …
    corecore