55 research outputs found

    De Novo Protein Structure Modeling from Cryoem Data Through a Dynamic Programming Algorithm in the Secondary Structure Topology Graph

    Get PDF
    Proteins are the molecules carry out the vital functions and make more than the half of dry weight in every cell. Protein in nature folds into a unique and energetically favorable 3-Dimensional (3-D) structure which is critical and unique to its biological function. In contrast to other methods for protein structure determination, Electron Cryorricroscopy (CryoEM) is able to produce volumetric maps of proteins that are poorly soluble, large and hard to crystallize. Furthermore, it studies the proteins in their native environment. Unfortunately, the volumetric maps generated by current advances in CryoEM technique produces protein maps at medium resolution about (~5 to 10Ă…) in which it is hard to determine the atomic-structure of the protein. However, the resolution of the volumetric maps is improving steadily, and recent works could obtain atomic models at higher resolutions (~3Ă…). De novo protein modeling is the process of building the structure of the protein using its CryoEM volumetric map. Thereupon, the volumetric maps at medium resolution generated by CryoEM technique proposed a new challenge. At the medium resolution, the location and orientation of secondary structure elements (SSE) can be visually and computationally identified. However, the order and direction (called protein topology) of the SSEs detected from the CryoEM volumetric map are not visible. In order to determine the protein structure, the topology of the SSEs has to be figured out and then the backbone can be built. Consequently, the topology problem has become a bottle neck for protein modeling using CryoEM In this dissertation, we focus to establish an effective computational framework to derive the atomic structure of a protein from the medium resolution CryoEM volumetric maps. This framework includes a topology graph component to rank effectively the topologies of the SSEs and a model building component. In order to generate the small subset of candidate topologies, the problem is translated into a layered graph representation. We developed a dynamic programming algorithm (TopoDP) for the new representation to overcome the problem of large search space. Our approach shows the improved accuracy, speed and memory use when compared with existing methods. However, the generating of such set was infeasible using a brute force method. Therefore, the topology graph component effectively reduces the topological space using the geometrical features of the secondary structures through a constrained K-shortest paths method in our layered graph. The model building component involves the bending of a helix and the loop construction using skeleton of the volumetric map. The forward-backward CCD is applied to bend the helices and model the loops

    Intensity-Based Skeletonization of CryoEM Gray-Scale Images Using a True Segmentation-Free Algorithm

    Get PDF
    Cryo-electron microscopy is an experimental technique that is able to produce 3D gray-scale images of protein molecules. In contrast to other experimental techniques, cryo-electron microscopy is capable of visualizing large molecular complexes such as viruses and ribosomes. At medium resolution, the positions of the atoms are not visible and the process cannot proceed. The medium-resolution images produced by cryo-electron microscopy are used to derive the atomic structure of the proteins in de novo modeling. The skeletons of the 3D gray-scale images are used to interpret important information that is helpful in de novo modeling. Unfortunately, not all features of the image can be captured using a single segmentation. In this paper, we present a segmentation-free approach to extract the gray-scale curve-like skeletons. The approach relies on a novel representation of the 3D image, where the image is modeled as a graph and a set of volume trees. A test containing 36 synthesized maps and one authentic map shows that our approach can improve the performance of the two tested tools used in de novo modeling. The improvements were 62 and 13 percent for Gorgon and DP-TOSS, respectively

    An Effective Computational Method Incorporating Multiple Secondary Structure Predictions in Topology Determination for Cryo-EM Images

    Get PDF
    A key idea in de novo modeling of a medium-resolution density image obtained from cryo-electron microscopy is to compute the optimal mapping between the secondary structure traces observed in the density image and those predicted on the protein sequence. When secondary structures are not determined precisely, either from the image or from the amino acid sequence of the protein, the computational problem becomes more complex. We present an efficient method that addresses the secondary structure placement problem in presence of multiple secondary structure predictions and computes the optimal mapping. We tested the method using 12 simulated images from alpha-proteins and two Cryo-EM images of α-β proteins. We observed that the rank of the true topologies is consistently improved by using multiple secondary structure predictions instead of a single prediction. The results show that the algorithm is robust and works well even when errors/ misses in the predicted secondary structures are present in the image or the sequence. The results also show that the algorithm is efficient and is able to handle proteins with as many as 33 helices

    Proceedings, MSVSCC 2013

    Get PDF
    Proceedings of the 7th Annual Modeling, Simulation & Visualization Student Capstone Conference held on April 11, 2013 at VMASC in Suffolk, Virginia

    In-silico Investigation of Ion-Pumping Rotary A- and V-type ATPases: Structural and Dynamical Aspects

    Get PDF
    Advances in Molecular Biosciences have revolutionised the way we perceive and pursue current biological research. Dynamic, complex biomacromolecules constitute the essential components of Cells. Particularly proteins have been characterised as the workhorse molecules of life. Either as single chains or complexes of associated units, proteins participate in every biological process with a specific structural and/or functional role. Ion-pumping rotary ATPases is a large family of important membrane-bound protein nanomachines. In the current work we investigate structural and dynamical aspects of the A- and V-type rotary ATPases, related to functional dynamics, and propose a multiscale computational framework for their in-silico biophysical characterisation and the interpretation of low-resolution experimental data from electron microscopy in Chapter 3. For the first time we present results from explicit-solvent atomistic molecular dynamics simulations of the prokaryotic A-type peripheral stator stalk and central rotor axle, both being critical subunits involved in the mechanical coupling of the rotary ATPases in Chapter 4. Our simulation data reveal the presence of flexibility heterogeneity and demonstrate the dynamic nature of the peripheral stator stalk as a source of intact ATPase particle conformational variability. In Chapter 5 we show the presence of structural plasticity in the eukaryotic peripheral stator stalk of the V-ATPase and discuss possible implications for V-ATPase regulation. Overall, the wealth of information accessed with molecular-dynamics simulations allows the exploitation of atomistic information within the multiscale framework of Chapter 3 to be applied for the mechanical characterisation of rotary ATPases in future studies. In particular, atomistic data could serve as high-resolution information for future parameterisation of simplified coarse-grain models for all ATPase subunits and the construction of molecular models for the intact ATPases. We anticipate that our approach will contribute to elucidating the molecular origin of rotary ATPases’ conformational flexibility and its implications for the holoenzyme’s function and kinetic efficiency

    Structural and functional studies on the eukaryotic chaperonin TRiC/CCT and its cooperating chaperone Hgh1

    Get PDF

    Network Models for Materials and Biological Systems

    Get PDF
    abstract: The properties of materials depend heavily on the spatial distribution and connectivity of their constituent parts. This applies equally to materials such as diamond and glasses as it does to biomolecules that are the product of billions of years of evolution. In science, insight is often gained through simple models with characteristics that are the result of the few features that have purposely been retained. Common to all research within in this thesis is the use of network-based models to describe the properties of materials. This work begins with the description of a technique for decoupling boundary effects from intrinsic properties of nanomaterials that maps the atomic distribution of nanomaterials of diverse shape and size but common atomic geometry onto a universal curve. This is followed by an investigation of correlated density fluctuations in the large length scale limit in amorphous materials through the analysis of large continuous random network models. The difficulty of estimating this limit from finite models is overcome by the development of a technique that uses the variance in the number of atoms in finite subregions to perform the extrapolation to large length scales. The technique is applied to models of amorphous silicon and vitreous silica and compared with results from recent experiments. The latter part this work applies network-based models to biological systems. The first application models force-induced protein unfolding as crack propagation on a constraint network consisting of interactions such as hydrogen bonds that cross-link and stabilize a folded polypeptide chain. Unfolding pathways generated by the model are compared with molecular dynamics simulation and experiment for a diverse set of proteins, demonstrating that the model is able to capture not only native state behavior but also partially unfolded intermediates far from the native state. This study concludes with the extension of the latter model in the development of an efficient algorithm for predicting protein structure through the flexible fitting of atomic models to low-resolution cryo-electron microscopy data. By optimizing the fit to synthetic data through directed sampling and context-dependent constraint removal, predictions are made with accuracies within the expected variability of the native state.Dissertation/ThesisPh.D. Physics 201

    Efficient Algorithms for Prokaryotic Whole Genome Assembly and Finishing

    Get PDF
    De-novo genome assembly from DNA fragments is primarily based on sequence overlap information. In addition, mate-pair reads or paired-end reads provide linking information for joining gaps and bridging repeat regions. Genome assemblers in general assemble long contiguous sequences (contigs) using both overlapping reads and linked reads until the assembly runs into an ambiguous repeat region. These contigs are further bridged into scaffolds using linked read information. However, errors can be made in both phases of assembly due to high error threshold of overlap acceptance and linking based on too few mate reads. Identical as well as similar repeat regions can often cause errors in overlap and mate-pair evidence. In addition, the problem of setting the correct threshold to minimize errors and optimize assembly of reads is not trivial and often requires a time-consuming trial and error process to obtain optimal results. The typical trial-and-error with multiple assembler, which can be computationally intensive, and is very inefficient, especially when users must learn how to use a wide variety of assemblers, many of which may be serial requiring long execution time and will not return usable or accurate results. Further, we show that the comparison of assembly results may not provide the users with a clear winner under all circumstances. Therefore, we propose a novel scaffolding tool, Correlative Algorithm for Repeat Placement (CARP), capable of joining short low error contigs using mate pair reads, computationally resolved repeat structures and synteny with one or more reference organisms. The CARP tool requires a set of repeat sequences such as insertion sequences (IS) that can be found computationally found without assembling the genome. Development of methods to identify such repeating regions directly from raw sequence reads or draft genomes led to the development of the ISQuest software package. ISQuest identifies bacterial ISs and their sequence elements—inverted and direct repeats—in raw read data or contigs using flexible search parameters. ISQuest is capable of finding ISs in hundreds of partially assembled genomes within hours; making it a valuable high-throughput tool for a global search of IS and repeat elements. The CARP tool matches very low error contigs with strong overlap using the ambiguous partial repeat sequence at the ends of the contig annotated using the repeat sequences discovered using ISQuest. These matches are verified by synteny with genomes of one or more reference organisms. We show that the CARP tool can be used to verify low mate pair evidence regions, independently find new joins and significantly reduce the number of scaffolds. Finally, we are demonstrate a novel viewer that presents to the user the computationally derived joins along with the evidence used to make the joins. The viewer allows the user to independently assess their confidence in the joins made by the finishing tools and make an informed decision of whether to invest the resources necessary to confirm a particular portion of the assembly. Further, we allow users to manually record join evidence, re-order contigs, and track the assembly finishing process

    The structural role of SARS-CoV-2 genetic background in the emergence and success of spike mutations: The case of the spike A222V mutation

    Get PDF
    The S:A222V point mutation, within the G clade, was characteristic of the 20E (EU1) SARS-CoV-2 variant identified in Spain in early summer 2020. This mutation has since reappeared in the Delta subvariant AY.4.2, raising questions about its specific effect on viral infection. We report combined serological, functional, structural and computational studies characterizing the impact of this mutation. Our results reveal that S:A222V promotes an increased RBD opening and slightly increases ACE2 binding as compared to the parent S:D614G clade. Finally, S:A222V does not reduce sera neutralization capacity, suggesting it does not affect vaccine effectiveness

    Combining computer simulations and deep learning to understand and predict protein structural dynamics

    Get PDF
    Molecular dynamics simulations provide a means to characterize the ensemble of structures that a protein adopts in solution. These structural ensembles provide crucial information about how proteins function, and these ensembles also reveal potential drug binding sites that are not observable from static protein structures (i.e. cryptic pockets). However, analyzing these high- dimensional datasets to understand protein function remains challenging. Additionally, finding cryptic pockets using simulation data is slow and expensive, which makes the appeal of computationally screening for cryptic pockets limited to a narrow set of circumstances. In this thesis, I develop deep learning based methods to overcome these challenges. First, I develop a deep learning algorithm, called DiffNets, to deal with the high-dimensionality of structural ensembles. DiffNets takes structural ensembles from similar systems with different biochemical properties and learns to highlight structural features that distinguish the systems, ultimately connecting structural signatures to their associated biochemical properties. Using DiffNets, I provide structural insights that explain how naturally occurring genetic variants of the oxytocin receptor alter signaling. Additionally, DiffNets help reveal how a SARS-CoV-2 protein involved in immune evasion becomes activated. Next, I use MD simulations to hunt for cryptic pockets across the SARS-CoV-2 proteome, which led to the discovery of more than 50 new potential druggable sites. Because this effort required an extraordinary amount of resources, I developed a deep learning approach to predict sites of cryptic pockets from single protein structures. This approach reduces the time to identify if a protein has a cryptic pocket by ~10,000-fold compared to the next best method
    • …
    corecore