947 research outputs found

    Cryo-EM map interpretation and protein model-building using iterative map segmentation.

    Get PDF
    A procedure for building protein chains into maps produced by single-particle electron cryo-microscopy (cryo-EM) is described. The procedure is similar to the way an experienced structural biologist might analyze a map, focusing first on secondary structure elements such as helices and sheets, then varying the contour level to identify connections between these elements. Since the high density in a map typically follows the main-chain of the protein, the main-chain connection between secondary structure elements can often be identified as the unbranched path between them with the highest minimum value along the path. This chain-tracing procedure is then combined with finding side-chain positions based on the presence of density extending away from the main path of the chain, allowing generation of a Cα model. The Cα model is converted to an all-atom model and is refined against the map. We show that this procedure is as effective as other existing methods for interpretation of cryo-EM maps and that it is considerably faster and produces models with fewer chain breaks than our previous methods that were based on approaches developed for crystallographic maps

    On deep generative modelling methods for protein-protein interaction

    Get PDF
    Proteins form the basis for almost all biological processes, identifying the interactions that proteins have with themselves, the environment, and each other are critical to understanding their biological function in an organism, and thus the impact of drugs designed to affect them. Consequently a significant body of research and development focuses on methods to analyse and predict protein structure and interactions. Due to the breadth of possible interactions and the complexity of structures, \textit{in sillico} methods are used to propose models of both interaction and structure that can then be verified experimentally. However the computational complexity of protein interaction means that full physical simulation of these processes requires exceptional computational resources and is often infeasible. Recent advances in deep generative modelling have shown promise in correctly capturing complex conditional distributions. These models derive their basic principles from statistical mechanics and thermodynamic modelling. While the learned functions of these methods are not guaranteed to be physically accurate, they result in a similar sampling process to that suggested by the thermodynamic principles of protein folding and interaction. However, limited research has been applied to extending these models to work over the space of 3D rotation, limiting their applicability to protein models. In this thesis we develop an accelerated sampling strategy for faster sampling of potential docking locations, we then address the rotational diffusion limitation by extending diffusion models to the space of SO(3)SO(3) and finally present a framework for the use of this rotational diffusion model to rigid docking of proteins

    A proposal for a coordinated effort for the determination of brainwide neuroanatomical connectivity in model organisms at a mesoscopic scale

    Get PDF
    In this era of complete genomes, our knowledge of neuroanatomical circuitry remains surprisingly sparse. Such knowledge is however critical both for basic and clinical research into brain function. Here we advocate for a concerted effort to fill this gap, through systematic, experimental mapping of neural circuits at a mesoscopic scale of resolution suitable for comprehensive, brain-wide coverage, using injections of tracers or viral vectors. We detail the scientific and medical rationale and briefly review existing knowledge and experimental techniques. We define a set of desiderata, including brain-wide coverage; validated and extensible experimental techniques suitable for standardization and automation; centralized, open access data repository; compatibility with existing resources, and tractability with current informatics technology. We discuss a hypothetical but tractable plan for mouse, additional efforts for the macaque, and technique development for human. We estimate that the mouse connectivity project could be completed within five years with a comparatively modest budget.Comment: 41 page

    A Graph-Based Algorithm to Determine Protein Structure from Cryo-EM Data

    Get PDF
    Cryo-electron microscopy: cryo-EM) provides 3D density maps of proteins, but these maps do not have sufficiently high resolution to directly yield atomic-scale models. Previous work has shown that features known as secondary structures can be located in these density maps. A second source of information about proteins is sequence analysis, which predicts locations of secondary structures along the protein sequence but does not provide any information about the 3D shape of the protein. This thesis presents a graph-based algorithm to find the correspondence between the secondary structures in the density map and sequence. This provides an ordering of secondary structures in the 3D density map, which can be used in building an atomic-scale model of the protein

    Efficient case-based reasoning through feature weighting, and its application in protein crystallography

    Get PDF
    Data preprocessing is critical for machine learning, data mining, and pattern recognition. In particular, selecting relevant and non-redundant features in highdimensional data is important to efficiently construct models that accurately describe the data. In this work, I present SLIDER, an algorithm that weights features to reflect relevance in determining similarity between instances. Accurate weighting of features improves the similarity measure, which is useful in learning algorithms like nearest neighbor and case-based reasoning. SLIDER performs a greedy search for optimum weights in an exponentially large space of weight vectors. Exhaustive search being intractable, the algorithm reduces the search space by focusing on pivotal weights at which representative instances are equidistant to truly similar and different instances in Euclidean space. SLIDER then evaluates those weights heuristically, based on effectiveness in properly ranking pre-determined matches of a set of cases, relative to mismatches. I analytically show that by choosing feature weights that minimize the mean rank of matches relative to mismatches, the separation between the distributions of Euclidean distances for matches and mismatches is increased. This leads to a better distance metric, and consequently increases the probability of retrieving true matches from a database. I also discuss how SLIDER is used to improve the efficiency and effectiveness of case retrieval in a case-based reasoning system that automatically interprets electron density maps to determine the three-dimensional structures of proteins. Electron density patterns for regions in a protein are represented by numerical features, which are used in a distance metric to efficiently retrieve matching patterns by searching a large database. These pre-selected cases are then evaluated by more expensive methods to identify truly good matches – this strategy speeds up the retrieval of matching density regions, thereby enabling fast and accurate protein model-building. This two-phase case retrieval approach is potentially useful in many case-based reasoning systems, especially those with computationally expensive case matching and large case libraries

    Determining Alpha-Helix Correspondence for Protein Structure Prediction from Cryo-EM Density Maps, Master\u27s Thesis, May 2007

    Get PDF
    Determining protein structure is an important problem for structural biologists, which has received a significant amount of attention in the recent years. In this thesis, we describe a novel, shape-modeling approach as an intermediate step towards recovering 3D protein structures from volumetric images. The input to our method is a sequence of alpha-helices that make up a protein, and a low-resolution volumetric image of the protein where possible locations of alpha-helices have been detected. Our task is to identify the correspondence between the two sets of helices, which will shed light on how the protein folds in space. The central theme of our approach is to cast the correspondence problem as that of shape matching between the 3D volume and the 1D sequence. We model both the shapes as attributed relational graphs, and formulate a constrained inexact graph matching problem. To compute the matching, we developed an optimal algorithm based on the A*-search with several choices of heuristic functions. As demonstrated in a suite of real protein data, the shape-modeling approach is capable of correctly identifying helix correspondences in noise-abundant volumes with minimal or no user intervention

    Machine Learning Methods for Medical and Biological Image Computing

    Get PDF
    Medical and biological imaging technologies provide valuable visualization information of structure and function for an organ from the level of individual molecules to the whole object. Brain is the most complex organ in body, and it increasingly attracts intense research attentions with the rapid development of medical and bio-logical imaging technologies. A massive amount of high-dimensional brain imaging data being generated makes the design of computational methods for efficient analysis on those images highly demanded. The current study of computational methods using hand-crafted features does not scale with the increasing number of brain images, hindering the pace of scientific discoveries in neuroscience. In this thesis, I propose computational methods using high-level features for automated analysis of brain images at different levels. At the brain function level, I develop a deep learning based framework for completing and integrating multi-modality neuroimaging data, which increases the diagnosis accuracy for Alzheimer’s disease. At the cellular level, I propose to use three dimensional convolutional neural networks (CNNs) for segmenting the volumetric neuronal images, which improves the performance of digital reconstruction of neuron structures. I design a novel CNN architecture such that the model training and testing image prediction can be implemented in an end-to-end manner. At the molecular level, I build a voxel CNN classifier to capture discriminative features of the input along three spatial dimensions, which facilitate the identification of secondary structures of proteins from electron microscopy im-ages. In order to classify genes specifically expressed in different brain cell-type, I propose to use invariant image feature descriptors to capture local gene expression information from cellular-resolution in situ hybridization images. I build image-level representations by applying regularized learning and vector quantization on generated image descriptors. The developed computational methods in this dissertation are evaluated using images from medical and biological experiments in comparison with baseline methods. Experimental results demonstrate that the developed representations, formulations, and algorithms are effective and efficient in learning from brain imaging data

    De Novo Protein Structure Modeling from Cryoem Data Through a Dynamic Programming Algorithm in the Secondary Structure Topology Graph

    Get PDF
    Proteins are the molecules carry out the vital functions and make more than the half of dry weight in every cell. Protein in nature folds into a unique and energetically favorable 3-Dimensional (3-D) structure which is critical and unique to its biological function. In contrast to other methods for protein structure determination, Electron Cryorricroscopy (CryoEM) is able to produce volumetric maps of proteins that are poorly soluble, large and hard to crystallize. Furthermore, it studies the proteins in their native environment. Unfortunately, the volumetric maps generated by current advances in CryoEM technique produces protein maps at medium resolution about (~5 to 10Å) in which it is hard to determine the atomic-structure of the protein. However, the resolution of the volumetric maps is improving steadily, and recent works could obtain atomic models at higher resolutions (~3Å). De novo protein modeling is the process of building the structure of the protein using its CryoEM volumetric map. Thereupon, the volumetric maps at medium resolution generated by CryoEM technique proposed a new challenge. At the medium resolution, the location and orientation of secondary structure elements (SSE) can be visually and computationally identified. However, the order and direction (called protein topology) of the SSEs detected from the CryoEM volumetric map are not visible. In order to determine the protein structure, the topology of the SSEs has to be figured out and then the backbone can be built. Consequently, the topology problem has become a bottle neck for protein modeling using CryoEM In this dissertation, we focus to establish an effective computational framework to derive the atomic structure of a protein from the medium resolution CryoEM volumetric maps. This framework includes a topology graph component to rank effectively the topologies of the SSEs and a model building component. In order to generate the small subset of candidate topologies, the problem is translated into a layered graph representation. We developed a dynamic programming algorithm (TopoDP) for the new representation to overcome the problem of large search space. Our approach shows the improved accuracy, speed and memory use when compared with existing methods. However, the generating of such set was infeasible using a brute force method. Therefore, the topology graph component effectively reduces the topological space using the geometrical features of the secondary structures through a constrained K-shortest paths method in our layered graph. The model building component involves the bending of a helix and the loop construction using skeleton of the volumetric map. The forward-backward CCD is applied to bend the helices and model the loops

    Integration of Mass Spectrometry Data for Structural Biology

    Get PDF
    Mass spectrometry (MS) is increasingly being used to probe the structure and dynamics of proteins and the complexes they form with other macromolecules. There are now several specialized MS methods, each with unique sample preparation, data acquisition, and data processing protocols. Collectively, these methods are referred to as structural MS and include cross-linking, hydrogen-deuterium exchange, hydroxyl radical footprinting, native, ion mobility, and top-down MS. Each of these provides a unique type of structural information, ranging from composition and stoichiometry through to residue level proximity and solvent accessibility. Structural MS has proved particularly beneficial in studying protein classes for which analysis by classic structural biology techniques proves challenging such as glycosylated or intrinsically disordered proteins. To capture the structural details for a particular system, especially larger multiprotein complexes, more than one structural MS method with other structural and biophysical techniques is often required. Key to integrating these diverse data are computational strategies and software solutions to facilitate this process. We provide a background to the structural MS methods and briefly summarize other structural methods and how these are combined with MS. We then describe current state of the art approaches for the integration of structural MS data for structural biology. We quantify how often these methods are used together and provide examples where such combinations have been fruitful. To illustrate the power of integrative approaches, we discuss progress in solving the structures of the proteasome and the nuclear pore complex. We also discuss how information from structural MS, particularly pertaining to protein dynamics, is not currently utilized in integrative workflows and how such information can provide a more accurate picture of the systems studied. We conclude by discussing new developments in the MS and computational fields that will further enable in-cell structural studies
    corecore