79,553 research outputs found

    LabelHash: A Flexible and Extensible Method for Matching Structural Motifs

    Get PDF
    There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Below, we will briefly describe LabelHash, a new method for partial structure comparison. In partial structure comparison, the goal is to find the best geometric and chemical similarity between a set of 3D points called a _motif_ and a subset of a set of 3D points called the _target_. Both the motif and targets are represented as sets of labeled 3D points. A motif is ideally composed of the functionally most-relevant residues in a binding site. The labels denote the type of residue. Motif points can have multiple labels to denote that substitutions are allowed. Any subset of the target that has labels that are compatible with the motif’s labels is called a _match_. The aim is to find statistically significant matches to a structural motif. Our method preprocesses a background database of targets such as a non-redundant subset of the Protein Data Bank in such a way that we can look up in constant time partial matches to a motif. Using a variant of the previously described match augmentation algorithm (1), we obtain complete matches to our motif. The nonparametric statistical model developed by (2,3) corrects for any bias introduced by our algorithm. This bias is introduced by excluding matches that do not satisfy certain geometric constraints for efficiency reasons

    Structure alignment based on coding of local geometric measures

    Get PDF
    BACKGROUND: A structure alignment method based on a local geometric property is presented and its performance is tested in pairwise and multiple structure alignments. In this approach, the writhing number, a quantity originating from integral formulas of Vassiliev knot invariants, is used as a local geometric measure. This measure is used in a sliding window to calculate the local writhe down the length of the protein chain. By encoding the distribution of writhing numbers across all the structures in the protein databank (PDB), protein geometries are represented in a 20-letter alphabet. This encoding transforms the structure alignment problem into a sequence alignment problem and allows the well-established algorithms of sequence alignment to be employed. Such geometric alignments offer distinct advantages over structural alignments in Cartesian coordinates as it better handles structural subtleties associated with slight twists and bends that distort one structure relative to another. RESULTS: The performance of programs for pairwise local alignment (TLOCAL) and multiple alignment (TCLUSTALW) are readily adapted from existing code for Smith-Waterman pairwise alignment and for multiple sequence alignment using CLUSTALW. The alignment algorithms employed a blocked scoring matrix (TBLOSUM) generated using the frequency of changes in the geometric alphabet of a block of protein structures. TLOCAL was tested on a set of 10 difficult proteins and found to give high quality alignments that compare favorably to those generated by existing pairwise alignment programs. A set of protein comparison involving hinged structures was also analyzed and TLOCAL was seen to compare favorably to other alignment methods. TCLUSTALW was tested on a family of protein kinases and reveal conserved regions similar to those previously identified by a hand alignment. CONCLUSION: These results show that the encoding of the writhing number as a geometric measure allow high quality structure alignments to be generated using standard algorithms of sequence alignment. This approach provides computationally efficient algorithms that allow fast database searching and multiple structure alignment. Because the geometric measure can employ different window sizes, the method allows the exploration of alignments on different, well-defined length scales

    Efficient search and comparison algorithms for 3D protein binding site retrieval and structure alignment from large-scale databases

    Get PDF
    Finding similar 3D structures is crucial for discovering potential structural, evolutionary, and functional relationships among proteins. As the number of known protein structures has dramatically increased, traditional methods can no longer provide the life science community with the adequate informatics capability needed to conduct large-scale and complex analyses. A suite of high-throughput and accurate protein structure search and comparison methods is essential. To meet the needs of the community, we develop several bioinformatics methods for protein binding site comparison and global structure alignment. First, we developed an efficient protein binding site search that is based on extracting geometric features both locally and globally. The main idea of this work was to capture spatial relationships among landmarks of binding site surfaces and bfuild a vocabulary of visual words to represent the characteristics of the surfaces. A vector model was then used to speed up the search of similar surfaces that share similar visual words with the query interface. Second, we developed an approach for accurate protein binding site comparison. Our algorithm provides an accurate binding site alignment by applying a two-level heuristic process which progressively refines alignment results from coarse surface point level to accurate residue atom level. This setting allowed us to explore different combinations of pairs of corresponding residues, thus improving the alignment quality of the binding site surfaces. Finally, we introduced a parallel algorithm for global protein structure alignment. Specifically, to speed up the time-consuming structure alignment process of protein 3D structures, we designed a parallel protein structure alignment framework to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, the framework is capable of parallelizing traditional structure alignment algorithms. Our findings can be applied in various research areas, such as prediction of protein inte

    Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within.</p> <p>Results</p> <p>Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions.</p> <p>Conclusions</p> <p>Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.</p

    pClay: A Precise Parallel Algorithm for Comparing Molecular Surfaces

    Get PDF
    Comparing binding sites as geometric solids can reveal conserved features of protein structure that bind similar molecular fragments and varying features that select different partners. Due to the subtlety of these features, algorithmic efficiency and geometric precision are essential for comparison accuracy. For these reasons, this paper presents pClay, the first structure comparison algorithm to employ fine-grained parallelism to enhance both throughput and efficiency. We evaluated the parallel performance of pClay on both multicore workstation CPUs and a 61-core Xeon Phi, observing scaleable speedup in many thread configurations. Parallelism unlocked levels of precision that were not practical with existing methods. This precision has important applications, which we demonstrate: A statistical model of steric variations in binding cavities, trained with data at the level of precision typical of existing work, can overlook 46% of authentic steric influences on specificity (p <= .02). The same model, trained with more precise data from pClay, overlooked 0% using the same standard of statistical significance. These results demonstrate how enhanced efficiency and precision can advance the detection of binding mechanisms that influence specificity

    Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity

    Get PDF
    The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level. Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism. From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable. In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems. Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis

    Protein-protein docking using region-based 3D Zernike descriptors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions are a pivotal component of many biological processes and mediate a variety of functions. Knowing the tertiary structure of a protein complex is therefore essential for understanding the interaction mechanism. However, experimental techniques to solve the structure of the complex are often found to be difficult. To this end, computational protein-protein docking approaches can provide a useful alternative to address this issue. Prediction of docking conformations relies on methods that effectively capture shape features of the participating proteins while giving due consideration to conformational changes that may occur.</p> <p>Results</p> <p>We present a novel protein docking algorithm based on the use of 3D Zernike descriptors as regional features of molecular shape. The key motivation of using these descriptors is their invariance to transformation, in addition to a compact representation of local surface shape characteristics. Docking decoys are generated using geometric hashing, which are then ranked by a scoring function that incorporates a buried surface area and a novel geometric complementarity term based on normals associated with the 3D Zernike shape description. Our docking algorithm was tested on both bound and unbound cases in the ZDOCK benchmark 2.0 dataset. In 74% of the bound docking predictions, our method was able to find a near-native solution (interface C-<it>α</it>RMSD ≤ 2.5 Å) within the top 1000 ranks. For unbound docking, among the 60 complexes for which our algorithm returned at least one hit, 60% of the cases were ranked within the top 2000. Comparison with existing shape-based docking algorithms shows that our method has a better performance than the others in unbound docking while remaining competitive for bound docking cases.</p> <p>Conclusion</p> <p>We show for the first time that the 3D Zernike descriptors are adept in capturing shape complementarity at the protein-protein interface and useful for protein docking prediction. Rigorous benchmark studies show that our docking approach has a superior performance compared to existing methods.</p

    TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

    Full text link
    Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes
    corecore