4 research outputs found

    MS3ALIGN: an efficient molecular surface aligner using the topology of surface curvature

    Get PDF
    Background: Aligning similar molecular structures is an important step in the process of bio-molecular structure and function analysis. Molecular surfaces are simple representations of molecular structure that are easily constructed from various forms of molecular data such as 3D atomic coordinates (PDB) and Electron Microscopy (EM) data. Methods: We present a Multi-Scale Morse-Smale Molecular-Surface Alignment tool, MS3ALIGN, which aligns molecular surfaces based on significant protrusions on the molecular surface. The input is a pair of molecular surfaces represented as triangle meshes. A key advantage of MS3ALIGN is computational efficiency that is achieved because it processes only a few carefully chosen protrusions on the molecular surface. Furthermore, the alignments are partial in nature and therefore allows for inexact surfaces to be aligned. Results: The method is evaluated in four settings. First, we establish performance using known alignments with varying overlap and noise values. Second, we compare the method with SurfComp, an existing surface alignment method. We show that we are able to determine alignments reported by SurfComp, as well as report relevant alignments not found by SurfComp. Third, we validate the ability of MS3ALIGN to determine alignments in the case of structurally dissimilar binding sites. Fourth, we demonstrate the ability of MS3ALIGN to align iso-surfaces derived from cryo-electron microscopy scans. Conclusions: We have presented an algorithm that aligns Molecular Surfaces based on the topology of surface curvature

    Efficient search and comparison algorithms for 3D protein binding site retrieval and structure alignment from large-scale databases

    Get PDF
    Finding similar 3D structures is crucial for discovering potential structural, evolutionary, and functional relationships among proteins. As the number of known protein structures has dramatically increased, traditional methods can no longer provide the life science community with the adequate informatics capability needed to conduct large-scale and complex analyses. A suite of high-throughput and accurate protein structure search and comparison methods is essential. To meet the needs of the community, we develop several bioinformatics methods for protein binding site comparison and global structure alignment. First, we developed an efficient protein binding site search that is based on extracting geometric features both locally and globally. The main idea of this work was to capture spatial relationships among landmarks of binding site surfaces and bfuild a vocabulary of visual words to represent the characteristics of the surfaces. A vector model was then used to speed up the search of similar surfaces that share similar visual words with the query interface. Second, we developed an approach for accurate protein binding site comparison. Our algorithm provides an accurate binding site alignment by applying a two-level heuristic process which progressively refines alignment results from coarse surface point level to accurate residue atom level. This setting allowed us to explore different combinations of pairs of corresponding residues, thus improving the alignment quality of the binding site surfaces. Finally, we introduced a parallel algorithm for global protein structure alignment. Specifically, to speed up the time-consuming structure alignment process of protein 3D structures, we designed a parallel protein structure alignment framework to exploit the parallelism of Graphics Processing Units (GPUs). As a general-purpose GPU platform, the framework is capable of parallelizing traditional structure alignment algorithms. Our findings can be applied in various research areas, such as prediction of protein inte

    PROTEIN SURFACE SIMILARITIES EVALUATION FOR FUNCTIONAL ANNOTATION STUDIES

    Get PDF
    One of the main targets of bioinformatics is to assign functions to proteins whose function is unknown relying on homologies identifications with proteins with known functions. Several approaches are currently available: the best choice depends on the evolutionary distance that separates the protein of interest from its homologous. Recently attention has been focused on molecular surfaces since they do not depend on the three-dimensional structure and allow similarities to be identified which other methods can\u2019t identify. Furthermore, molecular surfaces are the interface of interaction between molecules, and their geometrical and physical descriptions will lead to the comprehension of the molecular recognition process, since the geometrical component has a fundamental role in the early stage of complex formation. This particular aspect would have a major impact in the field of drug design and in the understanding of the side effects due to interactions between proteins. During this thesis a protocol for similarities identification on molecular surfaces has been developed and optimized. In this process, molecular surfaces are calculated according to Lee Richard\u2019s model, and then are represented through triangular meshes. Successively surfaces are transformed into a set of object oriented images using a computer vision approach. This type of representation has the advantage of being independent from the position of the objects represented, and thus similar surfaces can be described by similar images. The search for similarities is then performed by indentifying correspondences between pairs of similar images, by filtering matches relying on geometrical criteria and then by clustering correspondences in high similarity groups. These groups are then used to align surfaces in order to evaluate results both by visual inspection and through appropriate indexes. This process can be applied in the field of functional annotation, through the identification of similarities between surfaces of homologous proteins, and in study of interaction between proteins, through the identification of complementary areas between interacting proteins. The whole process of similarities detection depends on the configuration of 15 parameters that balance the time needed to perform calculation with the quality of results found. The problem of parameters estimation has been addressed using an implementation of genetic algorithm, which allowed representing different configuration parameters as a population in which individuals that are able to align surfaces satisfactory are rewarded with an high fitness score. The effectiveness of the algorithm was then improved by the introduction of neighbor heuristic which reduced the computational time required for correspondence clustering on surfaces. Particular interest was placed in results displaying and in the construction of indices that can quantify the quality of results. Regarding the visualization problem, a display system was implemented based on the Visualization ToolKit libraries in order to represent surfaces aligned as objects in three-dimensional space, enabling the user to interact with the scene represented by changing the point of view or enlarging details of the scene represented. Regarding the definition of useful indexes for results evaluation, two indexes had a fundamental role. The first one, called overlap index, measures the percentage of vertices of two surfaces that are closer than 1 A\ub0 after the alignment. This index in particular is useful for evaluating the surface similarity since similar aligned surfaces will have a large number of vertices closer than this distance. The second index, called RMSD, is important because it evaluates the Root Mean Square Deviation of alpha carbons of two aligned proteins in the case of a complementary search. This index allows evaluating how the aligned protein is distant from the correct position in the crystal complex. Concerning results evaluation, we have noticed that the consideration of electrostatic potential allows assigning good scores in case of strong geometrical similarity in context of functional annotations, thus facilitating the identification of homologous surfaces. This method has been validated both in the search of similarities and in the search of complementarities. Regarding the search of similarities, we tried to analyze a sample of 13 known proteins with a prosite domain in order to identify the presence of such domains on molecular surfaces. For doing this, we first reduced the number of structures present in the Protein Data Bank to a group of representative structures. Then we calculated the molecular surfaces for each representative protein and we created a dataset of patches corresponding to the prosite functional domain. The test was then performed trying to align the surface of the 13 known proteins to the patches dataset of functional domains. The results showed that in most cases we are able to properly align a functional domain to a protein surface with the same functional domain, and that these evidence was easily identifiable both by the parameters used for results evaluations, both by visually inspecting the results of the alignments. The method was then tested for complementary research, trying to reconstruct the protein-protein complex present in a well known dataset used to validate docking methods. In the case of searching for similarities it is important to describe surfaces in details in order to increase the accuracy, but high precision when searching for complementarity is counterproductive, since the interaction between proteins is not only determined by geometrical features but also involves the formation of favorable electrostatic interactions and rearrangements of side chains. Thus molecular surfaces were calculated using smoothed surfaces, where most details are lost but allowing to detect more easily interacting surfaces. Results showed that the algorithm is able to align complexes with comparable scores than the programs currently available; Considering this experimental design and that the method does not take into account the electrostatic potential, we can assume that the results obtained are particularly interesting since the proposed method provides a wider set of conformations than other algorithms, upon which we can extend the analysis in order to identify a better prediction. In conclusions the proposed system is able to identify similarities on molecular surfaces through the analysis of images of local description. The results show that the system implemented is effective in identifying similar surface areas in the context of functional annotation. In regards to the search for complementarities, the algorithm seems to have an interesting perspective, even though the best complex proposed is not always biologically correct. From this point of view, we have to do more analysis in order to improve the methods in protein interaction studies
    corecore