3 research outputs found

    Identification of functionally related enzymes by learning-to-rank methods

    Full text link
    Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes

    Novel pharmacophore clustering methods for protein binding site comparison

    Full text link
    Proteins perform diverse functions within cells. Some of the functions depend on the protein being involved in a protein complex, interacting with other proteins or with other entities (ligands) through specific binding sites on their surface. Comparison of protein binding sites has potential benefits in many research fields, including drug promiscuity studies, polypharmacology and immunology. While multiple methods have been proposed for comparing binding sites, they tend to focus on comparing very similar proteins and have only been developed for small specific datasets or very targeted applications. None of these methods make use of the powerful representation afforded by 3D complex-based pharmacophores. A pharmacophore model provides a description of a binding site, consisting of a group of chemical features arranged in three-dimensional space, that can be used to represent biological activities. Two different pharmacophore comparison and clustering methods based on the Iterative Closest Point (ICP) algorithm are proposed: a 3-dimensional ICP pharmacophore clustering method, and an N-dimensional ICP pharmacophore clustering method. These methods are complemented by a series of data pre-processing methods for input data preparation. The implementation of the methods takes computational representations (pharmacophores) of single molecule or protein complexes as input and produces distance matrices that can be visualised as dendrograms. The methods integrate both alignment-dependent and alignment-independent concepts. Both clustering methods were successfully evaluated using a 31 globulin-binding steroid dataset and a 41 antibody-antigen dataset, and were able to handle a larger dataset of 159 protein homodimers. For the steroid dataset, the resulting classification of ligands shows good correspondence with a classification based on binding affinity. For the antibody-antigen dataset, the classification of antigens reflected both antigen type and binding antibody. The applications to homodimers demonstrated the ability of both clustering methods to handle a larger dataset, and the possibility to visualise N-D pairwise comparisons using structural superposition of binding sites
    corecore