3,235 research outputs found

    Classification of GPCRs using family specific motifs

    Get PDF
    The classification of G-Protein Coupled Receptor (GPCR) sequences is an important problem that arises from the need to close the gap between the large number of orphan receptors and the relatively small number of annotated receptors. Equally important is the characterization of GPCR Class A subfamilies and gaining insight into the ligand interaction since GPCR Class A encompasses a very large number of drug-targeted receptors. In this thesis, a method for Class A subfamily classification using sequence-derived motifs which characterizes the subfamilies by discovering receptor-ligand interaction sites is proposed. The motifs that best characterize a subfamily are selected by the proposed Distinguishing Power Evaluation (DPE) technique. The experiments performed on GPCR sequence databases show that the proposed method outperforms state-of-the-art classification techniques for GPCR Class A subfamily prediction. An important contribution of this thesis is to discover key receptor-ligand interaction sites which is very important for drug design

    Prediction and classification for GPCR sequences based on ligand specific features

    Get PDF
    Functional identification of G-Protein Coupled Receptors (GPCRs) is one of the current focus areas of pharmaceutical research. Although thousands of GPCR sequences are known, many of them are orphan sequences (the activating ligand is unknown). Therefore, classification methods for automated characterization of orphan GPCRs are imperative. In this study, for predicting Level 1 subfamilies of GPCRs, a novel method for obtaining class specific features, based on the existence of activating ligand specific patterns, has been developed and utilized for a majority voting classification. Exploiting the fact that there is a non-promiscuous relationship between the specific binding of GPCRs into their ligands and their functional classification, our method classifies Level 1 subfamilies of GPCRs with a high predictive accuracy between 99% and 87% in a three-fold cross validation test. The method also tells us which motifs are significant for class determination which has important design implications. The presented machine learning approach, bridges the gulf between the excess amount of GPCR sequence data and their poor functional characterization

    On the hierarchical classification of G Protein-Coupled Receptors

    Get PDF
    Motivation: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. Results: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases

    Structure and functional motifs of GCR1, the only plant protein with a GPCR fold?

    Get PDF
    Whether GPCRs exist in plants is a fundamental biological question. Interest in deorphanizing new G protein coupled receptors (GPCRs), arises because of their importance in signaling. Within plants, this is controversial as genome analysis has identified 56 putative GPCRs, including GCR1 which is reportedly a remote homologue to class A, B and E GPCRs. Of these, GCR2, is not a GPCR; more recently it has been proposed that none are, not even GCR1. We have addressed this disparity between genome analysis and biological evidence through a structural bioinformatics study, involving fold recognition methods, from which only GCR1 emerges as a strong candidate. To further probe GCR1, we have developed a novel helix alignment method, which has been benchmarked against the the class A – class B - class F GPCR alignments. In addition, we have presented a mutually consistent set of alignments of GCR1 homologues to class A, class B and class F GPCRs, and shown that GCR1 is closer to class A and /or class B GPCRs than class A, class B or class F GPCRs are to each other. To further probe GCR1, we have aligned transmembrane helix 3 of GCR1 to each of the 6 GPCR classes. Variability comparisons provide additional evidence that GCR1 homologues have the GPCR fold. From the alignments and a GCR1 comparative model we have identified motifs that are common to GCR1, class A, B and E GPCRs. We discuss the possibilities that emerge from this controversial evidence that GCR1 has a GPCR fol

    GPCRTree: online hierarchical classification of GPCR function

    Get PDF
    Background: G protein-coupled receptors (GPCRs) play important physiological roles transducing extracellular signals into intracellular responses. Approximately 50% of all marketed drugs target a GPCR. There remains considerable interest in effectively predicting the function of a GPCR from its primary sequence. Findings: Using techniques drawn from data mining and proteochemometrics, an alignment-free approach to GPCR classification has been devised. It uses a simple representation of a protein's physical properties. GPCRTree, a publicly-available internet server, implements an algorithm that classifies GPCRs at the class, sub-family and sub-subfamily level. Conclusion: A selective top-down classifier was developed which assigns sequences within a GPCR hierarchy. Compared to other publicly available GPCR prediction servers, GPCRTree is considerably more accurate at every level of classification. The server has been available online since March 2008 at URL: http://igrid-ext.cryst.bbk.ac.uk/gpcrtree

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

    The Origin of GPCRs: Identification of Mammalian like Rhodopsin, Adhesion, Glutamate and Frizzled GPCRs in Fungi

    Get PDF
    G protein-coupled receptors (GPCRs) in humans are classified into the five main families named Glutamate, Rhodopsin, Adhesion, Frizzled and Secretin according to the GRAFS classification. Previous results show that these mammalian GRAFS families are well represented in the Metazoan lineages, but they have not been shown to be present in Fungi. Here, we systematically mined 79 fungal genomes and provide the first evidence that four of the five main mammalian families of GPCRs, namely Rhodopsin, Adhesion, Glutamate and Frizzled, are present in Fungi and found 142 novel sequences between them. Significantly, we provide strong evidence that the Rhodopsin family emerged from the cAMP receptor family in an event close to the split of Opisthokonts and not in Placozoa, as earlier assumed. The Rhodopsin family then expanded greatly in Metazoans while the cAMP receptor family is found in 3 invertebrate species and lost in the vertebrates. We estimate that the Adhesion and Frizzled families evolved before the split of Unikonts from a common ancestor of all major eukaryotic lineages. Also, the study highlights that the fungal Adhesion receptors do not have N-terminal domains whereas the fungal Glutamate receptors have a broad repertoire of mammalian-like N-terminal domains. Further, mining of the close unicellular relatives of the Metazoan lineage, Salpingoeca rosetta and Capsaspora owczarzaki, obtained a rich group of both the Adhesion and Glutamate families, which in particular provided insight to the early emergence of the N-terminal domains of the Adhesion family. We identified 619 Fungi specific GPCRs across 79 genomes and revealed that Blastocladiomycota and Chytridiomycota phylum have Metazoan-like GPCRs rather than the GPCRs specific for Fungi. Overall, this study provides the first evidence of the presence of four of the five main GRAFS families in Fungi and clarifies the early evolutionary history of the GPCR superfamily

    Fast protein classification by using the most significant pairs

    Get PDF
    This study introduces a new approach to speed up the protein classification process. The basic idea is rewriting the sequences of each family by using the most significant pairs, where the total number of the pairs that can be appeared in the protein sequences is 400 different pairs. The sequence length could be reduced to 0.86, 0.91 and 0.95 by using the most 100, 200 and 300 significant pairs, respectively. The average time reduction is 0.53 %, 0.33 % and 0.22 % for 100, 200, and 300 pairs, respectively. In the three cases the suggested procedure can be adopted to speed up the testing time. However to get identical classification rate to the previous profile HMM, 300 pairs at least must be used
    • …
    corecore