Search CORE

2,698 research outputs found

Prediction and classification for GPCR sequences based on ligand specific features

Author: F. Horn
G.E. Tusnády
K.R. Sreekumar
M. Bouvier
R. Karchin
S. Altshul
T. Gudermann
W. Pearson
Y. Huang
Publication venue: Lecture Notes in Computer Science,
Publication date: 01/01/2006
Field of study

Functional identification of G-Protein Coupled Receptors (GPCRs) is one of the current focus areas of pharmaceutical research. Although thousands of GPCR sequences are known, many of them are orphan sequences (the activating ligand is unknown). Therefore, classification methods for automated characterization of orphan GPCRs are imperative. In this study, for predicting Level 1 subfamilies of GPCRs, a novel method for obtaining class specific features, based on the existence of activating ligand specific patterns, has been developed and utilized for a majority voting classification. Exploiting the fact that there is a non-promiscuous relationship between the specific binding of GPCRs into their ligands and their functional classification, our method classifies Level 1 subfamilies of GPCRs with a high predictive accuracy between 99% and 87% in a three-fold cross validation test. The method also tells us which motifs are significant for class determination which has important design implications. The presented machine learning approach, bridges the gulf between the excess amount of GPCR sequence data and their poor functional characterization

Crossref

Sabanci University Research Database

Functional classification of G-Protein coupled receptors, based on their specific ligand coupling patterns

Author: Bakır Burcu
Sezerman Uğur
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2006
Field of study

Functional identification of G-Protein Coupled Receptors (GPCRs) is one of the current focus areas of pharmaceutical research. Although thousands of GPCR sequences are known, many of them re- main as orphan sequences (the activating ligand is unknown). Therefore, classification methods for automated characterization of orphan GPCRs are imperative. In this study, for predicting Level 2 subfamilies of Amine GPCRs, a novel method for obtaining fixed-length feature vectors, based on the existence of activating ligand specific patterns, has been developed and utilized for a Support Vector Machine (SVM)-based classification. Exploiting the fact that there is a non-promiscuous relationship between the specific binding of GPCRs into their ligands and their functional classification, our method classifies Level 2 subfamilies of Amine GPCRs with a high predictive accuracy of 97.02% in a ten-fold cross validation test. The presented machine learning approach, bridges the gulf between the excess amount of GPCR sequence data and their poor functional characterization

Sabanci University Research Database

On the hierarchical classification of G Protein-Coupled Receptors

Author: A. A. Freitas
A. Secker
Attwood
Bhasin
Bhasin
Bissantz
Cardoso
Christopoulos
D. R. Flower
Das
Davies
Flower
Flower
Foord
Gether
Gloriam
Guo
Horn
H bert
J. Timmis
Karchin
Keerthi
Klabunde
Kolakowski
Lapinsh
M. Mendao
M. N. Davies
Milligan
Papasaikas
Prabhu
Sandberg
Schi th
Publication venue: 'Oxford University Press (OUP)'
Publication date: 22/10/2007
Field of study

Motivation: G protein-coupled receptors (GPCRs) play an important role in many physiological systems by transducing an extracellular signal into an intracellular response. Over 50% of all marketed drugs are targeted towards a GPCR. There is considerable interest in developing an algorithm that could effectively predict the function of a GPCR from its primary sequence. Such an algorithm is useful not only in identifying novel GPCR sequences but in characterizing the interrelationships between known GPCRs. Results: An alignment-free approach to GPCR classification has been developed using techniques drawn from data mining and proteochemometrics. A dataset of over 8000 sequences was constructed to train the algorithm. This represents one of the largest GPCR datasets currently available. A predictive algorithm was developed based upon the simplest reasonable numerical representation of the protein's physicochemical properties. A selective top-down approach was developed, which used a hierarchical classifier to assign sequences to subdivisions within the GPCR hierarchy. The predictive performance of the algorithm was assessed against several standard data mining classifiers and further validated against Support Vector Machine-based GPCR prediction servers. The selective top-down approach achieves significantly higher accuracy than standard data mining methods in almost all cases

CiteSeerX

Crossref

Aberystwyth Research Portal

Kent Academic Repository

GPCRTree: online hierarchical classification of GPCR function

Author: Alex A Freitas
Alex A Freitas
Andrew Secker
Darren R Flower Open Access
David S Moss
David S Moss
Edward Clark
Jon Timmis
Mark Halling-brown
Matthew N Davies
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Background: G protein-coupled receptors (GPCRs) play important physiological roles transducing extracellular signals into intracellular responses. Approximately 50% of all marketed drugs target a GPCR. There remains considerable interest in effectively predicting the function of a GPCR from its primary sequence. Findings: Using techniques drawn from data mining and proteochemometrics, an alignment-free approach to GPCR classification has been devised. It uses a simple representation of a protein's physical properties. GPCRTree, a publicly-available internet server, implements an algorithm that classifies GPCRs at the class, sub-family and sub-subfamily level. Conclusion: A selective top-down classifier was developed which assigns sequences within a GPCR hierarchy. Compared to other publicly available GPCR prediction servers, GPCRTree is considerably more accurate at every level of classification. The server has been available online since March 2008 at URL: http://igrid-ext.cryst.bbk.ac.uk/gpcrtree

CiteSeerX

Crossref

Aberystwyth Research Portal

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Aston Publications Explorer

Birkbeck Institutional Research Online

Kent Academic Repository

Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

Author: Alquézar Mancho René
Giraldo Arjonilla Jesús
König Caroline
Vellido Alcacena Alfredo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

An Olfactory Receptor Pseudogene whose Function emerged in Humans

Author: Catherine Ronin
Chiquito J. Crasto
Gautam Bahl
Maryse Gremigni
Olivier Clot-Faybesse
Peter Lai
Valery Matarazzo
Publication venue
Publication date: 02/11/2007
Field of study

Human olfactory receptor, hOR17-210, is identified as a pseudogene in the human genome. Experimental data has shown however, that the gene product of cloned hOR17-210 cDNA was able to bind an odorant-binding protein and is narrowly tuned for excitation by cyclic ketones. Supported by experimental results, we used the bioinformatics methods of sequence analysis, computational protein modeling and docking, to show that functionality in this receptor is retained due to sequence-structure features not previously observed in mammalian ORs. This receptor does not possess the first two transmembrane helical domains (of seven typically seen in GPCRs). It however, possesses an additional TM that has not been observed in other human olfactory receptors. By incorporating these novel structural features, we created two putative models for this receptor. We also docked odor ligands that were experimentally shown to bind hOR17-210 model. We show how and why structural modifications of OR17-210 do not hinder this receptor's functionality. Our studies reveal that novel gene rearrangement that result in sequence and structural diversity in has a bearing on OR and GPCR function and evolution

Crossref

Nature Precedings

Virtual screening of GPCRs: An in silico chemogenomics approach

Author: Hoffmann Brice
Jacob Laurent
Stoven Véronique
Vert Jean-Philippe
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

International audienceThe G-protein coupled receptor (GPCR) superfamily is currently the largest class of therapeutic targets. In silico prediction of interactions between GPCRs and small molecules in the transmembrane ligand-binding site is therefore a crucial step in the drug discovery process, which remains a daunting task due to the difficulty to characterize the 3D structure of most GPCRs, and to the limited amount of known ligands for some members of the superfamily. Chemogenomics, which attempts to characterize interactions between all members of a target class and all small molecules simultaneously, has recently been proposed as an interesting alternative to traditional docking or ligand-based virtual screening strategies

arXiv.org e-Print Archive

MODBASE, a database of annotated comparative protein structure models and associated resources.

Author: Barkan David T
Carter Hannah
Davis Fred P
Eramian David
Eswar Narayanan
Karchin Rachel
Kelly Libusha
Mankoo Parminder
Marti-Renom Marc A
Pieper Ursula
Sali Andrej
Webb Ben M
Publication venue: eScholarship, University of California
Publication date: 23/10/2008
Field of study

MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)

PubMed Central

eScholarship - University of California

Algebraic shortcuts for leave-one-out cross-validation in supervised network inference

Author: Airola Antti
De Baets Bernard
Pahikkala Tapio
Stock Michiel
Waegeman Willem
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Supervised machine learning techniques have traditionally been very successful at reconstructing biological networks, such as protein-ligand interaction, protein-protein interaction and gene regulatory networks. Many supervised techniques for network prediction use linear models on a possibly nonlinear pairwise feature representation of edges. Recently, much emphasis has been placed on the correct evaluation of such supervised models. It is vital to distinguish between using a model to either predict new interactions in a given network or to predict interactions for a new vertex not present in the original network. This distinction matters because (i) the performance might dramatically differ between the prediction settings and (ii) tuning the model hyperparameters to obtain the best possible model depends on the setting of interest. Specific cross-validation schemes need to be used to assess the performance in such different prediction settings. In this work we discuss a state-of-the-art kernel-based network inference technique called two-step kernel ridge regression. We show that this regression model can be trained efficiently, with a time complexity scaling with the number of vertices rather than the number of edges. Furthermore, this framework leads to a series of cross-validation shortcuts that allow one to rapidly estimate the model performance for any relevant network prediction setting. This allows computational biologists to fully assess the capabilities of their models

Ghent University Academic Bibliography