163 research outputs found

    Systematic analysis of primary sequence domain segments for the discrimination between class C GPCR subtypes

    Get PDF
    G-protein-coupled receptors (GPCRs) are a large and diverse super-family of eukaryotic cell membrane proteins that play an important physiological role as transmitters of extracellular signal. In this paper, we investigate Class C, a member of this super-family that has attracted much attention in pharmacology. The limited knowledge about the complete 3D crystal structure of Class C receptors makes necessary the use of their primary amino acid sequences for analytical purposes. Here, we provide a systematic analysis of distinct receptor sequence segments with regard to their ability to differentiate between seven class C GPCR subtypes according to their topological location in the extracellular, transmembrane, or intracellular domains. We build on the results from the previous research that provided preliminary evidence of the potential use of separated domains of complete class C GPCR sequences as the basis for subtype classification. The use of the extracellular N-terminus domain alone was shown to result in a minor decrease in subtype discrimination in comparison with the complete sequence, despite discarding much of the sequence information. In this paper, we describe the use of Support Vector Machine-based classification models to evaluate the subtype-discriminating capacity of the specific topological sequence segments.Peer ReviewedPostprint (author's final draft

    Analysis of class C G-protein coupled receptors using supervised classification methods

    Get PDF
    G protein-coupled receptors (GPCRs) are cell membrane proteins with a key role in regulating the function of cells. This is the result of their ability to transmit extracellular signals, which makes them relevant for pharmacology and has led, over the last decade, to active research in the field of proteomics. The current thesis specifically targets class C of GPCRs, which are relevant in therapies for various central nervous system disorders, such as Alzheimer’s disease, anxiety, Parkinson’s disease and schizophrenia. The investigation of protein functionality often relies on the knowledge of crystal three dimensional (3-D) structures, which determine the receptor’s ability for ligand binding responsible for the activation of certain functionalities in the protein. The structural information is therefore paramount, but it is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as GPCRs. In the face of the lack of information about the 3-D structure, research is often bound to the analysis of the primary amino acid sequences of the proteins, which are commonly known and available from curated databases. Much research on sequence analysis has focused on the quantitative analysis of their aligned versions, although, recently, alternative approaches using machine learning techniques for the analysis of alignment-free sequences have been proposed. In this thesis, we focus on the differentiation of class C GPCRs into functional and structural related subgroups based on the alignment-free analysis of their sequences using supervised classification models. In the first part of the thesis, the main topic is the construction of supervised classification models for unaligned protein sequences based on physicochemical transformations and n-gram representations of their amino acid sequences. These models are useful to assess the internal data quality of the externally labeled dataset and to manage the label noise problem from a data curation perspective. In its second part, the thesis focuses on the analysis of the sequences to discover subtype- and region-speci¿c sequence motifs. For that, we carry out a systematic analysis of the topological sequence segments with supervised classification models and evaluate the subtype discrimination capability of each region. In addition, we apply different types of feature selection techniques to the n-gram representation of the amino acid sequence segments to find subtype and region specific motifs. Finally, we compare the findings of this motif search with the partially known 3D crystallographic structures of class C GPCRs.Los receptores acoplados a proteínas G (GPCRs) son proteínas de la membrana celular con un papel clave para la regulación del funcionamiento de una célula. Esto es consecuencia de su capacidad de transmisión de señales extracelulares, lo que les hace relevante en la farmacología y que ha llevado a investigaciones activas en la última década en el área de la proteómica. Esta tesis se centra específicamente en la clase C de GPCRs, que son relevante para terapias de varios trastornos del sistema nervioso central, como la enfermedad de Alzheimer, ansiedad, enfermedad de Parkinson y esquizofrenia. La investigación de la funcionalidad de proteínas muchas veces se basa en el conocimiento de la estructura cristalina tridimensional (3-D), que determina la capacidad del receptor para la unión con ligandos, que son responsables para la activación de ciertas funcionalidades en la proteína. El análisis de secuencias de amino ácidos se ha centrado en muchas investigaciones en el análisis cuantitativo de las versiones alineados de las secuencias, aunque, recientemente, se han propuesto métodos alternativos usando métodos de aprendizaje automático aplicados a las versiones no-alineadas de las secuencias. En esta tesis, nos centramos en la diferenciación de los GPCRs de la clase C en subgrupos funcionales y estructurales basado en el análisis de las secuencias no-alineadas utilizando modelos de clasificación supervisados. Estos modelos son útiles para evaluar la calidad interna de los datos a partir del conjunto de datos etiquetados externamente y para gestionar el problema del 'ruido de datos' desde la perspectiva de la curación de datos. En su segunda parte, la tesis enfoca el análisis de las secuencias para descubrir motivos de secuencias específicos a nivel de subtipo o región. Para eso, llevamos a cabo un análisis sistemático de los segmentos topológicos de la secuencia con modelos supervisados de clasificación y evaluamos la capacidad de discriminar entre subtipos de cada región. Adicionalmente, aplicamos diferentes tipos de técnicas de selección de atributos a las representaciones mediante n-gramas de los segmentos de secuencias de amino ácidos para encontrar motivos específicos a nivel de subtipo y región. Finalmente, comparamos los descubrimientos de la búsqueda de motivos con las estructuras cristalinas parcialmente conocidas para la clase C de GPCRs

    Analysis of class C G-protein coupled receptors using supervised classification methods

    Get PDF
    G protein-coupled receptors (GPCRs) are cell membrane proteins with a key role in regulating the function of cells. This is the result of their ability to transmit extracellular signals, which makes them relevant for pharmacology and has led, over the last decade, to active research in the field of proteomics. The current thesis specifically targets class C of GPCRs, which are relevant in therapies for various central nervous system disorders, such as Alzheimer’s disease, anxiety, Parkinson’s disease and schizophrenia. The investigation of protein functionality often relies on the knowledge of crystal three dimensional (3-D) structures, which determine the receptor’s ability for ligand binding responsible for the activation of certain functionalities in the protein. The structural information is therefore paramount, but it is not always known or easily unravelled, which is the case of eukaryotic cell membrane proteins such as GPCRs. In the face of the lack of information about the 3-D structure, research is often bound to the analysis of the primary amino acid sequences of the proteins, which are commonly known and available from curated databases. Much research on sequence analysis has focused on the quantitative analysis of their aligned versions, although, recently, alternative approaches using machine learning techniques for the analysis of alignment-free sequences have been proposed. In this thesis, we focus on the differentiation of class C GPCRs into functional and structural related subgroups based on the alignment-free analysis of their sequences using supervised classification models. In the first part of the thesis, the main topic is the construction of supervised classification models for unaligned protein sequences based on physicochemical transformations and n-gram representations of their amino acid sequences. These models are useful to assess the internal data quality of the externally labeled dataset and to manage the label noise problem from a data curation perspective. In its second part, the thesis focuses on the analysis of the sequences to discover subtype- and region-speci¿c sequence motifs. For that, we carry out a systematic analysis of the topological sequence segments with supervised classification models and evaluate the subtype discrimination capability of each region. In addition, we apply different types of feature selection techniques to the n-gram representation of the amino acid sequence segments to find subtype and region specific motifs. Finally, we compare the findings of this motif search with the partially known 3D crystallographic structures of class C GPCRs.Los receptores acoplados a proteínas G (GPCRs) son proteínas de la membrana celular con un papel clave para la regulación del funcionamiento de una célula. Esto es consecuencia de su capacidad de transmisión de señales extracelulares, lo que les hace relevante en la farmacología y que ha llevado a investigaciones activas en la última década en el área de la proteómica. Esta tesis se centra específicamente en la clase C de GPCRs, que son relevante para terapias de varios trastornos del sistema nervioso central, como la enfermedad de Alzheimer, ansiedad, enfermedad de Parkinson y esquizofrenia. La investigación de la funcionalidad de proteínas muchas veces se basa en el conocimiento de la estructura cristalina tridimensional (3-D), que determina la capacidad del receptor para la unión con ligandos, que son responsables para la activación de ciertas funcionalidades en la proteína. El análisis de secuencias de amino ácidos se ha centrado en muchas investigaciones en el análisis cuantitativo de las versiones alineados de las secuencias, aunque, recientemente, se han propuesto métodos alternativos usando métodos de aprendizaje automático aplicados a las versiones no-alineadas de las secuencias. En esta tesis, nos centramos en la diferenciación de los GPCRs de la clase C en subgrupos funcionales y estructurales basado en el análisis de las secuencias no-alineadas utilizando modelos de clasificación supervisados. Estos modelos son útiles para evaluar la calidad interna de los datos a partir del conjunto de datos etiquetados externamente y para gestionar el problema del 'ruido de datos' desde la perspectiva de la curación de datos. En su segunda parte, la tesis enfoca el análisis de las secuencias para descubrir motivos de secuencias específicos a nivel de subtipo o región. Para eso, llevamos a cabo un análisis sistemático de los segmentos topológicos de la secuencia con modelos supervisados de clasificación y evaluamos la capacidad de discriminar entre subtipos de cada región. Adicionalmente, aplicamos diferentes tipos de técnicas de selección de atributos a las representaciones mediante n-gramas de los segmentos de secuencias de amino ácidos para encontrar motivos específicos a nivel de subtipo y región. Finalmente, comparamos los descubrimientos de la búsqueda de motivos con las estructuras cristalinas parcialmente conocidas para la clase C de GPCRs.Postprint (published version

    A computational intelligence analysis of G proteincoupled receptor sequinces for pharmacoproteomic applications

    Get PDF
    Arguably, drug research has contributed more to the progress of medicine during the past decades than any other scientific factor. One of the main areas of drug research is related to the analysis of proteins. The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. This dependency brings about the challenge of finding robust methods to analyze the complex data they generate. Such challenge invites us to go one step further than traditional statistics and resort to approaches under the conceptual umbrella of artificial intelligence, including machine learning (ML), statistical pattern recognition and soft computing methods. Sound statistical principles are essential to trust the evidence base built through the use of such approaches. Statistical ML methods are thus at the core of the current thesis. More than 50% of drugs currently available target only four key protein families, from which almost a 30% correspond to the G Protein-Coupled Receptors (GPCR) superfamily. This superfamily regulates the function of most cells in living organisms and is at the centre of the investigations reported in the current thesis. No much is known about the 3D structure of these proteins. Fortunately, plenty of information regarding their amino acid sequences is readily available. The automatic grouping and classification of GPCRs into families and these into subtypes based on sequence analysis may significantly contribute to ascertain the pharmaceutically relevant properties of this protein superfamily. There is no biologically-relevant manner of representing the symbolic sequences describing proteins using real-valued vectors. This does not preclude the possibility of analyzing them using principled methods. These may come, amongst others, from the field of statisticalML. Particularly, kernel methods can be used to this purpose. Moreover, the visualization of high-dimensional protein sequence data can be a key exploratory tool for finding meaningful information that might be obscured by their intrinsic complexity. That is why the objective of the research described in this thesis is twofold: first, the design of adequate visualization-oriented artificial intelligence-based methods for the analysis of GPCR sequential data, and second, the application of the developed methods in relevant pharmacoproteomic problems such as GPCR subtyping and protein alignment-free analysis.Se podría decir que la investigación farmacológica ha desempeñado un papel predominante en el avance de la medicina a lo largo de las últimas décadas. Una de las áreas principales de investigación farmacológica es la relacionada con el estudio de proteínas. La farmacología depende cada vez más de los avances en genómica y proteómica, lo que conlleva el reto de diseñar métodos robustos para el análisis de los datos complejos que generan. Tal reto nos incita a ir más allá de la estadística tradicional para recurrir a enfoques dentro del campo de la inteligencia artificial, incluyendo el aprendizaje automático y el reconocimiento de patrones estadístico, entre otros. El uso de principios sólidos de teoría estadística es esencial para confiar en la base de evidencia obtenida mediante estos enfoques. Los métodos de aprendizaje automático estadístico son uno de los fundamentos de esta tesis. Más del 50% de los fármacos en uso hoy en día tienen como ¿diana¿ apenas cuatro familias clave de proteínas, de las que un 30% corresponden a la super-familia de los G-Protein Coupled Receptors (GPCR). Los GPCR regulan la funcionalidad de la mayoría de las células y son el objetivo central de la tesis. Se desconoce la estructura 3D de la mayoría de estas proteínas, pero, en cambio, hay mucha información disponible de sus secuencias de amino ácidos. El agrupamiento y clasificación automáticos de los GPCR en familias, y de éstas a su vez en subtipos, en base a sus secuencias, pueden contribuir de forma significativa a dilucidar aquellas de sus propiedades de interés farmacológico. No hay forma biológicamente relevante de representar las secuencias simbólicas de las proteínas mediante vectores reales. Esto no impide que se puedan analizar con métodos adecuados. Entre estos se cuentan las técnicas provenientes del aprendizaje automático estadístico y, en particular, los métodos kernel. Por otro lado, la visualización de secuencias de proteínas de alta dimensionalidad puede ser una herramienta clave para la exploración y análisis de las mismas. Es por ello que el objetivo central de la investigación descrita en esta tesis se puede desdoblar en dos grandes líneas: primero, el diseño de métodos centrados en la visualización y basados en la inteligencia artificial para el análisis de los datos secuenciales correspondientes a los GPCRs y, segundo, la aplicación de los métodos desarrollados a problemas de farmacoproteómica tales como la subtipificación de GPCRs y el análisis de proteinas no-alineadas

    Theoretical study of the interaction of agonists with the 5-HT2A receptor

    Get PDF
    The 5-HT2A receptor (5-HT2AR) is a biogenic amine receptor that belongs to the class A of G protein coupled receptors. It is characterized by a low affinity for serotonin (5-HT) and for other primary amines. Introduction of an ortho-methoxybenzyl substituent at the amine nitrogen increases the partial agonistic activity by a factor of 40 to 1400 compared with 5-HT. The present study was to analyse the QSAR of a series of 51 5-HT2AR partial agonistic arylethylamines, tested in vascular in-vitro assays on rats, at a structure-based level and to suggest ligand binding sites. The compounds belong to three different structural classes, (1) indoles, (2) methoxybenzenes and (3) quinazolinediones. Following a hierarchical strategy, different methods have been applied which all contribute to the investigation of ligand-receptor interactions: fragment regression analysis (FRA), receptor modeling, docking studies and 3D QSAR approaches (comparative molecular field analysis, CoMFA, and comparative molecular similarity index analysis, CoMSIA). An initial FRA indicated that methoxy substituents at indole and phenyl derivatives increase the activity and may be involved in polar interactions with the 5-HT2AR. The large contribution of lipophilic substituents in p position of phenethylamines suggests fit to a specific hydrophobic pocket. Secondary benzylamines are more than one order of magnitude more active than their NH2 analogs. An ortho-OH or -OMe substituent at the benzyl moiety further increases activity. Homology models of the human and rat 5-HT2AR were generated using the crystal structure of bovine rhodopsin and of the beta2-adrenoceptor as templates. The derivation of the putative binding sites for the arylethylamines was based on the results from FRA and on mutagenesis data. Both templates led to 5-HT2AR models with similar topology of the binding pocket within the transmembrane domains TM3, TM5, TM6 and TM7. Docking studies with representative members of the three structural classes suggested that the aryl moieties and particularly para-substituents in phenyl derivatives fit into a hydrophobic pocket formed by Phe2435.47, Phe2445.48 and Phe3406.52. The 5-methoxy substituents in indole and phenyl compounds form H bonds with Ser2395.43. In each case, an additional H bond with Ser1593.36 may be assumed. The cationic amine interacts with the conserved Asp1553.32. The benzyl group of secondary arylethylamines is inserted into another hydrophobic pocket formed by Phe3396.51, Trp3677.40 and Tyr3707.43. In this region, the docking poses depend on the template used for model generation, leading to different interactions especially of ortho- substituents. The docking studies with the beta2-adrenoceptor based rat 5-HT2AR model provided templates for a structure-based alignment of the whole series which was used in 3D QSAR analyses of the partial agonistic activity. Both approaches, CoMFA and CoMSIA, led to highly predictive models with low complexity (cross-validated q2 of 0.72 and 0.81 at 4 and 3 components, respectively). The results were largely compatible with the binding site and confirm the docking studies and the suggested ligand-receptor interactions. Steric and hydrophobic field effects on the potency indicate a hydrophobic pocket around the aryl moiety and near the para position of phenyl derivatives and account for the increased activity of secondary benzylamines. The effects of electrostatic and H-bond acceptor fields suggest a favourable influence of negative charges around the aryl moiety, corresponding to the increase in potency caused by methoxy substituents in 2-, 4-, 5- and 6-position of phenethylamines and by the quinazolinedione oxygens. This is in accord with the role of Ser1593.36 and Ser2395.43 as H bond donors. At the benzyl moiety, the negative charge and the acceptor potential of 2-hydroxy and -methoxy substituents is of advantage. Agonists stabilize or induce active receptor states not reflected by the existing crystal structures. Based on models of different rhodopsin states, a homology modeling and ligand docking study on corresponding 5-HT2AR states suggested to be specific to agonist and partial agonist binding, respectively, was performed. The models indicate collective conformational changes of TM domains during activation. The different 5-HT2AR states are similar with respect to the amino acids interacting with the arylethylamines, but show individual topologies of the binding sites. The interconversion of states by TM movements may be accompanied by co-translations and rotations of the ligands. In the case of the secondary amines considered, the tight fit of the benzyl substituent into a hydrophobic pocket containing key residues in TM6 probably impedes the complete receptor activation due to inhibiting the rotation of this helix. High affinity of a partial agonist is therefore often at the expense of its ability to fully activate a receptor

    Combinatorial expression of GPCR isoforms affects signalling and drug responses

    Get PDF
    G-protein-coupled receptors (GPCRs) are membrane proteins that modulate physiology across human tissues in response to extracellular signals. GPCR-mediated signalling can differ because of changes in the sequence1,2 or expression3 of the receptors, leading to signalling bias when comparing diverse physiological systems4. An underexplored source of such bias is the generation of functionally diverse GPCR isoforms with different patterns of expression across different tissues. Here we integrate data from human tissue-level transcriptomes, GPCR sequences and structures, proteomics, single-cell transcriptomics, population-wide genetic association studies and pharmacological experiments. We show how a single GPCR gene can diversify into several isoforms with distinct signalling properties, and how unique isoform combinations expressed in different tissues can generate distinct signalling states. Depending on their structural changes and expression patterns, some of the detected isoforms may influence cellular responses to drugs and represent new targets for developing drugs with improved tissue selectivity. Our findings highlight the need to move from a canonical to a context-specific view of GPCR signalling that considers how combinatorial expression of isoforms in a particular cell type, tissue or organism collectively influences receptor signalling and drug responses

    Profiling patterns of interhelical associations in membrane proteins.

    Get PDF
    A novel set of methods has been developed to characterize polytopic membrane proteins at the topological, organellar and functional level, in order to reduce the existing functional gap in the membrane proteome. Firstly, a novel clustering tool was implemented, named PROCLASS, to facilitate the manual curation of large sets of proteins, in readiness for feature extraction. TMLOOP and TMLOOP writer were implemented to refine current topological models by predicting membrane dipping loops. TMLOOP applies weighted predictive rules in a collective motif method, to overcome the inherent limitations of single motif methods. The approach achieved 92.4% accuracy in sensitivity and 100% reliability in specificity and 1,392 topological models described in the Swiss-Prot database were refined. The subcellular location (TMLOCATE) and molecular function (TMFUN) prediction methods rely on the TMDEPTH feature extraction method along data mining techniques. TMDEPTH uses refined topological models and amino acid sequences to calculate pairs of residues located at a similar depth in the membrane. Evaluation of TMLOCATE showed a normalized accuracy of 75% in discriminating between proteins belonging to the main organelles. At a sequence similarity threshold of 40%, TMFLTN predicted main functional classes with a sensitivity of 64.1-71.4%) and 70% of the olfactory GPCRs were correctly predicted. At a sequence similarity threshold of 90%, main functional classes were predicted with a sensitivity of 75.6-92.8%) and class A GPCRs were sub-classified with a sensitivity of 84.5%>-92.9%. These results reflect a direct association between the spatial arrangement of residues in the transmembrane regions and the capacity for polytopic membrane proteins to carry out their functions. The developed methods have for the first time categorically shown that the transmembrane regions hold essential information associated with a wide range of functional properties such as filtering and gating processes, subcellular location and molecular function

    Transmembrane protein topology prediction using support vector machines

    Get PDF
    Background: Alpha-helical transmembrane (TM) proteins are involved in a wide range of important biological processes such as cell signaling, transport of membrane-impermeable molecules, cell-cell communication, cell recognition and cell adhesion. Many are also prime drug targets, and it has been estimated that more than half of all drugs currently on the market target membrane proteins. However, due to the experimental difficulties involved in obtaining high quality crystals, this class of protein is severely under-represented in structural databases. In the absence of structural data, sequence-based prediction methods allow TM protein topology to be investigated.Results: We present a support vector machine-based (SVM) TM protein topology predictor that integrates both signal peptide and re-entrant helix prediction, benchmarked with full cross-validation on a novel data set of 131 sequences with known crystal structures. The method achieves topology prediction accuracy of 89%, while signal peptides and re-entrant helices are predicted with 93% and 44% accuracy respectively. An additional SVM trained to discriminate between globular and TM proteins detected zero false positives, with a low false negative rate of 0.4%. We present the results of applying these tools to a number of complete genomes. Source code, data sets and a web server are freely available from http://bioinf.cs.ucl.ac.uk/psipred/.Conclusion: The high accuracy of TM topology prediction which includes detection of both signal peptides and re-entrant helices, combined with the ability to effectively discriminate between TM and globular proteins, make this method ideally suited to whole genome annotation of alpha-helical transmembrane proteins

    Elucidation of Mechanisms Modulating the Conformation and Function of β-Arrestins by G Protein-Coupled Receptors

    Get PDF
    Arrestins are cytosolic G protein-coupled receptor (GPCR) binding proteins that regulate several facets of GPCR signaling. Once bound to agonist-occupied receptors, arrestins recruit elements of the clathrin-dependent endocytic machinery, resulting in removal of GPCRs from the plasma membrane. The fate of internalized receptors is determined by the stability of the GPCR-arrestin complex, which is itself dictated by several factors, including ligand structure, receptor structure, and arrestin post-translational modifications. We hypothesized that information about ligand and receptor structure is encoded in the conformation of the intracellular domains of an activated receptor and transferred allosterically to receptor-bound arrestin to dictate which of its many cellular functions it will perform. To test this hypothesis we developed a panel of arrestin3 intramolecular FlAsH BRET biosensors that allow detection of conformational shifts between the arrestin N-terminus and six positions within the protein. Measuring the effect of receptor activation on arrestin conformation generates an arrestin3 ‘conformational signature’ in a live cell, real time, multiwell plate format. Using a panel of structurally distinct angiotensin type 1A receptor (AT1AR) ligands, we show that GPCR-arrestin complex avidity correlates directly with the ligand-induced Δ Net BRET of an arrestin3 FlAsH-BRET sensor located within the arrestin3 C-terminal globular domain. We further hypothesized that perturbation of arrestin3 post-translational modifications that influence complex stability would similarly be reflected by loss of conformational shifts of arrestin characteristic of stable complex formation. Ubiquitination of arrestin3 at Lysines 11 and 12 is necessary to stabilize complexes with the AT1AR, but not the vasopressin type 2 receptors (V2R). We found that introduction of an arrestin3 K11/12R mutation, which changes the AT1AR-arrestin interaction from stable to transient, reduced the arrestin3 C-terminal FlAsH-BRET shift produced by AT1AR, but not by the V2R, whose trafficking is unaffected by the mutation. We further tested the impact of the K11/12R mutation on two previously unstudied receptors, the bradykinin type 2 receptor (B2R) and the type 1 parathyroid hormone receptor (PTH1R). Mutation resulted in loss arrestin3 FlAsH-BRET signal induced by B2R, but not PTH1R. Examination of arrestin trafficking by confocal microscopy demonstrated that the K11/12R mutation altered B2R, but not PTH1R, trafficking. We conclude that activation-induced changes in arrestin3 conformation, observable through intramolecular FlAsHBRET, reflect the impact of ligand structure and post translational-modification on its intracellular functions. Biophysical probes such as these, which predict the function of intracellular signaling proteins upon receptor activation, may have application in drug discovery efforts to identify “biased” ligands that tailor GPCR efficacy to elicit specific downstream signaling events
    corecore