894 research outputs found

    The mutational landscape of human olfactory G protein-coupled receptors

    Get PDF
    Olfactory receptors (ORs) constitute a large family of sensory proteins that enable us to recognize a wide range of chemical volatiles in the environment. By contrast to the extensive information about human olfactory thresholds for thousands of odorants, studies of the genetic influence on olfaction are limited to a few examples. To annotate on a broad scale the impact of mutations at the structural level, here we analyzed a compendium of 119,069 natural variants in human ORs collected from the public domain. OR mutations were categorized depending on their genomic and protein contexts, as well as their frequency of occurrence in several human populations. Functional interpretation of the natural changes was estimated from the increasing knowledge of the structure and function of the G protein-coupled receptor (GPCR) family, to which ORs belong. Our analysis reveals an extraordinary diversity of natural variations in the olfactory gene repertoire between individuals and populations, with a significant number of changes occurring at the structurally conserved regions. A particular attention is paid to mutations in positions linked to the conserved GPCR activation mechanism that could imply phenotypic variation in the olfactory perception. An interactive web application (hORMdb, Human Olfactory Receptor Mutation Database) was developed for the management and visualization of this mutational dataset. We performed topological annotations and population analysis of natural variants of human olfactory receptors and provide an interactive application to explore human OR mutation data. We envisage that the utility of this information will increase as the amount of available pharmacological data for these receptors grow. This effort, together with ongoing research in the study of genetic changes in other sensory receptors could shape an emerging sensegenomics field of knowledge, which should be considered by food and cosmetic consumer product manufacturers for the benefit of the general population. https://doi.org/10.13039/5011000110335https://doi.org/10.13039/5011000110333https://doi.org/10.13039/5011000110336https://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033_https://doi.org/10.13039/501100011033_https://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033Ahttps://doi.org/10.13039/501100011033ghttps://doi.org/10.13039/501100011033ehttps://doi.org/10.13039/501100011033nhttps://doi.org/10.13039/501100011033chttps://doi.org/10.13039/501100011033ihttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033Ehttps://doi.org/10.13039/501100011033shttps://doi.org/10.13039/501100011033thttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033thttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033lhttps://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033dhttps://doi.org/10.13039/501100011033ehttps://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033Ihttps://doi.org/10.13039/501100011033nhttps://doi.org/10.13039/501100011033vhttps://doi.org/10.13039/501100011033ehttps://doi.org/10.13039/501100011033shttps://doi.org/10.13039/501100011033thttps://doi.org/10.13039/501100011033ihttps://doi.org/10.13039/501100011033ghttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033chttps://doi.org/10.13039/501100011033ihttps://doi.org/10.13039/501100011033oˊhttps://doi.org/10.13039/501100011033nhttps://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033Ahttps://doi.org/10.13039/501100011033ghttps://doi.org/10.13039/501100011033ehttps://doi.org/10.13039/501100011033nhttps://doi.org/10.13039/501100011033chttps://doi.org/10.13039/501100011033ihttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033Ehttps://doi.org/10.13039/501100011033shttps://doi.org/10.13039/501100011033thttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033thttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033lhttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033dhttps://doi.org/10.13039/501100011033ehttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033Ihttps://doi.org/10.13039/501100011033nhttps://doi.org/10.13039/501100011033vhttps://doi.org/10.13039/501100011033ehttps://doi.org/10.13039/501100011033shttps://doi.org/10.13039/501100011033thttps://doi.org/10.13039/501100011033ihttps://doi.org/10.13039/501100011033ghttps://doi.org/10.13039/501100011033ahttps://doi.org/10.13039/501100011033chttps://doi.org/10.13039/501100011033ihttps://doi.org/10.13039/501100011033óhttps://doi.org/10.13039/501100011033nhttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033https://doi.org/10.13039/501100011033dhttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033$https://doi.org/10.13039/501100011033fhttps://doi.org/10.13039/501100011033 https://doi.org/10.13039/501100011033Phttps://doi.org/10.13039/501100011033Ihttps://doi.org/10.13039/501100011033Dhttps://doi.org/10.13039/5011000110332https://doi.org/10.13039/5011000110330https://doi.org/10.13039/5011000110331https://doi.org/10.13039/5011000110339https://doi.org/10.13039/501100011033-https://doi.org/10.13039/5011000110331https://doi.org/10.13039/5011000110330https://doi.org/10.13039/5011000110339https://doi.org/10.13039/5011000110332https://doi.org/10.13039/5011000110334https://doi.org/10.13039/5011000110330https://doi.org/10.13039/501100011033Rhttps://doi.org/10.13039/501100011033Bhttps://doi.org/10.13039/501100011033-https://doi.org/10.13039/501100011033Ihttps://doi.org/10.13039/5011000110330https://doi.org/10.13039/5011000110330https://doi.org/10.13039/50110001103

    A computational intelligence analysis of G proteincoupled receptor sequinces for pharmacoproteomic applications

    Get PDF
    Arguably, drug research has contributed more to the progress of medicine during the past decades than any other scientific factor. One of the main areas of drug research is related to the analysis of proteins. The world of pharmacology is becoming increasingly dependent on the advances in the fields of genomics and proteomics. This dependency brings about the challenge of finding robust methods to analyze the complex data they generate. Such challenge invites us to go one step further than traditional statistics and resort to approaches under the conceptual umbrella of artificial intelligence, including machine learning (ML), statistical pattern recognition and soft computing methods. Sound statistical principles are essential to trust the evidence base built through the use of such approaches. Statistical ML methods are thus at the core of the current thesis. More than 50% of drugs currently available target only four key protein families, from which almost a 30% correspond to the G Protein-Coupled Receptors (GPCR) superfamily. This superfamily regulates the function of most cells in living organisms and is at the centre of the investigations reported in the current thesis. No much is known about the 3D structure of these proteins. Fortunately, plenty of information regarding their amino acid sequences is readily available. The automatic grouping and classification of GPCRs into families and these into subtypes based on sequence analysis may significantly contribute to ascertain the pharmaceutically relevant properties of this protein superfamily. There is no biologically-relevant manner of representing the symbolic sequences describing proteins using real-valued vectors. This does not preclude the possibility of analyzing them using principled methods. These may come, amongst others, from the field of statisticalML. Particularly, kernel methods can be used to this purpose. Moreover, the visualization of high-dimensional protein sequence data can be a key exploratory tool for finding meaningful information that might be obscured by their intrinsic complexity. That is why the objective of the research described in this thesis is twofold: first, the design of adequate visualization-oriented artificial intelligence-based methods for the analysis of GPCR sequential data, and second, the application of the developed methods in relevant pharmacoproteomic problems such as GPCR subtyping and protein alignment-free analysis.Se podría decir que la investigación farmacológica ha desempeñado un papel predominante en el avance de la medicina a lo largo de las últimas décadas. Una de las áreas principales de investigación farmacológica es la relacionada con el estudio de proteínas. La farmacología depende cada vez más de los avances en genómica y proteómica, lo que conlleva el reto de diseñar métodos robustos para el análisis de los datos complejos que generan. Tal reto nos incita a ir más allá de la estadística tradicional para recurrir a enfoques dentro del campo de la inteligencia artificial, incluyendo el aprendizaje automático y el reconocimiento de patrones estadístico, entre otros. El uso de principios sólidos de teoría estadística es esencial para confiar en la base de evidencia obtenida mediante estos enfoques. Los métodos de aprendizaje automático estadístico son uno de los fundamentos de esta tesis. Más del 50% de los fármacos en uso hoy en día tienen como ¿diana¿ apenas cuatro familias clave de proteínas, de las que un 30% corresponden a la super-familia de los G-Protein Coupled Receptors (GPCR). Los GPCR regulan la funcionalidad de la mayoría de las células y son el objetivo central de la tesis. Se desconoce la estructura 3D de la mayoría de estas proteínas, pero, en cambio, hay mucha información disponible de sus secuencias de amino ácidos. El agrupamiento y clasificación automáticos de los GPCR en familias, y de éstas a su vez en subtipos, en base a sus secuencias, pueden contribuir de forma significativa a dilucidar aquellas de sus propiedades de interés farmacológico. No hay forma biológicamente relevante de representar las secuencias simbólicas de las proteínas mediante vectores reales. Esto no impide que se puedan analizar con métodos adecuados. Entre estos se cuentan las técnicas provenientes del aprendizaje automático estadístico y, en particular, los métodos kernel. Por otro lado, la visualización de secuencias de proteínas de alta dimensionalidad puede ser una herramienta clave para la exploración y análisis de las mismas. Es por ello que el objetivo central de la investigación descrita en esta tesis se puede desdoblar en dos grandes líneas: primero, el diseño de métodos centrados en la visualización y basados en la inteligencia artificial para el análisis de los datos secuenciales correspondientes a los GPCRs y, segundo, la aplicación de los métodos desarrollados a problemas de farmacoproteómica tales como la subtipificación de GPCRs y el análisis de proteinas no-alineadas

    Predicting a small molecule-kinase interaction map: A machine learning approach

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We present a machine learning approach to the problem of protein ligand interaction prediction. We focus on a set of binding data obtained from 113 different protein kinases and 20 inhibitors. It was attained through ATP site-dependent binding competition assays and constitutes the first available dataset of this kind. We extract information about the investigated molecules from various data sources to obtain an informative set of features.</p> <p>Results</p> <p>A Support Vector Machine (SVM) as well as a decision tree algorithm (C5/See5) is used to learn models based on the available features which in turn can be used for the classification of new kinase-inhibitor pair test instances. We evaluate our approach using different feature sets and parameter settings for the employed classifiers. Moreover, the paper introduces a new way of evaluating predictions in such a setting, where different amounts of information about the binding partners can be assumed to be available for training. Results on an external test set are also provided.</p> <p>Conclusions</p> <p>In most of the cases, the presented approach clearly outperforms the baseline methods used for comparison. Experimental results indicate that the applied machine learning methods are able to detect a signal in the data and predict binding affinity to some extent. For SVMs, the binding prediction can be improved significantly by using features that describe the active site of a kinase. For C5, besides diversity in the feature set, alignment scores of conserved regions turned out to be very useful.</p

    Transcriptomic analysis of crustacean neuropeptide signaling during the moult cycle in the green shore crab, Carcinus maenas

    Get PDF
    Abstract Background Ecdysis is an innate behaviour programme by which all arthropods moult their exoskeletons. The complex suite of interacting neuropeptides that orchestrate ecdysis is well studied in insects, but details of the crustacean ecdysis cassette are fragmented and our understanding of this process is comparatively crude, preventing a meaningful evolutionary comparison. To begin to address this issue we identified transcripts coding for neuropeptides and their putative receptors in the central nervous system (CNS) and Y-organs (YO) within the crab, Carcinus maenas, and mapped their expression profiles across accurately defined stages of the moult cycle using RNA-sequencing. We also studied gene expression within the epidermally-derived YO, the only defined role for which is the synthesis of ecdysteroid moulting hormones, to elucidate peptides and G protein-coupled receptors (GPCRs) that might have a function in ecdysis. Results Transcriptome mining of the CNS transcriptome yielded neuropeptide transcripts representing 47 neuropeptide families and 66 putative GPCRs. Neuropeptide transcripts that were differentially expressed across the moult cycle included carcikinin, crustacean hyperglycemic hormone-2, and crustacean cardioactive peptide, whilst a single putative neuropeptide receptor, proctolin R1, was differentially expressed. Carcikinin mRNA in particular exhibited dramatic increases in expression pre-moult, suggesting a role in ecdysis regulation. Crustacean hyperglycemic hormone-2 mRNA expression was elevated post- and pre-moult whilst that for crustacean cardioactive peptide, which regulates insect ecdysis and plays a role in stereotyped motor activity during crustacean ecdysis, was elevated in pre-moult. In the YO, several putative neuropeptide receptor transcripts were differentially expressed across the moult cycle, as was the mRNA for the neuropeptide, neuroparsin-1. Whilst differential gene expression of putative neuropeptide receptors was expected, the discovery and differential expression of neuropeptide transcripts was surprising. Analysis of GPCR transcript expression between YO and epidermis revealed 11 to be upregulated in the YO and thus are now candidates for peptide control of ecdysis. Conclusions The data presented represent a comprehensive survey of the deduced C. maenas neuropeptidome and putative GPCRs. Importantly, we have described the differential expression profiles of these transcripts across accurately staged moult cycles in tissues key to the ecdysis programme. This study provides important avenues for the future exploration of functionality of receptor-ligand pairs in crustaceans

    Sequence variation in G-protein-coupled receptors: analysis of single nucleotide polymorphisms

    Get PDF
    We assessed the disease-causing potential of single nucleotide polymorphisms (SNPs) based on a simple set of sequence-based features. We focused on SNPs from the dbSNP database in G-protein-coupled receptors (GPCRs), a large class of important transmembrane (TM) proteins. Apart from the location of the SNP in the protein, we evaluated the predictive power of three major classes of features to differentiate between disease-causing mutations and neutral changes: (i) properties derived from amino-acid scales, such as volume and hydrophobicity; (ii) position-specific phylogenetic features reflecting evolutionary conservation, such as normalized site entropy, residue frequency and SIFT score; and (iii) substitution-matrix scores, such as those derived from the BLOSUM62, GRANTHAM and PHAT matrices. We validated our approach using a control dataset consisting of known disease-causing mutations and neutral variations. Logistic regression analyses indicated that position-specific phylogenetic features that describe the conservation of an amino acid at a specific site are the best discriminators of disease mutations versus neutral variations, and integration of all our features improves discrimination power. Overall, we identify 115 SNPs in GPCRs from dbSNP that are likely to be associated with disease and thus are good candidates for genotyping in association studies

    Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association

    Get PDF
    Protein point mutations are an essential component of the evolutionary and experimental analysis of protein structure and function. While many manually curated databases attempt to index point mutations, most experimentally generated point mutations and the biological impacts of the changes are described in the peer-reviewed published literature. We describe an application, Mutation GraB (Graph Bigram), that identifies, extracts, and verifies point mutations from biomedical literature. The principal problem of point mutation extraction is to link the point mutation with its associated protein and organism of origin. Our algorithm uses a graph-based bigram traversal to identify these relevant associations and exploits the Swiss-Prot protein database to verify this information. The graph bigram method is different from other models for point mutation extraction in that it incorporates frequency and positional data of all terms in an article to drive the point mutation–protein association. Our method was tested on 589 articles describing point mutations from the G protein–coupled receptor (GPCR), tyrosine kinase, and ion channel protein families. We evaluated our graph bigram metric against a word-proximity metric for term association on datasets of full-text literature in these three different protein families. Our testing shows that the graph bigram metric achieves a higher F-measure for the GPCRs (0.79 versus 0.76), protein tyrosine kinases (0.72 versus 0.69), and ion channel transporters (0.76 versus 0.74). Importantly, in situations where more than one protein can be assigned to a point mutation and disambiguation is required, the graph bigram metric achieves a precision of 0.84 compared with the word distance metric precision of 0.73. We believe the graph bigram search metric to be a significant improvement over previous search metrics for point mutation extraction and to be applicable to text-mining application requiring the association of words

    Novel algorithms for protein sequence analysis

    Get PDF
    Each protein is characterized by its unique sequential order of amino acids, the so-called protein sequence. Biology__s paradigm is that this order of amino acids determines the protein__s architecture and function. In this thesis, we introduce novel algorithms to analyze protein sequences. Chapter 1 begins with the introduction of amino acids, proteins and protein families. Then fundamental techniques from computer science related to the thesis are briefly described. Making a multiple sequence alignment (MSA) and constructing a phylogenetic tree are traditional means of sequence analysis. Information entropy, feature selection and sequential pattern mining provide alternative ways to analyze protein sequences and they are all from computer science. In Chapter 2, information entropy was used to measure the conservation on a given position of the alignment. From an alignment which is grouped into subfamilies, two types of information entropy values are calculated for each position in the MSA. One is the average entropy for a given position among the subfamilies, the other is the entropy for the same position in the entire multiple sequence alignment. This so-called two-entropies analysis or TEA in short, yields a scatter-plot in which all positions are represented with their two entropy values as x- and y-coordinates. The different locations of the positions (or dots) in the scatter-plot are indicative of various conservation patterns and may suggest different biological functions. The globally conserved positions show up at the lower left corner of the graph, which suggests that these positions may be essential for the folding or for the main functions of the protein superfamily. In contrast the positions neither conserved between subfamilies nor conserved in each individual subfamily appear at the upper right corner. The positions conserved within each subfamily but divergent among subfamilies are in the upper left corner. They may participate in biological functions that divide subfamilies, such as recognition of an endogenous ligand in G protein-coupled receptors. The TEA method requires a definition of protein subfamilies as an input. However such definition is a challenging problem by itself, particularly because this definition is crucial for the following prediction of specificity positions. In Chapter 3, we automated the TEA method described in Chapter 2 by tracing the evolutionary pressure from the root to the branches of the phylogenetic tree. At each level of the tree, a TEA plot is produced to capture the signal of the evolutionary pressure. A consensus TEA-O plot is composed from the whole series of plots to provide a condensed representation. Positions related to functions that evolved early (conserved) or later (specificity) are close to the lower left or upper left corner of the TEA-O plot, respectively. This novel approach allows an unbiased, user-independent, analysis of residue relevance in a protein family. We tested the TEA-O method on a synthetic dataset as well as on __real__ data, i.e., LacI and GPCR datasets. The ROC plots for the real data showed that TEA-O works perfectly well on all datasets and much better than other considered methods such as evolutionary trace, SDPpred and TreeDet. While positions were treated independently from each other in Chapter 2 and 3 in predicting specificity positions, in Chapter 4 multi-RELIEF considers both sequence similarity and distance in 3D structure in the specificity scoring function. The multi-RELIEF method was developed based on RELIEF, a state-of-the-art Machine-Learning technique for feature weighting. It estimates the expected __local__ functional specificity of residues from an alignment divided in multiple classes. Optionally, 3D structure information is exploited by increasing the weight of residues that have high-weight neighbors. Using ROC curves over a large body of experimental reference data, we showed that multi-RELIEF identifies specificity residues for the seven test sets used. In addition, incorporating structural information improved the prediction for specificity of interaction with small molecules. Comparison of multi-RELIEF with four other state-of-the-art algorithms indicates its robustness and best overall performance. In Chapter 2, 3 and 4, we heavily relied on multiple sequence alignment to identify conserved and specificity positions. As mentioned before, the construction of such alignment is not self-evident. Following the principle of sequential pattern mining, in Chapter 5, we proposed a new algorithm that directly identifies frequent biologically meaningful patterns from unaligned sequences. Six algorithms were designed and implemented to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. From Chapter 2 to 5, we aimed to identify functional residues from either aligned or unaligned protein sequences. In Chapter 6, we introduce an alignment-independent procedure to cluster protein sequences, which may be used to predict protein function. Traditionally phylogeny reconstruction is usually based on multiple sequence alignment. The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In cheminformatics, constructing a similarity tree of ligands is usually alignment free. Feature spaces are routine means to convert compounds into binary fingerprints. Then distances among compounds can be obtained and similarity trees are constructed via clustering techniques. We explored building feature spaces for phylogeny reconstruction either using the so-called k-mer method or via sequential pattern mining with additional filtering and combining operations. Satisfying trees were built from both approaches compared with alignment-based methods. We found that when k equals 3, the phylogenetic tree built from the k-mer fingerprints is as good as one of the alignment-based methods, in which PAM and Neighborhood joining are used for computing distance and constructing a tree, respectively (NJ-PAM). As for the sequential pattern mining approach, the quality of the phylogenetic tree is better than one of the alignment-based method (NJ-PAM), if we set the support value to 10% and used maximum patterns only as descriptors. Finally in Chapter 7, general conclusions about the research described in this thesis are drawn. They are supplemented with an outlook on further research lines. We are convinced that the described algorithms can be useful in, e.g., genomic analyses, and provide further ideas for novel algorithms in this respect.Leiden University, NWO (Horizon Breakthrough project 050-71-041) and the Dutch Top Institute Pharma (D1-105)UBL - phd migration 201

    Cross genome phylogenetic analysis of human and Drosophila G protein-coupled receptors: application to functional annotation of orphan receptors

    Get PDF
    BACKGROUND: The cell-membrane G-protein coupled receptors (GPCRs) are one of the largest known superfamilies and are the main focus of intense pharmaceutical research due to their key role in cell physiology and disease. A large number of putative GPCRs are 'orphans' with no identified natural ligands. The first step in understanding the function of orphan GPCRs is to identify their ligands. Phylogenetic clustering methods were used to elucidate the chemical nature of receptor ligands, which led to the identification of natural ligands for many orphan receptors. We have clustered human and Drosophila receptors with known ligands and orphans through cross genome phylogenetic analysis and hypothesized higher relationship of co-clustered members that would ease ligand identification, as related receptors share ligands with similar structure or class. RESULTS: Cross-genome phylogenetic analyses were performed to identify eight major groups of GPCRs dividing them into 32 clusters of 371 human and 113 Drosophila proteins (excluding olfactory, taste and gustatory receptors) and reveal unexpected levels of evolutionary conservation across human and Drosophila GPCRs. We also observe that members of human chemokine receptors, involved in immune response, and most of nucleotide-lipid receptors (except opsins) do not have counterparts in Drosophila. Similarly, a group of Drosophila GPCRs (methuselah receptors), associated in aging, is not present in humans. CONCLUSION: Our analysis suggests ligand class association to 52 unknown Drosophila receptors and 95 unknown human GPCRs. A higher level of phylogenetic organization was revealed in which clusters with common domain architecture or cellular localization or ligand structure or chemistry or a shared function are evident across human and Drosophila genomes. Such analyses will prove valuable for identifying the natural ligands of Drosophila and human orphan receptors that can lead to a better understanding of physiological and pathological roles of these receptors

    Comparative genomics of early animal evolution

    Get PDF
    The explosion of genomics permits investigations into the origin and early evolution of the Metazoa at the molecular level. In this thesis, I am particularly interested in investigating the molecular foundation of the animal senses (i.e. how animals perceive their world). To understand the directionality of evolutionary innovation a well-developed phylogenetic framework is necessary. On one hand, the combination of molecular and morphological data sets has revolutionized our views of metazoan relationships over the past decades, but on the other hand, a number of nodes on the metazoan tree remain uncertain. Uncertainty is particularly high with reference to the taxa generally named “early branching metazoans”. Unfortunately, understanding the relationships among these taxa is key to understanding the evolution of sensory perception (Nielsen 2008). In this thesis I will investigate both animal phylogenetics (to attempt to resolve the phylogeny among the early branching Metazoa) and the evolution of the metazoan sensory receptors. The G-protein coupled receptor superfamily (GPCR) superfamily is the main family of metazoan surface receptors. In this thesis, after an initial introduction (Chapter 1), I address and substantially clarify the relationship among the early branching animals (Chapter 2) using novel genomic data and publicly available expressed sequence tags (ESTs). I then move forward (Chapter 3) to use network-based methods to study the early evolution of the GPCR superfamily in Eukaryotes and animals. Finally (Chapter 4), I focus on the study of a specific subset of GPCRs (the a-group, Rhodopsin-like receptors). This GPCR group is particularly interesting as it includes the best studied and, arguably, one of the most interesting among the GPCR families: the Opsin family. Opsins are key proteins used in the process of light detection, and the origin and early evolution of this family are still substantially unknown. Chapter 4 addresses both these problems. The thesis is then concluded by a general discussion (Chapter 5) and a future directions (Chapter 6) section. Overall, this thesis provides new insights into the origin and early evolution of the Metazoa and their senses
    corecore