6 research outputs found

    APPRIS: selecting functionally important isoforms.

    Get PDF
    APPRIS (https://appris.bioinfo.cnio.es) is a well-established database housing annotations for protein isoforms for a range of species. APPRIS selects principal isoforms based on protein structure and function features and on cross-species conservation. Most coding genes produce a single main protein isoform and the principal isoforms chosen by the APPRIS database best represent this main cellular isoform. Human genetic data, experimental protein evidence and the distribution of clinical variants all support the relevance of APPRIS principal isoforms. APPRIS annotations and principal isoforms have now been expanded to 10 model organisms. In this paper we highlight the most recent updates to the database. APPRIS annotations have been generated for two new species, cow and chicken, the protein structural information has been augmented with reliable models from the EMBL-EBI AlphaFold database, and we have substantially expanded the confirmatory proteomics evidence available for the human genome. The most significant change in APPRIS has been the implementation of TRIFID functional isoform scores. TRIFID functional scores are assigned to all splice isoforms, and APPRIS uses the TRIFID functional scores and proteomics evidence to determine principal isoforms when core methods cannot.National Human Genome Research Institute of the National Institutes of Health [2 U41 HG007234]; Spanish Ministry of Science, Innovation and Universities [PGC2018-097019-B-I00]; Carlos III Institute of Health-Fondo de Investigacion Sanitaria [PRB3 ´ (IPT17/0019––ISCIII-SGEFI/ERDF, ProteoRed]; ‘la Caixa’ Banking Foundation [HR17-00247]. Funding for open access charge: National Human Genome Research Institute.S

    APPRIS 2017: principal isoforms for multiple gene sets

    Get PDF
    The APPRIS database (http://appris-tools.org) uses protein structural and functional features and information from cross-species conservation to annotate splice isoforms in protein-coding genes. APPRIS selects a single protein isoform, the ‘principal’ isoform, as the reference for each gene based on these annotations. A single main splice isoform reflects the biological reality for most protein coding genes and APPRIS principal isoforms are the best predictors of these main proteins isoforms. Here, we present the updates to the database, new developments that include the addition of three new species (chimpanzee, Drosophila melangaster and Caenorhabditis elegans), the expansion of APPRIS to cover the RefSeq gene set and the UniProtKB proteome for six species and refinements in the core methods that make up the annotation pipeline. In addition APPRIS now provides a measure of reliability for individual principal isoforms and updates with each release of the GENCODE/Ensembl and RefSeq reference sets. The individual GENCODE/Ensembl, RefSeq and UniProtKB reference gene sets for six organisms have been merged to produce common sets of splice variants.National Institutes of Health [U41 HG007234, 2U41 HG007234]; Spanish Ministry of Economics and Competitiveness [BIO2015-67580-P]; SpanishNational Institute of Bioinformatics (www.inab.org) [INB-ISCIII, PRB2 to J.M.R.]; ProteoRed [IPT13/0001-ISCIII-SGEFI/FEDER to J.V.]; Joint BSC-IRB-CRG Program in Computation Biology and Award Severo Ochoa [SEV 2015-0493 to A.V.]. Funding for open access charge: U.S. Department of Health and Human Services; National Institutes of Health; National Human Genome Research Institute [2U41 HG007234].Peer ReviewedPostprint (published version

    Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone

    Get PDF
    Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.We thank F. Abascal and M. L. Tress for helpful discussions. This work was supported by Spanish Ministry of Economy and Competitiveness Projects BFU2015-71241-R and BIO2012-40205, cofunded by the European Regional Development Fund.S

    La coevolución en regiones de interacción entre proteínas: estudio y desarrollo de métodos computacionales

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 14-02-2020El funcionamiento celular se sustenta en intrincadas redes de interacciones moleculares. Una de las más comunes e importantes de estas interacciones moleculares son las interacciones físicas entre proteínas. La correcta asociación de proteínas impone fuertes restricciones a la evolución de las correspondientes secuencias. En este contexto, el término coevolución engloba a las interdependencias evolutivas entre proteínas que interaccionan generadas por restricciones estructurales, entre otros factores. Se han desarrollado varios métodos para predecir contactos físicos entre proteínas a partir de covariaciones en alineamientos de secuencias. En la última década, el desarrollo de nuevas metodológicas computacionales y el crecimiento de los datos de secuencias han permitido su mejora. Los objetivos principales de esta tesis son una mayor comprensión del fenómeno de la coevolución en regiones de interacción entre proteínas y la mejora de este tipo de métodos, atendiendo a dos de los problemas que más limitan su ámbito de aplicación: la imposibilidad de predecir contactos sistemáticamente entre proteínas en especies eucariotas y la falta de suficiente información de secuencias en muchas familias. La primera parte de la tesis se concentra en el desarrollo de métodos computacionales para estudiar la relación existente entre coevolución y conservación estructural de las interfaces a largas distancias evolutivas. La comparación de la señal coevolutiva detectada en alineamientos en procariotas con las divergencias estructurales entre complejos homólogos en procariotas y eucariotas nos ha llevado a descubrir que la señal de coevolución está asociada a un alto grado de conservación estructural. Esto permite proyectar con acierto los contactos predichos en procariotas, donde existen abundantes datos de secuencias, a complejos en eucariotas distantes pero relacionados evolutivamente. De esta forma resulta posible extender el ámbito de aplicación de metodologías basadas en coevolución a complejos de proteínas eucariotas. En una segunda parte, investigamos el efecto que tienen los factores limitantes de la predicción de contactos: la insuficiente cantidad de secuencias disponibles, los sesgos derivados de la conservación de las posiciones y la falta de independencia entre las secuencias debidas a la filogenia subyacente. Nuestros resultados muestran que existen predicciones de interacciones correctas en casos con pocas secuencias que son difícilmente recuperables sin una metodología adecuada. Proponemos una metodología que, gracias al uso de distribuciones empíricas nulas obtenidas mediante la aleatorización de los alineamientos de partida, nos permite obtener un umbral específico para cada caso haciendo más comparable la señal entre casos. Este procedimiento mejora la calidad de las predicciones de forma notable, a la vez que permite rescatar predicciones correctas a partir de alineamientos con pocas secuencias. Nuestro trabajo realza el papel de la coevolución en la evolución de las proteínas, en procesos como la divergencia en secuencia y la conservación de la estructura, así como su potencial para la construcción de modelos tridimensionales de un considerable número de interacciones entre proteínas. Temas en los que queda aún un importante margen de progreso, especialmente en lo que respecta a un mejor tratamiento de las relaciones filogenéticas entre las secuencias.Cellular functions are based on convoluted networks of molecular interactions. Protein-protein interactions are one of the most important and prevalent of these interactions. The correct association of proteins im-poses strong constraints on the evolution of proteins. In this context, the term coevolution encompasses the evolutive interdependence between interacting proteins due to existing structural constraints, among other factors. Several methods have been developed to predict contacts between proteins from sequence covari-ation in multiple sequence alignments. In the last decade, the development of new computational methods and the increase of available sequences have improved the contact prediction performance remarkably. The main objectives of this thesis are a better understanding of sequence coevolution at protein interfaces and the improvement of contact prediction between proteins, with a focus on two of the main challenges in this field: the impossibility of predicting contacts in eukaryotes and the insufficient number of sequences for many protein families. In the first part of this thesis, we present the development of a computational approach to study the relation between coevolution and structural conservation at protein interfaces over a large evolutionary scale. The comparison of the coevolutionary signal detected in prokaryotic alignments to the structural divergence be-tween prokaryotic and eukaryotic homologs shows that the coevolutionary signal is associated with high structural conservation. This finding enables the correct projection of contact predictions from prokaryotes, where there is abundant sequence data, to distant but evolutionary related eukaryotic complexes. Thus, it is possible to extend the scope of application of coevolutionary methods to eukaryotic complexes. In the second part, we study the limiting factors of contact prediction between proteins: the reduced number of sequences available, the biases induced by sequence conservation and the lack of independence between sequences due to the underlying phylogeny. Our results show that correct predictions in cases with few se-quences are hard to recover using current methodologies. Here we propose a method that uses empirical null distributions obtained through randomizations of the input alignments to estimate a specific threshold for each case that makes the signal more comparable between cases. This method significantly improves the quality of the predictions and recovers correct predictions even for alignments with few sequences. This work underlines the crucial role of coevolution in protein evolution, in processes such as sequence di-vergence and structural conservation, as well as its potential to build three-dimensional models for a consid-erable number of protein-protein interactions. These are areas in which there is still room for improvement, especially in handling the phylogenetic relations among sequences

    APPRIS WebServer and WebServices

    No full text
    corecore