8 research outputs found

    RepeatsDB in 2021: Improved data and extended classification for protein tandem repeat structures

    Get PDF
    The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.Fil: Paladin, Lisanna. Università di Padova; ItaliaFil: Bevilacqua, Martina. Università di Padova; ItaliaFil: Errigo, Sara. Università di Padova; ItaliaFil: Piovesan, Damiano. Università di Padova; ItaliaFil: Mičetić, Ivan. Università di Padova; ItaliaFil: Necci, Marco. Università di Padova; ItaliaFil: Monzon, Alexander Miguel. Università di Padova; ItaliaFil: Fabre, Maria Laura. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Ciencias Biológicas; ArgentinaFil: López, José Luis. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Ciencias Biológicas; ArgentinaFil: Nilsson, Juliet Fernanda. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Departamento de Ciencias Biológicas; ArgentinaFil: Ríos, Javier Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Lorenzano Menna, Pablo. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Cabrera, Maia Diana Eliana. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: González Buitrón, Martín. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Gonçalves Kulik, Mariane. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Fernández Alberti, Sebastián. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Fornasari, Maria Silvina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Parisi, Gustavo Daniel. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Lagares, Antonio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; Argentina. Universidad Nacional de La Plata. Facultad de Ciencias Agrarias y Forestales. Departamento de Ciencias Biológicas; ArgentinaFil: Hirsh, Layla. Pontificia Universidad Católica de Perú; PerúFil: Andrade Navarro, Miguel A.. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Kajava, Andrey V. Centre National de la Recherche Scientifique; FranciaFil: Tosatto, Silvio C E. Università di Padova; Itali

    RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

    Get PDF
    The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.Facultad de Ciencias ExactasInstituto de Biotecnologia y Biologia Molecula

    Identificação e avaliação de padrões repetitivos no proteoma de Trypanosoma Cruzi

    Get PDF
    Orientador: Prof. Dr. Doutor Roberto Tadeu Raittz.Coorientador: Prof. Dr. Wanderson Duarte da Rocha.Dissertação (mestrado) - Universidade Federal do Paraná, Setor de Educação Profissional e Tecnológica, Programa de Pós-Graduação em Bioinformática. Defesa : Curitiba, 28/04/2019.Inclui referências: p. 92-98.Resumo: Regiões repetidas são primordiais para a sobrevivência de Trypanossoma cruzi, pois tem sido atribuído a elas um papel importante no processo de evasão do sistema imune de hospedeiros mamíferos. A compreensão de muitas das funções que estas características exercem e/ou mecanismos ainda nos escapa, fazendo com que a patologia causada por este parasito, a Doença de Chagas, ainda não tenha cura definitiva. Alem disso, o diagnóstico desta doenca ainda é limitante. Neste estudo, nós aplicamos novas técnicas de bioinformática para anotação e análise exploratória de tandem repeats (TRs). Nós também realizamos análises de preferência de códons no proteoma completo e aplicamos valores de cobertura de transcriptoma para avaliarmos possíveis diferenças de seleção de códons para diferentes etapas do ciclo de vida. Verificamos que alguns aminoácidos apresentaram divergências, porém a grande maioria apresenta as preferências do genoma, enquanto que entre as etapas do ciclo de vida os padrões são sempre os mesmos. Ao compararmos as preferências globais com as regiões de TRs, verificamos que nas proteínas transmembrana, elas apresentam características distintas que podem indicar um meio de suprimir a expressão destes genes. Aprofundando nossas análises de TRs, nós realizamos a anotação de epitopos de células B nessas regiões e aplicamos dados de transcritoma buscando os melhores candidatos para novos alvos de teste de diagnóstico. Além de alguns dos antígenos já conhecidos, fomos capazes de identificar outros candidatos promissores a testes experimentais. Ao final do processo nós aplicamos as lições aprendidas com identificação de Tandem Repeats na geração de um modelo capaz de classificar sequências com e sem TRs, atingindo acurácia de 80%. O modelo desenvolvido aqui permitirá identificar TRs conservados em outros organismos patogênicos, bons alvos para anotação de epitopos de célula B para testes diagnósticos.Abstract: Repeated regions are crucial to Trypanosoma cruzi survival, since it has been assigned to them an important role in the evasion of the mammalian host immune system. Many of their functions and/or mechanisms remain unknown, and the definitive cure for the desease caused by it, Chagas Desease, still deceives us. In addition, its diagnosis is still limiting. Here, we applied new bioinformatics techniques to annotate and perform exploratory analysis of Tandem Repeats (TRs). We also performed codon preference analysis on the complete proteome and applied transcriptome coverage values to assess differences in codon selection at different stages of the life cycle. We found that some of the amino-acids presented divergency, but most of them share the preferences of the genome, while between the stages of the life cycle the patterns are always the same. When the preferences of the TR regions were compared to the global ones, we found that, for transmembrane proteins, these preferences presented some distinct characteristics, which may suggest a way to suppress these genes. We then annotated B-cell epitopes in these TR regions and applied transcriptome data on them, looking for better targets for diagnostic tests. In addition to some well-known antigens, we were able to find other promising candidates to future experimental testing. As the last task in this process, we applied the lessons learned to find Tandem Repeats in an AI model. It is able to classify sequences with and without TRs, with an accuracy of 80%. This model will allow the identification of sequences with conserved TRs in other pathogenic organisms, which will be good targets for B-cell epitope tools and future diagnostic tests

    One Step Closer to the Understanding of the Relationship IDR-LCR-Structure

    No full text
    Intrinsically disordered regions (IDRs) in protein sequences are emerging as functionally important elements for interaction and regulation. While being generally flexible, we previously showed, by observation of experimentally obtained structures, that they contain regions of reduced sequence complexity that have an increased propensity to form structure. Here we expand the universe of cases taking advantage of structural predictions by AlphaFold. Our studies focus on low complexity regions (LCRs) found within IDRs, where these LCRs have only one or two residue types (polyX and polyXY, respectively). In addition to confirming previous observations that polyE and polyEK have a tendency towards helical structure, we find a similar tendency for other LCRs such as polyQ and polyER, most of them including charged residues. We analyzed the position of polyXY containing IDRs within proteins, which allowed us to show that polyAG and polyAK accumulate at the N-terminal, with the latter showing increased helical propensity at that location. Functional enrichment analysis of polyXY with helical propensity indicated functions requiring interaction with RNA and DNA. Our work adds evidence of the function of LCRs in interaction-dependent structuring of disordered regions, encouraging the development of tools for the prediction of their dynamic structural properties

    Low Complexity Induces Structure in Protein Regions Predicted as Intrinsically Disordered

    No full text
    There is increasing evidence that many intrinsically disordered regions (IDRs) in proteins play key functional roles through interactions with other proteins or nucleic acids. These interactions often exhibit a context-dependent structural behavior. We hypothesize that low complexity regions (LCRs), often found within IDRs, could have a role in inducing local structure in IDRs. To test this, we predicted IDRs in the human proteome and analyzed their structures or those of homologous sequences in the Protein Data Bank (PDB). We then identified two types of simple LCRs within IDRs: regions with only one (polyX or homorepeats) or with only two types of amino acids (polyXY). We were able to assign structural information from the PDB more often to these LCRs than to the surrounding IDRs (polyX 61.8% > polyXY 50.5% > IDRs 39.7%). The most frequently observed polyX and polyXY within IDRs contained E (Glu) or G (Gly). Structural analyses of these sequences and of homologs indicate that polyEK regions induce helical conformations, while the other most frequent LCRs induce coil structures. Our work proposes bioinformatics methods to help in the study of the structural behavior of IDRs and provides a solid basis suggesting a structuring role of LCRs within them

    Structured Tandem Repeats in Protein Interactions

    No full text
    Tandem repeats (TRs) in protein sequences are consecutive, highly similar sequence motifs. Some types of TRs fold into structural units that pack together in ensembles, forming either an (open) elongated domain or a (closed) propeller, where the last unit of the ensemble packs against the first one. Here, we examine TR proteins (TRPs) to see how their sequence, structure, and evolutionary properties favor them for a function as mediators of protein interactions. Our observations suggest that TRPs bind other proteins using large, structured surfaces like globular domains; in particular, open-structured TR ensembles are favored by flexible termini and the possibility to tightly coil against their targets. While, intuitively, open ensembles of TRs seem prone to evolve due to their potential to accommodate insertions and deletions of units, these evolutionary events are unexpectedly rare, suggesting that they are advantageous for the emergence of the ancestral sequence but are early fixed. We hypothesize that their flexibility makes it easier for further proteins to adapt to interact with them, which would explain their large number of protein interactions. We provide insight into the properties of open TR ensembles, which make them scaffolds for alternative protein complexes to organize genes, RNA and proteins.Fil: Mac Donagh, Juan. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Marchesini, Abril. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Biotecnología y Biología Molecular. Universidad Nacional de La Plata. Facultad de Ciencias Exactas. Instituto de Biotecnología y Biología Molecular; ArgentinaFil: Spiga, Agostina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Nacional de Quilmes. Departamento de Ciencia y Tecnología; ArgentinaFil: Fallico, Maximiliano José. Universidad Nacional de La Plata. Facultad de Ciencas Exactas. Laboratorio de Investigación y Desarrollo de Bioactivos; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata; ArgentinaFil: Arrias, Paula Nazarena. Università di Padova; ItaliaFil: Monzon, Alexander Miguel. Dipartamento Di Ingegneria Dell' Informazione ; Universita Degli Studi Di Padova;Fil: Vagiona, Aimilia Christina. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Gonçalves Kulik, Mariane. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Mier, Pablo. Johannes Gutenberg Universitat Mainz; AlemaniaFil: Andrade Navarro, Miguel A.. Johannes Gutenberg Universitat Mainz; Alemani

    Structured Tandem Repeats in Protein Interactions

    No full text
    Tandem repeats (TRs) in protein sequences are consecutive, highly similar sequence motifs. Some types of TRs fold into structural units that pack together in ensembles, forming either an (open) elongated domain or a (closed) propeller, where the last unit of the ensemble packs against the first one. Here, we examine TR proteins (TRPs) to see how their sequence, structure, and evolutionary properties favor them for a function as mediators of protein interactions. Our observations suggest that TRPs bind other proteins using large, structured surfaces like globular domains; in particular, open-structured TR ensembles are favored by flexible termini and the possibility to tightly coil against their targets. While, intuitively, open ensembles of TRs seem prone to evolve due to their potential to accommodate insertions and deletions of units, these evolutionary events are unexpectedly rare, suggesting that they are advantageous for the emergence of the ancestral sequence but are early fixed. We hypothesize that their flexibility makes it easier for further proteins to adapt to interact with them, which would explain their large number of protein interactions. We provide insight into the properties of open TR ensembles, which make them scaffolds for alternative protein complexes to organize genes, RNA and proteins
    corecore