28 research outputs found

    Amplitude spectrum distance: measuring the global shape divergence of protein fragments

    Get PDF
    International audienceBackground: In structural bioinformatics, there is an increasing interest in identifying and understanding the evolution of local protein structures regarded as key structural or functional protein building blocks. A central need is then to compare these, possibly short, fragments by measuring efficiently and accurately their (dis)similarity. Progress towards this goal has given rise to scores enabling to assess the strong similarity of fragments. Yet, there is still a lack of more progressive scores, with meaningful intermediate values, for the comparison, retrieval or clustering of distantly related fragments. Results: We introduce here the Amplitude Spectrum Distance (ASD), a novel way of comparing protein fragments based on the discrete Fourier transform of their C α distance matrix. Defined as the distance between their amplitude spectra, ASD can be computed efficiently and provides a parameter-free measure of the global shape dissimilarity of two fragments. ASD inherits from nice theoretical properties, making it tolerant to shifts, insertions, deletions, circular permutations or sequence reversals while satisfying the triangle inequality. The practical interest of ASD with respect to RMSD, RMSDd , BC and TM scores is illustrated through zinc finger retrieval experiments and concrete structure examples. The benefits of ASD are also illustrated by two additional clustering experiments: domain linkers fragments and complementarity-determining regions of antibodies.Conclusions: Taking advantage of the Fourier transform to compare fragments at a global shape level, ASD is an objective and progressive measure taking into account the whole fragments. Its practical computation time and its properties make ASD particularly relevant for applications requiring meaningful measures on distantly related protein fragments, such as similar fragments retrieval asking for high recalls as shown in the experiments, or for any application taking also advantage of triangle inequality, such as fragments clustering. ASD program and source code are freely available at: http://www.irisa.fr/dyliss/public/ASD/

    Fragments structuraux : comparaison, prédictibilité à partir de la séquence et application à l'identification de protéines de virus

    Get PDF
    This thesis investigates the local characterization of protein families at both structural and sequential level. We introduce contact fragments (CF) as parts of protein structure that conciliate spatial locality together with sequential neighborhood. We show that the predictability of CF from the sequence is better than that of contiguous fragments and of structurally distant pairs of fragments. In order to structurally compare CF, we introduce ASD, a novel alignment-free dissimilarity measure that respects triangular inequality while being tolerant to sequence shifts and indels. We show that ASD outperforms classical scores for fragment comparison on practical experiments such that unsupervised classification and structural mining. Ultimately, by integrating the identification of CF from the sequence into a statistical machine learning framework, we developed VIRALpro, a tool that enables the detection of sequences of viral structural proteins.Cette thèse propose de nouveaux outils pour la caractérisation locale de familles de protéines au niveau de la séquence et de la structure. Nous introduisons les fragments en contact (CF) comme des portions de structure conciliant localité spatiale et voisinage séquentiel. Nous montrons qu'ils bénéficient d'une meilleure prédictibilité de structure depuis la séquence que des fragments contigus ou encore que des paires de fragments qui ne seraient pas en contact en structure. Pour comparer structuralement ces CF, nous introduisons l'ASD, une nouvelle mesure de similarité ne nécessitant pas d'alignement préalable, respectant l'inégalité triangulaire tout en étant tolérante aux décalages de séquences et aux indels. Nous montrons notamment que l'ASD offre des meilleures performances que les scores classiques de comparaison de fragments sur des tâches concrètes de classification non-supervisée et de fouille structurale. Enfin, grâce à des techniques d'apprentissage automatique, nous mettrons en œuvre la détection de CF à partir de la séquence pour l'identification de protéines de virus avec l'outil VIRALpro développé au cours de cette thèse

    Identifying distant homologous viral sequences in metagenomes using protein structure information

    Get PDF
    International audienceIt is estimated that marine viruses daily kill about 20% of the ocean biomass. Identifying them in water samples is thus a biological issue of great importance. The metagenomic approach for virus identication is a challenging task since their sequences carry a lot of mutations making them hardly possible to find by standard homology searches. The PEPS VAG project aims at establishing a novel methodology that uses structures of proteins as extra-information in order to annotate metagenomes without relying on homology of sequences. In the context of the first experiments made on the metagenome of station 23 of the TARA Ocean Project, we use the structures of capsid protein to infer the sequence signature of their fold, in order to find them in the metagenome. We present here the methodology, the first experiments and the on-going improvements

    Structural fragments : comparison, predictability from the sequence and application to the identification of viral structural proteins

    No full text
    Cette thèse propose de nouveaux outils pour la caractérisation locale de familles de protéines au niveau de la séquence et de la structure. Nous introduisons les fragments en contact (CF) comme des portions de structure conciliant localité spatiale et voisinage séquentiel. Nous montrons qu'ils bénéficient d'une meilleure prédictibilité de structure depuis la séquence que des fragments contigus ou encore que des paires de fragments qui ne seraient pas en contact en structure. Pour comparer structuralement ces CF, nous introduisons l'ASD, une nouvelle mesure de similarité ne nécessitant pas d'alignement préalable, respectant l'inégalité triangulaire tout en étant tolérante aux décalages de séquences et aux indels. Nous montrons notamment que l'ASD offre des meilleures performances que les scores classiques de comparaison de fragments sur des tâches concrètes de classification non-supervisée et de fouille structurale. Enfin, grâce à des techniques d'apprentissage automatique, nous mettrons en œuvre la détection de CF à partir de la séquence pour l'identification de protéines de virus avec l'outil VIRALpro développé au cours de cette thèse.This thesis investigates the local characterization of protein families at both structural and sequential level. We introduce contact fragments (CF) as parts of protein structure that conciliate spatial locality together with sequential neighborhood. We show that the predictability of CF from the sequence is better than that of contiguous fragments and of structurally distant pairs of fragments. In order to structurally compare CF, we introduce ASD, a novel alignment-free dissimilarity measure that respects triangular inequality while being tolerant to sequence shifts and indels. We show that ASD outperforms classical scores for fragment comparison on practical experiments such that unsupervised classification and structural mining. Ultimately, by integrating the identification of CF from the sequence into a statistical machine learning framework, we developed VIRALpro, a tool that enables the detection of sequences of viral structural proteins

    VIRALpro: a tool to identify viral capsid and tail sequences.

    No full text

    Structural conservation of remote homologues: better and further in contact fragments

    Get PDF
    International audienceWe address here a basic question on sequence-structure relationships in proteins: does a protein sequence depict a structure with a uniform faithfulness all along the sequence ? We investigate this question by defining contact fragments and show that their sequence homologs are significantly more faithful to structure than randomly chosen fragments

    VIRALpro: a tool to identify viral capsid and tail sequences

    No full text
    International audienceABSTRACTMotivation: Not only sequence data continues to outpace annotation information, but the problem is further exacerbated when organisms are underrepresented in the annotation databases. This is the case with non human-pathogenic viruses which occur frequently in metagenomic projects. Thus there is a need for tools capable of detecting and classifying viral sequences. Results: We describe VIRALpro a new effective tool for identifying capsid and tail protein sequences, which are the cornerstones toward viral sequence annotation and viral genome classification.Availability: The data, software, and corresponding web server are available from: http://scratch.proteomics.ics.uci.edu as part of the SCRATCH suite

    VIRALpro: a tool to identify viral capsid and tail sequences

    No full text
    MotivationNot only sequence data continue to outpace annotation information, but also the problem is further exacerbated when organisms are underrepresented in the annotation databases. This is the case with non-human-pathogenic viruses which occur frequently in metagenomic projects. Thus, there is a need for tools capable of detecting and classifying viral sequences.ResultsWe describe VIRALpro a new effective tool for identifying capsid and tail protein sequences, which are the cornerstones toward viral sequence annotation and viral genome classification.Availability and implementationThe data, software and corresponding web server are available from http://scratch.proteomics.ics.uci.edu as part of the SCRATCH [email protected] or [email protected] informationSupplementary data are available at Bioinformatics online
    corecore