44 research outputs found

    Amplitude spectrum distance: measuring the global shape divergence of protein fragments

    Get PDF
    International audienceBackground: In structural bioinformatics, there is an increasing interest in identifying and understanding the evolution of local protein structures regarded as key structural or functional protein building blocks. A central need is then to compare these, possibly short, fragments by measuring efficiently and accurately their (dis)similarity. Progress towards this goal has given rise to scores enabling to assess the strong similarity of fragments. Yet, there is still a lack of more progressive scores, with meaningful intermediate values, for the comparison, retrieval or clustering of distantly related fragments. Results: We introduce here the Amplitude Spectrum Distance (ASD), a novel way of comparing protein fragments based on the discrete Fourier transform of their C α distance matrix. Defined as the distance between their amplitude spectra, ASD can be computed efficiently and provides a parameter-free measure of the global shape dissimilarity of two fragments. ASD inherits from nice theoretical properties, making it tolerant to shifts, insertions, deletions, circular permutations or sequence reversals while satisfying the triangle inequality. The practical interest of ASD with respect to RMSD, RMSDd , BC and TM scores is illustrated through zinc finger retrieval experiments and concrete structure examples. The benefits of ASD are also illustrated by two additional clustering experiments: domain linkers fragments and complementarity-determining regions of antibodies.Conclusions: Taking advantage of the Fourier transform to compare fragments at a global shape level, ASD is an objective and progressive measure taking into account the whole fragments. Its practical computation time and its properties make ASD particularly relevant for applications requiring meaningful measures on distantly related protein fragments, such as similar fragments retrieval asking for high recalls as shown in the experiments, or for any application taking also advantage of triangle inequality, such as fragments clustering. ASD program and source code are freely available at: http://www.irisa.fr/dyliss/public/ASD/

    Fragments structuraux : comparaison, prédictibilité à partir de la séquence et application à l'identification de protéines de virus

    Get PDF
    This thesis investigates the local characterization of protein families at both structural and sequential level. We introduce contact fragments (CF) as parts of protein structure that conciliate spatial locality together with sequential neighborhood. We show that the predictability of CF from the sequence is better than that of contiguous fragments and of structurally distant pairs of fragments. In order to structurally compare CF, we introduce ASD, a novel alignment-free dissimilarity measure that respects triangular inequality while being tolerant to sequence shifts and indels. We show that ASD outperforms classical scores for fragment comparison on practical experiments such that unsupervised classification and structural mining. Ultimately, by integrating the identification of CF from the sequence into a statistical machine learning framework, we developed VIRALpro, a tool that enables the detection of sequences of viral structural proteins.Cette thĂšse propose de nouveaux outils pour la caractĂ©risation locale de familles de protĂ©ines au niveau de la sĂ©quence et de la structure. Nous introduisons les fragments en contact (CF) comme des portions de structure conciliant localitĂ© spatiale et voisinage sĂ©quentiel. Nous montrons qu'ils bĂ©nĂ©ficient d'une meilleure prĂ©dictibilitĂ© de structure depuis la sĂ©quence que des fragments contigus ou encore que des paires de fragments qui ne seraient pas en contact en structure. Pour comparer structuralement ces CF, nous introduisons l'ASD, une nouvelle mesure de similaritĂ© ne nĂ©cessitant pas d'alignement prĂ©alable, respectant l'inĂ©galitĂ© triangulaire tout en Ă©tant tolĂ©rante aux dĂ©calages de sĂ©quences et aux indels. Nous montrons notamment que l'ASD offre des meilleures performances que les scores classiques de comparaison de fragments sur des tĂąches concrĂštes de classification non-supervisĂ©e et de fouille structurale. Enfin, grĂące Ă  des techniques d'apprentissage automatique, nous mettrons en Ɠuvre la dĂ©tection de CF Ă  partir de la sĂ©quence pour l'identification de protĂ©ines de virus avec l'outil VIRALpro dĂ©veloppĂ© au cours de cette thĂšse

    Identifying distant homologous viral sequences in metagenomes using protein structure information

    Get PDF
    International audienceIt is estimated that marine viruses daily kill about 20% of the ocean biomass. Identifying them in water samples is thus a biological issue of great importance. The metagenomic approach for virus identication is a challenging task since their sequences carry a lot of mutations making them hardly possible to find by standard homology searches. The PEPS VAG project aims at establishing a novel methodology that uses structures of proteins as extra-information in order to annotate metagenomes without relying on homology of sequences. In the context of the first experiments made on the metagenome of station 23 of the TARA Ocean Project, we use the structures of capsid protein to infer the sequence signature of their fold, in order to find them in the metagenome. We present here the methodology, the first experiments and the on-going improvements

    Structural fragments : comparison, predictability from the sequence and application to the identification of viral structural proteins

    No full text
    Cette thĂšse propose de nouveaux outils pour la caractĂ©risation locale de familles de protĂ©ines au niveau de la sĂ©quence et de la structure. Nous introduisons les fragments en contact (CF) comme des portions de structure conciliant localitĂ© spatiale et voisinage sĂ©quentiel. Nous montrons qu'ils bĂ©nĂ©ficient d'une meilleure prĂ©dictibilitĂ© de structure depuis la sĂ©quence que des fragments contigus ou encore que des paires de fragments qui ne seraient pas en contact en structure. Pour comparer structuralement ces CF, nous introduisons l'ASD, une nouvelle mesure de similaritĂ© ne nĂ©cessitant pas d'alignement prĂ©alable, respectant l'inĂ©galitĂ© triangulaire tout en Ă©tant tolĂ©rante aux dĂ©calages de sĂ©quences et aux indels. Nous montrons notamment que l'ASD offre des meilleures performances que les scores classiques de comparaison de fragments sur des tĂąches concrĂštes de classification non-supervisĂ©e et de fouille structurale. Enfin, grĂące Ă  des techniques d'apprentissage automatique, nous mettrons en Ɠuvre la dĂ©tection de CF Ă  partir de la sĂ©quence pour l'identification de protĂ©ines de virus avec l'outil VIRALpro dĂ©veloppĂ© au cours de cette thĂšse.This thesis investigates the local characterization of protein families at both structural and sequential level. We introduce contact fragments (CF) as parts of protein structure that conciliate spatial locality together with sequential neighborhood. We show that the predictability of CF from the sequence is better than that of contiguous fragments and of structurally distant pairs of fragments. In order to structurally compare CF, we introduce ASD, a novel alignment-free dissimilarity measure that respects triangular inequality while being tolerant to sequence shifts and indels. We show that ASD outperforms classical scores for fragment comparison on practical experiments such that unsupervised classification and structural mining. Ultimately, by integrating the identification of CF from the sequence into a statistical machine learning framework, we developed VIRALpro, a tool that enables the detection of sequences of viral structural proteins

    VIRALpro: a tool to identify viral capsid and tail sequences.

    No full text

    Structural conservation of remote homologues: better and further in contact fragments

    Get PDF
    International audienceWe address here a basic question on sequence-structure relationships in proteins: does a protein sequence depict a structure with a uniform faithfulness all along the sequence ? We investigate this question by defining contact fragments and show that their sequence homologs are significantly more faithful to structure than randomly chosen fragments
    corecore