3 research outputs found

    Comparison of protein structures by growing neighborhood alignments

    Get PDF
    BACKGROUND: Design of protein structure comparison algorithm is an important research issue, having far reaching implications. In this article, we describe a protein structure comparison scheme, which is capable of detecting correct alignments even in difficult cases, e.g. non-topological similarities. The proposed method computes protein structure alignments by comparing, small substructures, called neighborhoods. Two different types of neighborhoods, sequence and structure, are defined, and two algorithms arising out of the scheme are detailed. A new method for computing equivalences having non-topological similarities from pairwise similarity score is described. A novel and fast technique for comparing sequence neighborhoods is also developed. RESULTS: The experimental results show that the current programs show better performance on Fischer and Novotny's benchmark datasets, than state of the art programs, e.g. DALI, CE and SSM. Our programs were also found to calculate correct alignments for proteins with huge amount of indels and internal repeats. Finally, the sequence neighborhood based program was used in extensive fold and non-topological similarity detection experiments. The accuracy of the fold detection experiments with the new measure of similarity was found to be similar or better than that of the standard algorithm CE. CONCLUSION: A new scheme, resulting in two algorithms, have been developed, implemented and tested. The programs developed are accessible at

    A novel method to compare protein structures using local descriptors

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships.</p> <p>Results</p> <p>We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy).</p> <p>Conclusions</p> <p>DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at <url>http://bioexploratorium.pl/EP/DEDAL</url>.</p

    Improving sequence similarity search for protein homology induction using structural data and machine learning methods

    Get PDF
    Παρουσιάζουμε μια νέα μέθοδο ανίχνευσης ομολογίας πρωτεϊνών που συνδυάζει δεδομένα σύγκρισης αλληλουχίας και δομών πρωτεϊνών για την ανίχνευση ομολογίας. Η σύγκριση ομοιότητας αλληλουχιών είναι το πιο συχνά χρησιμοποιούμενο μέτρο ανίχνευσης ομολογίας. Σε κάποιες περιπτώσεις, ομόλογες πρωτεΐνες μπορεί να παρουσιάζουν ελάχιστη (&lt; 35% ) ομοιότητα αλληλουχιών. Στην μέση του φάσματος των αποτελεσμάτων της σύγκρισης ομοιότητας αλληλουχιών, η απόκλιση αυτή οδηγεί σε λάθος ταξινόμηση. Η ζώνη των πρωτεϊνών, όπου παρουσιάζεται το μέγιστο σφάλμα, ονομάζεται διφορούμενη ζώνη (twilight zone) των πρωτεϊνών. Η νέα μέθοδος περιλαμβάνει αναταξινόμηση των αποτελεσμάτων αυτών, από το PSI-BLAST, σε «αληθώς θετικά» και «αληθώς αρνητικά» με χρήση ενός ταξινόμητη που χρησιμοποιεί πληροφορίες δομής πρωτεϊνών. Διάφοροι παραμετρικοί και μη παραμετρικοί ταξινομητές, καθώς και συνδυασμοί ταξινομητών, συγκρίθηκαν με τυπικά μέτρα αξιολόγησης. Αναπτύχθηκαν ταξινομητές, με χρήση δεδομένων σύγκρισης δομής, και στη συνέχεια χρησιμοποιήθηκαν για την αναταξινόμηση των πρωτεϊνών της διφορούμενης ζώνης. Παρέχουμε στατιστικά στοιχεία που υποστηρίζουν την διαχωρισιμότητα των δύο τάξεων και παρουσιάζουμε τα αποτελέσματα των διαφόρων ταξινομητών. Δοκιμάστηκαν ταξινομητές για διαφορετικούς συνδυασμούς δομικών χαρακτηριστικών που ανακτήθηκαν από την σύγκριση των δομών. Τα αποτελέσματά μας επιβεβαιώνουν ότι η προσέγγιση αυτή μπορεί να βελτίωσει σημαντικά την ανίχνευσης ομολογίας, μειώνοντας τα σφάλματα που συμβαίνουν στη διφορούμενη ζώνη κατά την σύγκριση πρωτεϊνών βάση της αλληλουχίας τους.We present a work flow for improving Homology detection for Proteins which combines protein sequence and structure comparison data for detecting Homology. Sequence similarity measures are the most commonly used tool for homology detection. Evolutionary divergence can lead to homologous proteins having very little sequence similarity. In the middle range of proteins found in the sequence similarity results this divergence leads to error in homology classification. This zone where maximum error occurs is referred to as the protein twilight zone. The work flow presented involves reclassifying twilight zone proteins, in the PSI-BLAST results, into &apos;true positives&apos; and &apos;true negatives&apos;. The reclassification is done using a Classifier built from the structural data. Several parametric, non-parametric and committee classifiers were compared on standard metrics. Classifiers were built using structure comparison data and subsequently used for reclassifying the twilight zone proteins. We provide statistical data supporting the separability of the two classes and subsequently provide results of classification using various classifiers. Combinations of structural features extracted were tried. Our tests show that the approach can be successfully used to improve a homology detection work flow by reducing errors that occur in the &apos;twilight zone&apos; when plain sequence comparison is used as a metric
    corecore