24 research outputs found

    SARA: a server for function annotation of RNA structures

    Get PDF
    Recent interest in non-coding RNA transcripts has resulted in a rapid increase of deposited RNA structures in the Protein Data Bank. However, a characterization and functional classification of the RNA structure and function space have only been partially addressed. Here, we introduce the SARA program for pair-wise alignment of RNA structures as a web server for structure-based RNA function assignment. The SARA server relies on the SARA program, which aligns two RNA structures based on a unit-vector root-mean-square approach. The likely accuracy of the SARA alignments is assessed by three different P-values estimating the statistical significance of the sequence, secondary structure and tertiary structure identity scores, respectively. Our benchmarks, which relied on a set of 419 RNA structures with known SCOR structural class, indicate that at a negative logarithm of mean P-value higher or equal than 2.5, SARA can assign the correct or a similar SCOR class to 81.4% and 95.3% of the benchmark set, respectively. The SARA server is freely accessible via the World Wide Web at http://sgu.bioinfo.cipf.es/services/SARA/

    Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within.</p> <p>Results</p> <p>Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions.</p> <p>Conclusions</p> <p>Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.</p

    The distance-profile representation and its application to detection of distantly related protein families

    Get PDF
    BACKGROUND: Detecting homology between remotely related protein families is an important problem in computational biology since the biological properties of uncharacterized proteins can often be inferred from those of homologous proteins. Many existing approaches address this problem by measuring the similarity between proteins through sequence or structural alignment. However, these methods do not exploit collective aspects of the protein space and the computed scores are often noisy and frequently fail to recognize distantly related protein families. RESULTS: We describe an algorithm that improves over the state of the art in homology detection by utilizing global information on the proximity of entities in the protein space. Our method relies on a vectorial representation of proteins and protein families and uses structure-specific association measures between proteins and template structures to form a high-dimensional feature vector for each query protein. These vectors are then processed and transformed to sparse feature vectors that are treated as statistical fingerprints of the query proteins. The new representation induces a new metric between proteins measured by the statistical difference between their corresponding probability distributions. CONCLUSION: Using several performance measures we show that the new tool considerably improves the performance in recognizing distant homologies compared to existing approaches such as PSIBLAST and FUGUE

    Protein family comparison using statistical models and predicted structural information

    Get PDF
    BACKGROUND: This paper presents a simple method to increase the sensitivity of protein family comparisons by incorporating secondary structure (SS) information. We build upon the effective information theory approach towards profile-profile comparison described in [Yona & Levitt 2002]. Our method augments profile columns using PSIPRED secondary structure predictions and assesses statistical similarity using information theoretical principles. RESULTS: Our tests show that this tool detects more similarities between protein families of distant homology than the previous primary sequence-based method. A very significant improvement in performance is observed when the real secondary structure is used. CONCLUSIONS: Integration of primary and secondary structure information can substantially improve detection of relationships between remotely related protein families

    Efficient protein alignment algorithm for protein search

    Get PDF
    © 2010 Lu et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution Licens

    BIOZON: a hub of heterogeneous biological data

    Get PDF
    Biological entities are strongly related and mutually dependent on each other. Therefore, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze them effectively. Biozon is a unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein–protein interactions and cellular pathways, and establishes the relationships between them. All data are integrated on to a single graph schema centered around the non-redundant set of biological objects that are shared by each source. This integration results in a highly connected graph structure that provides a more complete picture of the known context of a given object that cannot be determined from any one source. Currently, Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32 000 protein structures, 150 000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND. Biozon augments source data with locally derived data such as 5 billion pairwise protein alignments and 8 million structural alignments. The user may form complex cross-type queries on the graph structure, add similarity relations to form fuzzy queries and rank the results based on analysis of the edge structure similar to Google PageRank, online at

    Cloning and characterization of a pectin lyase gene from Colletotrichum lindemuthianum and comparative phylogenetic/structural analyses with genes from phytopathogenic and saprophytic/opportunistic microorganisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microorganisms produce cell-wall-degrading enzymes as part of their strategies for plant invasion/nutrition. Among these, pectin lyases (PNLs) catalyze the depolymerization of esterified pectin by a β-elimination mechanism. PNLs are grouped together with pectate lyases (PL) in Family 1 of the polysaccharide lyases, as they share a conserved structure in a parallel β-helix. The best-characterized fungal pectin lyases are obtained from saprophytic/opportunistic fungi in the genera <it>Aspergillus </it>and <it>Penicillium </it>and from some pathogens such as <it>Colletotrichum gloeosporioides</it>.</p> <p>The organism used in the present study, <it>Colletotrichum lindemuthianum</it>, is a phytopathogenic fungus that can be subdivided into different physiological races with different capacities to infect its host, <it>Phaseolus vulgaris</it>. These include the non-pathogenic and pathogenic strains known as races 0 and 1472, respectively.</p> <p>Results</p> <p>Here we report the isolation and sequence analysis of the <it>Clpnl2 </it>gene, which encodes the pectin lyase 2 of <it>C. lindemuthianum</it>, and its expression in pathogenic and non-pathogenic races of <it>C. lindemuthianum </it>grown on different carbon sources. In addition, we performed a phylogenetic analysis of the deduced amino acid sequence of Clpnl2 based on reported sequences of PNLs from other sources and compared the three-dimensional structure of Clpnl2, as predicted by homology modeling, with those of other organisms. Both analyses revealed an early separation of bacterial pectin lyases from those found in fungi and oomycetes. Furthermore, two groups could be distinguished among the enzymes from fungi and oomycetes: one comprising enzymes from mostly saprophytic/opportunistic fungi and the other formed mainly by enzymes from pathogenic fungi and oomycetes. Clpnl2 was found in the latter group and was grouped together with the pectin lyase from <it>C. gloeosporioides</it>.</p> <p>Conclusions</p> <p>The <it>Clpnl2 </it>gene of <it>C. lindemuthianum </it>shares the characteristic elements of genes coding for pectin lyases. A time-course analysis revealed significant differences between the two fungal races in terms of the expression of <it>Clpnl2 </it>encoding for pectin lyase 2. According to the results, pectin lyases from bacteria and fungi separated early during evolution. Likewise, the enzymes from fungi and oomycetes diverged in accordance with their differing lifestyles. It is possible that the diversity and nature of the assimilatory carbon substrates processed by these organisms played a determinant role in this phenomenon.</p

    New Algorithms for Protein Structure Comparison and Protein Structure Prediction

    Get PDF
    Proteins show a great variety of 3D conformations, which can be used to infer their evolutionary relationship and to classify them into more general groups; therefore algorithms of protein structure alignment, protein similarity search and protein structure prediction are very helpful for protein biologists. We developed new algorithms for the problems in this field. The algorithms are tested with structures from the Protein Data Bank (PDB) and SCOP, a Structure Classification of Protein Database. The experimental results show that our tools are more efficient than some well known systems for finding similar protein structures and predicting protein structures

    BIOZON: a system for unification, management and analysis of heterogeneous biological data

    Get PDF
    BACKGROUND: Integration of heterogeneous data types is a challenging problem, especially in biology, where the number of databases and data types increase rapidly. Amongst the problems that one has to face are integrity, consistency, redundancy, connectivity, expressiveness and updatability. DESCRIPTION: Here we present a system (Biozon) that addresses these problems, and offers biologists a new knowledge resource to navigate through and explore. Biozon unifies multiple biological databases consisting of a variety of data types (such as DNA sequences, proteins, interactions and cellular pathways). It is fundamentally different from previous efforts as it uses a single extensive and tightly connected graph schema wrapped with hierarchical ontology of documents and relations. Beyond warehousing existing data, Biozon computes and stores novel derived data, such as similarity relationships and functional predictions. The integration of similarity data allows propagation of knowledge through inference and fuzzy searches. Sophisticated methods of query that span multiple data types were implemented and first-of-a-kind biological ranking systems were explored and integrated. CONCLUSION: The Biozon system is an extensive knowledge resource of heterogeneous biological data. Currently, it holds more than 100 million biological documents and 6.5 billion relations between them. The database is accessible through an advanced web interface that supports complex queries, "fuzzy" searches, data materialization and more, online at

    Efficient algorithms and architectures for protein 3-D structure comparison

    Get PDF
    Η σύγκριση δομών πρωτεϊνών είναι ανεπτυγμένος τομέας της υπολογιστικής πρωτεϊνωμικής που χρησιμοποιείται ευρέως στη δομική βιολογία και την ανακάλυψη φαρμάκων. Οι αυξανόμενες υπολογιστικές απαιτήσεις του είναι αποτέλεσμα τριών παραγόντων: ταχεία επέκταση των βάσεων δεδομένων με νέες δομές πρωτεϊνών, υψηλή υπολογιστική πολυπλοκότητα των αλγορίθμων σύγκρισης δομών πρωτεϊνών κατά ζεύγη (PSC), και τάση χρήσης πολλαπλών μεθόδων σύγκρισης και συνδυασμού των αποτελεσμάτων τους (multi criteria protein structure comparison-MCPSC-), μιας και δεν υπάρχει PSC μέθοδος κοινά αποδεκτή ως η καλύτερη. Αναπτύξαμε πλαίσιο λογισμικού που εκμεταλλεύεται επεξεργαστές πολλών πυρήνων για την υλοποίηση παράλληλων στρατηγικών MCPSC με βάση τρεις δημοφιλείς PSC μεθόδους, τις TMalign, CE και USM. Συγκρίνουμε την απόδοση και αποδοτικότητα δύο παράλληλων υλοποιήσεων MCPSC στον πειραματικό επεξεργαστή δικτύου σε ψηφίδα (Network on Chip)  Intel Single-Chip Cloud Computer και τον δημοφιλή επεξεργαστή Intel Core i7. Επιπλέον, αναπτύξαμε εκτενές υπολογιστικό pipeline και υλοποίησή του με πρόγραμμα Python, που ονομάζεται pyMCPSC, που επιτρέπει στους χρήστες να εκτελούν MCPSC διεργασίες σε επεξεργαστές πολλαπλών πυρήνων. Το pyMCPSC, το οποίο συνδυάζει πέντε μεθόδους PSC και υποστηρίζει πέντε διαφορετικά σχήματα συναίνεσης MCPSC, υποστηρίζει τη συγκριτική ανάλυση μεγάλων συνόλων με δομές πρωτεϊνών και μπορεί να επεκταθεί ώστε να ενσωματώσει και νέες μεθόδους PSC στις βαθμολογίες συναίνεσης, καθώς αυτές καθίστανται διαθέσιμες.Protein Structure Comparison (PSC) is a well developed field of computational proteomics with active interest since it is widely used in structural biology and drug discovery. Fast increasing computational demand for all-to-all protein structures comparison is a result of mainly three factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise PSC algorithms, and the trend towards using multiple criteria for comparison and combining their results (MCPSC). In this thesis we have developed a software framework that exploits many-core and multi-core CPUs to implement efficient parallel MCPSC schemes in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of two parallel MCPSC implementations using Intel’s experimental many-core Single-Chip Cloud Computer (SCC) CPU as well as Intel’s Core i7 multi-core processor. Further, we have developed a dataset processing pipeline and implemented it in a Python utility, called pyMCPSC, allowing users to perform MCPSC efficiently on multi-core CPU. pyMCPSC, which combines five PSC methods and five different consensus scoring schemes, facilitates the analysis of similarities in protein domain datasets and can be easily extended to incorporate more PSC methods in the consensus scoring as they are becoming available
    corecore