Location of Repository

SIMAP: the similarity matrix of proteins

By Thomas Rattei, Roland Arnold, Patrick Tischler, Dominik Lindner, Volker Stümpflen and H. Werner Mewes

Abstract

Similarity Matrix of Proteins (SIMAP) () provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith–Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches

Topics: Article
Publisher: Oxford University Press
OAI identifier: oai:pubmedcentral.nih.gov:1347468
Provided by: PubMed Central
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://www.pubmedcentral.nih.g... (external link)
  • Suggested articles

    Preview

    Citations

    1. (2005). 30 The Universal Protein Resource (UniProt).
    2. (2001). 40 Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.
    3. (1990). 80 A basic local alignment search tool.
    4. (1999). Assigning protein functions by comparative genome analysis: protein phylogenetic profiles.
    5. Bendtsen,J.D.,Nielsen,H.,vonHeijne,G.andBrunak,S.(2004)Improved prediction of signal peptides:
    6. Binns,D.,Bradley,P.,Bork,P.,Bucher,P.,Cerutti,L.etal.(2005)InterPro, progress and status in 2005.
    7. (2001). CluSTr: a database of clusters of SWISS-PROT+TrEMBL proteins.
    8. (2002). Connected gene 105 neighborhoods in prokaryotic genomes.
    9. (2000). Flexible sequence similarity searching with the FASTA3 program package.
    10. Gonnet,G.H.,Cohen,M.A.andBrenner,S.A.(1992)Exhaustivematching of the entire protein sequence database.
    11. (1981). Identification of common molecular subsequences.
    12. (2005). Inparanoid: a comprehensive database of eukaryotic orthologs.
    13. (2004). MIPS Arabidopsis thaliana Database (MAtDB): an integrated 20 biological knowledge resource for plant genomics.
    14. (2002). NBLAST: a cluster variant of BLAST for NxN comparisons.
    15. (1982). Patterns of nucleotide substitution in pseudogenes and functional genes.
    16. (2002). Predicting functional linkages from gene fusions with confidence.
    17. (2000). Predicting subcellular localization of proteins based on their N-terminal amino acid sequence.
    18. (2003). ProtoNet: hierarchical classification of the protein space.
    19. (1991). Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith–Waterman and FASTA algorithms.
    20. (2003). The Protein Data Bank and structural genomics.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.