Search CORE

89 research outputs found

SIMAP—structuring the network of protein similarities

Author: B. Wachinger
Bendtsen
Benson
Deshpande
Emanuelsson
Enright
F. Hamberger
Henikoff
J. Krebs
J. Krumsiek
Kaplan
Kriventseva
Krogh
Mulder
P. Tischler
Pruitt
R. Arnold
Schmidt
T. Rattei
V. Stumpflen
W. Mewes
Wu
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Protein sequences are the most important source of evolutionary and functional information for new proteins. In order to facilitate the computationally intensive tasks of sequence analysis, the Similarity Matrix of Proteins (SIMAP) database aims to provide a comprehensive and up-to-date dataset of the pre-calculated sequence similarity matrix and sequence-based features like InterPro domains for all proteins contained in the major public sequence databases. As of September 2007, SIMAP covers ∼17 million proteins and more than 6 million non-redundant sequences and provides a complete annotation based on InterPro 16. Novel features of SIMAP include a new, portlet-based web portal providing multiple, structured views on retrieved proteins and integration of protein clusters and a unique search method for similar domain architectures. Access to SIMAP is freely provided for academic use through the web portal for individuals at http://mips.gsf.de/simap/and through Web Services for programmatic access at http://mips.gsf.de/webservices/services/SimapService2.0?wsdl

Crossref

University of Birmingham Research Portal

PubMed Central

PuSH

SIMAP: the similarity matrix of proteins

Author: Arnold Roland
Lindner Dominik
Mewes H. Werner
Rattei Thomas
Stümpflen Volker
Tischler Patrick
Publication venue: Oxford University Press
Publication date: 01/01/2005
Field of study

Similarity Matrix of Proteins (SIMAP) () provides a database based on a pre-computed similarity matrix covering the similarity space formed by >4 million amino acid sequences from public databases and completely sequenced genomes. The database is capable of handling very large datasets and is updated incrementally. For sequence similarity searches and pairwise alignments, we implemented a grid-enabled software system, which is based on FASTA heuristics and the Smith–Waterman algorithm. Our ProtInfo system allows querying by protein sequences covered by the SIMAP dataset as well as by fragments of these sequences, highly similar sequences and title words. Each sequence in the database is supplemented with pre-calculated features generated by detailed sequence analyses. By providing WWW interfaces as well as web-services, we offer the SIMAP resource as an efficient and comprehensive tool for sequence similarity searches

Crossref

University of Birmingham Research Portal

PubMed Central

PuSH

SIMAP--the database of all-against-all protein sequence similarities and annotations with new interfaces and increased coverage

Author: Arnold Roland
Goldenberg Florian
Mewes Hans-Werner
Rattei Thomas
Publication venue: 'Oxford University Press (OUP)'
Publication date: 26/10/2013
Field of study

The Similarity Matrix of Proteins (SIMAP, http://mips.gsf.de/simap/) database has been designed to massively accelerate computationally expensive protein sequence analysis tasks in bioinformatics. It provides pre-calculated sequence similarities interconnecting the entire known protein sequence universe, complemented by pre-calculated protein features and domains, similarity clusters and functional annotations. SIMAP covers all major public protein databases as well as many consistently re-annotated metagenomes from different repositories. As of September 2013, SIMAP contains >163 million proteins corresponding to ∼70 million non-redundant sequences. SIMAP uses the sensitive FASTA search heuristics, the Smith–Waterman alignment algorithm, the InterPro database of protein domain models and the BLAST2GO functional annotation algorithm. SIMAP assists biologists by facilitating the interactive exploration of the protein sequence universe. Web-Service and DAS interfaces allow connecting SIMAP with any other bioinformatic tool and resource. All-against-all protein sequence similarity matrices of project-specific protein collections are generated on request. Recent improvements allow SIMAP to cover the rapidly growing sequenced protein sequence universe. New Web-Service interfaces enhance the connectivity of SIMAP. Novel tools for interactive extraction of protein similarity networks have been added. Open access to SIMAP is provided through the web portal; the portal also contains instructions and links for software access and flat file downloads

University of Birmingham Research Portal

PubMed Central

PuSH

A simple stochastic model for the evolution of protein lengths

Author: A. Meir
C. A. Voigt
C. Destri
C. Miccio
R. L. Graham
T. Ohta
Publication venue: 'American Physical Society (APS)'
Publication date: 26/03/2007
Field of study

We analyse a simple discrete-time stochastic process for the theoretical modeling of the evolution of protein lengths. At every step of the process a new protein is produced as a modification of one of the proteins already existing and its length is assumed to be random variable which depends only on the length of the originating protein. Thus a Random Recursive Trees (RRT) is produced over the natural integers. If (quasi) scale invariance is assumed, the length distribution in a single history tends to a lognormal form with a specific signature of the deviations from exact gaussianity. Comparison with the very large SIMAP protein database shows good agreement.Comment: 12 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Gene3D: comprehensive structural and functional annotation of genomes

Author: A. Reid
Altschul
Ashburner
Berman
C. Orengo
C. Yeats
Eddy
Finn
Gough
Guldener
J. Lees
Kanehisa
Kersey
Lee
Marsden
Mulder
Murzin
N. Martin
P. Kellam
Rattei
Rost
Ruepp
Tatusov
Tian
X. Liu
Publication venue: 'Oxford University Press (OUP)'
Publication date: 27/10/2007
Field of study

Gene3D provides comprehensive structural and functional annotation of most available protein sequences, including the UniProt, RefSeq and Integr8 resources. The main structural annotation is generated through scanning these sequences against the CATH structural domain database profile-HMM library. CATH is a database of manually derived PDB-based structural domains, placed within a hierarchy reflecting topology, homology and conservation and is able to infer more ancient and divergent homology relationships than sequence-based approaches. This data is supplemented with Pfam-A, other non-domain structural predictions (i.e. coiled coils) and experimental data from UniProt. In order to enhance the investigations possible with this data, we have also incorporated a variety of protein annotation resources, including protein–protein interaction data, GO functional assignments, KEGG pathways, FUNCAT functional descriptions and links to microarray expression data. All of this data can be accessed through a newly re-designed website that has a focus on flexibility and clarity, with searches that can be restricted to a single genome or across the entire sequence database. Currently Gene3D contains over 3.5 million domain assignments for nearly 5 million proteins including 527 completed genomes. This is available at: http://gene3d.biochem.ucl.ac.uk

Crossref

UCL Discovery

PubMed Central

Birkbeck Institutional Research Online

The Australian National University

Spiral - Imperial College Digital Repository

PEDANT covers all complete RefSeq genomes

Author: Arnold Roland
Frishman Dmitrij
Güldener Ulrich
Jost Ralf
Kastenmüller Gabi
Mewes Hans-Werner
Münsterkötter Martin
Nenova Karamfilka
Pongratz Norbert
Rattei Thomas
Tischler Patrick
Volz Andreas
Walter Mathias C.
Wölling Andreas
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

The PEDANT genome database provides exhaustive annotation of nearly 3000 publicly available eukaryotic, eubacterial, archaeal and viral genomes with more than 4.5 million proteins by a broad set of bioinformatics algorithms. In particular, all completely sequenced genomes from the NCBI's Reference Sequence collection (RefSeq) are covered. The PEDANT processing pipeline has been sped up by an order of magnitude through the utilization of precalculated similarity information stored in the similarity matrix of proteins (SIMAP) database, making it possible to process newly sequenced genomes immediately as they become available. PEDANT is freely accessible to academic users at http://pedant.gsf.de. For programmatic access Web Services are available at http://pedant.gsf.de/webservices.jsp

University of Birmingham Research Portal

PubMed Central

PuSH

Explosive Percolation in the Human Protein Homology Network

Author: A. Garas
D. Medini
F. Radicchi
F. Radicchi
H. A. Makse
H. D. Rozenfeld
H.D. Rozenfeld
J. Spencer
L. K. Gallos
M.A. Serrano
M.S. Granovetter
R. Cohen
R.M. Ziff
S.N. Dorogovtsev
T. Bohman
T. Bohman
T. Kamada
T.F. Smith
Y.S. Cho
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/11/2009
Field of study

We study the explosive character of the percolation transition in a real-world network. We show that the emergence of a spanning cluster in the Human Protein Homology Network (H-PHN) exhibits similar features to an Achlioptas-type process and is markedly different from regular random percolation. The underlying mechanism of this transition can be described by slow-growing clusters that remain isolated until the later stages of the process, when the addition of a small number of links leads to the rapid interconnection of these modules into a giant cluster. Our results indicate that the evolutionary-based process that shapes the topology of the H-PHN through duplication-divergence events may occur in sudden steps, similarly to what is seen in first-order phase transitions.Comment: 13 pages, 6 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Research Papers in Economics

MIPS: analysis and annotation of proteins from whole genomes in 2005

Author: Frishman D.
Mayer K. F. X.
Mewes H. W.
Münsterkötter M.
Noubibou O.
Oesterheld M.
Pagel P.
Rattei T.
Ruepp A.
Stümpflen V.
Publication venue: Oxford University Press
Publication date: 28/12/2005
Field of study

The Munich Information Center for Protein Sequences (MIPS at the GSF), Neuherberg, Germany, provides resources related to genome information. Manually curated databases for several reference organisms are maintained. Several of these databases are described elsewhere in this and other recent NAR database issues. In a complementary effort, a comprehensive set of >400 genomes automatically annotated with the PEDANT system are maintained. The main goal of our current work on creating and maintaining genome databases is to extend gene centered information to information on interactions within a generic comprehensive framework. We have concentrated our efforts along three lines (i) the development of suitable comprehensive data structures and database technology, communication and query tools to include a wide range of different types of information enabling the representation of complex information such as functional modules or networks Genome Research Environment System, (ii) the development of databases covering computable information such as the basic evolutionary relations among all genes, namely SIMAP, the sequence similarity matrix and the CABiNet network analysis framework and (iii) the compilation and manual annotation of information related to interactions such as protein–protein interactions or other types of relations (e.g. MPCDB, MPPI, CYGD). All databases described and the detailed descriptions of our projects can be accessed through the MIPS WWW server ()

B2G-FAR, a species-centered GO annotation repository

Author: Al-Shahrour
Al-Shahrour
Al-Shahrour
Altschul
Ana Conesa
Arnold
Ashburner
Barrell
Camon
Conesa
Conesa
Espinoza
Götz
Holt
Huerta-Cepas
Joaquín Dopazo
Kersey
Marc-André Jehl
Marti-Renom
Myhre
Patricia Sebastián-León
Patrick Tischler
Quevillon
Rattei
Riley
Roland Arnold
Samuel Martín-Rodríguez
Sjölander
Stefan Götz
The Gene Ontology Consortium
The Uniprot Consortium
Thomas Rattei
Wise
Publication venue: Oxford University Press
Publication date: 01/04/2011
Field of study

Motivation: Functional genomics research has expanded enormously in the last decade thanks to the cost reduction in high-throughput technologies and the development of computational tools that generate, standardize and share information on gene and protein function such as the Gene Ontology (GO). Nevertheless, many biologists, especially working with non-model organisms, still suffer from non-existing or low-coverage functional annotation, or simply struggle retrieving, summarizing and querying these data

Crossref

University of Birmingham Research Portal

PubMed Central