Search CORE

686 research outputs found

InterPro, progress and status in 2005

InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro)

HAL Descartes

The University of Manchester - Institutional Repository

ProdInra

Hal-Diderot

Archive ouverte UNIGE

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Open Research Exeter

Oxford University Research Archive

MDC Repository

Explore Bristol Research

Recommended from our members

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities

Apollo (Cambridge)

Archivio istituzionale della ricerca - Università di Padova

InterPro in 2022.

Author: Bateman Alex
Bileschi Maxwell L
Blum Matthias
Bork Peer
Bridge Alan
Chuguransky Sara
Colwell Lucy
Gough Julian
Grego Tiago
Haft Daniel H
Letunić Ivica
Marchler-Bauer Aron
Mi Huaiyu
Natale Darren A
Orengo Christine A
Pandurangan Arun P
Paysan-Lafosse Typhaine
Pinto Beatriz Lázaro
Rivoire Catherine
Salazar Gustavo A
Sigrist Christian JA
Sillitoe Ian
Thanki Narmada
Thomas Paul D
Tosatto Silvio CE
Wu Cathy H
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/11/2022
Field of study

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction

UCL Discovery

The Universal Protein Resource (UniProt)

Author: Apweiler Rolf
Bairoch Amos
Barker Winona C.
Boeckmann Brigitte
Ferro Serenella
Gasteiger Elisabeth
Huang Hongzhan
Lopez Rodrigo
Magrane Michele
Martin Maria J.
Natale Darren A.
O'Donovan Claire
Redaschi Nicole
Wu Cathy H.
Yeh Lai-Su L.
Publication venue: Oxford University Press
Publication date: 17/12/2004
Field of study

The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks

Crossref

PubMed Central

Archive ouverte UNIGE

New and continuing developments at PROSITE.

Author: Bougueleret L.
Bridge A.
Cerutti L.
Cuche B.A.
de Castro E.
Hulo N.
Sigrist C.J.
Xenarios I.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

PROSITE (http://prosite.expasy.org/) consists of documentation entries describing protein domains, families and functional sites, as well as associated patterns and profiles to identify them. It is complemented by ProRule a collection of rules, which increases the discriminatory power of these profiles and patterns by providing additional information about functionally and/or structurally critical amino acids. PROSITE signatures, together with ProRule, are used for the annotation of domains and features of UniProtKB/Swiss-Prot entries. Here, we describe recent developments that allow users to perform whole-proteome annotation as well as a number of filtering options that can be combined to perform powerful targeted searches for biological discovery. The latest version of PROSITE (release 20.85, of 30 August 2012) contains 1308 patterns, 1039 profiles and 1041 ProRules

CiteSeerX

Serveur académique lausannois

Statistical analysis of genomic protein family and domain controlled annotations for functional investigation of classified gene lists

Author: Bellistri Elisa
Franceschini Andrea
Masseroli Marco
Pinciroli Francesco
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The increasing protein family and domain based annotations constitute important information to understand protein functions and gain insight into relations among their codifying genes. To allow analyzing of gene proteomic annotations, we implemented novel modules within <it>GFINDer</it>, a Web system we previously developed that dynamically aggregates functional and phenotypic annotations of user-uploaded gene lists and allows performing their statistical analysis and mining. Results Exploiting protein information in Pfam and InterPro databanks, we developed and added in <it>GFINDer </it>original modules specifically devoted to the exploration and analysis of functional signatures of gene protein products. They allow annotating numerous user-classified nucleotide sequence identifiers with controlled information on related protein families, domains and functional sites, classifying them according to such protein annotation categories, and statistically analyzing the obtained classifications. In particular, when uploaded nucleotide sequence identifiers are subdivided in classes, the <it>Statistics Protein Families&Domains </it>module allows estimating relevance of Pfam or InterPro controlled annotations for the uploaded genes by highlighting protein signatures significantly more represented within user-defined classes of genes. In addition, the <it>Logistic Regression </it>module allows identifying protein functional signatures that better explain the considered gene classification. Conclusion Novel <it>GFINDer </it>modules provide genomic protein family and domain analyses supporting better functional interpretation of gene classes, for instance defined through statistical and clustering analyses of gene expression results from microarray experiments. They can hence help understanding fundamental biological processes and complex cellular mechanisms influenced by protein domain composition, and contribute to unveil new biomedical knowledge about the codifying genes.</p

Archivio istituzionale della ricerca - Politecnico di Milano

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

New and continuing developments at PROSITE

Author: Bougueleret Lydie
Bridge Alan
Cerutti Lorenzo
Cuche Béatrice A.
de Castro Edouard
Hulo Nicolas
Sigrist Christian J. A.
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

RERO DOC Digital Library

The Proteome Analysis database: a tool for the in silico analysis of whole proteomes

Author: Apweiler Rolf
Fleischmann Wolfgang
Kanapin Alexander
Karavidopoulou Youla
Kersey Paul
Kriventseva Evgenia
Mittard Virginie
Mulder Nicola
Phan Isabelle
Pruess Manuela
Servant Florence
Publication venue
Publication date: 02/08/2017
Field of study

The Proteome Analysis database (http://www.ebi.ac.uk/proteome/) has been developed by the Sequence Database Group at EBI utilizing existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archeae and eukaryotes. Three main projects are used, InterPro, CluSTr and GO Slim, to give an overview on families, domains, sites, and functions of the proteins from each of the complete genomes. Complete proteome analysis is available for a total of 89 proteome sets. A specifically designed application enables InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the databas

RERO DOC Digital Library

The PRINTS database: a fine-grained protein sequence annotation and analysis resource—its status in 2012

Author: A. Coletta
A. L. Mitchell
A. Pavlopoulou
A. Theodosiou
Altschul
Apweiler
Attwood
Attwood
Attwood
Attwood
C. Roma-Mateo
Chen
G. Muirhead
Gilks
Henikoff
Huang
I. Popov
Kawamura
Nordle
P. B. Philippou
Roma-Mateo
Schnoes
Scordis
Sonnhammer
T. K. Attwood
Vaughan
Wong
Wright
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

The PRINTS database, now in its 21st year, houses a collection of diagnostic protein family ‘fingerprints’. Fingerprints are groups of conserved motifs, evident in multiple sequence alignments, whose unique inter-relationships provide distinctive signatures for particular protein families and structural/functional domains. As such, they may be used to assign uncharacterized sequences to known families, and hence to infer tentative functional, structural and/or evolutionary relationships. The February 2012 release (version 42.0) includes 2156 fingerprints, encoding 12 444 individual motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. Here, we report the current status of the database, and introduce a number of recent developments that help both to render a variety of our annotation and analysis tools easier to use and to make them more widely available

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Dokuz Eylul University Research Information System