Search CORE

6 research outputs found

The InterPro protein families and domains database: 20 years on

Author: Bateman A
Blum M
Bork P
Bridge A
Chang H-Y
Chuguransky S
Finn RD
Gough J
Grego T
Haft DH
Kandasaamy S
Letunic I
Marchler-Bauer A
Mi H
Mitchell A
Natale DA
Necci M
Nuka G
Orengo CA
Pandurangan AP
Paysan-Lafosse T
Qureshi M
Raj S
Richardson L
Rivoire C
Salazar GA
Sigrist CJA
Sillitoe I
Thanki N
Thomas PD
Tosatto SCE
Williams L
Wu CH
Publication venue
Publication date: 06/11/2020
Field of study

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan

UCL Discovery

InterPro in 2019: improving coverage, classification and access to protein sequence annotations

Author: Attwood TK
Babbitt PC
Blum M
Bork P
Bridge A
Brown SD
Chang H-Y
El-Gebali S
Finn RD
Fraser MI
Gough J
Haft DR
Huang H
Letunic I
Lopez R
Luciani A
Madeira F
Marchler-Bauer A
Mi H
Mitchell AL
Natale DA
Necci M
Nuka G
Orengo C
Pandurangan AP
Paysan-Lafosse T
Pesseat S
Potter SC
Qureshi MA
Rawlings ND
Redaschi N
Richardson LJ
Rivoire C
Salazar GA
Sangrador-Vegas A
Sigrist CJA
Sillitoe I
Sutton GG
Thanki N
Thomas PD
Tosatto SCE
Yong S-Y
Publication venue
Publication date: 06/11/2018
Field of study

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities

UCL Discovery

IMGT®, the international ImMunoGeneTics information system® 25 years on

Author: Aouinti S
Carillon E
Duroux Patrice
Duvergey H
Folch G.
Giudicelli V.
Hadi-Saljoqi S
Houles A
Jabado-Michaloud J.
Kossida S
Lefranc G.
Lefranc Mp
Paysan-Lafosse T
Sasorith S.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2015
Field of study

International audienc

HAL Descartes

Genome3D: integrating a collaborative data pipeline to expand the depth and breadth of consensus protein structure annotation

Author: Andreeva A
Blundell TL
Buchan DWA
Finn RD
Gough J
Jones D
Kelley LA
Lam SD
Murzin AG
Orengo C
Pandurangan AP
Paysan-Lafosse T
Salazar GA
Sillitoe I
Skwark MJ
Sternberg MJE
Velankar S
Publication venue: 'Oxford University Press (OUP)'
Publication date: 07/11/2019
Field of study

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms

UCL Discovery

Spiral - Imperial College Digital Repository

PDBe-KB: a community-driven resource for structural and functional annotations

Author: Al-Lazikani B.
Anyango S.
Armstrong D.
Barton G. J.
Berka K.
Berrisford J.
Blundell T.
Borkakoti N.
Dana J.
Das S.
Deshpande M.
Dey S.
Fernandez E. V.
Fraternali F.
Gibson T.
Gutmanas A.
Helmer-Citterich M.
Hoksza David
Huang L. C.
Jain R.
Jubb H.
Kannan N.
Kannas C.
Koca J.
Krivak R.
Kumar M.
Levy E. D.
MacGowan S.
Madeira F.
Madhusudhan M. S.
Martell H. J.
McGreig J. E.
Micco P. D.
Mir S.
Mukhopadhyay A.
Nair S. S.
Orengo C.
Parca L.
Paysan-Lafosse T.
Pravda L.
Radusky L.
Ribeiro A.
Serrano L.
Sillitoe I.
Singh G.
Skoda P.
Sternberg M.
Svobodova R.
Thornton J.
Tyzack J.
Valencia A.
Varadi M.
Velankar S.
Vranken W.
Wass M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural and functional annotations of macromolecular structure data, contained in the Protein Data Bank (PDB). The goal of PDBe-KB is two-fold: (i) to increase the visibility and reduce the fragmentation of annotations contributed by specialist data resources, and to make these data more findable, accessible, interoperable and reusable (FAIR) and (ii) to place macromolecular structure data in their biological context, thus facilitating their use by the broader scientific community in fundamental and applied research. Here, we describe the guidelines of this collaborative effort, the current status of contributed data, and the PDBe-KB infrastructure, which includes the data exchange format, the deposition system for added value annotations, the distributable database containing the assembled data, and programmatic access endpoints. We also describe a series of novel web-pages—the PDBe-KB aggregated views of structure data—which combine information on macromolecular structures from many PDB entries. We have recently released the first set of pages in this series, which provide an overview of available structural and functional information for a protein of interest, referenced by a UniProtKB accession

Open Repository and Bibliography - Luxembourg

Protein Data Bank: The single global archive for 3D macromolecular structure data

Author: Anyango S.
Armstrong D. R.
Baskaran K.
Bekker G. -J.
Berman H. M.
Berrisford J. M.
Bhikadiya C.
Bi C.
Burley S. K.
Chen L.
Cho H.
Christie C.
Conroy M. J.
Dana J. M.
Deshpande M.
DI COSTANZO Luigi
Duarte J. M.
Dutta S.
Feng Z.
Fujiwara T.
Gaborova R.
Gane P.
Ghosh S.
Goodsell D. S.
Green R. K.
Gupta D.
Guranovic V.
Gutmanas A.
Guzenko D.
Hoch J. C.
Hudson B. P.
Ikegawa Y.
Ioannidis Y. E.
Iwata T.
Kengaku Y.
Kim J. Y.
Kleywegt G. J.
Kobayashi N.
Koca J.
Kudou T.
Kurisu G.
Liang Y.
Livny M.
Lowe R.
Mak L.
Markley J. L.
Maziuk D.
Mir S.
Mukhopadhyay A.
Nadzirin N.
Nair S.
Nakagawa A.
Nakamura H.
Patwardhan A.
Paysan-Lafosse T.
Peisach E.
Periskova I.
Pravda L.
Randle C.
Romero P. R.
Rose A.
Salih O.
Sato J.
Sehnal D.
Sekharan M.
Shao C.
Sicong Yao
Suzuki H.
Tao Y. -P.
Ulrich E. L.
Valasatava Y.
Varadi M.
Varekova R.
Velankar S.
Voigt M.
Wedell J. R.
Westbrook J.
Yamashita R.
Yokochi M.
Young J.
Zardecki C.
Zhuravleva M.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2019
Field of study

The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data of biological macromolecules. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination of 3D macromolecular structure data. The PDB Core Archive houses 3D atomic coordinates of more than 144 000 structural models of proteins, DNA/RNA, and their complexes with metals and small molecules and related experimental data and metadata. Structure and experimental data/metadata are also stored in the PDB Core Archive using the readily extensible wwPDB PDBx/mmCIF master data format, which will continue to evolve as data/metadata from new experimental techniques and structure determination methods are incorporated by the wwPDB. Impacts of the recently developed universal wwPDB OneDep deposition/validation/biocuration system and various methods-specific wwPDB Validation Task Forces on improving the quality of structures and data housed in the PDB Core Archive are described together with current challenges and future plans

Archivio della ricerca - Università degli studi di Napoli Federico II