Search CORE

2 research outputs found

The InterPro protein families and domains database: 20 years on

Author: Bateman A
Blum M
Bork P
Bridge A
Chang H-Y
Chuguransky S
Finn RD
Gough J
Grego T
Haft DH
Kandasaamy S
Letunic I
Marchler-Bauer A
Mi H
Mitchell A
Natale DA
Necci M
Nuka G
Orengo CA
Pandurangan AP
Paysan-Lafosse T
Qureshi M
Raj S
Richardson L
Rivoire C
Salazar GA
Sigrist CJA
Sillitoe I
Thanki N
Thomas PD
Tosatto SCE
Williams L
Wu CH
Publication venue
Publication date: 06/11/2020
Field of study

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan

UCL Discovery

Perspectives on tracking data reuse across biodata resources

Author: Ahmad S
Aimo L
Argoud-Puy G
Arighi CN
Auchincloss AH
Axelsen KB
Bansal P
Baratin D
Bastian FB
Bateman A
Bateman A
Batista Neto TM
Bolleman JT
Boutet E
Bowler-Barnett EH
Breuza L
Bridge AJ
Buys M
Bye-A-Jee H
Casals-Casas C
Chen C
Chen Y
Cook CE
Coudert E
Cuche B
D'Eustachio P
da Costa Gonzales LJ
de Castro E
Denny P
Dogan T
Ebenezer T
Estreicher A
Famiglietti ML
Fan J
Feuermann M
Gasteiger E
Gehant S
Gil BC
Gos A
Gruaz N
Harrison M
Hermjakob H
Huang H
Hulo C
Hussein A
Hyka-Nouspikel N
Ibrahim KT
Ignatchenko A
Insana G
Ishtiaq R
Joshi V
Jungo F
Jyothi D
Kandasaamy S
Kerhornou A
Kim M
Laiho K
Le Mercier P
Lehvaslaiho M
Li D
Lieberherr D
Lock A
Lord P
Luciani A
Luo J
Lussi Y
Magrane M
Marin J
Martin M-J
Masson P
McGarvey P
Morgat A
Natale DA
Natale DA
Orchard S
Pedruzzi I
Peters B
Pilbout S
Pourcel L
Poux S
Pozzato M
Pruess M
Raposo P
Redaschi N
Rice DL
Rivoire C
Ross K
Ross K
Saidi R
Santos R
Sigrist CJA
Speretta E
Stephenson J
Sternberg PW
Su AI
Sundaram S
Sveshnikova A
Thakur M
Thomas PD
Totoo P
Tyagi N
Vasudev P
Vinayaka CR
Wang Y
Warner K
Wijerathne S
Wu CH
Zaru R
Zhang J
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2024
Field of study

c The Author(s) 2024. Published by Oxford University Press.Motivation: Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results: The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources

Newcastle University E-Prints