Search CORE

6 research outputs found

Perspectives on tracking data reuse across biodata resources.

Author: Bastian F.B.
Bateman A.
Buys M.
Cook C.E.
D'Eustachio P.
Harrison M.
Hermjakob H.
Li D.
Lord P.
Natale D.A.
Peters B.
Ross K.E.
Sternberg P.W.
Su A.I.
Thakur M.
Thomas P.D.
Publication venue
Publication date: 01/01/2024
Field of study

Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources. Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users)

Serveur académique lausannois

Perspectives on tracking data reuse across biodata resources

Author: Ahmad S
Aimo L
Argoud-Puy G
Arighi CN
Auchincloss AH
Axelsen KB
Bansal P
Baratin D
Bastian FB
Bateman A
Bateman A
Batista Neto TM
Bolleman JT
Boutet E
Bowler-Barnett EH
Breuza L
Bridge AJ
Buys M
Bye-A-Jee H
Casals-Casas C
Chen C
Chen Y
Cook CE
Coudert E
Cuche B
D'Eustachio P
da Costa Gonzales LJ
de Castro E
Denny P
Dogan T
Ebenezer T
Estreicher A
Famiglietti ML
Fan J
Feuermann M
Gasteiger E
Gehant S
Gil BC
Gos A
Gruaz N
Harrison M
Hermjakob H
Huang H
Hulo C
Hussein A
Hyka-Nouspikel N
Ibrahim KT
Ignatchenko A
Insana G
Ishtiaq R
Joshi V
Jungo F
Jyothi D
Kandasaamy S
Kerhornou A
Kim M
Laiho K
Le Mercier P
Lehvaslaiho M
Li D
Lieberherr D
Lock A
Lord P
Luciani A
Luo J
Lussi Y
Magrane M
Marin J
Martin M-J
Masson P
McGarvey P
Morgat A
Natale DA
Natale DA
Orchard S
Pedruzzi I
Peters B
Pilbout S
Pourcel L
Poux S
Pozzato M
Pruess M
Raposo P
Redaschi N
Rice DL
Rivoire C
Ross K
Ross K
Saidi R
Santos R
Sigrist CJA
Speretta E
Stephenson J
Sternberg PW
Su AI
Sundaram S
Sveshnikova A
Thakur M
Thomas PD
Totoo P
Tyagi N
Vasudev P
Vinayaka CR
Wang Y
Warner K
Wijerathne S
Wu CH
Zaru R
Zhang J
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2024
Field of study

c The Author(s) 2024. Published by Oxford University Press.Motivation: Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results: The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources

Newcastle University E-Prints

Experimental and computational investigation of enzyme functional annotations uncovers misannotation in the EC 1.1.3.15 enzyme class

Author: Engqvist Martin
Rembeza Elzbieta
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2021
Field of study

Only a small fraction of genes deposited to databases have been experimentally characterised. The majority of proteins have their function assigned automatically, which can result in erroneous annotations. The reliability of current annotations in public databases is largely unknown; experimental attempts to validate the accuracy within individual enzyme classes are lacking. In this study we performed an overview of functional annotations to the BRENDA enzyme database. We first applied a high-throughput experimental platform to verify functional annotations to an enzyme class of S-2-hydroxyacid oxidases (EC 1.1.3.15). We chose 122 representative sequences of the class and screened them for their predicted function. Based on the experimental results, predicted domain architecture and similarity to previously characterised S-2-hydroxyacid oxidases, we inferred that at least 78% of sequences in the enzyme class are misannotated. We experimentally confirmed four alternative activities among the misannotated sequences and showed that misannotation in the enzyme class increased over time. Finally, we performed a computational analysis of annotations to all enzyme classes in the BRENDA database, and showed that nearly 18% of all sequences are annotated to an enzyme class while sharing no similarity or domain architecture to experimentally characterised representatives. We showed that even well-studied enzyme classes of industrial relevance are affected by the problem of functional misannotation. Copyright

Chalmers Research

On patterns and re-use in bioinformatics databases

Author: Bell MJ
Lord P
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Newcastle University E-Prints

On patterns and re-use in bioinformatics databases

Author: Attwood
Baumgartner
Bell
Bell
Bell
Bolleman
Fernández-Suárez
Gross
Haft
Hunter
Jonathan Wren
Lane
Leinonen
Lord
Michael J Bell
Missier
Phillip Lord
Punta
Richardson
Sigrist
Viégas
Wooley
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref