Search CORE

3 research outputs found

Incentivising Use of Structured Language in Biological Descriptions: Author-Driven Phenotype Data and Ontology Production

Author: Chen Hsin-Liang
Cui Hong
Ford Bruce
Macklin James A.
Penev Lyubomir
Reznicek Anton
Sachs Joel
Starr Julian
Publication venue: Scholars\u27 Mine
Publication date: 07/11/2018
Field of study

Phenotypes are used for a multitude of purposes such as defining species, reconstructing phylogenies, diagnosing diseases or improving crop and animal productivity, but most of this phenotypic data is published in free-text narratives that are not computable. This means that the complex relationship between the genome, the environment and phenotypes is largely inaccessible to analysis and important questions related to the evolution of organisms, their diseases or their response to climate change cannot be fully addressed. It takes great effort to manually convert free-text narratives to a computable format before they can be used in large-scale analyses. We argue that this manual curation approach is not a sustainable solution to produce computable phenotypic data for three reasons: 1) it does not scale to all of biodiversity; 2) it does not stop the publication of free-text phenotypes that will continue to need manual curation in the future and, most importantly, 3) It does not solve the problem of inter-curator variation (curators interpret/convert a phenotype differently from each other). Our empirical studies have shown that inter-curator variation is as high as 40% even within a single project. With this level of variation, it is difficult to imagine that data integrated from multiple curation projects can be of high quality. The key causes of this variation have been identified as semantic vagueness in original phenotype descriptions and difficulties in using standardised vocabularies (ontologies). We argue that the authors describing phenotypes are the key to the solution. Given the right tools and appropriate attribution, the authors should be in charge of developing a project’s semantics and ontology. This will speed up ontology development and improve the semantic clarity of phenotype descriptions from the moment of publication. A proof of concept project on this idea was funded by NSF ABI in July 2017. We seek readers input or critique of the proposed approaches to help achieve community-based computable phenotype data production in the near future. Results from this project will be accessible through https://biosemantics.github.io/author-driven-production

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Theory and Practice of Data Citation

Author: Silvello Gianmaria
Publication venue: 'Wiley'
Publication date: 24/06/2017
Field of study

Citations are the cornerstone of knowledge propagation and the primary means of assessing the quality of research, as well as directing investments in science. Science is increasingly becoming "data-intensive", where large volumes of data are collected and analyzed to discover complex patterns through simulations and experiments, and most scientific reference works have been replaced by online curated datasets. Yet, given a dataset, there is no quantitative, consistent and established way of knowing how it has been used over time, who contributed to its curation, what results have been yielded or what value it has. The development of a theory and practice of data citation is fundamental for considering data as first-class research objects with the same relevance and centrality of traditional scientific products. Many works in recent years have discussed data citation from different viewpoints: illustrating why data citation is needed, defining the principles and outlining recommendations for data citation systems, and providing computational methods for addressing specific issues of data citation. The current panorama is many-faceted and an overall view that brings together diverse aspects of this topic is still missing. Therefore, this paper aims to describe the lay of the land for data citation, both from the theoretical (the why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association for Information Science and Technology (JASIST), 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Recommended from our members

Beyond ecosystem modeling: a roadmap to community cyberinfrastructure for ecological data‐model integration

Author: Campbell Eleanor E.
Cowdery Elizabeth M.
De Kauwe Martin G.
Desai Ankur
Dietze Michael C.
Duveneck Matthew J.
Fer Istem
Fisher Joshua B.
Gardella Anthony K.
Haynes Katherine D.
Hoffman Forrest M.
Johnston Miriam R.
Kooper Rob
LeBauer David S.
Mantooth Joshua
Parton William
Poulter Benjamin
Quaife Tristan
Raiho Ann
Schaefer Kevin
Serbin Shawn P.
Shiklomanov Alexey N.
Simkins James
Viskari Toni
Wilcox Kevin R.
Publication venue: 'Wiley'
Publication date: 19/10/2020
Field of study

In an era of rapid global change, our ability to understand and predict Earth's natural systems is lagging behind our ability to monitor and measure changes in the biosphere. Bottlenecks to informing models with observations have reduced our capacity to fully exploit the growing volume and variety of available data. Here, we take a critical look at the information infrastructure that connects ecosystem modeling and measurement efforts, and propose a roadmap to community cyberinfrastructure development that can reduce the divisions between empirical research and modeling and accelerate the pace of discovery. A new era of data‐model integration requires investment in accessible, scalable, transparent tools that integrate the expertise of the whole community, including both modelers and empiricists. This roadmap focuses on five key opportunities for community tools: the underlying foundationsof community cyberinfrastructure; data ingest; calibration of models to data; model‐data benchmarking; and data assimilation and ecological forecasting. This community‐driven approach is key to meeting the pressing needs of science and society in the 21st century

Central Archive at the University of Reading

Crossref

The University of Arizona

eScholarship - University of California

Explore Bristol Research