23 research outputs found
Theory and Practice of Data Citation
Citations are the cornerstone of knowledge propagation and the primary means
of assessing the quality of research, as well as directing investments in
science. Science is increasingly becoming "data-intensive", where large volumes
of data are collected and analyzed to discover complex patterns through
simulations and experiments, and most scientific reference works have been
replaced by online curated datasets. Yet, given a dataset, there is no
quantitative, consistent and established way of knowing how it has been used
over time, who contributed to its curation, what results have been yielded or
what value it has.
The development of a theory and practice of data citation is fundamental for
considering data as first-class research objects with the same relevance and
centrality of traditional scientific products. Many works in recent years have
discussed data citation from different viewpoints: illustrating why data
citation is needed, defining the principles and outlining recommendations for
data citation systems, and providing computational methods for addressing
specific issues of data citation.
The current panorama is many-faceted and an overall view that brings together
diverse aspects of this topic is still missing. Therefore, this paper aims to
describe the lay of the land for data citation, both from the theoretical (the
why and what) and the practical (the how) angle.Comment: 24 pages, 2 tables, pre-print accepted in Journal of the Association
for Information Science and Technology (JASIST), 201
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)
This report records and discusses the Third Workshop on Sustainable Software
for Science: Practice and Experiences (WSSSPE3). The report includes a
description of the keynote presentation of the workshop, which served as an
overview of sustainable scientific software. It also summarizes a set of
lightning talks in which speakers highlighted to-the-point lessons and
challenges pertaining to sustaining scientific software. The final and main
contribution of the report is a summary of the discussions, future steps, and
future organization for a set of self-organized working groups on topics
including developing pathways to funding scientific software; constructing
useful common metrics for crediting software stakeholders; identifying
principles for sustainable software engineering design; reaching out to
research software organizations around the world; and building communities for
software sustainability. For each group, we include a point of contact and a
landing page that can be used by those who want to join that group's future
activities. The main challenge left by the workshop is to see if the groups
will execute these activities that they have scheduled, and how the WSSSPE
community can encourage this to happen
Data for: A Literature Review on Methods for the Extraction of Usage Statements of Software and Data: [research data]
Software and data have become major components of modern research, which is also reflected by an increased number of software usages. Knowledge about used software and data would provide researchers a better understanding of the results of a scientific investigation and thus foster it's reproducibility. Software and data are, however, often not formally cited but their usage is mentioned in the main text. In order to assess the state of the art in extraction of such usage statements, we performed a literature review. We provide an overview of existing methods for the identification of usage statements of software and data in scientific articles. This analysis mainly focuses on technical approaches, the employed corpora, and the purpose of the investigation itself. We found four different classes of approaches that are used in the literature: 1.) term search, 2.) manual extraction, 3.) rule-based extraction, and 4.) extraction based on supervised learning
Mapping the repository landscape : harnessing similarity with RepoSim and RepoSnipy
The rapid growth of scientific software development has led to the emergence of large and complex codebases, making it challenging to search, find, and compare software repositories within the scientific research community. In this paper, we propose a solution by leveraging deep learning techniques to learn embeddings that capture semantic similarities among repositories. Our approach focuses on identifying repositories with similar semantics, even when their code fragments and documentation exhibit different syntax. To address this challenge, we introduce two complementary open-source tools: RepoSim and RepoSnipy. RepoSim is a command-line toolbox designed to represent repositories at both the source code and documentation levels. It utilizes the UniXcoder pre-trained language model, which has demonstrated remarkable performance in code-related understanding tasks. RepoSnipy is a web-based neural semantic search engine that utilizes the powerful capabilities of RepoSim and offers a user-friendly search interface, allowing researchers and practitioners to query public repositories hosted on GitHub and discover semantically similar repositories. RepoSim and RepoSnipy empower researchers, developers, and practitioners by facilitating the comparison and analysis of software repositories. They not only enable efficient collaboration and code reuse but also accelerate the development of scientific software.Postprin
Journal Production Guidance for Software and Data Citations
Les logiciels et les citations de données sont des pratiques exemplaires émergentes en communication universitaire et scientifique qui fournissent un excellent support pour la validation des résultats, la reproductibilité, le crédit, le partage et la réutilisation d’outils précieux et la citation des données a été possible avec une certaine rigueur depuis la création de DataCite. Un registre des métadonnées riches connexes a été recommandé par un rapport complet sur un groupe de travail CODATA-ICSTI en 2012. CODATA-ICSTI est le Comité international et interdisciplinaire sur les données pour la science et la technologie et le Conseil international de l’information scientifique et technique est abrégé ainsi : ICST
Report on the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3)
This report records and discusses the Third Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE3). The report includes a description of the keynote presentation of the workshop, which served as an overview of sustainable scientific software. It also summarizes a set of lightning talks in which speakers highlighted to-the-point lessons and challenges pertaining to sustaining scientific software. The final and main contribution of the report is a summary of the discussions, future steps, and future organization for a set of self-organized working groups on topics including developing pathways to funding scientific software; constructing useful common metrics for crediting software stakeholders; identifying principles for sustainable software engineering design; reaching out to research software organizations around the world; and building communities for software sustainability. For each group, we include a point of contact and a landing page that can be used by those who want to join that group’s future activities. The main challenge left by the workshop is to see if the groups will execute these activities that they have scheduled, and how the WSSSPE community can encourage this to happe