Search CORE

6 research outputs found

Automating Software Citation using GitCite

Author: Chen Leshang
Davidson Susan
Publication venue
Publication date: 01/01/2019
Field of study

The ability to cite software and give credit to its authors is increasingly important. This paper presents a model for software citation with version control, and an implementation that integrates with Git and GitHub. The implementation includes a browser extension and a local executable tool, which enable citations to be added/modified/deleted to software project repositories and managed through functions such as fork/merge/copy

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Automating Software Citation using GitCite

Author: Chen Leshang
Davidson Susan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/04/2020
Field of study

The ability to cite software and give credit to its authors and contributors is increasingly important. While the number of online open-source software repositories has grown rapidly over the past few years, few are being properly cited when used due to the difficulty of creating appropriate citations and the lack of automated techniques. This paper presents GitCite, a model for software citation with version control which enables citations to be inferred for any project component based on a small number of explicit citations attached to subdirectories/files, and an implementation that integrates with Git and GitHub. The implementation includes a browser extension and a local executable tool, which enable citations to be added/modified/deleted to software project repositories and managed through functions such as fork/merge/copy

arXiv.org e-Print Archive

Crossref

Data Citation: A New Provenance Challenge

Author: Alawini Abdussalam
DAVIDSON SUSAN B
SILVELLO GIANMARIA
Tannen Val
Wu Yinjun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Archivio istituzionale della ricerca - Università di Padova

Data citation and the citation graph

Author: Buneman Peter
Dosso Dennis
Lissandrini Matteo
Silvello Gianmaria
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2022
Field of study

The citation graph is a computational artifact that is widely used to represent the domain of published literature. It represents connections between published works, such as citations and authorship. Among other things, the graph supports the computation of bibliometric measures such as h-indexes and impact factors. There is now an increasing demand that we should treat the publication of data in the same way that we treat conventional publications. In particular, we should cite data for the same reasons that we cite other publications. In this paper we discuss what is needed for the citation graph to represent data citation. We identify two challenges: to model the evolution of credit appropriately (through references) over time and to model data citation not only to a data set treated as a single object but also to parts of it. We describe an extension of the current citation graph model that addresses these challenges. It is built on two central concepts: citable units and reference subsumption. We discuss how this extension would enable data citation to be represented within the citation graph and how it allows for improvements in current practices for bibliometric computations, both for scientific publications and for data

Catalogo dei prodotti della ricerca

VBN

Towards The Efficient Use Of Fine-Grained Provenance In Datascience Applications

Author: Wu Yinjun
Publication venue: ScholarlyCommons
Publication date: 01/01/2021
Field of study

Recent years have witnessed increased demand for users to be able to interpret the results of data science pipelines, locate erroneous data items in the input, evaluate the importance of individual input data items, and acknowledge the contributions of data curators. Such applications often involve the use of the provenance at a fine-grained level, and require very fast response time. To address this issue, my goal is to expedite the use of fine-grained provenance in applications within both the database and machine learning domains, which are ubiquitous in contemporary data science pipelines. In applications from the database domain, I focus on the problem of data citation and provide two different types of solutions, Rewriting-based solutions and Provenance-based solutions, to generate fine-grained citations to database query results by implicitly or explicitly leveraging provenance information. In applications from the ML domain, the first considers the problem of incrementally updating ML models after the deletions of a small subset of training samples. This is critical for understanding the importance of individual training samples to ML models, especially in online pipelines. For this problem, I provide two solutions, PrIU and DeltaGrad, to incrementally update ML models constructed by SGD/GD methods, which utilize provenance information collected during the training phase on the full dataset before the deletion requests. The second application from the ML domain that I focus on is to explore how to clean label uncertainties located in the ML training dataset in a more efficient and cheaper manner. To address this problem, I proposed a solution, CHEF, to reduce the cost and the overhead at each phase of the label cleaning pipeline and maintain the overall model performance simultaneously. I also propose initial ideas for how to remove some assumptions used in these solutions to extend them to more general scenarios

ScholarlyCommons@Penn

Automating data citation in CiteDB

Author: Abiteboul S.
Buneman P.
Davidson S. B.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref