2,949 research outputs found

    Archiving and referencing source code with Software Heritage

    Get PDF
    Software, and software source code in particular, is widely used in modern research. It must be properly archived, referenced, described and cited in order to build a stable and long lasting corpus of scientic knowledge. In this article we show how the Software Heritage universal source code archive provides a means to fully address the first two concerns, by archiving seamlessly all publicly available software source code, and by providing intrinsic persistent identifiers that allow to reference it at various granularities in a way that is at the same time convenient and effective. We call upon the research community to adopt widely this approach.Comment: arXiv admin note: substantial text overlap with arXiv:1909.1076

    How to use Software Heritage for archiving and referencing your source code: guidelines and walkthrough

    Get PDF
    Software source code is an essential research output, and many research communities strongly encourage making the source code of the artefact available by archiving it in publicly-accessible long-term archives.Software Heritage is a non profit, long term universal archive specifically designed for software source code, and able to store not only a software artifact, but also its full development history. It provides the ideal place to preserve research software artifacts, and offers powerful mechanisms to enhance research articles with precise references to relevant fragments of your source code.Using Software Heritage for your research software artifacts is straightforward and involves three simple steps. This document details each of these three steps, providing guidelines for making the most out of Software Heritage for your research

    Assessing the Prevalence and Archival Rate of URIs to Git Hosting Platforms in Scholarly Publications

    Get PDF
    The definition of scholarly content has expanded to include the data and source code that contribute to a publication. While major archiving efforts to preserve conventional scholarly content, typically in PDFs (e.g., LOCKSS, CLOCKSS, Portico), are underway, no analogous effort has yet emerged to preserve the data and code referenced in those PDFs, particularly the scholarly code hosted online on Git Hosting Platforms (GHPs). Similarly, Software Heritage is working to archive public source code, but there is value in archiving the surrounding ephemera that provide important context to the code while maintaining their original URIs. In current implementations, source code and its ephemera are not preserved, which presents a problem for scholarly projects where reproducibility matters. To quantify the scope of this issue, we analyzed the use of GHP URIs in the arXiv and PMC corpora. In total, there were 253,590 URIs to GitHub, SourceForge, Bitbucket, and GitLab repositories across the 2.64 million publications. Authors have increasingly included GHP URIs in scholarly publications and, in 2021, one in five arXiv publications included a GitHub URI. Next, we analyzed the archival coverage of scholarly GHP URIs in Web archives and Software Heritage. Overall, 79.15% of GHP URIs were archived in the Web archives while only 62.06% of GHP URIs were archived in Software Heritage. We used a machine learning classifier to identify other Open Access Data and Software (OADS) URIs outside of the four GHPs previously studied. We found almost 50,000 unique OADS hostnames and more non-GHP OADS URIs than GHP URIs. The prevalence of OADS URIs and vast number of unique hostnames points to the utility of a classifier to identify OADS URIs as opposed to manual enumeration. Lastly, we found a statistically significant relationship between the popularity of a GitHub repository as determined by engagement metrics and archival coverage indicating that less popular repositories less likely to be archived and, thus, more vulnerable to being unrecoverable. The growing use of GHPs in scholarly publications points to an urgent and growing need for dedicated efforts to archive their holdings in order to preserve research code and its scholarly ephemera

    Video game preservation in the UK: a survey of records management practices

    Get PDF
    Video games are a cultural phenomenon; a medium like no other that has become one of the largest entertainment sectors in the world. While the UK boasts an enviable games development heritage, it risks losing a major part of its cultural output through an inability to preserve the games that are created by the country’s independent games developers. The issues go deeper than bit rot and other problems that affect all digital media; loss of context, copyright and legal issues, and the throwaway culture of the ‘next’ game all hinder the ability of fans and academics to preserve video games and make them accessible in the future. This study looked at the current attitudes towards preservation in the UK’s independent (‘indie’) video games industry by examining current record-keeping practices and analysing the views of games developers. The results show that there is an interest in preserving games, and possibly a desire to do so, but issues of piracy and cost prevent the industry from undertaking preservation work internally, and from allowing others to assume such responsibility. The recommendation made by this paper is not simply for preservation professionals and enthusiasts to collaborate with the industry, but to do so by advocating the commercial benefits that preservation may offer to the industry

    A Behavioral Approach to Understanding the Git Experience

    Get PDF
    The Investigating and Archiving the Scholarly Git Experience (IASGE) project is multi-track study focused on understanding the uses of Git by students, faculty, and staff working in academic research institutions as well as the ways source code repositories and their associated contextual ephemera can be better preserved. This research, in turn, has implications regarding how to support Git in the scholarly process, how version control systems contribute to reproducibility, and how Library and Information Science (LIS) professionals can support Git through instruction and sustainability efforts. In this paper, we focus on a subset of our larger project and take a deep look at what code hosting platforms offer researchers in terms of productivity and collaboration. For this portion, a survey, focus groups, and user experience interviews were conducted to gain an understanding of how and why scholarly researchers use Version Control Systems (VCS) as well as some of the pain points in learning and using VCS for daily work

    How Long Can We Build It? Ensuring Usability of a Scientific Code Base

    Get PDF
    Software and in particular source code became an important component of scientific publications and henceforth is now subject of research data management.  Maintaining source code such that it remains a usable and a valuable scientific contribution is and remains a huge task. Not all code contributions can be actively maintained forever. Eventually, there will be a significant backlog of legacy source-code. In this article we analyse the requirements for applying the concept of long-term reusability to source code. We use simple case study to identify gaps and provide a technical infrastructure based on emulator to support automated builds of historic software in form of source code. &nbsp

    Language Archive Records: Interoperability Of Referencing Practices And Metadata Models

    Get PDF
    With the rise of the digital language archive and the plethora of referenceable content, a critical question arises: “How easy is it for authors to use existing tools to cite the content they are referencing?” This is especially important as people use archived materials as evidence within published language descriptions. Archived resource metadata is well discussed in language documentation circles; however, bibliographic metadata and its accessibility are less discussed. Discoverability metadata, a subset of archived resource metadata, serves aggregators like OLAC by declaring a resource exists. In contrast, bibliographic metadata functions within documents by declaring where to find a resource that is known to exist. In this thesis I look at the interaction between Zotero, an open source reference manager, five different archives (PARADISEC, Pangloss, SIL Language & Culture Archives, ELAR, and Kaipuleohone), and three methods of importing metadata from them into Zotero (DOI import, HTML embedded metadata, and file based import). I report on collection and audio artifact metadata provided by the archive to the author via Zotero’s interfaces: what’s included, what’s missing, and what’s misaligned. Understanding the processes by which authors collect metadata for the purpose of citation and referencing, what metadata they need, and if it is being provided, facilitates the design of useful interfaces to archives which elevate the value of archives to all groups who interact with them. I propose that interaction design is an additional factor to those presented by Chang (2010) in her well received checklist for evaluating language archives. Interaction design, the technical field concerned with designing how people interact with objects and services, is the design process by which archives manage the interactions they have with those they serve. I specifically argue that interaction design adds value to an archive’s brand, as perceived by the network of archive users, when it facilitates the interaction with bibliographic metadata about artifacts within holdings. This added value speaks to the sustainability of an archive within its sphere of influence. It is increasingly important in the career development of scholars to meet metric-based assessments of their influence in scholarly discussions. Reference counts, including those pointing to the evidentiary record housed in archives, play a significant role in establishing quantitative baseline metrics for scholars

    A web information system for the management and the dissemination of Cultural Heritage data.

    No full text
    Safeguarding and exploiting Cultural Heritage induce the production of numerous and heterogeneous data. The management of these data is an essential task for the use and the diffusion of the information gathered on the field. Previously, the data handling was a hand-made task done thanks to efficient and experienced methods. Until the growth of computer science, other methods have been carried out for the digital preservation and treatment of Cultural Heritage information. The development of computerized data management systems to store and make use of archaeological datasets is then a significant task nowadays. Especially for sites that have been excavated and worked without computerized means, it is now necessary to put all the data produced onto computer. This allows preservation of the information digitally (in addition with the paper documents) and offers new exploitation possibilities, like the immediate connection of different kinds of data for analyses, or the digital documentation of the site for its improvement. Geographical Information Systems have proved their potentialities in this scope, but they are not always adapted to the management of features at the scale of a particular archaeological site. Therefore this paper aims to present the development of a Virtual Research Environment dedicated to the exploitation of intra-site Cultural Heritage data. The Information System produced is based on open-source software modules dedicated to the Internet, so users can avoid being software driven and can register and consult data from different computers. The system gives the opportunity to do exploratory analyses of the data, especially at spatial and temporal levels. The system is compliant to every kind of Cultural Heritage site and allows management of diverse types of data. Some experimentation has been done on sites managed by the Service of the National Sites and Monuments of Luxembourg
    corecore