Search CORE

2,438,601 research outputs found

Finding Regressions in Projects under Version Control Systems

Author: Bendik Jaroslav
Benes Nikola
Cerna Ivana
Publication venue
Publication date: 22/08/2017
Field of study

Version Control Systems (VCS) are frequently used to support development of large-scale software projects. A typical VCS repository of a large project can contain various intertwined branches consisting of a large number of commits. If some kind of unwanted behaviour (e.g. a bug in the code) is found in the project, it is desirable to find the commit that introduced it. Such commit is called a regression point. There are two main issues regarding the regression points. First, detecting whether the project after a certain commit is correct can be very expensive as it may include large-scale testing and/or some other forms of verification. It is thus desirable to minimise the number of such queries. Second, there can be several regression points preceding the actual commit; perhaps a bug was introduced in a certain commit, inadvertently fixed several commits later, and then reintroduced in a yet later commit. In order to fix the actual commit it is usually desirable to find the latest regression point. The currently used distributed VCS contain methods for regression identification, see e.g. the git bisect tool. In this paper, we present a new regression identification algorithm that outperforms the current tools by decreasing the number of validity queries. At the same time, our algorithm tends to find the latest regression points which is a feature that is missing in the state-of-the-art algorithms. The paper provides an experimental evaluation of the proposed algorithm and compares it to the state-of-the-art tool git bisect on a real data set

arXiv.org e-Print Archive

Crossref

Essential Tools: Version Control Systems

Author: Hinsen Konrad
Läufer Konstantin
Thiruvathukal George K.
Publication venue: Loyola eCommons
Publication date: 01/01/2009
Field of study

Did you ever wish you\u27d made a backup copy of a file before changing it? Or before applying a collaborator\u27s modifications? Version control systems make this easier, and do a lot more

HAL Université de Tours

Loyola eCommons

DataHub: Collaborative Data Science & Dataset Version Management at Scale

Author: Bhardwaj Anant
Bhattacherjee Souvik
Chavan Amit
Deshpande Amol
Elmore Aaron J.
Madden Samuel
Parameswaran Aditya G.
Publication venue
Publication date: 02/09/2014
Field of study

Relational databases have limited support for data collaboration, where teams collaboratively curate and analyze large datasets. Inspired by software version control systems like git, we propose (a) a dataset version control system, giving users the ability to create, branch, merge, difference and search large, divergent collections of datasets, and (b) a platform, DataHub, that gives users the ability to perform collaborative data analysis building on this version control system. We outline the challenges in providing dataset version control at scale.Comment: 7 page

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Scalability of Distributed Version Control Systems

Author: Anshus Otto
Bjørndalen John Markus
Murphy Mike
Publication venue: Norsk informatikkonferanse
Publication date: 26/11/2017
Field of study

Source at https://ojs.bibsys.no/index.php/NIK/article/view/434.Distributed version control systems are popular for storing source code, but they are notoriously ill suited for storing large binary files. We report on the results from a set of experiments designed to characterize the behavior of some widely used distributed version control systems with respect to scaling. The experiments measured commit times and repository sizes when storing single files of increasing size, and when storing increasing numbers of single-kilobyte files. The goal is to build a distributed storage system with characteristics similar to version control but for much larger data sets. An early prototype of such a system, Distributed Media Versioning (DMV), is briefly described and compared with Git, Mercurial, and the Git-based backup tool Bup. We find that processing large files without splitting them into smaller parts will limit maximum file size to what can fit in RAM. Storing millions of small files will result in inefficient use of disk space. And storing files with hash-based file and directory names will result in high-latency write operations, due to having to switch between directories rather than performing a sequential write. The next-phase strategy for DMV will be to break files into chunks by content for de-duplication, then re-aggregating the chunks into append-only log files for low-latency write operations and efficient use of disk space

Munin - Open Research Archive

Version Control of Speaker Recognition Systems

Author: Moreno Ignacio Lopez
Wang Quan
Publication venue
Publication date: 26/07/2020
Field of study

This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles. A typical speaker recognition system consists of two stages: the enrollment stage, where a profile is generated from user-provided enrollment audio; and the runtime stage, where the voice identity of the runtime audio is compared against the stored profiles. As technology advances, the speaker recognition system needs to be updated for better performance. However, if the stored user profiles are not updated accordingly, version mismatch will result in meaningless recognition results. In this paper, we describe different version control strategies for different types of speaker recognition systems, according to how they are deployed in the production environment

arXiv.org e-Print Archive

Scalability of Distributed Version Control Systems

Author: Anshus Otto
Bjørndalen John Markus
Murphy Michael
Publication venue: NIKT Foundation
Publication date: 01/01/2017
Field of study

Distributed version control systems are popular for storing source code, but they are notoriously ill suited for storing large binary files. We report on the results from a set of experiments designed to characterize the behavior of some widely used distributed version control systems with respect to scaling. The experiments measured commit times and repository sizes when storing single files of increasing size, and when storing increasing numbers of single-kilobyte files. The goal is to build a distributed storage system with characteristics similar to version control but for much larger data sets. An early prototype of such a system, Distributed Media Versioning (DMV), is briefly described and compared with Git, Mercurial, and the Git-based backup tool Bup. We find that processing large files without splitting them into smaller parts will limit maximum file size to what can fit in RAM. Storing millions of small files will result in inefficient use of disk space. And storing files with hash-based file and directory names will result in high-latency write operations, due to having to switch between directories rather than performing a sequential write. The next-phase strategy for DMV will be to break files into chunks by content for de-duplication, then re-aggregating the chunks into append-only log files for low-latency write operations and efficient use of disk space

BIBSYS: Open Journals Systems

Munin - Open Research Archive

NORA - Norwegian Open Research Archives