Gitome: A curated dataset for GitHub README-related tasks

Abstract

<h2><strong>About </strong></h2><p>This repository contains the source code implementation used to replicate the experimental results obtained in the submitted to the 21st International Conference on Mining Software Repositories (MSR204).</p><p><i>"Gitome: A curated dataset for GitHub README-related tasks"</i></p><p>authored by:</p><p>Claudio Di Sipio, Juri Di Rocco, Riccardo Rubei, Phuong Than Nguyen, and Davide Di Ruscio,</p><p>Università degli Studi dell'Aquila, Italy</p><h2><strong>Data description </strong></h2><p>The dataset is structured as follows: </p><ul><li><strong>emf_metamodel.zip:</strong> It contains the Ecore project with the Gitome data model</li><li><strong>existing_dumps.zip</strong>: It contains the existing datasets used to build Gitome</li><li><strong>lang_aggr_stats.csv: </strong>It contains the language data to compute the statistics presented in the paper</li><li><strong>langs.csv: </strong>It contains all the languages and their frequency</li><li><strong>output_dataset.zip:</strong> It contains the benchmarking dataset obtained by parsing the README files</li><li><strong>repository_lists.zip: </strong>It contains the list of repositories for each considered dataset (with possible duplicates)</li><li><strong>topics.csv:</strong> It contains all the topics and their frequency</li><li><strong>topics_aggr_stats.csv:  </strong>It contains the topics data to compute the statistics presented in the paper</li><li><strong>gitome_repo.txt</strong>: It contains the list of the URLs of the considered GitHub repositories</li></ul><p> </p><h2><strong>How to collect Gitome</strong></h2><p>To collect all the data stored in this archive, please refer to the supporting Github repository https://github.com/MDEGroup/Gitome-MSR2024.</p><p> </p><p> </p&gt

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 18/08/2024