24 research outputs found

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    Online division of labour: emergent structures in Open Source Software

    Get PDF
    The development Open Source Software fundamentally depends on the participation and commitment of volunteer developers to progress on a particular task. Several works have presented strategies to increase the on-boarding and engagement of new contributors, but little is known on how these diverse groups of developers self-organise to work together. To understand this, one must consider that, on one hand, platforms like GitHub provide a virtually unlimited development framework: any number of actors can potentially join to contribute in a decentralised, distributed, remote, and asynchronous manner. On the other, however, it seems reasonable that some sort of hierarchy and division of labour must be in place to meet human biological and cognitive limits, and also to achieve some level of efficiency. These latter features (hierarchy and division of labour) should translate into detectable structural arrangements when projects are represented as developer-file bipartite networks. Thus, in this paper we analyse a set of popular open source projects from GitHub, placing the accent on three key properties: nestedness, modularity and in-block nestedness -which typify the emergence of heterogeneities among contributors, the emergence of subgroups of developers working on specific subgroups of files, and a mixture of the two previous, respectively. These analyses show that indeed projects evolve into internally organised blocks. Furthermore, the distribution of sizes of such blocks is bounded, connecting our results to the celebrated Dunbar number both in off- and on-line environments. Our conclusions create a link between bio-cognitive constraints, group formation and online working environments, opening up a rich scenario for future research on (online) work team assembly (e.g. size, composition, and formation). From a complex network perspective, our results pave the way for the study of time-resolved datasets, and the design of suitable models that can mimic the growth and evolution of OSS projects

    PENGARUH POLA HUBUNGAN PENGEMBANG PADA EVOLUSI PERANGKAT LUNAK

    Get PDF
    Pola hubungan antara individu dalam sebuah pekerjaan, dapat mempengaruhi tingkat ketercapaian pekerjaan dan kualitas produk yang dihasilkan. Hipotesa tersebut menjadi latar belakang dalam penelitian ini untuk menyelediki pengaruh pola hubungan dalam interaksi antar pengembang terhadap evolusi sebuah perangkat lunak. Pemanfaatan rekam data dalam rekayasa perangkat lunak telah digunakan secara luas untuk mempelajari dan meningkatkan kualitas proses pengembangan perangkat lunak. Pola hubungan antar pengembang dapat diekstraksi dari event log (catatan kejadian) dengan menggunakan teknik-teknik process mining yang menggabungkan konsep manajemen proses bisnis dan analisa jejaring sosial (social network analysis, SNA). Pola hubungan pengembang sebagai individu dalam komunitas, diukur secara kuantitatif melalui pendekatan yang didasarkan pada metrik SNA yang meliputi pola: (1) aktivitas dalam hubungan sebabakibat (causality), (2) aktivitas dalam kasus yang berhubungan (joint cases), (3) aktivitas yang serupa (similar task), dan (4) aktivitas dalam kasus tertentu (special event). Sedangkan evolusi perangkat lunak diamati dari produk pengembang untuk jumlah fitur baru, jumlah bug yang ditangani, penyempurnaan fitur (enhancement), dan permintaan dukungan (support request) yang berhasil diselesaikan. Dengan menggunakan metode Partial Least Sqeare (PLS),dapat disimpulkan bahwa pada studi kasus yang digunakan, pola hubungan sebab akibat memiliki tingkat signifikansi yang paling baik terhadap evolusi perangkat lunak dengan nilai p-value 9.022E-1
    corecore