728 research outputs found

    Promises and Perils of Mining Software Package Ecosystem Data

    Full text link
    The use of third-party packages is becoming increasingly popular and has led to the emergence of large software package ecosystems with a maze of inter-dependencies. Since the reliance on these ecosystems enables developers to reduce development effort and increase productivity, it has attracted the interest of researchers: understanding the infrastructure and dynamics of package ecosystems has given rise to approaches for better code reuse, automated updates, and the avoidance of vulnerabilities, to name a few examples. But the reality of these ecosystems also poses challenges to software engineering researchers, such as: How do we obtain the complete network of dependencies along with the corresponding versioning information? What are the boundaries of these package ecosystems? How do we consistently detect dependencies that are declared but not used? How do we consistently identify developers within a package ecosystem? How much of the ecosystem do we need to understand to analyse a single component? How well do our approaches generalise across different programming languages and package ecosystems? In this chapter, we review promises and perils of mining the rich data related to software package ecosystems available to software engineering researchers.Comment: Submitted as a Book Chapte

    Non-Linear Editor for Text-Based Screencast

    Full text link
    Screencasts, where computer screen is broadcast to a large audience on the web, are becoming popular as an online educational tool. Among various types of screencast content, popular are the contents that involve text editing, including computer programming. There are emerging platforms that support such text-based screencasts by recording every character insertion/deletion from the creator and reconstructing its playback on the viewer's screen. However, these platforms lack rich support for creating and editing the screencast itself, mainly due to the difficulty of manipulating recorded text changes; the changes are tightly coupled in sequence, thus modifying arbitrary part of the sequence is not trivial. We present a non-linear editing tool for text-based screencasts. With the proposed selective history rewrite process, our editor allows users to substitute an arbitrary part of a text-based screencast while preserving overall consistency of the rest of the text-based screencast.Comment: To appear in Adjunct Proceedings of the 30th Annual ACM Symposium on User Interface Software & Technology (UIST 2017, Poster

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
    • …
    corecore