140 research outputs found

    git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

    Full text link
    Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure

    How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?

    Full text link
    Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost model based on early participation (first three months of activity) in 290,255 GitHub projects and we interpret the model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023

    On Reducing the Energy Consumption of Software: From Hurdles to Requirements

    Get PDF
    International audienceBackground. As software took control over hardware in many domains, the question of the energy footprint induced by the software is becoming critical for our society, as the resources powering the underlying infrastructure are finite. Yet, beyond this growing interest, energy consumption remains a difficult concept to master for a developer.Aims. The purpose of this study is to better understand the root causes that prevent software energy consumption to be more widely adopted by developers and companies.Method. To investigate this issue, this paper reports on a qualitative study we conducted in an industrial context. We applied an in-depth analysis of the interviews of 10 experienced developers and summarized a set of implications.Results. We argue that our study delivers i) insightful feedback on how green software design is considered among the interviewed developers and ii) a set of findings to build helpful tools, motivate further research, and establish better development strategies to promote green software design.Conclusion. This paper covers an industrial case study of developers' awareness of green software design and how to promote it within the company. While it might not be generalizable for any company, we believe our results deliver a common body of knowledge with implications to be considered for similar cases and further researches

    Detecting and Characterizing Propagation of Security Weaknesses in Puppet-based Infrastructure Management

    Full text link
    Despite being beneficial for managing computing infrastructure automatically, Puppet manifests are susceptible to security weaknesses, e.g., hard-coded secrets and use of weak cryptography algorithms. Adequate mitigation of security weaknesses in Puppet manifests is thus necessary to secure computing infrastructure that are managed with Puppet manifests. A characterization of how security weaknesses propagate and affect Puppet-based infrastructure management, can inform practitioners on the relevance of the detected security weaknesses, as well as help them take necessary actions for mitigation. To that end, we conduct an empirical study with 17,629 Puppet manifests mined from 336 open source repositories. We construct Taint Tracker for Puppet Manifests (TaintPup), for which we observe 2.4 times more precision compared to that of a state-of-the-art security static analysis tool. TaintPup leverages Puppet-specific information flow analysis using which we characterize propagation of security weaknesses. From our empirical study, we observe security weaknesses to propagate into 4,457 resources, i.e, Puppet-specific code elements used to manage infrastructure. A single instance of a security weakness can propagate into as many as 35 distinct resources. We observe security weaknesses to propagate into 7 categories of resources, which include resources used to manage continuous integration servers and network controllers. According to our survey with 24 practitioners, propagation of security weaknesses into data storage-related resources is rated to have the most severe impact for Puppet-based infrastructure management.Comment: 14 pages, currently under revie

    Software architecture social debt:managing the incommunicability factor

    Get PDF
    Architectural technical debt is the additional project cost connected to technical issues nested in software architectures. Similarly, many practitioners have already experienced that there exists within software architectures a form of social debt, that is, the additional project cost connected to sociotechnical and organizational issues evident in or related to software architectures. This paper illustrates four recurrent antipatterns or community smells connected to such architectural social debt and outlines a means to measure the additional project cost connected to their underlying cause: decision incommunicability. Evaluating the results in multiple focus groups, this paper concludes that studying social debt and community smells at the architecture level may prove vital to rid software development communities of critical organizational flaws incurring considerable additional cost.</p
    • 

    corecore