14 research outputs found

    “It Takes All Kinds”: A Simulation Modeling Perspective on Motivation and Coordination in Libre Software Development Projects

    Get PDF
    This paper presents a stochastic simulation model to study implications of the mechanisms by which individual software developers’ efforts are allocated within large and complex open source software projects. It illuminates the role of different forms of “motivations-at-the-margin” in the micro-level resource allocation process of distributed and decentralized multi-agent engineering undertakings of this kind. We parameterize the model by isolating the parameter ranges in which it generates structures of code that share certain empirical regularities found to characterize actual projects. We find that, in this range, a variety of different motivations are represented within the community of developers. There is a correspondence between the indicated mixture of motivations and the distribution of avowed motivations for engaging in FLOSS development, found in the survey responses of developers who were participants in large projects.free and open source software (FLOSS), libre software engineering, maintainability, reliability, functional diversity, modularity, developers’ motivations, user-innovation, peer-esteem, reputational reward systems, agent-based modeling, stochastic simulation, stigmergy, morphogenesis.

    Episodic Peripheral Contributors and Technical Dependencies in Open Source Software (OSS) Ecosystems

    Get PDF
    Despite the fact that OSS contributors tend to eschew traditional organizational hierarchies, researchers have found that, in many cases, OSS contributors make tightly coupled system designs and successfully coordinate highly interdependent tasks. Although researchers have explained how OSS contributors make tightly coupled code contributions, we do not know the characteristics of individuals who make such contributions. While previous studies have considered OSS projects as single, independent containers, I note that OSS projects do not constitute independent or standalone entities but reuse and, thus, depend one another. This reuse creates complex networks of interdependencies called “software ecosystems”. In this paper, I analyze OSS contributors who have made tightly coupled code contributions using two lenses: the core-periphery lens and the habitual-episodic lens. Based on investigating three volunteer-driven OSS projects, I found OSS contributors who make tightly coupled code contributions to have different code-contribution patterns. Interestingly, I found that half of such contributors made no previous code contributions to the sampled projects but episodically authored patches (or pull requests) that increased software coupling. Based on further investigation, I suggest a multiple-fluid-container view that accommodates software ecosystems in which multiple containers (multiple OSS projects) co-evolve with each container (each OSS project) readily accessible

    Authorship Attribution of Source Code: A Language-Agnostic Approach and Applicability in Software Engineering

    Full text link
    Authorship attribution of source code has been an established research topic for several decades. State-of-the-art results for the authorship attribution problem look promising for the software engineering field, where they could be applied to detect plagiarized code and prevent legal issues. With this study, we first introduce a language-agnostic approach to authorship attribution of source code. Two machine learning models based on our approach match or improve over state-of-the-art results, originally achieved by language-specific approaches, on existing datasets for code in C++, Python, and Java. After that, we discuss limitations of existing synthetic datasets for authorship attribution, and propose a data collection approach that delivers datasets that better reflect aspects important for potential practical use in software engineering. In particular, we discuss the concept of work context and its importance for authorship attribution. Finally, we demonstrate that high accuracy of authorship attribution models on existing datasets drastically drops when they are evaluated on more realistic data. We conclude the paper by outlining next steps in design and evaluation of authorship attribution models that could bring the research efforts closer to practical use.Comment: 12 page

    On the Use of Process Trails to Understand Software Development

    Full text link

    Methods of Disambiguating and De-anonymizing Authorship in Large Scale Operational Data

    Get PDF
    Operational data from software development, social networks and other domains are often contaminated with incorrect or missing values. Examples include misspelled or changed names, multiple emails belonging to the same person and user profiles that vary in different systems. Such digital traces are extensively used in research and practice to study collaborating communities of various kinds. To achieve a realistic representation of the networks that represent these communities, accurate identities are essential. In this work, we aim to identify, model, and correct identity errors in data from open-source software repositories, which include more than 23M developer IDs and nearly 1B Git commits (developer activity records). Our investigation into the nature and prevalence of identity errors in software activity data reveals that they are different and occur at much higher rates than other domains. Existing techniques relying on string comparisons can only disambiguate Synonyms, but not Homonyms, which are common in software activity traces. Therefore, we introduce measures of behavioral fingerprinting to improve the accuracy of Synonym resolution, and to disambiguate Homonyms. Fingerprints are constructed from the traces of developers’ activities, such as, the style of writing in commit messages, the patterns in files modified and projects participated in by developers, and the patterns related to the timing of the developers’ activity. Furthermore, to address the lack of training data necessary for the supervised learning approaches that are used in disambiguation, we design a specific active learning procedure that minimizes the manual effort necessary to create training data in the domain of developer identity matching. We extensively evaluate the proposed approach, using over 16,000 OpenStack developers in 1200 projects, against commercial and most recent research approaches, and further on recent research on a much larger sample of over 2,000,000 IDs. Results demonstrate that our method is significantly better than both the recent research and commercial methods. We also conduct experiments to demonstrate that such erroneous data have significant impact on developer networks. We hope that the proposed approach will expedite research progress in the domain of software engineering, especially in applications for which graphs of social networks are critical

    Developer identification methods for integrated data from various sources

    No full text
    corecore