9 research outputs found

    Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

    Full text link
    Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (e.g., finding development projects) require human intervention. When the amount of data to be processed increases, these semi-automatic solutions become less useful since the effort in need for human intervention is far beyond affordable. To solve this problem, we investigated the GHTorrent dataset and proposed a method to detect public development projects. The results show that our method can effectively improve the sample selection process in two ways: (1) We provide a simple model to automatically select samples (with 0.827 precision and 0.947 recall); (2) We also offer a complex model to help researchers carefully screen samples (with 63.2% less effort than manually confirming all samples, and can achieve 0.926 precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc

    Modeling the Effects of Diversity and Corporations on Participation Dynamics in FLOSS Ecosystems

    Get PDF
    A multitude of societal issues associated with the development of technology have emerged over the years including, but not limited to: insufficient personnel for maintenance; a lack of accessibility; the spread of harmful tools; and bias and discrimination against marginalized groups. I propose that a systems perspective is necessary to identify potential leverage points in technology production systems to influence them towards increased social good and evaluate their effectiveness for intervention. Toward this end, I conducted a mixed-methods study of a widely-adopted approach in tech production, free/libre and open source software (FLOSS) development. A survey was distributed to elicit responses from FLOSS project contributors to characterize their perceptions of diversity and corporate involvement as they relate to participation decisions and information gathering activities in online platforms. To complement this, an analysis of data from FLOSS projects on GitHub was completed to model participation dynamics. Survey results indicate that contributors attend to information that is used to infer group diversity and information about corporate decision making related to FLOSS systems. Furthermore, the influence of this information on participation decisions varies on the basis of economic needs and sociopolitical beliefs. Analyses of eighteen project ecosystems, with over 9,000 contributors, reveal that projects with no to some corporate involvement generally have broader contributor and user bases than those that are owned by a company. Taken together, these findings suggest that the internal practices of companies involved in FLOSS can be perceived as opaque and controlling which is detrimental to both the expansion of a project\u27s contributor base and for increasing diversity across FLOSS ecosystems. This research highlights the need to differentiate projects on the basis of corporate involvement and community ethos to design appropriate interventions. A set of recommendations and research propositions are offered to improve inclusivity, equity, and sustainability in tech development
    corecore