Search CORE

84 research outputs found

The perils and pitfalls of mining SourceForge

Author
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2004
Field of study

Automatic Detection of Public Development Projects in Large Open Source Ecosystems: An Exploratory Study on GitHub

Author: Cheng Can
Li Bing
Li Zengyang
Liang Peng
Publication venue: 'KSI Research Inc.'
Publication date: 08/05/2018
Field of study

Hosting over 10 million of software projects, GitHub is one of the most important data sources to study behavior of developers and software projects. However, with the increase of the size of open source datasets, the potential threats to mining these datasets have also grown. As the dataset grows, it becomes gradually unrealistic for human to confirm quality of all samples. Some studies have investigated this problem and provided solutions to avoid threats in sample selection, but some of these solutions (e.g., finding development projects) require human intervention. When the amount of data to be processed increases, these semi-automatic solutions become less useful since the effort in need for human intervention is far beyond affordable. To solve this problem, we investigated the GHTorrent dataset and proposed a method to detect public development projects. The results show that our method can effectively improve the sample selection process in two ways: (1) We provide a simple model to automatically select samples (with 0.827 precision and 0.947 recall); (2) We also offer a complex model to help researchers carefully screen samples (with 63.2% less effort than manually confirming all samples, and can achieve 0.926 precision and 0.959 recall).Comment: Accepted by the SEKE2018 Conferenc

arXiv.org e-Print Archive

Crossref

Firms on SourceForge

Author: Eilhard Jan
Publication venue
Publication date
Field of study

This paper explores empirically what factors inﬂuence a ﬁrm’s decision to contribute and to take leadership in open source projects. Increasing ﬁrms’ participation in the development of open source software (OSS) is generally perceived as a puzzle. Assuming that ﬁrms face a ”Make-or-Buy” decision before using OSS, we argue that contribution is in fact the best way for them to keep control of their supplier in a context where incomplete open source licenses govern transactions. Building on this proposition, we derive predictions on the drivers of ﬁrms’ contribution and leadership in open source projects, and test them on a unique dataset of 4,808 open source projects extracted from Sourceforge. Our empirical ﬁndings conﬁrm the predictions and lend support to our hypotheses.Open source; transaction cost; governance; firm boundaries; software

Research Papers in Economics

Structural Complexity and Decay in FLOSS Systems: An Inter-Repository Study

Author: Beecher Karl
Capiluppi Andrea
Publication venue
Publication date: 01/01/2009
Field of study

Past software engineering literature has firmly established that software architectures and the associated code decay over time. Architectural decay is, potentially, a major issue in Free/Libre/Open Source Software (FLOSS) projects, since developers sporadically joining FLOSS projects do not always have a clear understanding of the underlying architecture, and may break the overall conceptual structure by several small changes to the code base. This paper investigates whether the structure of a FLOSS system and its decay can also be influenced by the repository in which it is retained: specifically, two FLOSS repositories are studied to understand whether the complexity of the software structure in the sampled projects is comparable, or one repository hosts more complex systems than the other. It is also studied whether the effort to counteract this complexity is dependent on the repository, and the governance it gives to the hosted projects. The results of the paper are two-fold: on one side, it is shown that the repository hosting larger and more active projects presents more complex structures. On the other side, these larger and more complex systems benefit from more anti-regressive work to reduce this complexity

University of Lincoln Institutional Repository

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Pitfalls and Guidelines for Using Time-Based Git Data

Author: Chauhan Jigyasa
Dyer Robert
Flint Samuel W.
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 13/03/2022
Field of study

Many software engineering research papers rely on time-based data (e.g., commit timestamps, issue report creation/update/close dates, release dates). Like most real-world data however, time-based data is often dirty. To date, there are no studies that quantify how frequently such data is used by the software engineering research community, or investigate sources of and quantify how often such data is dirty. Depending on the research task and method used, including such dirty data could aect the research results. This paper presents an extended survey of papers that utilize time-based data, published in the Mining Software Repositories (MSR) conference series. Out of the 754 technical track and data papers published in MSR 2004{2021, we saw at least 290 (38%) papers utilized time-based data. We also observed that most time-based data used in research papers comes in the form of Git commits, often from GitHub. Based on those results, we then used the Boa and Software Heritage infrastructures to help identify and quantify several sources of dirty Git timestamp data. Finally we provide guidelines/best practices for researchers utilizing time-based data from Git repositories

DigitalCommons@University of Nebraska

Recommended from our members

Similarities, challenges and opportunities of wikipedia content and open source projects

Author: Capiluppi A
Publication venue: 'Wiley'
Publication date: 04/09/2012
Field of study

Copyright @ 2012 John Wiley & Sons, Ltd.Several years of research and evidence have demonstrated that Open Source Software (OSS) portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of “usefulness” and “modularity” we isolate valuable content in both Wikipedia pages and OSS projects

Brunel University Research Archive

Application domains in the research papers at benevol:A retrospective

Author: Ajienka Nemitari
Capiluppi Andrea
Romo Bilyaminu Auwal
Publication venue: CEUR-WS.org
Publication date: 01/01/2019
Field of study

Proceedings - University of Groningen

An Introduction to Software Ecosystems

Author: De Roover Coen
Mens Tom
Publication venue
Publication date: 28/07/2023
Field of study

This chapter defines and presents different kinds of software ecosystems. The focus is on the development, tooling and analytics aspects of software ecosystems, i.e., communities of software developers and the interconnected software components (e.g., projects, libraries, packages, repositories, plug-ins, apps) they are developing and maintaining. The technical and social dependencies between these developers and software components form a socio-technical dependency network, and the dynamics of this network change over time. We classify and provide several examples of such ecosystems. The chapter also introduces and clarifies the relevant terms needed to understand and analyse these ecosystems, as well as the techniques and research methods that can be used to analyse different aspects of these ecosystems.Comment: Preprint of chapter "An Introduction to Software Ecosystems" by Tom Mens and Coen De Roover, published in the book "Software Ecosystems: Tooling and Analytics" (eds. T. Mens, C. De Roover, A. Cleve), 2023, ISBN 978-3-031-36059-6, reproduced with permission of Springer. The final authenticated version of the book and this chapter is available online at: https://doi.org/10.1007/978-3-031-36060-

arXiv.org e-Print Archive

Dynamics of Innovation in an “Open Source” Collaboration Environment: Lurking, Laboring and Launching FLOSS Projects on SourceForge

Author: Francesco Rullani
Paul David
Publication venue
Publication date
Field of study

A systems analysis perspective is adopted to examine the critical properties of the Free/Libre/Open Source Software (FLOSS) mode of innovation, as reflected on the SourceForge platform (SF.net). This approach re-scales March’s (1991) framework and applies it to characterize the “innovation system” of a “distributed organization” of interacting agents in a virtual collaboration environment. The innovation system of the virtual collaboration environment is an emergent property of two “coupled” processes: one involves interactions among agents searching for information to use in designing novel software products, and the other involves the mobilization of individual capabilities for application in the software development projects. Micro-dynamics of this system are studied empirically by constructing transition probability matrices representing movements of 222,835 SF.net users among 7 different activity states. Estimated probabilities are found to form first-order Markov chains describing ergodic processes. This makes it possible to computate the equilibrium distribution of agents among the states, thereby suppressing transient effects and revealing persisting patterns of project-joining and project-launching.innovation systems, collaborative development environments, industrial districts, exploration and exploitation dynamics, open source software, FLOSS, SourceForge, project-joining, project-founding, Markov chain analysis.

Research Papers in Economics

Antipatterns in software classification taxonomies

Author: Capiluppi Andrea
Sas Cezar
Publication venue: 'Elsevier BV'
Publication date: 01/08/2022
Field of study

Empirical results in software engineering have long started to show that findings are unlikely to be applicable to all software systems, or any domain: results need to be evaluated in specified contexts, and limited to the type of systems that they were extracted from. This is a known issue, and requires the establishment of a classification of software types. This paper makes two contributions: the first is to evaluate the quality of the current software classifications landscape. The second is to perform a case study showing how to create a classification of software types using a curated set of software systems. Our contributions show that existing, and very likely even new, classification attempts are deemed to fail for one or more issues, that we named as the ‘antipatterns’ of software classification tasks. We collected 7 of these antipatterns that emerge from both our case study, and the existing classifications. These antipatterns represent recurring issues in a classification, so we discuss practical ways to help researchers avoid these pitfalls. It becomes clear that classification attempts must also face the daunting task of formulating a taxonomy of software types, with the objective of establishing a hierarchy of categories in a classification

Proceedings - University of Groningen

University of Groningen

Dissertations of the University of Groningen