140 research outputs found
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories
Data from software repositories have become an important foundation for the
empirical study of software engineering processes. A recurring theme in the
repository mining literature is the inference of developer networks capturing
e.g. collaboration, coordination, or communication from the commit history of
projects. Most of the studied networks are based on the co-authorship of
software artefacts defined at the level of files, modules, or packages. While
this approach has led to insights into the social aspects of software
development, it neglects detailed information on code changes and code
ownership, e.g. which exact lines of code have been authored by which
developers, that is contained in the commit log of software projects.
Addressing this issue, we introduce git2net, a scalable python software that
facilitates the extraction of fine-grained co-editing networks in large git
repositories. It uses text mining techniques to analyse the detailed history of
textual modifications within files. This information allows us to construct
directed, weighted, and time-stamped networks, where a link signifies that one
developer has edited a block of source code originally written by another
developer. Our tool is applied in case studies of an Open Source and a
commercial software project. We argue that it opens up a massive new source of
high-resolution data on human collaboration patterns.Comment: MSR 2019, 12 pages, 10 figure
How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?
Although the open source model bears many advantages in software development,
open source projects are always hard to sustain. Previous research on open
source sustainability mainly focuses on projects that have already reached a
certain level of maturity (e.g., with communities, releases, and downstream
projects). However, limited attention is paid to the development of
(sustainable) open source projects in their infancy, and we believe an
understanding of early sustainability determinants is crucial for project
initiators, incubators, newcomers, and users.
In this paper, we aim to explore the relationship between early participation
factors and long-term project sustainability. We leverage a novel methodology
combining the Blumberg model of performance and machine learning to predict the
sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost
model based on early participation (first three months of activity) in 290,255
GitHub projects and we interpret the model using LIME. We quantitatively show
that early participants have a positive effect on project's future sustained
activity if they have prior experience in OSS project incubation and
demonstrate concentrated focus and steady commitment. Participation from
non-code contributors and detailed contribution documentation also promote
project's sustained activity. Compared with individual projects, building a
community that consists of more experienced core developers and more active
peripheral developers is important for organizational projects. This study
provides unique insights into the incubation and recognition of sustainable
open source projects, and our interpretable prediction approach can also offer
guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and
Symposium on the Foundations of Software Engineering (ESEC/FSE 2023
On Reducing the Energy Consumption of Software: From Hurdles to Requirements
International audienceBackground. As software took control over hardware in many domains, the question of the energy footprint induced by the software is becoming critical for our society, as the resources powering the underlying infrastructure are finite. Yet, beyond this growing interest, energy consumption remains a difficult concept to master for a developer.Aims. The purpose of this study is to better understand the root causes that prevent software energy consumption to be more widely adopted by developers and companies.Method. To investigate this issue, this paper reports on a qualitative study we conducted in an industrial context. We applied an in-depth analysis of the interviews of 10 experienced developers and summarized a set of implications.Results. We argue that our study delivers i) insightful feedback on how green software design is considered among the interviewed developers and ii) a set of findings to build helpful tools, motivate further research, and establish better development strategies to promote green software design.Conclusion. This paper covers an industrial case study of developers' awareness of green software design and how to promote it within the company. While it might not be generalizable for any company, we believe our results deliver a common body of knowledge with implications to be considered for similar cases and further researches
Detecting and Characterizing Propagation of Security Weaknesses in Puppet-based Infrastructure Management
Despite being beneficial for managing computing infrastructure automatically,
Puppet manifests are susceptible to security weaknesses, e.g., hard-coded
secrets and use of weak cryptography algorithms. Adequate mitigation of
security weaknesses in Puppet manifests is thus necessary to secure computing
infrastructure that are managed with Puppet manifests. A characterization of
how security weaknesses propagate and affect Puppet-based infrastructure
management, can inform practitioners on the relevance of the detected security
weaknesses, as well as help them take necessary actions for mitigation. To that
end, we conduct an empirical study with 17,629 Puppet manifests mined from 336
open source repositories. We construct Taint Tracker for Puppet Manifests
(TaintPup), for which we observe 2.4 times more precision compared to that of a
state-of-the-art security static analysis tool. TaintPup leverages
Puppet-specific information flow analysis using which we characterize
propagation of security weaknesses. From our empirical study, we observe
security weaknesses to propagate into 4,457 resources, i.e, Puppet-specific
code elements used to manage infrastructure. A single instance of a security
weakness can propagate into as many as 35 distinct resources. We observe
security weaknesses to propagate into 7 categories of resources, which include
resources used to manage continuous integration servers and network
controllers. According to our survey with 24 practitioners, propagation of
security weaknesses into data storage-related resources is rated to have the
most severe impact for Puppet-based infrastructure management.Comment: 14 pages, currently under revie
Software architecture social debt:managing the incommunicability factor
Architectural technical debt is the additional project cost connected to technical issues nested in software architectures. Similarly, many practitioners have already experienced that there exists within software architectures a form of social debt, that is, the additional project cost connected to sociotechnical and organizational issues evident in or related to software architectures. This paper illustrates four recurrent antipatterns or community smells connected to such architectural social debt and outlines a means to measure the additional project cost connected to their underlying cause: decision incommunicability. Evaluating the results in multiple focus groups, this paper concludes that studying social debt and community smells at the architecture level may prove vital to rid software development communities of critical organizational flaws incurring considerable additional cost.</p
- âŠ