    Primary Drivers of Software Maintenance Cost Studied Using Longitudinal Data

    We examine the main drivers of software maintenance effort and cost. We use the ‘Distributed Cognition’ framework to hypothesize about how ‘discovery work’ in maintenance is effected by two types of cost drivers: system attributes (size, complexity, age, etc.) and personnel attributes (number of maintainers, location dispersion, etc.). We test our hypotheses using archival data about over 5,000 maintenance projects carried out between 2009 and 2011 on 412 different operational systems in a large financial institution. We find that personnel attributes are significantly more influential than system attributes. In particular, a marginal change in personnel factors is associated with effort growing much faster than cost, indicating an escalating marginal cost of spreading maintenance work across more maintainers and site locations. We also find, counter to expectation, that two system attributes are negatively linked to maintenance effort and cost. Implications of these findings for research and practices are discussed

    On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests

    Pull-based development has enabled numerous volunteers to contribute to open-source projects with fewer barriers. Nevertheless, a considerable amount of pull requests (PRs) with valid contributions are abandoned by their contributors, wasting the effort and time put in by both the contributors and maintainers. To better understand the underlying dynamics of contributor-abandoned PRs, we conduct a mixed-methods study using both quantitative and qualitative methods. We curate a dataset consisting of 265,325 PRs including 4,450 abandoned ones from ten popular and mature GitHub projects and measure 16 features characterizing PRs, contributors, review processes, and projects. Using statistical and machine learning techniques, we find that complex PRs, novice contributors, and lengthy reviews have a higher probability of abandonment and the rate of PR abandonment fluctuates alongside the projects' maturity or workload. To identify why contributors abandon their PRs, we also manually examine a random sample of 354 abandoned PRs. We observe that the most frequent abandonment reasons are related to the obstacles faced by contributors, followed by the hurdles imposed by maintainers during the review process. Finally, we survey the top core maintainers of the studied projects to understand their perspectives on dealing with PR abandonment and on our findings.Comment: Manuscript accepted for publication in ACM Transactions on Software Engineering and Methodology (TOSEM

    Proceedings of the ECCS 2005 satellite workshop: embracing complexity in design - Paris 17 November 2005

    Embracing complexity in design is one of the critical issues and challenges of the 21st century. As the realization grows that design activities and artefacts display properties associated with complex adaptive systems, so grows the need to use complexity concepts and methods to understand these properties and inform the design of better artifacts. It is a great challenge because complexity science represents an epistemological and methodological swift that promises a holistic approach in the understanding and operational support of design. But design is also a major contributor in complexity research. Design science is concerned with problems that are fundamental in the sciences in general and complexity sciences in particular. For instance, design has been perceived and studied as a ubiquitous activity inherent in every human activity, as the art of generating hypotheses, as a type of experiment, or as a creative co-evolutionary process. Design science and its established approaches and practices can be a great source for advancement and innovation in complexity science. These proceedings are the result of a workshop organized as part of the activities of a UK government AHRB/EPSRC funded research cluster called Embracing Complexity in Design (www.complexityanddesign.net) and the European Conference in Complex Systems (complexsystems.lri.fr).

    Augmentative communication device design, implementation and evaluation

    The ultimate aim of this thesis was to design and implement an advanced software based Augmentative Communication Device (ACD) , or Voice Output Communication Aid NOCA), for non-vocal Learning Disabled individuals by applying current psychological models, theories, and experimental techniques. By taking account of potential user's cognitive and linguistic abilities a symbol based device (Easy Speaker) was produced which outputs naturalistic digitised human speech and sound and makes use of a photorealistic symbol set. In order to increase the size of the available symbol set a hypermedia style dynamic screen approach was employed. The relevance of the hypermedia metaphor in relation to models of knowledge representation and language processing was explored.Laboratory based studies suggested that potential user's could learn to productively operate the software, became faster and more efficient over time when performing set conversational tasks. Studies with unimpaired individuals supported the notion that digitised speech was less cognitively demanding to decode, or listen to.With highly portable, touch based, PC compatible systems beginning to appear it is hoped that the otherwise silent will be able to use the software as their primary means of communication with the speaking world. Extensive field trials over a six month period with a prototype device and in collaboration with user's caregivers strongly suggested this might be the case.Off-device improvements were also noted suggesting that Easy Speaker, or similar software has the potential to be used as a communication training tool. Such training would be likely 10 improve overall communicative effectiveness.To conclude, a model for successful ACD development was proposed

    Requirements Changes Rework Effects: A Case Study

    Although software managers are generally good at estimation, their experience of scheduling reworks is poor. Inconsistent or incorrect effort estimation in turn increases the risk that the completion time for a project will ultimately become problematic. To continually alter software maintenance schedules while maintaining software projects is, in fact, a daunting task. Our proposed framework, validated in a case study, confirms that variables in requirements change suffer from weaknesses in coding, user involvement and user documentation. Our results clearly show that there is significant impact on rework as a result of unexpected errors found to correlate to 1) weak characteristics and attributes as described in the source lines of code, especially in data declaration and data statement, 2) lack of communication between developers and users on a change effect, and 3) unavailability of user documentation. To keep rework under control, new criteria in change request forms are proposed. These criteria are shown in the framework to need refining; thus, the more case studies that are validated, the more reliable the result will be in determining outcomes of effort rework effects

    How Early Participation Determines Long-Term Sustained Activity in GitHub Projects?

    Although the open source model bears many advantages in software development, open source projects are always hard to sustain. Previous research on open source sustainability mainly focuses on projects that have already reached a certain level of maturity (e.g., with communities, releases, and downstream projects). However, limited attention is paid to the development of (sustainable) open source projects in their infancy, and we believe an understanding of early sustainability determinants is crucial for project initiators, incubators, newcomers, and users. In this paper, we aim to explore the relationship between early participation factors and long-term project sustainability. We leverage a novel methodology combining the Blumberg model of performance and machine learning to predict the sustainability of 290,255 GitHub projects. Specificially, we train an XGBoost model based on early participation (first three months of activity) in 290,255 GitHub projects and we interpret the model using LIME. We quantitatively show that early participants have a positive effect on project's future sustained activity if they have prior experience in OSS project incubation and demonstrate concentrated focus and steady commitment. Participation from non-code contributors and detailed contribution documentation also promote project's sustained activity. Compared with individual projects, building a community that consists of more experienced core developers and more active peripheral developers is important for organizational projects. This study provides unique insights into the incubation and recognition of sustainable open source projects, and our interpretable prediction approach can also offer guidance to open source project initiators and newcomers.Comment: The 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE 2023

    Aspect of Code Cloning Towards Software Bug and Imminent Maintenance: A Perspective on Open-source and Industrial Mobile Applications

    As a part of the digital era of microtechnology, mobile application (app) development is evolving with lightning speed to enrich our lives and bring new challenges and risks. In particular, software bugs and failures cost trillions of dollars every year, including fatalities such as a software bug in a self-driving car that resulted in a pedestrian fatality in March 2018 and the recent Boeing-737 Max tragedies that resulted in hundreds of deaths. Software clones (duplicated fragments of code) are also found to be one of the crucial factors for having bugs or failures in software systems. There have been many significant studies on software clones and their relationships to software bugs for desktop-based applications. Unfortunately, while mobile apps have become an integral part of today’s era, there is a marked lack of such studies for mobile apps. In order to explore this important aspect, in this thesis, first, we studied the characteristics of software bugs in the context of mobile apps, which might not be prevalent for desktop-based apps such as energy-related (battery drain while using apps) and compatibility-related (different behaviors of same app in different devices) bugs/issues. Using Support Vector Machine (SVM), we classified about 3K mobile app bug reports of different open-source development sites into four categories: crash, energy, functionality and security bug. We then manually examined a subset of those bugs and found that over 50% of the bug-fixing code-changes occurred in clone code. There have been a number of studies with desktop-based software systems that clearly show the harmful impacts of code clones and their relationships to software bugs. Given that there is a marked lack of such studies for mobile apps, in our second study, we examined 11 open-source and industrial mobile apps written in two different languages (Java and Swift) and noticed that clone code is more bug-prone than non-clone code and that industrial mobile apps have a higher code clone ratio than open-source mobile apps. Furthermore, we correlated our study outcomes with those of existing desktop based studies and surveyed 23 mobile app developers to validate our findings. Along with validating our findings from the survey, we noticed that around 95% of the developers usually copy/paste (code cloning) code fragments from the popular Crowd-sourcing platform, Stack Overflow (SO) to their projects and that over 75% of such developers experience bugs after such activities (the code cloning from SO). Existing studies with desktop-based systems also showed that while SO is one of the most popular online platforms for code reuse (and code cloning), SO code fragments are usually toxic in terms of software maintenance perspective. Thus, in the third study of this thesis, we studied the consequences of code cloning from SO in different open source and industrial mobile apps. We observed that closed-source industrial apps even reused more SO code fragments than open-source mobile apps and that SO code fragments were more change-prone (such as bug) than non-SO code fragments. We also experienced that SO code fragments were related to more bugs in industrial projects than open-source ones. Our studies show how we could efficiently and effectively manage clone related software bugs for mobile apps by utilizing the positive sides of code cloning while overcoming (or at least minimizing) the negative consequences of clone fragments

    Quality Issues in Machine Learning Software Systems

    Context: An increasing demand is observed in various domains to employ Machine Learning (ML) for solving complex problems. ML models are implemented as software components and deployed in Machine Learning Software Systems (MLSSs). Problem: There is a strong need for ensuring the serving quality of MLSSs. False or poor decisions of such systems can lead to malfunction of other systems, significant financial losses, or even threats to human life. The quality assurance of MLSSs is considered a challenging task and currently is a hot research topic. Objective: This paper aims to investigate the characteristics of real quality issues in MLSSs from the viewpoint of practitioners. This empirical study aims to identify a catalog of quality issues in MLSSs. Method: We conduct a set of interviews with practitioners/experts, to gather insights about their experience and practices when dealing with quality issues. We validate the identified quality issues via a survey with ML practitioners. Results: Based on the content of 37 interviews, we identified 18 recurring quality issues and 24 strategies to mitigate them. For each identified issue, we describe the causes and consequences according to the practitioners' experience. Conclusion: We believe the catalog of issues developed in this study will allow the community to develop efficient quality assurance tools for ML models and MLSSs. A replication package of our study is available on our public GitHub repository


    Although there is a long tradition of empirical studies of software developers, few studies have focused on software maintenance. Prior work is predicated on the belief that higher levels of software comprehension are associated with higher levels of performance on modification tasks. This study provides a more complete understanding of the relationship between software comprehension and modification. We conceptualize software maintenance as interlinking comprehension and modification, and argue that the relationship between the two is moderated by cognitive fit. Specifically, cognitive fit exists when the software maintainer's dominant mental representation of the software and their mental representation of the modification task emphasize the same type of knowledge. We hypothesize that when cognitive fit exists, greater improvements in comprehension are associated with higher levels of performance on a modification task. When cognitive fit does not exist, however, the software maintainer's mental representations of the software and of the modification task do not emphasize the same type of knowledge, which may mean that attention is devoted to comprehension at the expense of modification, resulting in lower performance on the modification task. In these circumstances, comprehension and modification tasks may interfere with each other, an effect known as dual-task interference. We therefore hypothesize that performance on a modification task is moderated by the fit between the mental representation of the software and that of the modification task. We tested our theory by varying cognitive fit to create matched and mismatched conditions in a single experiment that used IT professionals as subjects. Our findings support our theory: cognitive fit moderates the relationship between comprehension and modification. Specifically, changes in software comprehension and modification performance are positively related when cognitive fit exists and negatively related when cognitive fit does not exist. Our findings demonstrate the need to examine more complex relationships among the numerous types of tasks involved in software development rather than examining software comprehension alone.Ye
