5 research outputs found

    An empirical study on developer-related factors characterizing fix-inducing commits

    Get PDF
    This paper analyzes developer-related factors that could influence the likelihood for a commit to induce a fix. Specifically, we focus on factors that could potentially hinder developers\u27 ability to correctly understand the code components involved in the change to be committed as follows: (i) the coherence of the commit (i.e., how much it is focused on a specific topic); (ii) the experience level of the developer on the files involved in the commit; and (iii) the interfering changes performed by other developers on the files involved in past commits. The results of our study indicate that fix-inducing\u27 commits (i.e., commits that induced a fix) are significantly less coherent than clean\u27 commits (i.e., commits that did not induce a fix). Surprisingly, fix-inducing\u27 commits are performed by more experienced developers; yet, those are the developers performing more complex changes in the system. Finally, fix-inducing\u27 commits have a higher number of past interfering changes as compared with clean\u27 commits. Our empirical study sheds light on previously unexplored factors and presents significant results that can be used to improve approaches for defect prediction. Copyright (c) 2016 John Wiley & Sons, Ltd

    Improving Software Project Health Using Machine Learning

    Get PDF
    In recent years, systems that would previously live on different platforms have been integrated under a single umbrella. The increased use of GitHub, which offers pull-requests, issue trackingand version history, and its integration with other solutions such as Gerrit, or Travis, as well as theresponse from competitors, created development environments that favour agile methodologiesby increasingly automating non-coding tasks: automated build systems, automated issue triagingetc. In essence, source-code hosting platforms shifted to continuous integration/continuousdelivery (CI/CD) as a service. This facilitated a shift in development paradigms, adherents ofagile methodology can now adopt a CI/CD infrastructure more easily. This has also created large,publicly accessible sources of source-code together with related project artefacts: GHTorrent andsimilar datasets now offer programmatic access to the whole of GitHub. Project health encompasses traceability, documentation, adherence to coding conventions,tasks that reduce maintenance costs and increase accountability, but may not directly impactfeatures. Overfocus on health can slow velocity (new feature delivery) so the Agile Manifestosuggests developers should travel light — forgo tasks focused on a project health in favourof higher feature velocity. Obviously, injudiciously following this suggestion can undermine aproject’s chances for success. Simultaneously, this shift to CI/CD has allowed the proliferation of Natural Language orNatural Language and Formal Language textual artefacts that are programmatically accessible:GitHub and their competitors allow API access to their infrastructure to enable the creation ofCI/CD bots. This suggests that approaches from Natural Language Processing and MachineLearning are now feasible and indeed desirable. This thesis aims to (semi-)automate tasks forthis new paradigm and its attendant infrastructure by bringing to the foreground the relevant NLPand ML techniques. Under this umbrella, I focus on three synergistic tasks from this domain: (1) improving theissue-pull-request traceability, which can aid existing systems to automatically curate the issuebacklog as pull-requests are merged; (2) untangling commits in a version history, which canaid the beforementioned traceability task as well as improve the usability of determining a faultintroducing commit, or cherry-picking via tools such as git bisect; (3) mixed-text parsing, to allowbetter API mining and open new avenues for project-specific code-recommendation tools
    corecore