8 research outputs found

    Characteristics of Useful Code Reviews: An Empirical Study at Microsoft

    Get PDF
    Abstract-Over the past decade, both open source and commercial software projects have adopted contemporary peer code review practices as a quality control mechanism. Prior research has shown that developers spend a large amount of time and effort performing code reviews. Therefore, identifying factors that lead to useful code reviews can benefit projects by increasing code review effectiveness and quality. In a three-stage mixed research study, we qualitatively investigated what aspects of code reviews make them useful to developers, used our findings to build and verify a classification model that can distinguish between useful and not useful code review feedback, and finally we used this classifier to classify review comments enabling us to empirically investigate factors that lead to more effective code review feedback. In total, we analyzed 1.5 millions review comments from five Microsoft projects and uncovered many factors that affect the usefulness of review feedback. For example, we found that the proportion of useful comments made by a reviewer increases dramatically in the first year that he or she is at Microsoft but tends to plateau afterwards. In contrast, we found that the more files that are in a change, the lower the proportion of comments in the code review that will be of value to the author of the change. Based on our findings, we provide recommendations for practitioners to improve effectiveness of code reviews

    Code Ownership and Software Quality: A Replication Study",

    No full text
    Using new and refined code ownership metrics we were able to classify source files that contained at least one bug with a median precision of 0.74 and a median recall of 0.38. On directory level, we achieve a precision of 0.76 and a recall of 0.60

    An Actionable Framework for Understanding and Improving Developer Experience

    Full text link
    Developer experience is an important concern for software organizations as enhancing developer experience improves productivity, satisfaction, engagement and retention. We set out to understand what affects developer experience through semi-structured interviews with 21 developers from industry, which we transcribed and iteratively coded. Our findings elucidate factors that affect developer experience and characteristics that influence their respective importance to individual developers. We also identify strategies employed by individuals and teams to improve developer experience and the barriers that stand in their way. Lastly, we describe the coping mechanisms of developers when developer experience cannot be sufficiently improved. Our findings result in the DX Framework, an actionable conceptual framework for understanding and improving developer experience. The DX Framework provides a go-to reference for organizations that want to enable more productive and effective work environments for their developers.Comment: 15 pages, 1 figur

    The Art of Testing Less without Sacrificing Quality

    No full text
    Abstract-Testing is a key element of software development processes for the management and assessment of product quality. In most development environments, the software engineers are responsible for ensuring the functional correctness of code. However, for large complex software products, there is an additional need to check that changes do not negatively impact other parts of the software and they comply with system constraints such as backward compatibility, performance, security etc. Ensuring these system constraints may require complex verification infrastructure and test procedures. Although such tests are time consuming and expensive and rarely find defects they act as an insurance process to ensure the software is compliant. However, long lasting tests increasingly conflict with strategic aims to shorten release cycles. To decrease production costs and to improve development agility, we created a generic test selection strategy called THEO that accelerates test processes without sacrificing product quality. THEO is based on a cost model, which dynamically skips tests when the expected cost of running the test exceeds the expected cost of removing it. We replayed past development periods of three major Microsoft products resulting in a reduction of 50% of test executions, saving millions of dollars per year, while maintaining product quality

    Strategies for Avoiding Text Fixture Smells during Software Evolution

    No full text
    Abstract—An important challenge in creating automated tests is how to design test fixtures, i.e., the setup code that initializes the system under test before actual automated testing can start. Test designers have to choose between different approaches for the setup, trading off maintenance overhead with slow test execution. Over time, test code quality can erode and test smells can develop, such as the occurrence of overly general fixtures, obscure inline code and dead fields. In this paper, we investigate how fixture-related test smells evolve over time by analyzing several thousand revisions of five open source systems. Our findings indicate that setup management strategies strongly influence the types of test fixture smells that emerge in code, and that several types of fixture smells often emerge at the same time. Based on this information, we recommend important guidelines for setup strategies, and suggest how tool support can be improved to help in both avoiding the emergence of such smells as well as how to refactor code when test smells do appear. Index Terms—test fixture smells; test evolution; maintenance; I

    CodeFlow: Improving the Code Review Process at Microsoft

    No full text

    Investigating Severity Thresholds for Test Smells

    Get PDF
    Test smells are poor design decisions implemented in test code, which can have an impact on the effectiveness and maintainability of unit tests. Even though test smell detection tools exist, how to rank the severity of the detected smells is an open research topic. In this work, we aim at investigating the severity rating for four test smells and investigate their perceived impact on test suite maintainability by the developers. To accomplish this, we first analyzed some 1,500 open-source projects to elicit severity thresholds for commonly found test smells. Then, we conducted a study with developers to evaluate our thresholds. We found that (1) current detection rules for certain test smells are considered as too strict by the developers and (2) our newly defined severity thresholds are in line with the participants' perception of how test smells have an impact on the maintainability of a test suite. Preprint [https://doi.org/10.5281/zenodo.3744281], data and material [https://doi.org/10.5281/zenodo.3611111].Software Engineerin

    Similarity-based prioritization of test case automation

    Get PDF
    The importance of efficient software testing procedures is driven by an ever increasing system complexity as well as global competition. In the particular case of manual test cases at the system integration level, where thousands of test cases may be executed before release, time must be well spent in order to test the system as completely and as efficiently as possible. Automating a subset of the manual test cases, i.e, translating the manual instructions to automatically executable code, is one way of decreasing the test effort. It is further common that test cases exhibit similarities, which can be exploited through reuse when automating a test suite. In this paper, we investigate the potential for reducing test effort by ordering the test cases before such automation, given that we can reuse already automated parts of test cases. In our analysis, we investigate several approaches for prioritization in a case study at a large Swedish vehicular manufacturer. The study analyzes the effects with respect to test effort, on four projects with a total of 3919 integration test cases constituting 35,180 test steps, written in natural language. The results show that for the four projects considered, the difference in expected manual effort between the best and the worst order found is on average 12 percentage points. The results also show that our proposed prioritization method is nearly as good as more resource demanding meta-heuristic approaches at a fraction of the computational time. Based on our results, we conclude that the order of automation is important when the set of test cases contain similar steps (instructions) that cannot be removed, but are possible to reuse. More precisely, the order is important with respect to how quickly the manual test execution effort decreases for a set of test cases that are being automated.
    corecore