193 research outputs found

    Continuous Defect Prediction: The Idea and a Related Dataset

    Get PDF
    We would like to present the idea of our Continuous Defect Prediction (CDP) research and a related dataset that we created and share. Our dataset is currently a set of more than 11 million data rows, representing files involved in Continuous Integration (CI) builds, that synthesize the results of CI builds with data we mine from software repositories. Our dataset embraces 1265 software projects, 30,022 distinct commit authors and several software process metrics that in earlier research appeared to be useful in software defect prediction. In this particular dataset we use TravisTorrent as the source of CI data. TravisTorrent synthesizes commit level information from the Travis CI server and GitHub open-source projects repositories. We extend this data to a file change level and calculate the software process metrics that may be used, for example, as features to predict risky software changes that could break the build if committed to a repository with CI enabled.Comment: Lech Madeyski and Marcin Kawalerowicz. "Continuous Defect Prediction: The Idea and a Related Dataset" In: 14th International Conference on Mining Software Repositories (MSR'17). Buenos Aires. 2017, pp. 515-518. doi: 10.1109/MSR.2017.46. URL: http://madeyski.e-informatyka.pl/download/MadeyskiKawalerowiczMSR17.pd

    Defect prediction with bad smells in code

    Get PDF
    Background: Defect prediction in software can be highly beneficial for development projects, when prediction is highly effective and defect-prone areas are predicted correctly. One of the key elements to gain effective software defect prediction is proper selection of metrics used for dataset preparation. Objective: The purpose of this research is to verify, whether code smells metrics, collected using Microsoft CodeAnalysis tool, added to basic metric set, can improve defect prediction in industrial software development project. Results: We verified, if dataset extension by the code smells sourced metrics, change the effectiveness of the defect prediction by comparing prediction results for datasets with and without code smells-oriented metrics. In a result, we observed only small improvement of effectiveness of defect prediction when dataset extended with bad smells metrics was used: average accuracy value increased by 0.0091 and stayed within the margin of error. However, when only use of code smells based metrics were used for prediction (without basic set of metrics), such process resulted with surprisingly high accuracy (0.8249) and F-measure (0.8286) results. We also elaborated data anomalies and problems we observed when two different metric sources were used to prepare one, consistent set of data. Conclusion: Extending the dataset by the code smells sourced metric does not significantly improve the prediction effectiveness. Achieved result did not compensate effort needed to collect additional metrics. However, we observed that defect prediction based on the code smells only is still highly effective and can be used especially where other metrics hardly be used.Comment: Chapter 10 in Software Engineering: Improving Practice through Research (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 163-176, 201

    Editorial

    Get PDF

    Editorial

    Get PDF

    Software Metrics in Boa Large-Scale Software Mining Infrastructure: Challenges and Solutions

    Get PDF
    In this paper, we describe our experience implementing some of classic software engineering metrics using Boa - a large-scale software repository mining platform - and its dedicated language. We also aim to take an advantage of the Boa infrastructure to propose new software metrics and to characterize open source projects by software metrics to provide reference values of software metrics based on large number of open source projects. Presented software metrics, well known and proposed in this paper, can be used to build large-scale software defect prediction models. Additionally, we present the obstacles we met while developing metrics, and our analysis can be used to improve Boa in its future releases. The implemented metrics can also be used as a foundation for more complex explorations of open source projects and serve as a guide how to implement software metrics using Boa as the source code of the metrics is freely available to support reproducible research.Comment: Chapter 8 of the book "Software Engineering: Improving Practice through Research" (B. Hnatkowska and M. \'Smia{\l}ek, eds.), pp. 131-146, 201

    On the Effectiveness of Unit Tests in Test-driven Development

    Get PDF
    Background: Writing unit tests is one of the primary activities in test-driven development. Yet, the existing reviews report few evidence supporting or refuting the effect of this development approach on test case quality. Lack of ability and skills of developers to produce sufficiently good test cases are also reported as limitations of applying test-driven development in industrial practice. Objective: We investigate the impact of test-driven development on the effectiveness of unit test cases compared to an incremental test last development in an industrial context. Method: We conducted an experiment in an industrial setting with 24 professionals. Professionals followed the two development approaches to implement the tasks. We measure unit test effectiveness in terms of mutation score. We also measure branch and method coverage of test suites to compare our results with the literature. Results: In terms of mutation score, we have found that the test cases written for a test-driven development task have a higher defect detection ability than test cases written for an incremental test-last development task. Subjects wrote test cases that cover more branches on a test-driven development task compared to the other task. However, test cases written for an incremental test-last development task cover more methods than those written for the second task. Conclusion: Our findings are different from previous studies conducted at academic settings. Professionals were able to perform more effective unit testing with test-driven development. Furthermore, we observe that the coverage measure preferred in academic studies reveal different aspects of a development approach. Our results need to be validated in larger industrial contexts.Istanbul Technical University Scientific Research Projects (MGA-2017-40712), and the Academy of Finland (Decision No. 278354)

    OECD Recommendation's draft concerning access to research data from public funding: A review

    Get PDF
    Sharing research data from public funding is an important topic, especially now, during times of global emergencies like the COVID-19 pandemic, when we need policies that enable rapid sharing of research data. Our aim is to discuss and review the revised Draft of the OECD Recommendation Concerning Access to Research Data from Public Funding. The Recommendation is based on ethical scientific practice, but in order to be able to apply it in real settings, we suggest several enhancements to make it more actionable. In particular, constant maintenance of provided software stipulated by the Recommendation is virtually impossible even for commercial software. Other major concerns are insufficient clarity regarding how to finance data repositories in joint private-public investments, inconsistencies between data security and user-friendliness of access, little focus on the reproducibility of submitted data, risks related to the mining of large data sets, and sensitive (particularly personal) data protection. In addition, we identify several risks and threats that need to be considered when designing and developing data platforms to implement the Recommendation (e.g., not only the descriptions of the data formats but also the data collection methods should be available). Furthermore, the non-even level of readiness of some countries for the practical implementation of the proposed Recommendation poses a risk of its delayed or incomplete implementation
    corecore