5 research outputs found
Cross-Dataset Design Discussion Mining
Being able to identify software discussions that are primarily about design,
which we call design mining, can improve documentation and maintenance of
software systems. Existing design mining approaches have good classification
performance using natural language processing (NLP) techniques, but the
conclusion stability of these approaches is generally poor. A classifier
trained on a given dataset of software projects has so far not worked well on
different artifacts or different datasets. In this study, we replicate and
synthesize these earlier results in a meta-analysis. We then apply recent work
in transfer learning for NLP to the problem of design mining. However, for our
datasets, these deep transfer learning classifiers perform no better than less
complex classifiers. We conclude by discussing some reasons behind the transfer
learning approach to design mining.Comment: accepted for SANER 2020, Feb, London, ON. 12 pages. Replication
package: https://doi.org/10.5281/zenodo.359012
Tools, processes and factors influencing of code review
Code review is the most effective quality assurance strategy in software development where reviewers aim to identify defects and improve the quality of source code of both commercial and open-source software. Ultimately, the main purpose of code review activities is to produce better software products. Review comments are the building blocks of code review. There are many approaches to conduct reviews and analysis source code such as pair programming, informal inspections, and formal inspections. Reviewers are responsible for providing comments and suggestions to improve the quality of the proposed source code modifications. This work aims to succinctly describe code review process, giving a framework of the tools and factors influencing code review to aid reviewers and authors in the code review stages and choose the suitable code review tool
Toward Effective Secure Code Reviews: An Empirical Study of Security-Related Coding Weaknesses
Identifying security issues early is encouraged to reduce the latent negative
impacts on software systems. Code review is a widely-used method that allows
developers to manually inspect modified code, catching security issues during a
software development cycle. However, existing code review studies often focus
on known vulnerabilities, neglecting coding weaknesses, which can introduce
real-world security issues that are more visible through code review. The
practices of code reviews in identifying such coding weaknesses are not yet
fully investigated.
To better understand this, we conducted an empirical case study in two large
open-source projects, OpenSSL and PHP. Based on 135,560 code review comments,
we found that reviewers raised security concerns in 35 out of 40 coding
weakness categories. Surprisingly, some coding weaknesses related to past
vulnerabilities, such as memory errors and resource management, were discussed
less often than the vulnerabilities. Developers attempted to address raised
security concerns in many cases (39%-41%), but a substantial portion was merely
acknowledged (30%-36%), and some went unfixed due to disagreements about
solutions (18%-20%). This highlights that coding weaknesses can slip through
code review even when identified. Our findings suggest that reviewers can
identify various coding weaknesses leading to security issues during code
reviews. However, these results also reveal shortcomings in current code review
practices, indicating the need for more effective mechanisms or support for
increasing awareness of security issue management in code reviews
Recommended from our members
Identifying Design Discussion Topics in Open Source Software
When contributing to a software system, developers need to understand the rationale for previous design decisions so that they can adhere to the system’s design. Not doing so can lead to erosion of the overall design quality of the system. However, discussions embedded in a large volume of communication on different topics, frag- mented across multiple communication channels, makes it difficult, if not impossible, to retrieve the relevant discussions efficiently. Assorting these design discussions by their topic can aid developers in understanding why the code is as it is, perform a retrospective analysis of prior discussions, and identify expertise within the project. Although recent work has started investigating the discussion topics, we still lack a comprehensive understanding of what these topics are and how design quality is af- fected due to these topics of discussion. In this paper, we take an initial step towards this goal. A qualitative analysis of 3,569 discussions collected from three different channels and a survey of 111 developers shows that: I) seven distinct topics are discussed during design discussions and II) 25.15% of the respondents face difficulties while retrieving these topics. We also build a supervised machine learning classifier with high precision (0.86), recall (0.85), and F1-score (0.85) to identify design discussion topics automatically and investigate the impact of these discussion topics on the design quality of the project