149 research outputs found
Mining unstructured software data
Our thesis is that the analysis of unstructured data supports software understanding and evolution analysis, and complements the data mined from structured sources. To this aim, we implemented the necessary toolset and investigated methods for exploring, exposing, and exploiting unstructured data.To validate our thesis, we focused on development email data. We found two main challenges in using it to support program comprehension and software development: The disconnection between emails and code artifacts and the noisy and mixed-language nature of email content. We tackle these challenges proposing novel approaches. First, we devise lightweight techniques for linking email data to code artifacts. We use these techniques for creating a tool to support program comprehension with email data, and to create a new set of email based metrics to improve existing defect prediction approaches. Subsequently, we devise techniques for giving a structure to the content of email and we use this structure to conduct novel software analyses to support program comprehension. In this dissertation we show that unstructured data, in the form of development emails, is a valuable addition to structured data and, if correctly mined, can be used successfully to support software engineering activities
The effects of change decomposition on code review -- a controlled experiment
Background: Code review is a cognitively demanding and time-consuming
process. Previous qualitative studies hinted at how decomposing change sets
into multiple yet internally coherent ones would improve the reviewing process.
So far, literature provided no quantitative analysis of this hypothesis.
Aims: (1) Quantitatively measure the effects of change decomposition on the
outcome of code review (in terms of number of found defects, wrongly reported
issues, suggested improvements, time, and understanding); (2) Qualitatively
analyze how subjects approach the review and navigate the code, building
knowledge and addressing existing issues, in large vs. decomposed changes.
Method: Controlled experiment using the pull-based development model
involving 28 software developers among professionals and graduate students.
Results: Change decomposition leads to fewer wrongly reported issues,
influences how subjects approach and conduct the review activity (by increasing
context-seeking), yet impacts neither understanding the change rationale nor
the number of found defects.
Conclusions: Change decomposition reduces the noise for subsequent data
analyses but also significantly supports the tasks of the developers in charge
of reviewing the changes. As such, commits belonging to different concepts
should be separated, adopting this as a best practice in software engineering
Team Design Communication Patterns in e-Learning Design and Development
Prescriptive stage models have been found insufficient to describe the dynamic aspects of designing, especially in interdisciplinary e-learning project teams. There is a growing need for a systematic empirical analysis of team design processes that offer deeper and more detailed insights into Instructional Design (ID) than general models can offer. In this paper we detail findings that emerged from two case studies of team design meetings in the development of totally online courses at two well-established European Distance Universities. We applied an activity-based approach to an extended
verbal protocol dataset. This method proved to be adequate to describe the emerging team design process by taking into account both cognitive and social aspects of team activity in this specific context. Our findings provide evidence that design is more than problem solving, mainly because the design process is strongly related to the communication process in a team. Some interesting patterns of designing emerge, which shed light on the still implicit nature of ID performed by teams. We conclude by presenting guidelines and skills for team designing in the complex field of e-learning
O015. Evaluation of the genetic polymorphism of the α3 (CHRNA3) and α5 (CHRNA5) nicotinic receptor subunits, in patients with cluster headache
N/
Software security during modern code review: The developer’s perspective
To avoid software vulnerabilities, organizations are shifting security to earlier stages of the software development, such as at code review time. In this paper, we aim to understand the developers’ perspective on assessing software security during code review, the challenges they encounter, and the support that companies and projects provide. To this end, we conduct a two-step investigation: we interview 10 professional developers and survey 182 practitioners about software security assessment during code review. The outcome is an overview of how developers perceive software security during code review and a set of identified challenges. Our study revealed that most developers do not immediately report to focus on security issues during code review. Only after being asked about software security, developers state to always consider it during review and acknowledge its importance. Most companies do not provide security training, yet expect developers to still ensure security during reviews. Accordingly, developers report the lack of training and security knowledge as the main challenges they face when checking for security issues. In addition, they have challenges with third-party libraries and to identify interactions between parts of code that could have security implications. Moreover, security may be disregarded during reviews due to developers’ assumptions about the security dynamic of the application they develop
An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms
Compliance with the European Union's Platform-to-Business (P2B) Regulation helps fostering a fair, ethical and secure online environment. However, it is challenging for online platforms, and assessing their compliance can be difficult for public authorities. This is partly due to the lack of automated tools for assessing the information (e.g., software documentation) platforms provide concerning ranking transparency. Our study tackles this issue in two ways. First, we empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo), revealing substantial differences in their documentation. Second, we introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. These tools are evaluated against human judgments, showing promising results as reliable proxies for compliance assessments. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3, which seeks to reduce inequality, including business disparities, on these platforms
Interpersonal Conflicts During Code Review
Code review consists of manual inspection, discussion, and judgment of source code by developers other than the code's author. Due to discussions around competing ideas and group decision-making processes, interpersonal conflicts during code reviews are expected. This study systematically investigates how developers perceive code review conflicts and addresses interpersonal conflicts during code reviews as a theoretical construct. Through the thematic analysis of interviews conducted with 22 developers, we confirm that conflicts during code reviews are commonplace, anticipated and seen as normal by developers. Even though conflicts do happen and carry a negative impact for the review, conflicts-if resolved constructively-can also create value and bring improvement. Moreover, the analysis provided insights on how strongly conflicts during code review and its context (i.e., code, developer, team, organization) are intertwined. Finally, there are aspects specific to code review conflicts that call for the research and application of customized conflict resolution and management techniques, some of which are discussed in this paper. Data and material: https://doi.org/10.5281/zenodo.584879
Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent Textbooks and Documentation
Traditional explanatory resources, such as user manuals and textbooks, often contain content that may not cater to the diverse backgrounds and information needs of users. Yet, developing intuitive, user-centered methods to effectively explain complex or large amounts of information is still an open research challenge. In this paper we present ExplanatoryGPT, an approach we devised and implemented to transform textual documents into interactive, intelligent resources, capable of offering dynamic, personalized explanations. Our approach uses state-of-the-art question-answering technology to generate on-demand, expandable explanations, with the aim of allowing readers to efficiently navigate and comprehend static materials. ExplanatoryGPT integrates ChatGPT, a state-of-the-art language model, with Achinstein’s philosophical theory of explanations. By combining question generation and answer retrieval algorithms with ChatGPT, our method generates interactive, user-centered explanations, while mitigating common issues associated with ChatGPT, such as hallucinations and memory shortcomings. To showcase the effectiveness of our Explanatory AI, we conducted tests using a variety of sources, including a legal textbook and documentation of some health and financial software. Specifically, we provide several examples that illustrate how ExplanatoryGPT excels over ChatGPT in generating more precise explanations, accomplished through thoughtful macro-planning of explanation content. Notably, our approach also avoids the need to provide the entire context of the explanation as a prompt to ChatGPT, a process that is often not feasible due to common memory constraints
An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms:46th International Conference on Software Engineering: Software Engineering in Society
Compliance with the European Union's Platform-to-Business (P2B) Regulation helps fostering a fair, ethical and secure online environment. However, it is challenging for online platforms, and assessing their compliance can be difficult for public authorities. This is partly due to the lack of automated tools for assessing the information (e.g., software documentation) platforms provide concerning ranking transparency. Our study tackles this issue in two ways. First, we empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo), revealing substantial differences in their documentation. Second, we introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. These tools are evaluated against human judgments, showing promising results as reliable proxies for compliance assessments. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3, which seeks to reduce inequality, including business disparities, on these platforms. Data and materials: https://doi.org/10.5281/zenodo.10478546.</p
On Refining the SZZ Algorithm with Bug Discussion Data
Context: Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the SZZ algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits. Objectives: Despite several improvements and variants, SZZ struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka tangled commit and ghost commits). Methods: Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the SZZ algorithm. Results: To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset, RoTEB, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for SZZ. After we found evidence that the mentioned files are relevant, we augment SZZ with this information, using different strategies, and evaluate the resulting approach against multiple SZZ variations. Conclusion: We define a taxonomy outlining the rationale behind developers’ references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the SZZ algorithm’s efficacy. Then, we verify that integrating these file references augments the precision of SZZ in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for SZZ. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material: https://zenodo.org/records/11484723. © The Author(s) 2024
- …
