149 research outputs found

    Mining unstructured software data

    Get PDF
    Our thesis is that the analysis of unstructured data supports software understanding and evolution analysis, and complements the data mined from structured sources. To this aim, we implemented the necessary toolset and investigated methods for exploring, exposing, and exploiting unstructured data.To validate our thesis, we focused on development email data. We found two main challenges in using it to support program comprehension and software development: The disconnection between emails and code artifacts and the noisy and mixed-language nature of email content. We tackle these challenges proposing novel approaches. First, we devise lightweight techniques for linking email data to code artifacts. We use these techniques for creating a tool to support program comprehension with email data, and to create a new set of email based metrics to improve existing defect prediction approaches. Subsequently, we devise techniques for giving a structure to the content of email and we use this structure to conduct novel software analyses to support program comprehension. In this dissertation we show that unstructured data, in the form of development emails, is a valuable addition to structured data and, if correctly mined, can be used successfully to support software engineering activities

    The effects of change decomposition on code review -- a controlled experiment

    Get PDF
    Background: Code review is a cognitively demanding and time-consuming process. Previous qualitative studies hinted at how decomposing change sets into multiple yet internally coherent ones would improve the reviewing process. So far, literature provided no quantitative analysis of this hypothesis. Aims: (1) Quantitatively measure the effects of change decomposition on the outcome of code review (in terms of number of found defects, wrongly reported issues, suggested improvements, time, and understanding); (2) Qualitatively analyze how subjects approach the review and navigate the code, building knowledge and addressing existing issues, in large vs. decomposed changes. Method: Controlled experiment using the pull-based development model involving 28 software developers among professionals and graduate students. Results: Change decomposition leads to fewer wrongly reported issues, influences how subjects approach and conduct the review activity (by increasing context-seeking), yet impacts neither understanding the change rationale nor the number of found defects. Conclusions: Change decomposition reduces the noise for subsequent data analyses but also significantly supports the tasks of the developers in charge of reviewing the changes. As such, commits belonging to different concepts should be separated, adopting this as a best practice in software engineering

    Team Design Communication Patterns in e-Learning Design and Development

    Get PDF
    Prescriptive stage models have been found insufficient to describe the dynamic aspects of designing, especially in interdisciplinary e-learning project teams. There is a growing need for a systematic empirical analysis of team design processes that offer deeper and more detailed insights into Instructional Design (ID) than general models can offer. In this paper we detail findings that emerged from two case studies of team design meetings in the development of totally online courses at two well-established European Distance Universities. We applied an activity-based approach to an extended verbal protocol dataset. This method proved to be adequate to describe the emerging team design process by taking into account both cognitive and social aspects of team activity in this specific context. Our findings provide evidence that design is more than problem solving, mainly because the design process is strongly related to the communication process in a team. Some interesting patterns of designing emerge, which shed light on the still implicit nature of ID performed by teams. We conclude by presenting guidelines and skills for team designing in the complex field of e-learning

    Software security during modern code review: The developer’s perspective

    Full text link
    To avoid software vulnerabilities, organizations are shifting security to earlier stages of the software development, such as at code review time. In this paper, we aim to understand the developers’ perspective on assessing software security during code review, the challenges they encounter, and the support that companies and projects provide. To this end, we conduct a two-step investigation: we interview 10 professional developers and survey 182 practitioners about software security assessment during code review. The outcome is an overview of how developers perceive software security during code review and a set of identified challenges. Our study revealed that most developers do not immediately report to focus on security issues during code review. Only after being asked about software security, developers state to always consider it during review and acknowledge its importance. Most companies do not provide security training, yet expect developers to still ensure security during reviews. Accordingly, developers report the lack of training and security knowledge as the main challenges they face when checking for security issues. In addition, they have challenges with third-party libraries and to identify interactions between parts of code that could have security implications. Moreover, security may be disregarded during reviews due to developers’ assumptions about the security dynamic of the application they develop

    An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms

    Get PDF
    Compliance with the European Union's Platform-to-Business (P2B) Regulation helps fostering a fair, ethical and secure online environment. However, it is challenging for online platforms, and assessing their compliance can be difficult for public authorities. This is partly due to the lack of automated tools for assessing the information (e.g., software documentation) platforms provide concerning ranking transparency. Our study tackles this issue in two ways. First, we empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo), revealing substantial differences in their documentation. Second, we introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. These tools are evaluated against human judgments, showing promising results as reliable proxies for compliance assessments. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3, which seeks to reduce inequality, including business disparities, on these platforms

    Interpersonal Conflicts During Code Review

    Full text link
    Code review consists of manual inspection, discussion, and judgment of source code by developers other than the code's author. Due to discussions around competing ideas and group decision-making processes, interpersonal conflicts during code reviews are expected. This study systematically investigates how developers perceive code review conflicts and addresses interpersonal conflicts during code reviews as a theoretical construct. Through the thematic analysis of interviews conducted with 22 developers, we confirm that conflicts during code reviews are commonplace, anticipated and seen as normal by developers. Even though conflicts do happen and carry a negative impact for the review, conflicts-if resolved constructively-can also create value and bring improvement. Moreover, the analysis provided insights on how strongly conflicts during code review and its context (i.e., code, developer, team, organization) are intertwined. Finally, there are aspects specific to code review conflicts that call for the research and application of customized conflict resolution and management techniques, some of which are discussed in this paper. Data and material: https://doi.org/10.5281/zenodo.584879

    Toward Eliminating Hallucinations: GPT-based Explanatory AI for Intelligent Textbooks and Documentation

    Get PDF
    Traditional explanatory resources, such as user manuals and textbooks, often contain content that may not cater to the diverse backgrounds and information needs of users. Yet, developing intuitive, user-centered methods to effectively explain complex or large amounts of information is still an open research challenge. In this paper we present ExplanatoryGPT, an approach we devised and implemented to transform textual documents into interactive, intelligent resources, capable of offering dynamic, personalized explanations. Our approach uses state-of-the-art question-answering technology to generate on-demand, expandable explanations, with the aim of allowing readers to efficiently navigate and comprehend static materials. ExplanatoryGPT integrates ChatGPT, a state-of-the-art language model, with Achinstein’s philosophical theory of explanations. By combining question generation and answer retrieval algorithms with ChatGPT, our method generates interactive, user-centered explanations, while mitigating common issues associated with ChatGPT, such as hallucinations and memory shortcomings. To showcase the effectiveness of our Explanatory AI, we conducted tests using a variety of sources, including a legal textbook and documentation of some health and financial software. Specifically, we provide several examples that illustrate how ExplanatoryGPT excels over ChatGPT in generating more precise explanations, accomplished through thoughtful macro-planning of explanation content. Notably, our approach also avoids the need to provide the entire context of the explanation as a prompt to ChatGPT, a process that is often not feasible due to common memory constraints

    An Empirical Study on Compliance with Ranking Transparency in the Software Documentation of EU Online Platforms:46th International Conference on Software Engineering: Software Engineering in Society

    Get PDF
    Compliance with the European Union's Platform-to-Business (P2B) Regulation helps fostering a fair, ethical and secure online environment. However, it is challenging for online platforms, and assessing their compliance can be difficult for public authorities. This is partly due to the lack of automated tools for assessing the information (e.g., software documentation) platforms provide concerning ranking transparency. Our study tackles this issue in two ways. First, we empirically evaluate the compliance of six major platforms (Amazon, Bing, Booking, Google, Tripadvisor, and Yahoo), revealing substantial differences in their documentation. Second, we introduce and test automated compliance assessment tools based on ChatGPT and information retrieval technology. These tools are evaluated against human judgments, showing promising results as reliable proxies for compliance assessments. Our findings could help enhance regulatory compliance and align with the United Nations Sustainable Development Goal 10.3, which seeks to reduce inequality, including business disparities, on these platforms. Data and materials: https://doi.org/10.5281/zenodo.10478546.</p

    On Refining the SZZ Algorithm with Bug Discussion Data

    Get PDF
    Context: Researchers testing hypotheses related to factors leading to low-quality software often rely on historical data, specifically on details regarding when defects were introduced into a codebase of interest. The prevailing techniques to determine the introduction of defects revolve around variants of the SZZ algorithm. This algorithm leverages information on the lines modified during a bug-fixing commit and finds when these lines were last modified, thereby identifying bug-introducing commits. Objectives: Despite several improvements and variants, SZZ struggles with accuracy, especially in cases of unrelated modifications or that touch files not involved in the introduction of the bug in the version control systems (aka tangled commit and ghost commits). Methods: Our research investigates whether and how incorporating content retrieved from bug discussions can address these issues by identifying the related and external files and thus improve the efficacy of the SZZ algorithm. Results: To conduct our investigation, we take advantage of the links manually inserted by Mozilla developers in bug reports to signal which commits inserted bugs. Thus, we prepared the dataset, RoTEB, comprised of 12,472 bug reports. We first manually inspect a sample of 369 bug reports related to these bug-fixing or bug-introducing commits and investigate whether the files mentioned in these reports could be useful for SZZ. After we found evidence that the mentioned files are relevant, we augment SZZ with this information, using different strategies, and evaluate the resulting approach against multiple SZZ variations. Conclusion: We define a taxonomy outlining the rationale behind developers’ references to diverse files in their discussions. We observe that bug discussions often mention files relevant to enhancing the SZZ algorithm’s efficacy. Then, we verify that integrating these file references augments the precision of SZZ in pinpointing bug-introducing commits. Yet, it does not markedly influence recall. These results deepen our comprehension of the usefulness of bug discussions for SZZ. Future work can leverage our dataset and explore other techniques to further address the problem of tangled commits and ghost commits. Data & material: https://zenodo.org/records/11484723. © The Author(s) 2024
    corecore