109 research outputs found

    Recovering Architectural Variability of a Family of Product Variants

    Full text link
    A Software Product Line (SPL) aims at applying a pre-planned systematic reuse of large-grained software artifacts to increase the software productivity and reduce the development cost. The idea of SPL is to analyze the business domain of a family of products to identify the common and the variable parts between the products. However, it is common for companies to develop, in an ad-hoc manner (e.g. clone and own), a set of products that share common functionalities and differ in terms of others. Thus, many recent research contributions are proposed to re-engineer existing product variants to a SPL. Nevertheless, these contributions are mostly focused on managing the variability at the requirement level. Very few contributions address the variability at the architectural level despite its major importance. Starting from this observation, we propose, in this paper, an approach to reverse engineer the architecture of a set of product variants. Our goal is to identify the variability and dependencies among architectural-element variants at the architectural level. Our work relies on Formal Concept Analysis (FCA) to analyze the variability. To validate the proposed approach, we experimented on two families of open-source product variants; Mobile Media and Health Watcher. The results show that our approach is able to identify the architectural variability and the dependencies

    Understanding the Use of Inheritance with Visual Patterns

    Get PDF
    International audienceThe goal of this work is to visualize inheritance in object-oriented programs to help its comprehension. We propose a single, compact view of all class hierarchies at once using a custom Sunburst layout. It enables to quickly discover interesting facts across classes while preserving the essential relationship between parent and children classes. We explain how standard inheritance metrics are mapped into our visualization. Additionally, we define a new metric characterizing similar children classes. Using these metrics and the proposed layout, a set of common visual patterns is derived. These patterns allow the programmer to quickly understand how inheritance is used and provide answers to some essential questions when performing program comprehension tasks. Our approach is evaluated through a case study that involves examples from large programs, demonstrating its scalability

    Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications

    Full text link
    In model-driven engineering (MDE), UML class diagrams serve as a way to plan and communicate between developers. However, it is complex and resource-consuming. We propose an automated approach for the extraction of UML class diagrams from natural language software specifications. To develop our approach, we create a dataset of UML class diagrams and their English specifications with the help of volunteers. Our approach is a pipeline of steps consisting of the segmentation of the input into sentences, the classification of the sentences, the generation of UML class diagram fragments from sentences, and the composition of these fragments into one UML class diagram. We develop a quantitative testing framework specific to UML class diagram extraction. Our approach yields low precision and recall but serves as a benchmark for future research.Comment: 8 pages, 7 tables, 9 figures, 2 algorithms, to be published in MODELS '22 Companio

    Improving the Learning of Code Review Successive Tasks with Cross-Task Knowledge Distillation

    Full text link
    Code review is a fundamental process in software development that plays a pivotal role in ensuring code quality and reducing the likelihood of errors and bugs. However, code review can be complex, subjective, and time-consuming. Quality estimation, comment generation, and code refinement constitute the three key tasks of this process, and their automation has traditionally been addressed separately in the literature using different approaches. In particular, recent efforts have focused on fine-tuning pre-trained language models to aid in code review tasks, with each task being considered in isolation. We believe that these tasks are interconnected, and their fine-tuning should consider this interconnection. In this paper, we introduce a novel deep-learning architecture, named DISCOREV, which employs cross-task knowledge distillation to address these tasks simultaneously. In our approach, we utilize a cascade of models to enhance both comment generation and code refinement models. The fine-tuning of the comment generation model is guided by the code refinement model, while the fine-tuning of the code refinement model is guided by the quality estimation model. We implement this guidance using two strategies: a feedback-based learning objective and an embedding alignment objective. We evaluate DISCOREV by comparing it to state-of-the-art methods based on independent training and fine-tuning. Our results show that our approach generates better review comments, as measured by the BLEU score, as well as more accurate code refinement according to the CodeBLEU scoreComment: FSE'24. arXiv admin note: substantial text overlap with arXiv:2309.0336

    Towards using Few-Shot Prompt Learning for Automating Model Completion

    Get PDF
    We propose a simple yet a novel approach to improve completion in domain modeling activities. Our approach exploits the power of large language models by using few-shot prompt learning without the need to train or fine-tune those models with large datasets that are scarce in this field. We implemented our approach and tested it on the completion of static and dynamic domain diagrams. Our initial evaluation shows that such an approach is effective and can be integrated in different ways during the modeling activities

    Toward Optimal Psychological Functioning in AI-driven Software Engineering Tasks: The SEWELL-CARE Assessment Framework

    Full text link
    In the field of software engineering, there has been a shift towards utilizing various artificial intelligence techniques to address challenges and create innovative tools. These solutions are aimed at enhancing efficiency, automating tasks, and providing valuable support to developers. While the technical aspects are crucial, the well-being and psychology of the individuals performing these tasks are often overlooked. This paper argues that a holistic approach is essential, one that considers the technical, psychological, and social aspects of software engineering tasks. To address this gap, we introduce SEWELL-CARE, a conceptual framework designed to assess AI-driven software engineering tasks from multiple perspectives, with the goal of customizing the tools to improve the efficiency, well-being, and psychological functioning of developers. By emphasizing both technical and human dimensions, our framework provides a nuanced evaluation that goes beyond traditional technical metrics

    CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code

    Full text link
    Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence of a long-term temporal dimension in existing code change datasets, limiting their suitability in lifelong learning scenarios. In contrast, our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. In this work, we introduce an initial version of CodeLL, comprising 71 machine-learning-based projects mined from Software Heritage. This dataset enables the extraction and in-depth analysis of code changes spanning 2,483 releases at both the method and API levels. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes. Additionally, the dataset can help studying data distribution shifts within software repositories and the evolution of API usages over time.Comment: 4+1 page

    Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models

    Full text link
    Large Language Models (LLMs) possess impressive capabilities to generate meaningful code snippets given natural language intents in zero-shot, i.e., without the need for specific fine-tuning. In the perspective of unleashing their full potential, prior work has demonstrated the benefits of fine-tuning the models to task-specific data. However, fine-tuning process demands heavy computational costs and is intractable when resources are scarce, especially for models with billions of parameters. In light of these challenges, previous studies explored In-Context Learning (ICL) as an effective strategy to generate contextually appropriate code without fine-tuning. However, it operates at inference time and does not involve learning task-specific parameters, potentially limiting the model's performance on downstream tasks. In this context, we foresee that Parameter-Efficient Fine-Tuning (PEFT) techniques carry a high potential for efficiently specializing LLMs to task-specific data. In this paper, we deliver a comprehensive study of LLMs with the impact of PEFT techniques under the automated code generation scenario. Our experimental results reveal the superiority and potential of such techniques over ICL on a wide range of LLMs in reducing the computational burden and improving performance. Therefore, the study opens opportunities for broader applications of PEFT in software engineering scenarios.Comment: 10+2 page
    • …
    corecore