109 research outputs found
Recovering Architectural Variability of a Family of Product Variants
A Software Product Line (SPL) aims at applying a pre-planned systematic reuse
of large-grained software artifacts to increase the software productivity and
reduce the development cost. The idea of SPL is to analyze the business domain
of a family of products to identify the common and the variable parts between
the products. However, it is common for companies to develop, in an ad-hoc
manner (e.g. clone and own), a set of products that share common
functionalities and differ in terms of others. Thus, many recent research
contributions are proposed to re-engineer existing product variants to a SPL.
Nevertheless, these contributions are mostly focused on managing the
variability at the requirement level. Very few contributions address the
variability at the architectural level despite its major importance. Starting
from this observation, we propose, in this paper, an approach to reverse
engineer the architecture of a set of product variants. Our goal is to identify
the variability and dependencies among architectural-element variants at the
architectural level. Our work relies on Formal Concept Analysis (FCA) to
analyze the variability. To validate the proposed approach, we experimented on
two families of open-source product variants; Mobile Media and Health Watcher.
The results show that our approach is able to identify the architectural
variability and the dependencies
Understanding the Use of Inheritance with Visual Patterns
International audienceThe goal of this work is to visualize inheritance in object-oriented programs to help its comprehension. We propose a single, compact view of all class hierarchies at once using a custom Sunburst layout. It enables to quickly discover interesting facts across classes while preserving the essential relationship between parent and children classes. We explain how standard inheritance metrics are mapped into our visualization. Additionally, we define a new metric characterizing similar children classes. Using these metrics and the proposed layout, a set of common visual patterns is derived. These patterns allow the programmer to quickly understand how inheritance is used and provide answers to some essential questions when performing program comprehension tasks. Our approach is evaluated through a case study that involves examples from large programs, demonstrating its scalability
Towards Automatically Extracting UML Class Diagrams from Natural Language Specifications
In model-driven engineering (MDE), UML class diagrams serve as a way to plan
and communicate between developers. However, it is complex and
resource-consuming. We propose an automated approach for the extraction of UML
class diagrams from natural language software specifications. To develop our
approach, we create a dataset of UML class diagrams and their English
specifications with the help of volunteers. Our approach is a pipeline of steps
consisting of the segmentation of the input into sentences, the classification
of the sentences, the generation of UML class diagram fragments from sentences,
and the composition of these fragments into one UML class diagram. We develop a
quantitative testing framework specific to UML class diagram extraction. Our
approach yields low precision and recall but serves as a benchmark for future
research.Comment: 8 pages, 7 tables, 9 figures, 2 algorithms, to be published in MODELS
'22 Companio
Improving the Learning of Code Review Successive Tasks with Cross-Task Knowledge Distillation
Code review is a fundamental process in software development that plays a
pivotal role in ensuring code quality and reducing the likelihood of errors and
bugs. However, code review can be complex, subjective, and time-consuming.
Quality estimation, comment generation, and code refinement constitute the
three key tasks of this process, and their automation has traditionally been
addressed separately in the literature using different approaches. In
particular, recent efforts have focused on fine-tuning pre-trained language
models to aid in code review tasks, with each task being considered in
isolation. We believe that these tasks are interconnected, and their
fine-tuning should consider this interconnection. In this paper, we introduce a
novel deep-learning architecture, named DISCOREV, which employs cross-task
knowledge distillation to address these tasks simultaneously. In our approach,
we utilize a cascade of models to enhance both comment generation and code
refinement models. The fine-tuning of the comment generation model is guided by
the code refinement model, while the fine-tuning of the code refinement model
is guided by the quality estimation model. We implement this guidance using two
strategies: a feedback-based learning objective and an embedding alignment
objective. We evaluate DISCOREV by comparing it to state-of-the-art methods
based on independent training and fine-tuning. Our results show that our
approach generates better review comments, as measured by the BLEU score, as
well as more accurate code refinement according to the CodeBLEU scoreComment: FSE'24. arXiv admin note: substantial text overlap with
arXiv:2309.0336
Towards using Few-Shot Prompt Learning for Automating Model Completion
We propose a simple yet a novel approach to improve completion in domain modeling activities. Our approach exploits the power of large language models by using few-shot prompt learning without the need to train or fine-tune those models with large datasets that are scarce in this field. We implemented our approach and tested it on the completion of static and dynamic domain diagrams. Our initial evaluation shows that such an approach is effective and can be integrated in different ways during the modeling activities
Toward Optimal Psychological Functioning in AI-driven Software Engineering Tasks: The SEWELL-CARE Assessment Framework
In the field of software engineering, there has been a shift towards
utilizing various artificial intelligence techniques to address challenges and
create innovative tools. These solutions are aimed at enhancing efficiency,
automating tasks, and providing valuable support to developers. While the
technical aspects are crucial, the well-being and psychology of the individuals
performing these tasks are often overlooked. This paper argues that a holistic
approach is essential, one that considers the technical, psychological, and
social aspects of software engineering tasks. To address this gap, we introduce
SEWELL-CARE, a conceptual framework designed to assess AI-driven software
engineering tasks from multiple perspectives, with the goal of customizing the
tools to improve the efficiency, well-being, and psychological functioning of
developers. By emphasizing both technical and human dimensions, our framework
provides a nuanced evaluation that goes beyond traditional technical metrics
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
Motivated by recent work on lifelong learning applications for language
models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused
on code changes. Our contribution addresses a notable research gap marked by
the absence of a long-term temporal dimension in existing code change datasets,
limiting their suitability in lifelong learning scenarios. In contrast, our
dataset aims to comprehensively capture code changes across the entire release
history of open-source software repositories. In this work, we introduce an
initial version of CodeLL, comprising 71 machine-learning-based projects mined
from Software Heritage. This dataset enables the extraction and in-depth
analysis of code changes spanning 2,483 releases at both the method and API
levels. CodeLL enables researchers studying the behaviour of LMs in lifelong
fine-tuning settings for learning code changes. Additionally, the dataset can
help studying data distribution shifts within software repositories and the
evolution of API usages over time.Comment: 4+1 page
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models
Large Language Models (LLMs) possess impressive capabilities to generate
meaningful code snippets given natural language intents in zero-shot, i.e.,
without the need for specific fine-tuning. In the perspective of unleashing
their full potential, prior work has demonstrated the benefits of fine-tuning
the models to task-specific data. However, fine-tuning process demands heavy
computational costs and is intractable when resources are scarce, especially
for models with billions of parameters. In light of these challenges, previous
studies explored In-Context Learning (ICL) as an effective strategy to generate
contextually appropriate code without fine-tuning. However, it operates at
inference time and does not involve learning task-specific parameters,
potentially limiting the model's performance on downstream tasks. In this
context, we foresee that Parameter-Efficient Fine-Tuning (PEFT) techniques
carry a high potential for efficiently specializing LLMs to task-specific data.
In this paper, we deliver a comprehensive study of LLMs with the impact of PEFT
techniques under the automated code generation scenario. Our experimental
results reveal the superiority and potential of such techniques over ICL on a
wide range of LLMs in reducing the computational burden and improving
performance. Therefore, the study opens opportunities for broader applications
of PEFT in software engineering scenarios.Comment: 10+2 page
- …