31 research outputs found
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation
Neural Machine Translation (NMT) has reached a level of maturity to be
recognized as the premier method for the translation between different
languages and aroused interest in different research areas, including software
engineering. A key step to validate the robustness of the NMT models consists
in evaluating the performance of the models on adversarial inputs, i.e., inputs
obtained from the original ones by adding small amounts of perturbation.
However, when dealing with the specific task of the code generation (i.e., the
generation of code starting from a description in natural language), it has not
yet been defined an approach to validate the robustness of the NMT models. In
this work, we address the problem by identifying a set of perturbations and
metrics tailored for the robustness assessment of such models. We present a
preliminary experimental evaluation, showing what type of perturbations affect
the model the most and deriving useful insights for future directions.Comment: Paper accepted for publication in the proceedings of The 1st Intl.
Workshop on Natural Language-based Software Engineering (NLBSE) to be held
with ICSE 202
NLBSE’22 tool competition
We report on the organization and results of the first edition of the Tool Competition from the International Workshop on Natural Language-based Software Engineering (NLBSE’22). This year, five teams submitted multiple classification models to automatically classify issue reports as bugs, enhancements, or questions. Most of them are based on BERT (Bidirectional Encoder Representations from Transformers) and were fine-tuned and evaluated on a benchmark dataset of 800k issue reports. The goal of the competition was to improve the classification performance of a baseline model based on fastText. This report provides details of the competition, including its rules, the teams and contestant models, and the ranking of models based on their average classification performance across the issue types
Issue Report Validation in an Industrial Context
Effective issue triaging is crucial for software development teams to improve
software quality, and thus customer satisfaction. Validating issue reports
manually can be time-consuming, hindering the overall efficiency of the
triaging process. This paper presents an approach on automating the validation
of issue reports to accelerate the issue triaging process in an industrial
set-up. We work on 1,200 randomly selected issue reports in banking domain,
written in Turkish, an agglutinative language, meaning that new words can be
formed with linear concatenation of suffixes to express entire sentences. We
manually label these reports for validity, and extract the relevant patterns
indicating that they are invalid. Since the issue reports we work on are
written in an agglutinative language, we use morphological analysis to extract
the features. Using the proposed feature extractors, we utilize a machine
learning based approach to predict the issue reports' validity, performing a
0.77 F1-score.Comment: Accepted for publication in Proceedings of the 31st ACM Joint
European Software Engineering Conference and Symposium on the Foundations of
Software Engineering (ESEC/FSE'23
On the Evaluation of NLP-based Models for Software Engineering
NLP-based models have been increasingly incorporated to address SE problems.
These models are either employed in the SE domain with little to no change, or
they are greatly tailored to source code and its unique characteristics. Many
of these approaches are considered to be outperforming or complementing
existing solutions. However, an important question arises here: "Are these
models evaluated fairly and consistently in the SE community?". To answer this
question, we reviewed how NLP-based models for SE problems are being evaluated
by researchers. The findings indicate that currently there is no consistent and
widely-accepted protocol for the evaluation of these models. While different
aspects of the same task are being assessed in different studies, metrics are
defined based on custom choices, rather than a system, and finally, answers are
collected and interpreted case by case. Consequently, there is a dire need to
provide a methodological way of evaluating NLP-based models to have a
consistent assessment and preserve the possibility of fair and efficient
comparison.Comment: To appear in the Proceedings of the 1sth International Workshop on
Natural Language-based Software Engineering (NLBSE), co-located with ICSE,
202
Dynamic Decentralization Domains for the Internet of Things
The Internet of Things (IoT) and edge computing are fostering a future of ecosystems hosting complex decentralized computations that are deeply integrated with our very dynamic environments. Digitalized buildings, communities of people, and cities will be the next-generation "hardware and platform,"counting myriads of interconnected devices, on top of which intrinsically distributed computational processes will run and self-organize. They will spontaneously spawn, diffuse to pertinent logical/physical regions, cooperate and compete, opportunistically summon required resources, collect and analyze data, compute results, trigger distributed actions, and eventually decay. What would a programming model for such ecosystems look like? Based on research findings on self-adaptive/self-organizing systems, this article proposes design abstractions based on "dynamic decentralization domains": regions of space opportunistically formed to support situated recognition and action. We embody the approach into a Scala application program interface (API) enacting distributed execution and show its applicability in a case study of environmental monitoring
CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers
Users use Issue Tracking Systems to keep track and manage issue reports in
their repositories. An issue is a rich source of software information that
contains different reports including a problem, a request for new features, or
merely a question about the software product. As the number of these issues
increases, it becomes harder to manage them manually. Thus, automatic
approaches are proposed to help facilitate the management of issue reports.
This paper describes CatIss, an automatic CATegorizer of ISSue reports which
is built upon the Transformer-based pre-trained RoBERTa model. CatIss
classifies issue reports into three main categories of Bug reports,
Enhancement/feature requests, and Questions. First, the datasets provided for
the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained
RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on
about 80 thousand issue reports from GitHub, indicates that it performs very
well surpassing the competition baseline, TicketTagger, and achieving 87.2%
F1-score (micro average). Additionally, as CatIss is trained on a wide set of
repositories, it is a generic prediction model, hence applicable for any unseen
software project or projects with little historical data. Scripts for cleaning
the datasets, training CatIss, and evaluating the model are publicly available.Comment: To appear in the Proceedings of the 1sth International Workshop on
Natural Language-based Software Engineering (NLBSE), co-located with ICSE,
202
Vary: An IDE for Designing Algorithms and Measuring Quality
Pseudocode is one of the recommended methods for teaching students to design algorithms. Having a tool that performs the automatic translation of an algorithm into pseudocode to a programming language would allow the student to understand the complete process of program development. In addition, the introduction of quality measurement of algorithms designed from the first steps of learning programming would enable the student to understand the importance of code quality for maintenance of software processes. This work describes Vary, an integrated development environment based on Eclipse for writing and running pseudocode algorithms. The environment automatically transforms abstract pseudocode into runnable C/C++ source code that can be later executed. Computer programming learners and even computational scientists can use Vary to write and run algorithms, while taking advantage of modern development environment features. Vary is provided with an additional extension to automatically carry out algorithm analysis with SonarQube
Breaking the Silence: the Threats of Using LLMs in Software Engineering
Large Language Models (LLMs) have gained considerable traction within the
Software Engineering (SE) community, impacting various SE tasks from code
completion to test generation, from program repair to code summarization.
Despite their promise, researchers must still be careful as numerous intricate
factors can influence the outcomes of experiments involving LLMs. This paper
initiates an open discussion on potential threats to the validity of LLM-based
research including issues such as closed-source models, possible data leakage
between LLM training data and research evaluation, and the reproducibility of
LLM-based findings. In response, this paper proposes a set of guidelines
tailored for SE researchers and Language Model (LM) providers to mitigate these
concerns. The implications of the guidelines are illustrated using existing
good practices followed by LLM providers and a practical example for SE
researchers in the context of test case generation.Comment: Accepted at the ICSE'24 conference, NIER trac
Understanding issues related to personal data and data protection in open source projects on GitHub
Context: Data protection regulations such as the GDPR and the CCPA affect how
software may handle the personal data of its users and how consent for handling
of such data may be given. Prior literature focused on how this works in
operation, but lacks a perspective of the impact on the software development
process.
Objective: Within our work, we will address this gap and explore how software
development itself is impacted. We want to understand which data
protection-related issues are reported, who reports them, and how developers
react to such issues.
Method: We will conduct an exploratory study based on issues that are
reported with respect to data protection in open source software on GitHub. We
will determine the roles of the actors involved, the status of such issues, and
we use inductive coding to understand the data protection issues. We
qualitatively analyze the issues as part of the inductive coding and further
explore the reasoning for resolutions. We quantitatively analyze the relation
between the roles, resolutions, and data protection issues to understand
correlations.Comment: Registered Report with Continuity Acceptance (CA) for submission to
Empirical Software Engineering granted by RR-Committee of the MSR'2