80 research outputs found
WELL: Applying Bug Detectors to Bug Localization via Weakly Supervised Learning
Bug localization is a key software development task, where a developer
locates the portion of the source code that must be modified based on the bug
report. It is label-intensive and time-consuming due to the increasing size and
complexity of the modern software. Effectively automating this task can greatly
reduce costs by cutting down the developers' effort. Researchers have already
made efforts to harness the great powerfulness of deep learning (DL) to
automate bug localization. However, training DL models demands a large quantity
of annotated training data, while the buggy-location-annotated dataset with
reasonable quality and quantity is difficult to collect. This becomes an
obstacle to the effective usage of DL for bug localization. We notice that the
data pairs for bug detection, which provide weak buggy-or-not binary
classification supervision, are much easier to obtain. Inspired by weakly
supervised learning, this paper proposes WEakly supervised bug LocaLization
(WELL), an approach to transform bug detectors to bug locators. Through the
CodeBERT model finetuned by bug detection, WELL is capable to locate bugs in a
weakly supervised manner based on the attention. The evaluations on three
datasets of WELL show competitive performance with the existing strongly
supervised DL solutions. WELL even outperforms current SOTA models in tasks of
variable misuse and binary operator misuse.Comment: (Preprint) Software Engineer; Deep Learning; Bug Detection &
Localizatio
Given 2n eyeballs, all quality flaws are shallow
We demonstrate the capabilities of the Microservice Artefact Observatory (MAO), a federated software quality assessment middleware. MAO’s extensible assessment tools continuously scan for quality flaws, defects and inconsistencies in microservice artefacts and observe runtime behaviour. The federation reduces bias and also increases the resilience and overcomes per-site failures, leading to a single, merged timeline of software quality. Already serving concurrently by n = 3 observant operators in Argentina and Switzerland, the federation is designed to become a community-wide consensus voting-based ground truth repository with query interfaces for large-scale software quality and evolution insights. These insights can be exploited for excluding buggy software before or after deployment, for optimised resource allocation, and further software management tasks
Opinion Mining for Software Development: A Systematic Literature Review
Opinion mining, sometimes referred to as sentiment analysis, has gained increasing attention in software engineering (SE) studies.
SE researchers have applied opinion mining techniques in various contexts, such as identifying developers’ emotions expressed in
code comments and extracting users’ critics toward mobile apps. Given the large amount of relevant studies available, it can take
considerable time for researchers and developers to figure out which approaches they can adopt in their own studies and what perils
these approaches entail.
We conducted a systematic literature review involving 185 papers. More specifically, we present 1) well-defined categories of opinion
mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in
other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4)
concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques.
The results of our study serve as references to choose suitable opinion mining tools for software development activities, and provide
critical insights for the further development of opinion mining techniques in the SE domain
Antipatterns in Software Classification Taxonomies
Empirical results in software engineering have long started to show that
findings are unlikely to be applicable to all software systems, or any domain:
results need to be evaluated in specified contexts, and limited to the type of
systems that they were extracted from. This is a known issue, and requires the
establishment of a classification of software types.
This paper makes two contributions: the first is to evaluate the quality of
the current software classifications landscape. The second is to perform a case
study showing how to create a classification of software types using a curated
set of software systems.
Our contributions show that existing, and very likely even new,
classification attempts are deemed to fail for one or more issues, that we
named as the `antipatterns' of software classification tasks. We collected 7 of
these antipatterns that emerge from both our case study, and the existing
classifications.
These antipatterns represent recurring issues in a classification, so we
discuss practical ways to help researchers avoid these pitfalls. It becomes
clear that classification attempts must also face the daunting task of
formulating a taxonomy of software types, with the objective of establishing a
hierarchy of categories in a classification.Comment: Accepted for publish at the Journal of Systems and Softwar
Characterizing and Detecting Duplicate Logging Code Smells
Developers rely on software logs for a wide variety of tasks, such as debugging, testing, program comprehension, verification, and performance analysis. Despite the importance of logs, prior studies show that there is no industrial standard on how to write logging statements. Recent research on logs often only considers the appropriateness of a log as an individual item (e.g., one single logging statement); while logs are typically analyzed in tandem. In this thesis, we focus on studying duplicate logging statements, which are logging statements that have the same static text message. Such duplications in the text message are potential indications of logging code smells, which may affect developers’ understanding of the dynamic view of the system. We manually studied over 3K duplicate logging statements and their surrounding code in four large-scale open source systems: Hadoop, CloudStack, ElasticSearch, and Cassandra. We uncovered five patterns of duplicate logging code smells. For each instance of the code smell, we further manually identify the problematic (i.e., require fixes) and justifiable (i.e., do not require fixes) cases. Then, we contact developers in order to verify our manual study result. We integrated our manual study result and developers’ feedback into our automated static analysis tool, DLFinder, which automatically detects problematic duplicate logging code smells. We evaluated DLFinder on the four manually studied systems and four additional systems: Kafka, Flink, Camel and Wicket. In total, combining the results of DLFinder and our manual analysis, we reported 91 problematic code smell instances to developers and all of them have been fixed. This thesis provides an initial step on creating a logging guideline for developers to improve the quality of logging code. DLFinder is also able to detect duplicate logging code smells with high precision and recall
Antipatterns in software classification taxonomies
Empirical results in software engineering have long started to show that findings are unlikely to be applicable to all software systems, or any domain: results need to be evaluated in specified contexts, and limited to the type of systems that they were extracted from. This is a known issue, and requires the establishment of a classification of software types. This paper makes two contributions: the first is to evaluate the quality of the current software classifications landscape. The second is to perform a case study showing how to create a classification of software types using a curated set of software systems. Our contributions show that existing, and very likely even new, classification attempts are deemed to fail for one or more issues, that we named as the ‘antipatterns’ of software classification tasks. We collected 7 of these antipatterns that emerge from both our case study, and the existing classifications. These antipatterns represent recurring issues in a classification, so we discuss practical ways to help researchers avoid these pitfalls. It becomes clear that classification attempts must also face the daunting task of formulating a taxonomy of software types, with the objective of establishing a hierarchy of categories in a classification
Bots in software engineering: a systematic mapping study
Bots have emerged from research prototypes to deployable systems due to the recent developments in machine learning, natural language processing and understanding techniques. In software engineering, bots range from simple automated scripts to decision-making autonomous systems. The spectrum of applications of bots in software engineering is so wide and diverse, that a comprehensive overview and categorization of such bots is needed. Existing works considered selective bots to be analyzed and failed to provide the overall picture. Hence it is significant to categorize bots in software engineering through analyzing why, what and how the bots are applied in software engineering. We approach the problem with a systematic mapping study based on the research articles published in this topic. This study focuses on classification of bots used in software engineering, the various dimensions of the characteristics, the more frequently researched area, potential research spaces to be explored and the perception of bots in the developer community. This study aims to provide an introduction and a broad overview of bots used in software engineering. Discussions of the feedback and results from several studies provide interesting insights and prospective future directions
Speculative Analysis for Quality Assessment of Code Comments
Previous studies have shown that high-quality code comments assist developers
in program comprehension and maintenance tasks. However, the semi-structured
nature of comments, unclear conventions for writing good comments, and the lack
of quality assessment tools for all aspects of comments make their evaluation
and maintenance a non-trivial problem. To achieve high-quality comments, we
need a deeper understanding of code comment characteristics and the practices
developers follow. In this thesis, we approach the problem of assessing comment
quality from three different perspectives: what developers ask about commenting
practices, what they write in comments, and how researchers support them in
assessing comment quality.
Our preliminary findings show that developers embed various kinds of
information in class comments across programming languages. Still, they face
problems in locating relevant guidelines to write consistent and informative
comments, verifying the adherence of their comments to the guidelines, and
evaluating the overall state of comment quality. To help developers and
researchers in building comment quality assessment tools, we provide: (i) an
empirically validated taxonomy of comment convention-related questions from
various community forums, (ii) an empirically validated taxonomy of comment
information types from various programming languages, (iii) a
language-independent approach to automatically identify the information types,
and (iv) a comment quality taxonomy prepared from a systematic literature
review.Comment: 5 pages, 1 figure, conferenc
Can NMT Understand Me? Towards Perturbation-based Evaluation of NMT Models for Code Generation
Neural Machine Translation (NMT) has reached a level of maturity to be
recognized as the premier method for the translation between different
languages and aroused interest in different research areas, including software
engineering. A key step to validate the robustness of the NMT models consists
in evaluating the performance of the models on adversarial inputs, i.e., inputs
obtained from the original ones by adding small amounts of perturbation.
However, when dealing with the specific task of the code generation (i.e., the
generation of code starting from a description in natural language), it has not
yet been defined an approach to validate the robustness of the NMT models. In
this work, we address the problem by identifying a set of perturbations and
metrics tailored for the robustness assessment of such models. We present a
preliminary experimental evaluation, showing what type of perturbations affect
the model the most and deriving useful insights for future directions.Comment: Paper accepted for publication in the proceedings of The 1st Intl.
Workshop on Natural Language-based Software Engineering (NLBSE) to be held
with ICSE 202
ANTIVIRUS PERFORMANCE EVALUATION AGAINST POWERSHELL OBFUSCATED MALWARE
In recent years, malware attacks have become increasingly sophisticated, and the methods used by attackers to evade Windows defenses have grown more complex. As a result, detecting and defending against these attacks has become an ever more pressing challenge for security professionals. Despite significant efforts to improve Windows security, attackers continue to find new ways to bypass these defenses and infiltrate systems. The techniques covered in this paper are all currently active and effective at evading Windows defenses. Our findings underscore the need for continued vigilance and the importance of staying up to date with the latest threats and countermeasures
- …