396 research outputs found
DeepSoft: A vision for a deep model of software
Although software analytics has experienced rapid growth as a research area,
it has not yet reached its full potential for wide industrial adoption. Most of
the existing work in software analytics still relies heavily on costly manual
feature engineering processes, and they mainly address the traditional
classification problems, as opposed to predicting future events. We present a
vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling
software and its development process to predict future risks and recommend
interventions. DeepSoft, partly inspired by human memory, is built upon the
powerful deep learning-based Long Short Term Memory architecture that is
capable of learning long-term temporal dependencies that occur in software
evolution. Such deep learned patterns of software can be used to address a
range of challenging problems such as code and task recommendation and
prediction. DeepSoft provides a new approach for research into modeling of
source code, risk prediction and mitigation, developer modeling, and
automatically generating code patches from bug reports.Comment: FSE 201
Owl Eyes: Spotting UI Display Issues via Visual Understanding
Graphical User Interface (GUI) provides a visual bridge between a software
application and end users, through which they can interact with each other.
With the development of technology and aesthetics, the visual effects of the
GUI are more and more attracting. However, such GUI complexity posts a great
challenge to the GUI implementation. According to our pilot study of
crowdtesting bug reports, display issues such as text overlap, blurred screen,
missing image always occur during GUI rendering on different devices due to the
software or hardware compatibility. They negatively influence the app
usability, resulting in poor user experience. To detect these issues, we
propose a novel approach, OwlEye, based on deep learning for modelling visual
information of the GUI screenshot. Therefore, OwlEye can detect GUIs with
display issues and also locate the detailed region of the issue in the given
GUI for guiding developers to fix the bug. We manually construct a large-scale
labelled dataset with 4,470 GUI screenshots with UI display issues and develop
a heuristics-based data augmentation method for boosting the performance of our
OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision
and 84% recall in detecting UI display issues, and 90% accuracy in localizing
these issues. We also evaluate OwlEye with popular Android apps on Google Play
and F-droid, and successfully uncover 57 previously-undetected UI display
issues with 26 of them being confirmed or fixed so far.Comment: Accepted to 35th IEEE/ACM International Conference on Automated
Software Engineering (ASE 20
Investigating Semantic Properties of Images Generated from Natural Language Using Neural Networks
This work explores the attributes, properties, and potential uses of generative neural networks within the realm of encoding semantics. It works toward answering the questions of: If one uses generative neural networks to create a picture based on natural language, does the resultant picture encode the text\u27s semantics in a way a computer system can process? Could such a system be more precise than current solutions at detecting, measuring, or comparing semantic properties of generated images, and thus their source text, or their source semantics?
This work is undertaken in the hope that detecting previously unknown properties, or better understanding them, could lead to new or improved methods of encoding and processing semantics in a computer system. Improvements in this space could affect many systems that make semantically based decisions. Being able to detect general or specific semantic properties, semantic similarity, or other semantic properties more effectively could improve tasks such as information retrieval, question answering, duplication (clone) detection, sentiment analysis, and others. Additionally, it could provide insight into how to better represent semantics in computer systems and thus bring us closer to general artificial intelligence.
To explore this space, this work starts with an experiment consisting of transforming pairs of texts into pairs of images via a generative neural network and exploring properties of those image pairs. The text pairs were known to either be textually and semantically identical, semantically similar, or semantically dissimilar. The resultant image pairs are then tested for similarity via a second neural network based process to investigate if the semantic similarity is preserved during the transformation process and thus, exists in the resultant image pairs in a quantifiable way.
Preliminary results showed strong evidence of resultant images encoding semantics in a measurable way. However, when the experiment was conducted on a larger dataset, and with the generative network more thoroughly trained, the results are weaker. An alternative experiment conducted on different datasets and configurations produced results that are still weaker than the preliminary experiments. These findings lead us to believe the promise of the preliminary results was possibly due to semantics being encoded by the vectorization of the words, and not by the generative neural network. This explanation seeks to clarify why, as the generative neural network took a larger role in the process, the results were worse, and as it took a smaller role, the results were better. Further tests were conducted to establish this belief and proved supportive
A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports
Bug reports are an essential aspect of software development, and it is
crucial to identify and resolve them quickly to ensure the consistent
functioning of software systems. Retrieving similar bug reports from an
existing database can help reduce the time and effort required to resolve bugs.
In this paper, we compared the effectiveness of semantic textual similarity
methods for retrieving similar bug reports based on a similarity score. We
explored several embedding models such as TF-IDF (Baseline), FastText, Gensim,
BERT, and ADA. We used the Software Defects Data containing bug reports for
various software projects to evaluate the performance of these models. Our
experimental results showed that BERT generally outperformed the rest of the
models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our
study provides insights into the effectiveness of different embedding methods
for retrieving similar bug reports and highlights the impact of selecting the
appropriate one for this task. Our code is available on GitHub.Comment: 7 Page
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection
Duplicate bug report detection (DBRD) is a long-standing challenge in both
academia and industry. Over the past decades, researchers have proposed various
approaches to detect duplicate bug reports more accurately. With the recent
advancement of deep learning, researchers have also proposed several approaches
that leverage deep learning models to detect duplicate bug reports. A recent
benchmarking study on DBRD also reveals that the performance of deep
learning-based approaches is not always better than the traditional approaches.
However, traditional approaches have limitations, e.g., they are usually based
on the bag-of-words model, which cannot capture the semantics of bug reports.
To address these aforementioned challenges, we seek to leverage
state-of-the-art large language model to improve the performance of the
traditional DBRD approach.
In this paper, we propose an approach called Cupid, which combines the
best-performing traditional DBRD approach REP with the state-of-the-art large
language model ChatGPT. Specifically, we first leverage ChatGPT under the
zero-shot setting to get essential information on bug reports. We then use the
essential information as the input of REP to detect duplicate bug reports. We
conducted an evaluation on comparing Cupid with three existing approaches on
three datasets. The experimental results show that Cupid achieves new
state-of-the-art results, reaching Recall Rate@10 scores ranging from 0.59 to
0.67 across all the datasets analyzed. Our work highlights the potential of
combining large language models to improve the performance of software
engineering tasks.Comment: Work in progres
Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection
One goal of technical online communities is to help developers find the right
answer in one place. A single question can be asked in different ways with
different wordings, leading to the existence of duplicate posts on technical
forums. The question of how to discover and link duplicate posts has garnered
the attention of both developer communities and researchers. For example, Stack
Overflow adopts a voting-based mechanism to mark and close duplicate posts.
However, addressing these constantly emerging duplicate posts in a timely
manner continues to pose challenges. Therefore, various approaches have been
proposed to detect duplicate posts on technical forum posts automatically. The
existing methods suffer from limitations either due to their reliance on
handcrafted similarity metrics which can not sufficiently capture the semantics
of posts, or their lack of supervision to improve the performance.
Additionally, the efficiency of these methods is hindered by their dependence
on pair-wise feature generation, which can be impractical for large amount of
data. In this work, we attempt to employ and refine the GPT-3 embeddings for
the duplicate detection task. We assume that the GPT-3 embeddings can
accurately represent the semantics of the posts. In addition, by training a
Siamese-based network based on the GPT-3 embeddings, we obtain a latent
embedding that accurately captures the duplicate relation in technical forum
posts. Our experiment on a benchmark dataset confirms the effectiveness of our
approach and demonstrates superior performance compared to baseline methods.
When applied to the dataset we constructed with a recent Stack Overflow dump,
our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and
68.9%, respectively. With a manual study, we confirm our approach's potential
of finding unlabelled duplicates on technical forums.Comment: SANER 202
DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control-and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges
- …