396 research outputs found

    DeepSoft: A vision for a deep model of software

    Full text link
    Although software analytics has experienced rapid growth as a research area, it has not yet reached its full potential for wide industrial adoption. Most of the existing work in software analytics still relies heavily on costly manual feature engineering processes, and they mainly address the traditional classification problems, as opposed to predicting future events. We present a vision for \emph{DeepSoft}, an \emph{end-to-end} generic framework for modeling software and its development process to predict future risks and recommend interventions. DeepSoft, partly inspired by human memory, is built upon the powerful deep learning-based Long Short Term Memory architecture that is capable of learning long-term temporal dependencies that occur in software evolution. Such deep learned patterns of software can be used to address a range of challenging problems such as code and task recommendation and prediction. DeepSoft provides a new approach for research into modeling of source code, risk prediction and mitigation, developer modeling, and automatically generating code patches from bug reports.Comment: FSE 201

    Owl Eyes: Spotting UI Display Issues via Visual Understanding

    Full text link
    Graphical User Interface (GUI) provides a visual bridge between a software application and end users, through which they can interact with each other. With the development of technology and aesthetics, the visual effects of the GUI are more and more attracting. However, such GUI complexity posts a great challenge to the GUI implementation. According to our pilot study of crowdtesting bug reports, display issues such as text overlap, blurred screen, missing image always occur during GUI rendering on different devices due to the software or hardware compatibility. They negatively influence the app usability, resulting in poor user experience. To detect these issues, we propose a novel approach, OwlEye, based on deep learning for modelling visual information of the GUI screenshot. Therefore, OwlEye can detect GUIs with display issues and also locate the detailed region of the issue in the given GUI for guiding developers to fix the bug. We manually construct a large-scale labelled dataset with 4,470 GUI screenshots with UI display issues and develop a heuristics-based data augmentation method for boosting the performance of our OwlEye. The evaluation demonstrates that our OwlEye can achieve 85% precision and 84% recall in detecting UI display issues, and 90% accuracy in localizing these issues. We also evaluate OwlEye with popular Android apps on Google Play and F-droid, and successfully uncover 57 previously-undetected UI display issues with 26 of them being confirmed or fixed so far.Comment: Accepted to 35th IEEE/ACM International Conference on Automated Software Engineering (ASE 20

    Investigating Semantic Properties of Images Generated from Natural Language Using Neural Networks

    Get PDF
    This work explores the attributes, properties, and potential uses of generative neural networks within the realm of encoding semantics. It works toward answering the questions of: If one uses generative neural networks to create a picture based on natural language, does the resultant picture encode the text\u27s semantics in a way a computer system can process? Could such a system be more precise than current solutions at detecting, measuring, or comparing semantic properties of generated images, and thus their source text, or their source semantics? This work is undertaken in the hope that detecting previously unknown properties, or better understanding them, could lead to new or improved methods of encoding and processing semantics in a computer system. Improvements in this space could affect many systems that make semantically based decisions. Being able to detect general or specific semantic properties, semantic similarity, or other semantic properties more effectively could improve tasks such as information retrieval, question answering, duplication (clone) detection, sentiment analysis, and others. Additionally, it could provide insight into how to better represent semantics in computer systems and thus bring us closer to general artificial intelligence. To explore this space, this work starts with an experiment consisting of transforming pairs of texts into pairs of images via a generative neural network and exploring properties of those image pairs. The text pairs were known to either be textually and semantically identical, semantically similar, or semantically dissimilar. The resultant image pairs are then tested for similarity via a second neural network based process to investigate if the semantic similarity is preserved during the transformation process and thus, exists in the resultant image pairs in a quantifiable way. Preliminary results showed strong evidence of resultant images encoding semantics in a measurable way. However, when the experiment was conducted on a larger dataset, and with the generative network more thoroughly trained, the results are weaker. An alternative experiment conducted on different datasets and configurations produced results that are still weaker than the preliminary experiments. These findings lead us to believe the promise of the preliminary results was possibly due to semantics being encoded by the vectorization of the words, and not by the generative neural network. This explanation seeks to clarify why, as the generative neural network took a larger role in the process, the results were worse, and as it took a smaller role, the results were better. Further tests were conducted to establish this belief and proved supportive

    A Comparative Study of Text Embedding Models for Semantic Text Similarity in Bug Reports

    Full text link
    Bug reports are an essential aspect of software development, and it is crucial to identify and resolve them quickly to ensure the consistent functioning of software systems. Retrieving similar bug reports from an existing database can help reduce the time and effort required to resolve bugs. In this paper, we compared the effectiveness of semantic textual similarity methods for retrieving similar bug reports based on a similarity score. We explored several embedding models such as TF-IDF (Baseline), FastText, Gensim, BERT, and ADA. We used the Software Defects Data containing bug reports for various software projects to evaluate the performance of these models. Our experimental results showed that BERT generally outperformed the rest of the models regarding recall, followed by ADA, Gensim, FastText, and TFIDF. Our study provides insights into the effectiveness of different embedding methods for retrieving similar bug reports and highlights the impact of selecting the appropriate one for this task. Our code is available on GitHub.Comment: 7 Page

    Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection

    Full text link
    Duplicate bug report detection (DBRD) is a long-standing challenge in both academia and industry. Over the past decades, researchers have proposed various approaches to detect duplicate bug reports more accurately. With the recent advancement of deep learning, researchers have also proposed several approaches that leverage deep learning models to detect duplicate bug reports. A recent benchmarking study on DBRD also reveals that the performance of deep learning-based approaches is not always better than the traditional approaches. However, traditional approaches have limitations, e.g., they are usually based on the bag-of-words model, which cannot capture the semantics of bug reports. To address these aforementioned challenges, we seek to leverage state-of-the-art large language model to improve the performance of the traditional DBRD approach. In this paper, we propose an approach called Cupid, which combines the best-performing traditional DBRD approach REP with the state-of-the-art large language model ChatGPT. Specifically, we first leverage ChatGPT under the zero-shot setting to get essential information on bug reports. We then use the essential information as the input of REP to detect duplicate bug reports. We conducted an evaluation on comparing Cupid with three existing approaches on three datasets. The experimental results show that Cupid achieves new state-of-the-art results, reaching Recall Rate@10 scores ranging from 0.59 to 0.67 across all the datasets analyzed. Our work highlights the potential of combining large language models to improve the performance of software engineering tasks.Comment: Work in progres

    Refining GPT-3 Embeddings with a Siamese Structure for Technical Post Duplicate Detection

    Full text link
    One goal of technical online communities is to help developers find the right answer in one place. A single question can be asked in different ways with different wordings, leading to the existence of duplicate posts on technical forums. The question of how to discover and link duplicate posts has garnered the attention of both developer communities and researchers. For example, Stack Overflow adopts a voting-based mechanism to mark and close duplicate posts. However, addressing these constantly emerging duplicate posts in a timely manner continues to pose challenges. Therefore, various approaches have been proposed to detect duplicate posts on technical forum posts automatically. The existing methods suffer from limitations either due to their reliance on handcrafted similarity metrics which can not sufficiently capture the semantics of posts, or their lack of supervision to improve the performance. Additionally, the efficiency of these methods is hindered by their dependence on pair-wise feature generation, which can be impractical for large amount of data. In this work, we attempt to employ and refine the GPT-3 embeddings for the duplicate detection task. We assume that the GPT-3 embeddings can accurately represent the semantics of the posts. In addition, by training a Siamese-based network based on the GPT-3 embeddings, we obtain a latent embedding that accurately captures the duplicate relation in technical forum posts. Our experiment on a benchmark dataset confirms the effectiveness of our approach and demonstrates superior performance compared to baseline methods. When applied to the dataset we constructed with a recent Stack Overflow dump, our approach attains a Top-1, Top-5, and Top-30 accuracy of 23.1%, 43.9%, and 68.9%, respectively. With a manual study, we confirm our approach's potential of finding unlabelled duplicates on technical forums.Comment: SANER 202

    DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network

    Full text link
    Static bug detection has shown its effectiveness in detecting well-defined memory errors, e.g., memory leaks, buffer overflows, and null dereference. However, modern software systems have a wide variety of vulnerabilities. These vulnerabilities are extremely complicated with sophisticated programming logic, and these bugs are often caused by different bad programming practices, challenging existing bug detection solutions. It is hard and labor-intensive to develop precise and efficient static analysis solutions for different types of vulnerabilities, particularly for those that may not have a clear specification as the traditional well-defined vulnerabilities. This article presents DeepWukong, a new deep-learning-based embedding approach to static detection of software vulnerabilities for C/C++ programs. Our approach makes a new attempt by leveraging advanced recent graph neural networks to embed code fragments in a compact and low-dimensional representation, producing a new code representation that preserves high-level programming logic (in the form of control-and data-flows) together with the natural language information of a program. Our evaluation studies the top 10 most common C/C++ vulnerabilities during the past 3 years. We have conducted our experiments using 105,428 real-world programs by comparing our approach with four well-known traditional static vulnerability detectors and three state-of-the-art deep-learning-based approaches. The experimental results demonstrate the effectiveness of our research and have shed light on the promising direction of combining program analysis with deep learning techniques to address the general static code analysis challenges
    corecore