374 research outputs found

    Aspect of Code Cloning Towards Software Bug and Imminent Maintenance: A Perspective on Open-source and Industrial Mobile Applications

    Get PDF
    As a part of the digital era of microtechnology, mobile application (app) development is evolving with lightning speed to enrich our lives and bring new challenges and risks. In particular, software bugs and failures cost trillions of dollars every year, including fatalities such as a software bug in a self-driving car that resulted in a pedestrian fatality in March 2018 and the recent Boeing-737 Max tragedies that resulted in hundreds of deaths. Software clones (duplicated fragments of code) are also found to be one of the crucial factors for having bugs or failures in software systems. There have been many significant studies on software clones and their relationships to software bugs for desktop-based applications. Unfortunately, while mobile apps have become an integral part of today’s era, there is a marked lack of such studies for mobile apps. In order to explore this important aspect, in this thesis, first, we studied the characteristics of software bugs in the context of mobile apps, which might not be prevalent for desktop-based apps such as energy-related (battery drain while using apps) and compatibility-related (different behaviors of same app in different devices) bugs/issues. Using Support Vector Machine (SVM), we classified about 3K mobile app bug reports of different open-source development sites into four categories: crash, energy, functionality and security bug. We then manually examined a subset of those bugs and found that over 50% of the bug-fixing code-changes occurred in clone code. There have been a number of studies with desktop-based software systems that clearly show the harmful impacts of code clones and their relationships to software bugs. Given that there is a marked lack of such studies for mobile apps, in our second study, we examined 11 open-source and industrial mobile apps written in two different languages (Java and Swift) and noticed that clone code is more bug-prone than non-clone code and that industrial mobile apps have a higher code clone ratio than open-source mobile apps. Furthermore, we correlated our study outcomes with those of existing desktop based studies and surveyed 23 mobile app developers to validate our findings. Along with validating our findings from the survey, we noticed that around 95% of the developers usually copy/paste (code cloning) code fragments from the popular Crowd-sourcing platform, Stack Overflow (SO) to their projects and that over 75% of such developers experience bugs after such activities (the code cloning from SO). Existing studies with desktop-based systems also showed that while SO is one of the most popular online platforms for code reuse (and code cloning), SO code fragments are usually toxic in terms of software maintenance perspective. Thus, in the third study of this thesis, we studied the consequences of code cloning from SO in different open source and industrial mobile apps. We observed that closed-source industrial apps even reused more SO code fragments than open-source mobile apps and that SO code fragments were more change-prone (such as bug) than non-SO code fragments. We also experienced that SO code fragments were related to more bugs in industrial projects than open-source ones. Our studies show how we could efficiently and effectively manage clone related software bugs for mobile apps by utilizing the positive sides of code cloning while overcoming (or at least minimizing) the negative consequences of clone fragments

    Towards understanding the challenges faced by machine learning software developers and enabling automated solutions

    Get PDF
    Modern software systems are increasingly including machine learning (ML) as an integral component. However, we do not yet understand the difficulties faced by software developers when learning about ML libraries and using them within their systems. To fill that gap this thesis reports on a detailed (manual) examination of 3,243 highly-rated Q&A posts related to ten ML libraries, namely Tensorflow, Keras, scikitlearn, Weka, Caffe, Theano, MLlib, Torch, Mahout, and H2O, on Stack Overflow, a popular online technical Q&A forum. Our findings reveal the urgent need for software engineering (SE) research in this area. The second part of the thesis particularly focuses on understanding the Deep Neural Network (DNN) bug characteristics. We study 2,716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, their root causes and impacts, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. While exploring the bug characteristics, our findings imply that repairing software that uses DNNs is one such unmistakable SE need where automated tools could be beneficial; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing DNNs. So, the third part of this thesis presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from Github for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns and the most common bug fix patterns are fixing data dimension and neural network connectivity. Finally, we propose an automatic technique to detect ML Application Programming Interface (API) misuses. We started with an empirical study to understand ML API misuses. Our study shows that ML API misuse is prevalent and distinct compared to non-ML API misuses. Inspired by these findings, we contributed Amimla (Api Misuse In Machine Learning Apis) an approach and a tool for ML API misuse detection. Amimla relies on several technical innovations. First, we proposed an abstract representation of ML pipelines to use in misuse detection. Second, we proposed an abstract representation of neural networks for deep learning related APIs. Third, we have developed a representation strategy for constraints on ML APIs. Finally, we have developed a misuse detection strategy for both single and multi-APIs. Our experimental evaluation shows that Amimla achieves a high average accuracy of ∼80% on two benchmarks of misuses from Stack Overflow and Github

    An Assessment of Eclipse Bugs' Priority and Severity Prediction Using Machine Learning

    Get PDF
    The reliability and quality of software programs remains to be an important and challenging aspect of software design. Software developers and system operators spend huge time on assessing and overcoming expected and unexpected errors that might affect the users’ experience negatively. One of the major concerns in developing software problems is the bug reports, which contains the severity and priority of these defects. For a long time, this task was performed manually with huge effort and time consumptions by system operators. Therefore, in this paper, we present a novel automatic assessment tool using Machine Learning algorithms, for assessing bugs’ reports based on several features such as hardware, product, assignee, OS, component, target milestone, votes, and versions.  The aim is to build a tool that automatically classifies software bugs according to the severity and priority of the bugs and makes predictions based on the most representative features and bug report text. To perform this task, we used the Multi-Nominal Naive Bayes, Random Forests Classifier, Bagging, Ada Boosting, SVC, KNN, and Linear SVM Classifiers and Natural Language Processing techniques to analyze the Eclipse dataset. The approach shows promising results for software bugs’ detection and prediction

    Method-Level Bug Severity Prediction using Source Code Metrics and LLMs

    Full text link
    In the past couple of decades, significant research efforts are devoted to the prediction of software bugs. However, most existing work in this domain treats all bugs the same, which is not the case in practice. It is important for a defect prediction method to estimate the severity of the identified bugs so that the higher-severity ones get immediate attention. In this study, we investigate source code metrics, source code representation using large language models (LLMs), and their combination in predicting bug severity labels of two prominent datasets. We leverage several source metrics at method-level granularity to train eight different machine-learning models. Our results suggest that Decision Tree and Random Forest models outperform other models regarding our several evaluation metrics. We then use the pre-trained CodeBERT LLM to study the source code representations' effectiveness in predicting bug severity. CodeBERT finetuning improves the bug severity prediction results significantly in the range of 29%-140% for several evaluation metrics, compared to the best classic prediction model on source code metric. Finally, we integrate source code metrics into CodeBERT as an additional input, using our two proposed architectures, which both enhance the CodeBERT model effectiveness

    Towards the detection and analysis of performance regression introducing code changes

    Get PDF
    In contemporary software development, developers commonly conduct regression testing to ensure that code changes do not affect software quality. Performance regression testing is an emerging research area from the regression testing domain in software engineering. Performance regression testing aims to maintain the system\u27s performance. Conducting performance regression testing is known to be expensive. It is also complex, considering the increase of committed code and developing team members working simultaneously. Many automated regression testing techniques have been proposed in prior research. However, challenges in the practice of locating and resolving performance regression still exist. Directing regression testing to the commit level provides solutions to locate the root cause, yet it hinders the development process. This thesis outlines motivations and solutions to address locating performance regression root causes. First, we challenge a deterministic state-of-art approach by expanding the testing data to find improvement areas. The deterministic approach was found to be limited in searching for the best regression-locating rule. Thus, we presented two stochastic approaches to develop models that can learn from historical commits. The goal of the first stochastic approach is to view the research problem as a search-based optimization problem seeking to reach the highest detection rate. We are applying different multi-objective evolutionary algorithms and conducting a comparison between them. This thesis also investigates whether simplifying the search space by combining objectives would achieve comparative results. The second stochastic approach addresses the severity of class imbalance any system could have since code changes introducing regression are rare but costly. We formulate the identification of problematic commits that introduce performance regression as a binary classification problem that handles class imbalance. Further, the thesis provides an exploratory study on the challenges developers face in resolving performance regression. The study is based on the questions posted on a technical form directed to performance regression. We collected around 2k questions discussing the regression of software execution time, and all were manually analyzed. The study resulted in a categorization of the challenges. We also discussed the difficulty level of performance regression issues within the development community. This study provides insights to help developers during the software design and implementation to avoid regression causes

    Analysis of Human Affect and Bug Patterns to Improve Software Quality and Security

    Get PDF
    The impact of software is ever increasing as more and more systems are being software operated. Despite the usefulness of software, many instances software failures have been causing tremendous losses in lives and dollars. Software failures take place because of bugs (i.e., faults) in the software systems. These bugs cause the program to malfunction or crash and expose security vulnerabilities exploitable by malicious hackers. Studies confirm that software defects and vulnerabilities appear in source code largely due to the human mistakes and errors of the developers. Human performance is impacted by the underlying development process and human affects, such as sentiment and emotion. This thesis examines these human affects of software developers, which have drawn recent interests in the community. For capturing developers’ sentimental and emotional states, we have developed several software tools (i.e., SentiStrength-SE, DEVA, and MarValous). These are novel tools facilitating automatic detection of sentiments and emotions from the software engineering textual artifacts. Using such an automated tool, the developers’ sentimental variations are studied with respect to the underlying development tasks (e.g., bug-fixing, bug-introducing), development periods (i.e., days and times), team sizes and project sizes. We expose opportunities for exploiting developers’ sentiments for higher productivity and improved software quality. While developers’ sentiments and emotions can be leveraged for proactive and active safeguard in identifying and minimizing software bugs, this dissertation also includes in-depth studies of the relationship among various bug patterns, such as software defects, security vulnerabilities, and code smells to find actionable insights in minimizing software bugs and improving software quality and security. Bug patterns are exposed through mining software repositories and bug databases. These bug patterns are crucial in localizing bugs and security vulnerabilities in software codebase for fixing them, predicting portions of software susceptible to failure or exploitation by hackers, devising techniques for automated program repair, and avoiding code constructs and coding idioms that are bug-prone. The software tools produced from this thesis are empirically evaluated using standard measurement metrics (e.g., precision, recall). The findings of all the studies are validated with appropriate tests for statistical significance. Finally, based on our experience and in-depth analysis of the present state of the art, we expose avenues for further research and development towards a holistic approach for developing improved and secure software systems

    Use and misuse of the term "Experiment" in mining software repositories research

    Get PDF
    The significant momentum and importance of Mining Software Repositories (MSR) in Software Engineering (SE) has fostered new opportunities and challenges for extensive empirical research. However, MSR researchers seem to struggle to characterize the empirical methods they use into the existing empirical SE body of knowledge. This is especially the case of MSR experiments. To provide evidence on the special characteristics of MSR experiments and their differences with experiments traditionally acknowledged in SE so far, we elicited the hallmarks that differentiate an experiment from other types of empirical studies and characterized the hallmarks and types of experiments in MSR. We analyzed MSR literature obtained from a small-scale systematic mapping study to assess the use of the term experiment in MSR. We found that 19% of the papers claiming to be an experiment are indeed not an experiment at all but also observational studies, so they use the term in a misleading way. From the remaining 81% of the papers, only one of them refers to a genuine controlled experiment while the others stand for experiments with limited control. MSR researchers tend to overlook such limitations, compromising the interpretation of the results of their studies. We provide recommendations and insights to support the improvement of MSR experiments.This work has been partially supported by the Spanish project: MCI PID2020-117191RB-I00.Peer ReviewedPostprint (author's final draft
    • …
    corecore