5,248 research outputs found

    What is the Connection Between Issues, Bugs, and Enhancements? (Lessons Learned from 800+ Software Projects)

    Full text link
    Agile teams juggle multiple tasks so professionals are often assigned to multiple projects, especially in service organizations that monitor and maintain a large suite of software for a large user base. If we could predict changes in project conditions changes, then managers could better adjust the staff allocated to those projects.This paper builds such a predictor using data from 832 open source and proprietary applications. Using a time series analysis of the last 4 months of issues, we can forecast how many bug reports and enhancement requests will be generated next month. The forecasts made in this way only require a frequency count of this issue reports (and do not require an historical record of bugs found in the project). That is, this kind of predictive model is very easy to deploy within a project. We hence strongly recommend this method for forecasting future issues, enhancements, and bugs in a project.Comment: Accepted to 2018 International Conference on Software Engineering, at the software engineering in practice track. 10 pages, 10 figure

    The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study

    Get PDF
    Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.Comment: Under submission to Software Testing, Verification, and Reliability journal. (arXiv admin note: text overlap with arXiv:2107.00906 - This is an earlier study that this study extends

    Enhancing Web Applications Observability through Instrumented Automated Browsers

    Get PDF
    In software engineering, observability is the ability to determine the current state of a software system based on its external outputs or signals such as metrics, logs, or traces. Web engineers rely on the web browser console as the primary tool to monitor the client-side of web applications during end-to-end tests. However, this is a manual and time-consuming task due to the different browsers available. This paper presents BrowserWatcher, an open-source browser extension providing cross-browser capabilities to observe web applications and automatically gather browser console logs in different browsers (e.g., Chrome, Firefox, or Edge). We have leveraged this extension to conduct an empirical study analyzing the browser console of the top-50 public websites manually and automatically. The results show that BrowserWatcher gathers all the well-known log categories such as console or error traces. It also reveals that each web browser additionally includes other types of logs, which differ among browsers, thus providing distinct pieces of information for the same website.This work was partially supported in part by the Ministerio de Ciencia e Innovación-Agencia Estatal de Investigación, Spain (10.13039/501100011033) through the H2O Learn project under Grant PID2020-112584RB-C31, in part by the Madrid Regional Government through the e-Madrid-CM Project, Spain under Grant S2018/TCS-4307, and in part supported by the Comunidad de Madrid and Universidad Politécnica de Madrid, Spain through the V-PRICIT Research Programme Apoyo a la realización de Proyectos de I+D para jóvenes investigadores UPM-CAM, under Grant APOYOJOVENES-QINIM8-72-PKGQ0J. Funding for Article Processing Charge (APC): Universidad Carlos III de Madrid (Read & Publish Agreement CRUE-CSIC 2023)

    Large Language Models in Fault Localisation

    Full text link
    Large Language Models (LLMs) have shown promise in multiple software engineering tasks including code generation, code summarisation, test generation and code repair. Fault localisation is essential for facilitating automatic program debugging and repair, and is demonstrated as a highlight at ChatGPT-4's launch event. Nevertheless, there has been little work understanding LLMs' capabilities for fault localisation in large-scale open-source programs. To fill this gap, this paper presents an in-depth investigation into the capability of ChatGPT-3.5 and ChatGPT-4, the two state-of-the-art LLMs, on fault localisation. Using the widely-adopted Defects4J dataset, we compare the two LLMs with the existing fault localisation techniques. We also investigate the stability and explanation of LLMs in fault localisation, as well as how prompt engineering and the length of code context affect the fault localisation effectiveness. Our findings demonstrate that within a limited code context, ChatGPT-4 outperforms all the existing fault localisation methods. Additional error logs can further improve ChatGPT models' localisation accuracy and stability, with an average 46.9% higher accuracy over the state-of-the-art baseline SmartFL in terms of TOP-1 metric. However, performance declines dramatically when the code context expands to the class-level, with ChatGPT models' effectiveness becoming inferior to the existing methods overall. Additionally, we observe that ChatGPT's explainability is unsatisfactory, with an accuracy rate of only approximately 30%. These observations demonstrate that while ChatGPT can achieve effective fault localisation performance under certain conditions, evident limitations exist. Further research is imperative to fully harness the potential of LLMs like ChatGPT for practical fault localisation applications

    An Evaluation of Log Parsing with ChatGPT

    Full text link
    Software logs play an essential role in ensuring the reliability and maintainability of large-scale software systems, as they are often the sole source of runtime information. Log parsing, which converts raw log messages into structured data, is an important initial step towards downstream log analytics. In recent studies, ChatGPT, the current cutting-edge large language model (LLM), has been widely applied to a wide range of software engineering tasks. However, its performance in automated log parsing remains unclear. In this paper, we evaluate ChatGPT's ability to undertake log parsing by addressing two research questions. (1) Can ChatGPT effectively parse logs? (2) How does ChatGPT perform with different prompting methods? Our results show that ChatGPT can achieve promising results for log parsing with appropriate prompts, especially with few-shot prompting. Based on our findings, we outline several challenges and opportunities for ChatGPT-based log parsing.Comment: 6 page