3,667 research outputs found

    The Integration of Machine Learning into Automated Test Generation: A Systematic Mapping Study

    Get PDF
    Context: Machine learning (ML) may enable effective automated test generation. Objective: We characterize emerging research, examining testing practices, researcher goals, ML techniques applied, evaluation, and challenges. Methods: We perform a systematic mapping on a sample of 102 publications. Results: ML generates input for system, GUI, unit, performance, and combinatorial testing or improves the performance of existing generation methods. ML is also used to generate test verdicts, property-based, and expected output oracles. Supervised learning - often based on neural networks - and reinforcement learning - often based on Q-learning - are common, and some publications also employ unsupervised or semi-supervised learning. (Semi-/Un-)Supervised approaches are evaluated using both traditional testing metrics and ML-related metrics (e.g., accuracy), while reinforcement learning is often evaluated using testing metrics tied to the reward function. Conclusion: Work-to-date shows great promise, but there are open challenges regarding training data, retraining, scalability, evaluation complexity, ML algorithms employed - and how they are applied - benchmarks, and replicability. Our findings can serve as a roadmap and inspiration for researchers in this field.Comment: Under submission to Software Testing, Verification, and Reliability journal. (arXiv admin note: text overlap with arXiv:2107.00906 - This is an earlier study that this study extends

    A Review on Web Application Testing and its Current Research Directions

    Get PDF
    Testing is an important part of every software development process on which companies devote considerable time and effort. The burgeoning web applications and their proliferating economic significance in the society made the area of web application testing an area of acute importance. The web applications generally tend to take faster and quicker release cycles making their testing very challenging. The main issues in testing are cost efficiency and bug detection efficiency. Coverage-based   testing is the process of ensuring exercise of specific program elements. Coverage measurement helps determine the “thoroughness” of testing achieved. An avalanche of tools, techniques, frameworks came into existence to ascertain the quality of web applications.  A comparative study of some of the prominent tools, techniques and models for web application testing is presented. This work highlights the current research directions of some of the web application testing techniques

    Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development

    Full text link
    Mobile devices and platforms have become an established target for modern software developers due to performant hardware and a large and growing user base numbering in the billions. Despite their popularity, the software development process for mobile apps comes with a set of unique, domain-specific challenges rooted in program comprehension. Many of these challenges stem from developer difficulties in reasoning about different representations of a program, a phenomenon we define as a "language dichotomy". In this paper, we reflect upon the various language dichotomies that contribute to open problems in program comprehension and development for mobile apps. Furthermore, to help guide the research community towards effective solutions for these problems, we provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference on Program Comprehension (ICPC'18

    Gamified Exploratory GUI Testing of Web Applications: a Preliminary Evaluation

    Get PDF
    In the context of Software Engineering, testing is a well-known phase that plays a critical role, as is needed to ensure that the designed and produced code provides the expected results, avoiding faults and crashes. Exploratory GUI testing allows the tester to manually define test cases by directly interacting with the user interface of the finite system. However, testers often loosely perform exploratory GUI testing, as they perceive it as a time-consuming, repetitive and unappealing activity. We defined a gamified framework for GUI testing to address this issue, which we developed and integrated into the Augmented testing tool, Scout. Gamification is perceived as a means to enhance the performance of human testers by stimulating competition and encouraging them to achieve better results in terms of both efficiency and effectiveness. We performed a preliminary evaluation of the gamification layer with a small sample of testers to assess the benefits of the technique compared with the standard version of the same tool. Test sequences defined with the gamified tool achieved higher coverage (i.e., higher efficiency) and a slightly higher percentage of bugs found. The user's opinion was almost unanimously in favor of the gamified version of the tool

    Software Testing with Large Language Model: Survey, Landscape, and Vision

    Full text link
    Pre-trained large language models (LLMs) have recently emerged as a breakthrough technology in natural language processing and artificial intelligence, with the ability to handle large-scale datasets and exhibit remarkable performance across a wide range of tasks. Meanwhile, software testing is a crucial undertaking that serves as a cornerstone for ensuring the quality and reliability of software products. As the scope and complexity of software systems continue to grow, the need for more effective software testing techniques becomes increasingly urgent, and making it an area ripe for innovative approaches such as the use of LLMs. This paper provides a comprehensive review of the utilization of LLMs in software testing. It analyzes 52 relevant studies that have used LLMs for software testing, from both the software testing and LLMs perspectives. The paper presents a detailed discussion of the software testing tasks for which LLMs are commonly used, among which test case preparation and program repair are the most representative ones. It also analyzes the commonly used LLMs, the types of prompt engineering that are employed, as well as the accompanied techniques with these LLMs. It also summarizes the key challenges and potential opportunities in this direction. This work can serve as a roadmap for future research in this area, highlighting potential avenues for exploration, and identifying gaps in our current understanding of the use of LLMs in software testing.Comment: 20 pages, 11 figure

    Assessing the Quality of Mobile Graphical User Interfaces Using Multi-Objective Optimization

    Get PDF
    Aesthetic defects are a violation of quality attributes that are symptoms of bad interface design programming decisions. They lead to deteriorating the perceived usability of mobile user interfaces and negatively impact the Users eXperience (UX) with the mobile app. Most existing studies relied on a subjective evaluation of aesthetic defects depending on end-users feedback, which makes the manual evaluation of mobile user interfaces human-centric, time-consuming, and error-prone. Therefore, recent studies have dedicated their effort to focus on the definition of mathematical formulas that each targets a specific structural quality of the interface. As the UX is tightly dependent on the user profile, the combi-nation and calibration of quality attributes, formulas, and users characteristics, when defining a defect, is not straightforward. In this context, we propose a fully automated framework which combines literature quality attributes with the users profile to identify aesthetic defects of MUI. More precisely, we consider the mobile user interface evaluation as a multi-objective optimization problem where the goal is to maximize the number of detected violations while minimizing the detection complexity of detection rules and enhancing the interfaces overall quality in means

    FairFuzz: Targeting Rare Branches to Rapidly Increase Greybox Fuzz Testing Coverage

    Full text link
    In recent years, fuzz testing has proven itself to be one of the most effective techniques for finding correctness bugs and security vulnerabilities in practice. One particular fuzz testing tool, American Fuzzy Lop or AFL, has become popular thanks to its ease-of-use and bug-finding power. However, AFL remains limited in the depth of program coverage it achieves, in particular because it does not consider which parts of program inputs should not be mutated in order to maintain deep program coverage. We propose an approach, FairFuzz, that helps alleviate this limitation in two key steps. First, FairFuzz automatically prioritizes inputs exercising rare parts of the program under test. Second, it automatically adjusts the mutation of inputs so that the mutated inputs are more likely to exercise these same rare parts of the program. We conduct evaluation on real-world programs against state-of-the-art versions of AFL, thoroughly repeating experiments to get good measures of variability. We find that on certain benchmarks FairFuzz shows significant coverage increases after 24 hours compared to state-of-the-art versions of AFL, while on others it achieves high program coverage at a significantly faster rate
    • …
    corecore