583 research outputs found

    Studying the fault-detection effectiveness of GUI test cases for rapidly evolving software

    Full text link

    Developing Cost-Effective Model-Based Techniques for GUI Testing

    Get PDF
    Most of today's software users interact with the software through a graphical user interface (GUI), which constitutes as much as 45-60% of the total code. The correctness of the GUI is necessary to ensure the correctness of the overall software. Although GUIs have become ubiquitous, testing GUIs for functional correctness has remained a neglected research area. Existing GUI testing techniques are extremely resource intensive primarily because GUIs have very large input spaces and evolve frequently. This dissertation overcomes the limitations of existing techniques by developing a process with supporting models, techniques, and tools for continuous integration testing of evolving GUI-based applications. The key idea of this process is to create three concentric testing loops, each with specific GUI testing goals, resource usage, and targeted feedback. The innermost fully automatic loop called crash testing operates on each code change of the GUI software. The second semi-automated loop called smoke testing operates on each day's GUI build. The outermost loop called comprehensive GUI testing is executed after a major version of the GUI is available. The primary enablers of this process, also developed in this dissertation, include an abstract model of the GUI and a set of model-based techniques for test-case generation, test oracle creation, and continuous GUI testing. The model and techniques were obtained by studying GUI faults, interactions between GUI events, and why certain event interactions lead to faults. The continuous testing process and associated techniques are shown to be useful, via several large experiments involving millions of test cases, on both in-house and open-source GUI applications

    Reverse Engineering and Testing of Rich Internet Applications

    Get PDF
    The World Wide Web experiences a continuous and constant evolution, where new initiatives, standards, approaches and technologies are continuously proposed for developing more effective and higher quality Web applications. To satisfy the growing request of the market for Web applications, new technologies, frameworks, tools and environments that allow to develop Web and mobile applications with the least effort and in very short time have been introduced in the last years. These new technologies have made possible the dawn of a new generation of Web applications, named Rich Internet Applications (RIAs), that offer greater usability and interactivity than traditional ones. This evolution has been accompanied by some drawbacks that are mostly due to the lack of applying well-known software engineering practices and approaches. As a consequence, new research questions and challenges have emerged in the field of web and mobile applications maintenance and testing. The research activity described in this thesis has addressed some of these topics with the specific aim of proposing new and effective solutions to the problems of modelling, reverse engineering, comprehending, re-documenting and testing existing RIAs. Due to the growing relevance of mobile applications in the renewed Web scenarios, the problem of testing mobile applications developed for the Android operating system has been addressed too, in an attempt of exploring and proposing new techniques of testing automation for these type of applications

    Repairing GUI Test Suites Using a Genetic Algorithm

    Full text link

    Cause reduction for quick testing

    Get PDF
    pre-printAbstract-In random testing, it is often desirable to produce a "quick test" - an extremely inexpensive test suite that can serve as a frequently applied regression and allow the benefits of random testing to be obtained even in very slow or oversubscribed test environments. Delta debugging is an algorithm that, given a failing test case, produces a smaller test case that also fails, and typically executes much more quickly. Delta debugging of random tests can produce effective regression suites for previously detected faults, but such suites often have little power for detecting new faults, and in some cases provide poor code coverage. This paper proposes extending delta debugging by simplifying tests with respect to code coverage, an instance of a generalization of delta debugging we call cause reduction. We show that test suites reduced in this fashion can provide very effective quick tests for real-world programs. For Mozilla's SpiderMonkey JavaScript engine, the reduced suite is more effective for finding software faults, even if its reduced runtime is not considered. The effectiveness of a reduction-based quick test persists through major changes to the software under test

    On Improving (Non)Functional Testing

    Get PDF
    Software testing is commonly classified into two categories, nonfunctional testing and functional testing. The goal of nonfunctional testing is to test nonfunctional requirements, such as performance and reliability. Performance testing is one of the most important types of nonfunctional testing, one goal of which is to detect the phenomena that an Application Under Testing (AUT) exhibits unexpectedly worse performance (e.g., lower throughput) with some input data. During performance testing, a critical challenge is to understand the AUT’s behaviors with large numbers of combinations of input data and find the particular subset of inputs leading to performance bottlenecks. However, enumerating those particular inputs and identifying those bottlenecks are always laborious and intellectually intensive. In addition, for an evolving software system, some code changes may accidentally degrade performance between two software versions, it is even more challenging to find problematic changes (out of a large number of committed changes) may lead to performance regressions under certain test inputs. This dissertation presents a set of approaches to automatically find specific combinations of input data for exposing performance bottlenecks and further analyze execution traces to identify performance bottlenecks. In addition, this dissertation also provides an approach that automatically estimates the impact of code changes on performance degradation between two released software versions to identify the problematic ones likely leading to performance regressions. Functional testing is used to test the functional correctness of AUTs. Developers commonly write test suites for AUTs to test different functionalities and locate functional faults. During functional testing, developers rely on some strategies to order test cases to achieve certain objectives, such as exposing faults faster, which is known as Test Case Prioritization (TCP). TCP techniques are commonly classified into two categories, dynamic and static techniques. A set of empirical studies has been conducted to examine and understand different TCP techniques, but there is a clear gap in existing studies. No study has compared static techniques against dynamic techniques and comprehensively examined the impact of test granularity, program size, fault characteristics, and the similarities in terms of fault detection on TCP techniques. Thus, this dissertation presents an empirical study to thoroughly compare static and dynamic TCP techniques in terms of effectiveness, efficiency, and similarity of uncovered faults at different granularities on a large set of real-world programs, and further analyze the potential impact of program size and fault characteristics on TCP evaluation. Moreover, in the prior work, TCP techniques have been typically evaluated against synthetic software defects, called mutants. For this reason, it is currently unclear whether TCP performance on mutants would be representative of the performance achieved on real faults. to answer this fundamental question, this dissertation presents the first empirical study that investigates TCP performance when applied to both real-world faults and mutation faults for understanding the representativeness of mutants

    A Context-Sensitive Coverage Criterion for Test Suite Reduction

    Get PDF
    Modern software is increasingly developed using multi-language implementations, large supporting libraries and frameworks, callbacks, virtual function calls, reflection, multithreading, and object- and aspect-oriented programming. The predominant example of such software is the graphical user interface (GUI), which is used as a front-end to most of today's software applications. The characteristics of GUIs and other modern software present new challenges to software testing. Because recently developed techniques for automated test case generation can generate more tests than are practical to regularly execute, one important challenge is test suite reduction. Test suite reduction seeks to decrease the size of a test suite without overly compromising its original fault detection ability. This research advances the state-of-the-art in test suite reduction by empirically studying a coverage criterion which considers the context in which program concepts are covered. Conventional approaches to test suite reduction were developed and evaluated on batch-style applications and, due to the aforementioned considerations, are not always easily applicable to modern software. Furthermore, many existing techniques fail to consider the context in which code executes inside an event-driven paradigm, where programs wait for and interactively respond to user- and system-generated events. Consequently, they yield reduced test suites with severely impaired fault detection ability. The novel feature of this research is a test suite reduction technique based on the call stack coverage criterion which addresses many of the challenges associated with coverage-based test suite reduction in modern applications. Results show that reducing test suites while maintaining call stack coverage yields good tradeoffs between size reduction and fault detection effectiveness compared to traditional techniques. The output of this research includes models, metrics, algorithms, and techniques based upon this approach

    EMPIRICAL CHARACTERIZATION OF SOFTWARE QUALITY

    Get PDF
    The research topic focuses on the characterization of software quality considering the main software elements such as people, process and product. Many attributes (size, language, testing techniques etc.) probably could have an effect on the quality of software. In this thesis we aim to understand the impact of attributes of three P’s (people, product, process) on the quality of software by empirical means. Software quality can be interpreted in many ways, such as customer satisfaction, stability and defects etc. In this thesis we adopt ‘defect density’ as a quality measure. Therefore the research focus on the empirical evidences of the impact of attributes of the three P’s on the software defect density. For this reason empirical research methods (systematic literature reviews, case studies, and interviews) are utilized to collect empirical evidence. Each of this research method helps to extract the empirical evidences of the object under study and for data analysis statistical methods are used. Considering the product attributes, we have studied the size, language, development mode, age, complexity, module structure, module dependency, and module quality and their impact on project quality. Considering the process attributes, we have studied the process maturity and structure, and their impact on the project quality. Considering the people attributes, we have studied the experience and capability, and their impact on the project quality. Moreover, in the process category, we have studied the impact of one testing approach called ‘exploratory testing’ and its impact on the quality of software. Exploratory testing is a widely used software-testing practice and means simultaneous learning, test design, and test execution. We have analyzed the exploratory testing weaknesses, and proposed a hybrid testing approach in an attempt to improve the quality. Concerning the product attributes, we found that there exist a significant difference of quality between open and close source projects, java and C projects, and large and small projects. Very small and defect free modules have impact on the software quality. Different complexity metrics have different impact on the software quality considering the size. Product complexity as defined in Table 53 has partial impact on the software quality. However software age and module dependencies are not factor to characterize the software quality. Concerning the people attributes, we found that platform experience, application experience and language and tool experience have significant impact on the software quality. Regarding the capability we found that programmer capability has partial impact on the software quality where as analyst capability has no impact on the software quality. Concerning process attributes we found that there is no difference of quality between the project developed under CMMI and those that are not developed under CMMI. Regarding the CMMI levels there is difference of software quality particularly between CMMI level 1 and CMMI level 3. Comparing different process types we found that hybrid projects are of better quality than waterfall projects. Process maturity defined by (SEI-CMM) has partial impact on the software quality. Concerning exploratory testing, we found that exploratory testing weaknesses induce the testing technical debt therefore a process is defined in conjunction with the scripted testing in an attempt to reduce the associated technical debt of exploratory testing. The findings are useful for both researchers and practitioners to evaluate their project

    QUANTIFYING FLAKINESS AND MINIMIZING ITS EFFECTS ON SOFTWARE TESTING

    Get PDF
    In software testing, test inputs are passed into a system under test (SUT); the SUT is executed; and a {\it test oracle} checks the outputs against expected values. There are cases when the same test case is executed on the same code of the SUT multiple times, and it passes or fails during different runs. This is the {\it test flakiness problem} and such test cases are called {\it flaky tests}. The test flakiness problem makes test results and testing techniques unreliable. Flaky tests may be mistakingly labeled as failed, and this will increase not only the number of reported bugs testers need to check, but also the chance to miss real faults. The test flakiness problem is gaining more attention in modern software testing practice where complex interactions are involved in test execution, and this raises several new challenges: What metrics should be used to measure the flakiness of a test case? What are the factors that cause or impact flakiness? And how can the effects of flakiness be reduced or minimized? This research develops a systematic approach to quantitively analyze and minimize the effects of flakiness. This research makes three major contributions. First, a novel {\it entropy-based metric} is introduced to quantify the flakiness of different layers of test outputs (such as code coverage, invariants, and GUI state). Second, the impact of a common set of factors on test results in system interactive testing is examined. Last, a new {\it flake filter} is introduced to minimize the impact of flakiness by filtering out flaky tests (and test assertions) while retaining bug-revealing ones. Two empirical studies on five open source applications evaluate the new entropy measure, study the causes of flakiness, and evaluate the usefulness of the flake filter. In particular, the first study empirically analyzes the impact of factors including the system platform, Java version, application initial state and tool harness configurations. The results show a large impact on SUTs when these factors were uncontrolled, with as many as 184 lines of code coverage differing between runs of the same test cases, and up to 96\% false positives with respect to fault detection. The second study evaluates the effectiveness of the flake filter on the SUTs' real faults. The results show that 3.83\% of flaky assertions can impact 88.59\% of test cases, and it is possible to automatically obtain a flake filter that, in some cases, completely eliminates flakiness without comprising fault-detection ability
    • …
    corecore