1,433 research outputs found

    A random testing approach using pushdown automata

    Get PDF
    International audienceSince finite automata are in general strong abstractions of systems, many test cases which are automata traces generated uniformly at ran-dom, may be un-concretizable. This paper proposes a method extending the abovementioned testing approach to pushdown systems providing finer abstractions. Using combinatorial techniques guarantees the uniformity of generated traces. In addition, to improve the quality of the test suites, the combination of coverage criteria with random testing is investigated. The method is illustrated within both structural and model-based testing contexts

    Test generation from P systems using model checking

    Get PDF
    This paper presents some testing approaches based on model checking and using different testing criteria. First, test sets are built from different Kripke structure representations. Second, various rule coverage criteria for transitional, non-deterministic, cell-like P systems, are considered in order to generate adequate test sets. Rule based coverage criteria (simple rule coverage, context-dependent rule coverage and variants) are defined and, for each criterion, a set of LTL (Linear Temporal Logic) formulas is provided. A codification of a P system as a Kripke structure and the sets of LTL properties are used in test generation: for each criterion, test cases are obtained from the counterexamples of the associated LTL formulas, which are automatically generated from the Kripke structure codification of the P system. The method is illustrated with an implementation using a specific model checker, NuSMV. (C) 2010 Elsevier Inc. All rights reserved

    SOFTWARE FAULT DETECTION VIA GRAMMAR-BASED TEST CASE GENERATION

    Get PDF
    Fault detection is helpful to cut down the failure causes by logically locating and eliminating defects. In this thesis, we present a novel fault detection technique via structured input data which can be represented by a grammar. We take a set of well-distributed test cases as input, each of which has a set of test requirements. We illustrate that test requirements come from structured data can be effectively used as coverage criteria to reduce the test suites. We then propose an automatic fault detection approach to locate software bugs which are shown in failed test cases. This method can be applied in testing data-input-critical software such as compilers, translators, reactive systems etc. Preliminary experimental study proves that our fault detection approach is able to precisely locate the faults of software under test from failed test cases

    Coverage directed algorithms for test suite construction from LR-automata

    Get PDF
    Thesis (MSc)--Stellenbosch University, 2022.ENGLISH ABSTRACT: Bugs in software can have disastrous results in terms of both economic cost and human lives. Parsers can have bugs, like any other type of software, and must therefore be thoroughly tested in order to ensure that a parser recognizes its intended language accurately. However, parsers often need to recognize many different variations and combinations of grammar structures which can make it time consuming and difficult to construct test suites by hand. We therefore require automated methods of test suite construction for these systems. Currently, the majority of test suite construction algorithms focus on the grammar describing the language to be recognized by the parser. In this thesis we take a different approach. We consider the LR-automaton that recognizes the target language and use the context information encoded in the automaton. Specifically, we define a new class of algorithm and coverage criteria over a variant of the LR-automaton that we define, called an LR-graph. We define methods of constructing positive test suites, using paths over this LR-graph, as well as mutations on valid paths to construct negative test suites. We evaluate the performance of our new algorithms against other state-of-the-art algorithms. We do this by comparing coverage achieved over various systems, some smaller systems used in a university level compilers course and other larger, real-world systems. We find good performance of our algorithms over these systems, when compared to algorithms that produce test suites of equivalent size. Our evaluation has uncovered a problem in grammar-based testing algorithms that we call bias. Bias can lead to significant variation in coverage achieved over a system, which can in turn lead to a flawed comparison of two algorithms or unrealized performance when a test suite is used in practice. We therefore define bias and measure it for all grammar-based test suite construction algorithms we use in this thesis.AFRIKAANSE OPSOMMING: Foute in sagteware kan rampspoedige resultate hˆe in terme van beide eko nomiese koste en menselewens. Ontleders kan foute hˆe soos enige ander tipe sagteware en moet daarom deeglik getoets word om te verseker dat ’n ontleder sy beoogde taal akkuraat herken. Ontleders moet egter dikwels baie verskillende variasies en kombinasies van grammatikastrukture herken wat dit tydrowend en moeilik kan maak om toetsreekse met die hand te bou. Ons benodig dus outomatiese metodes van toetsreeks-konstruksie vir hierdie stelsels. Tans fokus die meeste toetsreeks-konstruksiealgoritmes op die grammatika wat die taal beskryf wat deur die ontleder herken moet word. In hierdie tesis volg ons ’n ander benadering. Ons beskou die LR-outomaat wat die teikentaal herken en gebruik die konteksinligting wat in die outomaat ge¹enkodeer is. Spesifiek, ons definieer ’n nuwe klas algoritme en dekkingskriteria oor ’n variant van die LR-outomaat wat ons definieer, wat ’n LR-grafiek genoem word. Ons definieer metodes om positiewe toetsreekse te konstrueer deur paaie oor hierdie LR-grafiek te gebruik, asook mutasies op geldige paaie om negatiewe toetsreekse te konstrueer. Ons evalueer die werkverrigting van ons nuwe algoritmes teenoor ander moderne algoritmes. Ons doen dit deur dekking wat oor verskeie stelsels behaal is, te vergelyk, sommige kleiner stelsels wat in ’n samestellerskursus op universiteitsvlak en ander groter werklike stelsels gebruik word. Ons vind goeie werkverrigting van ons algoritmes oor hierdie stelsels, in vergelyking met algoritmes wat toetsreekse van ekwivalente grootte produseer. Ons evaluering het ’n probleem in grammatika-gebaseerde toetsalgoritmes ontdek wat ons vooroordeel noem. Vooroordeel kan lei tot aansienlike variasie in dekking wat oor ’n stelsel behaal word, wat weer kan lei tot ’n gebrekkige vergelyking van twee algoritmes of ongerealiseerde prestasie wanneer ’n toets reeks in die praktyk gebruik word. Ons definieer dus vooroordeel en meet dit vir alle grammatika-gebaseerde toetsreeks-konstruksiealgoritmes wat ons in hierdie tesis gebruik.Master

    Evidence-driven testing and debugging of software systems

    Get PDF
    Program debugging is the process of testing, exposing, reproducing, diagnosing and fixing software bugs. Many techniques have been proposed to aid developers during software testing and debugging. However, researchers have found that developers hardly use or adopt the proposed techniques in software practice. Evidently, this is because there is a gap between proposed methods and the state of software practice. Most methods fail to address the actual needs of software developers. In this dissertation, we pose the following scientific question: How can we bridge the gap between software practice and the state-of-the-art automated testing and debugging techniques? To address this challenge, we put forward the following thesis: Software testing and debugging should be driven by empirical evidence collected from software practice. In particular, we posit that the feedback from software practice should shape and guide (the automation) of testing and debugging activities. In this thesis, we focus on gathering evidence from software practice by conducting several empirical studies on software testing and debugging activities in the real-world. We then build tools and methods that are well-grounded and driven by the empirical evidence obtained from these experiments. Firstly, we conduct an empirical study on the state of debugging in practice using a survey and a human study. In this study, we ask developers about their debugging needs and observe the tools and strategies employed by developers while testing, diagnosing and repairing real bugs. Secondly, we evaluate the effectiveness of the state-of-the-art automated fault localization (AFL) methods on real bugs and programs. Thirdly, we conducted an experiment to evaluate the causes of invalid inputs in software practice. Lastly, we study how to learn input distributions from real-world sample inputs, using probabilistic grammars. To bridge the gap between software practice and the state of the art in software testing and debugging, we proffer the following empirical results and techniques: (1) We collect evidence on the state of practice in program debugging and indeed, we found that there is a chasm between (available) debugging tools and developer needs. We elicit the actual needs and concerns of developers when testing and diagnosing real faults and provide a benchmark (called DBGBench) to aid the automated evaluation of debugging and repair tools. (2) We provide empirical evidence on the effectiveness of several state-of-the-art AFL techniques (such as statistical debugging formulas and dynamic slicing). Building on the obtained empirical evidence, we provide a hybrid approach that outperforms the state-of-the-art AFL techniques. (3) We evaluate the prevalence and causes of invalid inputs in software practice, and we build on the lessons learned from this experiment to build a general-purpose algorithm (called ddmax) that automatically diagnoses and repairs real-world invalid inputs. (4) We provide a method to learn the distribution of input elements in software practice using probabilistic grammars and we further employ the learned distribution to drive the test generation of inputs that are similar (or dissimilar) to sample inputs found in the wild. In summary, we propose an evidence-driven approach to software testing and debugging, which is based on collecting empirical evidence from software practice to guide and direct software testing and debugging. In our evaluation, we found that our approach is effective in improving the effectiveness of several debugging activities in practice. In particular, using our evidence-driven approach, we elicit the actual debugging needs of developers, improve the effectiveness of several automated fault localization techniques, effectively debug and repair invalid inputs, and generate test inputs that are (dis)similar to real-world inputs. Our proposed methods are built on empirical evidence and they improve over the state-of-the-art techniques in testing and debugging.Software-Debugging bezeichnet das Testen, AufspĂŒren, Reproduzieren, Diagnostizieren und das Beheben von Fehlern in Programmen. Es wurden bereits viele Debugging-Techniken vorgestellt, die Softwareentwicklern beim Testen und Debuggen unterstĂŒtzen. Dennoch hat sich in der Forschung gezeigt, dass Entwickler diese Techniken in der Praxis kaum anwenden oder adaptieren. Das könnte daran liegen, dass es einen großen Abstand zwischen den vorgestellten und in der Praxis tatsĂ€chlich genutzten Techniken gibt. Die meisten Techniken genĂŒgen den Anforderungen der Entwickler nicht. In dieser Dissertation stellen wir die folgende wissenschaftliche Frage: Wie können wir die Kluft zwischen Software-Praxis und den aktuellen wissenschaftlichen Techniken fĂŒr automatisiertes Testen und Debugging schließen? Um diese Herausforderung anzugehen, stellen wir die folgende These auf: Das Testen und Debuggen von Software sollte von empirischen Daten, die in der Software-Praxis gesammelt wurden, vorangetrieben werden. Genauer gesagt postulieren wir, dass das Feedback aus der Software-Praxis die Automation des Testens und Debuggens formen und bestimmen sollte. In dieser Arbeit fokussieren wir uns auf das Sammeln von Daten aus der Software-Praxis, indem wir einige empirische Studien ĂŒber das Testen und Debuggen von Software in der echten Welt durchfĂŒhren. Auf Basis der gesammelten Daten entwickeln wir dann Werkzeuge, die sich auf die Daten der durchgefĂŒhrten Experimente stĂŒtzen. Als erstes fĂŒhren wir eine empirische Studie ĂŒber den Stand des Debuggens in der Praxis durch, wobei wir eine Umfrage und eine Humanstudie nutzen. In dieser Studie befragen wir Entwickler zu ihren BedĂŒrfnissen, die sie beim Debuggen haben und beobachten die Werkzeuge und Strategien, die sie beim Diagnostizieren, Testen und AufspĂŒren echter Fehler einsetzen. Als nĂ€chstes bewerten wir die EffektivitĂ€t der aktuellen Automated Fault Localization (AFL)- Methoden zum automatischen AufspĂŒren von echten Fehlern in echten Programmen. Unser dritter Schritt ist ein Experiment, um die Ursachen von defekten Eingaben in der Software-Praxis zu ermitteln. Zuletzt erforschen wir, wie HĂ€ufigkeitsverteilungen von Teileingaben mithilfe einer Grammatik von echten Beispiel-Eingaben aus der Praxis gelernt werden können. Um die LĂŒcke zwischen Software-Praxis und der aktuellen Forschung ĂŒber Testen und Debuggen von Software zu schließen, bieten wir die folgenden empirischen Ergebnisse und Techniken: (1) Wir sammeln aktuelle Forschungsergebnisse zum Stand des Software-Debuggens und finden in der Tat eine Diskrepanz zwischen (vorhandenen) Debugging-Werkzeugen und dem, was der Entwickler tatsĂ€chlich benötigt. Wir sammeln die tatsĂ€chlichen BedĂŒrfnisse von Entwicklern beim Testen und Debuggen von Fehlern aus der echten Welt und entwickeln einen Benchmark (DbgBench), um das automatische Evaluieren von Debugging-Werkzeugen zu erleichtern. (2) Wir stellen empirische Daten zur EffektivitĂ€t einiger aktueller AFL-Techniken vor (z.B. Statistical Debugging-Formeln und Dynamic Slicing). Auf diese Daten aufbauend, stellen wir einen hybriden Algorithmus vor, der die Leistung der aktuellen AFL-Techniken ĂŒbertrifft. (3) Wir evaluieren die HĂ€ufigkeit und Ursachen von ungĂŒltigen Eingaben in der Softwarepraxis und stellen einen auf diesen Daten aufbauenden universell einsetzbaren Algorithmus (ddmax) vor, der automatisch defekte Eingaben diagnostiziert und behebt. (4) Wir stellen eine Methode vor, die Verteilung von Schnipseln von Eingaben in der Software-Praxis zu lernen, indem wir Grammatiken mit Wahrscheinlichkeiten nutzen. Die gelernten Verteilungen benutzen wir dann, um den Beispiel-Eingaben Ă€hnliche (oder verschiedene) Eingaben zu erzeugen. Zusammenfassend stellen wir einen auf der Praxis beruhenden Ansatz zum Testen und Debuggen von Software vor, welcher auf empirischen Daten aus der Software-Praxis basiert, um das Testen und Debuggen zu unterstĂŒtzen. In unserer Evaluierung haben wir festgestellt, dass unser Ansatz effektiv viele Debugging-Disziplinen in der Praxis verbessert. Genauer gesagt finden wir mit unserem Ansatz die genauen BedĂŒrfnisse von Entwicklern, verbessern die EffektivitĂ€t vieler AFL-Techniken, debuggen und beheben effektiv fehlerhafte Eingaben und generieren Test-Eingaben, die (un)Ă€hnlich zu Eingaben aus der echten Welt sind. Unsere vorgestellten Methoden basieren auf empirischen Daten und verbessern die aktuellen Techniken des Testens und Debuggens

    Search based algorithms for test sequence generation in functional testing

    Get PDF
    Information and Software Technology (DOI: 10.1016/j.infsof.2014.07.014)The generation of dynamic test sequences from a formal specification, complementing traditional testing methods in order to find errors in the source code. Objective In this paper we extend one specific combinatorial test approach, the Classification Tree Method (CTM), with transition information to generate test sequences. Although we use CTM, this extension is also possible for any combinatorial testing method. Method The generation of minimal test sequences that fulfill the demanded coverage criteria is an NP-hard problem. Therefore, search-based approaches are required to find such (near) optimal test sequences. Results The experimental analysis compares the search-based technique with a greedy algorithm on a set of 12 hierarchical concurrent models of programs extracted from the literature. Our proposed search-based approaches (GTSG and ACOts) are able to generate test sequences by finding the shortest valid path to achieve full class (state) and transition coverage. Conclusion The extended classification tree is useful for generating of test sequences. Moreover, the experimental analysis reveals that our search-based approaches are better than the greedy deterministic approach, especially in the most complex instances. All presented algorithms are actually integrated into a professional tool for functional testing.Spanish Ministry of Economy and Competitiveness and FEDER under contract TIN2011-28194 and fellowship BES-2012-055967. Project 8.06/5.47.4142 in collaboration with the VSB-Tech. Univ. of Ostrava, Universidad de MĂĄlaga, AndalucĂ­a Tech. and EU Grant ICT-257574 (FITTEST project)

    Grammar-based fuzzing using input features

    Get PDF
    In grammar-based fuzz testing, a formal grammar is used to produce test inputs that are syntactically valid in order to reach the business logic of a program under test. In this setting, it is advantageous to ensure a high diversity of inputs to test more of the program's behavior. How can we characterize features that make inputs diverse and associate them with the execution of particular parts of the program? Previous work does not answer this question to satisfaction, with most attempts mainly considering superficial features defined by the structure of the grammar such as the presence of production rules or terminal symbols, regardless of their context. We present a measure of input coverage called k-path coverage, which takes into account combinations of grammar entities up to a given context depth k, and makes it possible to efficiently express, assess, and achieve input diversity. In a series of experiments, we demonstrate and evaluate how to systematically attain k-path coverage, how it correlates with code coverage and can thus be used as its predictor. By automatically inferring explicit associations between k-path features and the coverage of individual methods we further show how to generate inputs that specifically target the execution of given code locations. We expect the presented instrument of k-paths to prove useful in numerous additional applications such as assessing the quality of grammars, serving as an adequacy criterion for input test suites, enabling test case prioritization, facilitating program comprehension, and perhaps beyond.Im Bereich des grammatik-basierten Fuzz-Testens benutzt man eine formale Grammatik, um Testeingaben zu produzieren, welche syntaktisch korrekt sind, mit dem Ziel die GeschĂ€ftslogik eines zu testenden Programms zu erreichen. DafĂŒr ist es vorteilhaft eine hohe DiversitĂ€t der Eingaben zu sichern, um mehr vom Verhalten des Programms testen zu können. Wie kann man Merkmale charakterisieren, die Eingaben vielfĂ€ltig machen und diese mit der AusfĂŒhrung bestimmter Programmteile in Verbindung bringen? Bisherige AnsĂ€tze liefern darauf keine ausreichende Antwort, denn meistens betrachten sie oberflĂ€chliche, durch die Grammatikstruktur definierte Merkmale, wie das Vorhandensein von Produktionsregeln oder Terminalen, unabhĂ€ngig von ihrem Verwendungskontext. Wir prĂ€sentieren ein Maß fĂŒr Eingabeabdeckung, genannt -path Abdeckung, welche Kombinationen von Grammatikelementen bis zu einer vorgegebenen Kontexttiefe berĂŒcksichtigt und es ermöglicht, die DiversitĂ€t von Eingaben effizient auszudrĂŒcken, zu bewerten und zu erzielen. Mit Experimenten zeigen und evaluieren wir, wie man gezielt -path Abdeckung erreicht und wie sie mit der Codeabdeckung zusammenhĂ€ngt und diese somit vorhersagen kann. Ferner zeigen wir wie automatisches Erlernen expliziter Assoziationen zwischen Merkmalen und der Abdeckung einzelner Methoden die Erzeugung von Eingaben ermöglicht, welche auf die AusfĂŒhrung bestimmter Codestellen abzielen. Wir rechnen damit, dass sich -paths als ein vielseitiges Instrument beweisen, dessen Anwendung ĂŒber solche Gebiete, wie z.B. Messung der QualitĂ€t von Grammatiken und Eingabe-Testsuiten, Testfallpriorisierung, oder Erleichterung von ProgrammverstĂ€ndnis, hinausgeht
    • 

    corecore