4,894 research outputs found

    Evidence-driven testing and debugging of software systems

    Get PDF
    Program debugging is the process of testing, exposing, reproducing, diagnosing and fixing software bugs. Many techniques have been proposed to aid developers during software testing and debugging. However, researchers have found that developers hardly use or adopt the proposed techniques in software practice. Evidently, this is because there is a gap between proposed methods and the state of software practice. Most methods fail to address the actual needs of software developers. In this dissertation, we pose the following scientific question: How can we bridge the gap between software practice and the state-of-the-art automated testing and debugging techniques? To address this challenge, we put forward the following thesis: Software testing and debugging should be driven by empirical evidence collected from software practice. In particular, we posit that the feedback from software practice should shape and guide (the automation) of testing and debugging activities. In this thesis, we focus on gathering evidence from software practice by conducting several empirical studies on software testing and debugging activities in the real-world. We then build tools and methods that are well-grounded and driven by the empirical evidence obtained from these experiments. Firstly, we conduct an empirical study on the state of debugging in practice using a survey and a human study. In this study, we ask developers about their debugging needs and observe the tools and strategies employed by developers while testing, diagnosing and repairing real bugs. Secondly, we evaluate the effectiveness of the state-of-the-art automated fault localization (AFL) methods on real bugs and programs. Thirdly, we conducted an experiment to evaluate the causes of invalid inputs in software practice. Lastly, we study how to learn input distributions from real-world sample inputs, using probabilistic grammars. To bridge the gap between software practice and the state of the art in software testing and debugging, we proffer the following empirical results and techniques: (1) We collect evidence on the state of practice in program debugging and indeed, we found that there is a chasm between (available) debugging tools and developer needs. We elicit the actual needs and concerns of developers when testing and diagnosing real faults and provide a benchmark (called DBGBench) to aid the automated evaluation of debugging and repair tools. (2) We provide empirical evidence on the effectiveness of several state-of-the-art AFL techniques (such as statistical debugging formulas and dynamic slicing). Building on the obtained empirical evidence, we provide a hybrid approach that outperforms the state-of-the-art AFL techniques. (3) We evaluate the prevalence and causes of invalid inputs in software practice, and we build on the lessons learned from this experiment to build a general-purpose algorithm (called ddmax) that automatically diagnoses and repairs real-world invalid inputs. (4) We provide a method to learn the distribution of input elements in software practice using probabilistic grammars and we further employ the learned distribution to drive the test generation of inputs that are similar (or dissimilar) to sample inputs found in the wild. In summary, we propose an evidence-driven approach to software testing and debugging, which is based on collecting empirical evidence from software practice to guide and direct software testing and debugging. In our evaluation, we found that our approach is effective in improving the effectiveness of several debugging activities in practice. In particular, using our evidence-driven approach, we elicit the actual debugging needs of developers, improve the effectiveness of several automated fault localization techniques, effectively debug and repair invalid inputs, and generate test inputs that are (dis)similar to real-world inputs. Our proposed methods are built on empirical evidence and they improve over the state-of-the-art techniques in testing and debugging.Software-Debugging bezeichnet das Testen, AufspĂŒren, Reproduzieren, Diagnostizieren und das Beheben von Fehlern in Programmen. Es wurden bereits viele Debugging-Techniken vorgestellt, die Softwareentwicklern beim Testen und Debuggen unterstĂŒtzen. Dennoch hat sich in der Forschung gezeigt, dass Entwickler diese Techniken in der Praxis kaum anwenden oder adaptieren. Das könnte daran liegen, dass es einen großen Abstand zwischen den vorgestellten und in der Praxis tatsĂ€chlich genutzten Techniken gibt. Die meisten Techniken genĂŒgen den Anforderungen der Entwickler nicht. In dieser Dissertation stellen wir die folgende wissenschaftliche Frage: Wie können wir die Kluft zwischen Software-Praxis und den aktuellen wissenschaftlichen Techniken fĂŒr automatisiertes Testen und Debugging schließen? Um diese Herausforderung anzugehen, stellen wir die folgende These auf: Das Testen und Debuggen von Software sollte von empirischen Daten, die in der Software-Praxis gesammelt wurden, vorangetrieben werden. Genauer gesagt postulieren wir, dass das Feedback aus der Software-Praxis die Automation des Testens und Debuggens formen und bestimmen sollte. In dieser Arbeit fokussieren wir uns auf das Sammeln von Daten aus der Software-Praxis, indem wir einige empirische Studien ĂŒber das Testen und Debuggen von Software in der echten Welt durchfĂŒhren. Auf Basis der gesammelten Daten entwickeln wir dann Werkzeuge, die sich auf die Daten der durchgefĂŒhrten Experimente stĂŒtzen. Als erstes fĂŒhren wir eine empirische Studie ĂŒber den Stand des Debuggens in der Praxis durch, wobei wir eine Umfrage und eine Humanstudie nutzen. In dieser Studie befragen wir Entwickler zu ihren BedĂŒrfnissen, die sie beim Debuggen haben und beobachten die Werkzeuge und Strategien, die sie beim Diagnostizieren, Testen und AufspĂŒren echter Fehler einsetzen. Als nĂ€chstes bewerten wir die EffektivitĂ€t der aktuellen Automated Fault Localization (AFL)- Methoden zum automatischen AufspĂŒren von echten Fehlern in echten Programmen. Unser dritter Schritt ist ein Experiment, um die Ursachen von defekten Eingaben in der Software-Praxis zu ermitteln. Zuletzt erforschen wir, wie HĂ€ufigkeitsverteilungen von Teileingaben mithilfe einer Grammatik von echten Beispiel-Eingaben aus der Praxis gelernt werden können. Um die LĂŒcke zwischen Software-Praxis und der aktuellen Forschung ĂŒber Testen und Debuggen von Software zu schließen, bieten wir die folgenden empirischen Ergebnisse und Techniken: (1) Wir sammeln aktuelle Forschungsergebnisse zum Stand des Software-Debuggens und finden in der Tat eine Diskrepanz zwischen (vorhandenen) Debugging-Werkzeugen und dem, was der Entwickler tatsĂ€chlich benötigt. Wir sammeln die tatsĂ€chlichen BedĂŒrfnisse von Entwicklern beim Testen und Debuggen von Fehlern aus der echten Welt und entwickeln einen Benchmark (DbgBench), um das automatische Evaluieren von Debugging-Werkzeugen zu erleichtern. (2) Wir stellen empirische Daten zur EffektivitĂ€t einiger aktueller AFL-Techniken vor (z.B. Statistical Debugging-Formeln und Dynamic Slicing). Auf diese Daten aufbauend, stellen wir einen hybriden Algorithmus vor, der die Leistung der aktuellen AFL-Techniken ĂŒbertrifft. (3) Wir evaluieren die HĂ€ufigkeit und Ursachen von ungĂŒltigen Eingaben in der Softwarepraxis und stellen einen auf diesen Daten aufbauenden universell einsetzbaren Algorithmus (ddmax) vor, der automatisch defekte Eingaben diagnostiziert und behebt. (4) Wir stellen eine Methode vor, die Verteilung von Schnipseln von Eingaben in der Software-Praxis zu lernen, indem wir Grammatiken mit Wahrscheinlichkeiten nutzen. Die gelernten Verteilungen benutzen wir dann, um den Beispiel-Eingaben Ă€hnliche (oder verschiedene) Eingaben zu erzeugen. Zusammenfassend stellen wir einen auf der Praxis beruhenden Ansatz zum Testen und Debuggen von Software vor, welcher auf empirischen Daten aus der Software-Praxis basiert, um das Testen und Debuggen zu unterstĂŒtzen. In unserer Evaluierung haben wir festgestellt, dass unser Ansatz effektiv viele Debugging-Disziplinen in der Praxis verbessert. Genauer gesagt finden wir mit unserem Ansatz die genauen BedĂŒrfnisse von Entwicklern, verbessern die EffektivitĂ€t vieler AFL-Techniken, debuggen und beheben effektiv fehlerhafte Eingaben und generieren Test-Eingaben, die (un)Ă€hnlich zu Eingaben aus der echten Welt sind. Unsere vorgestellten Methoden basieren auf empirischen Daten und verbessern die aktuellen Techniken des Testens und Debuggens

    Scenic: A Language for Scenario Specification and Scene Generation

    Full text link
    We propose a new probabilistic programming language for the design and analysis of perception systems, especially those based on machine learning. Specifically, we consider the problems of training a perception system to handle rare events, testing its performance under different conditions, and debugging failures. We show how a probabilistic programming language can help address these problems by specifying distributions encoding interesting types of inputs and sampling these to generate specialized training and test sets. More generally, such languages can be used for cyber-physical systems and robotics to write environment models, an essential prerequisite to any formal analysis. In this paper, we focus on systems like autonomous cars and robots, whose environment is a "scene", a configuration of physical objects and agents. We design a domain-specific language, Scenic, for describing "scenarios" that are distributions over scenes. As a probabilistic programming language, Scenic allows assigning distributions to features of the scene, as well as declaratively imposing hard and soft constraints over the scene. We develop specialized techniques for sampling from the resulting distribution, taking advantage of the structure provided by Scenic's domain-specific syntax. Finally, we apply Scenic in a case study on a convolutional neural network designed to detect cars in road images, improving its performance beyond that achieved by state-of-the-art synthetic data generation methods.Comment: 41 pages, 36 figures. Full version of a PLDI 2019 paper (extending UC Berkeley EECS Department Tech Report No. UCB/EECS-2018-8

    Overcoming Language Dichotomies: Toward Effective Program Comprehension for Mobile App Development

    Full text link
    Mobile devices and platforms have become an established target for modern software developers due to performant hardware and a large and growing user base numbering in the billions. Despite their popularity, the software development process for mobile apps comes with a set of unique, domain-specific challenges rooted in program comprehension. Many of these challenges stem from developer difficulties in reasoning about different representations of a program, a phenomenon we define as a "language dichotomy". In this paper, we reflect upon the various language dichotomies that contribute to open problems in program comprehension and development for mobile apps. Furthermore, to help guide the research community towards effective solutions for these problems, we provide a roadmap of directions for future work.Comment: Invited Keynote Paper for the 26th IEEE/ACM International Conference on Program Comprehension (ICPC'18

    Applying Formal Methods to Networking: Theory, Techniques and Applications

    Full text link
    Despite its great importance, modern network infrastructure is remarkable for the lack of rigor in its engineering. The Internet which began as a research experiment was never designed to handle the users and applications it hosts today. The lack of formalization of the Internet architecture meant limited abstractions and modularity, especially for the control and management planes, thus requiring for every new need a new protocol built from scratch. This led to an unwieldy ossified Internet architecture resistant to any attempts at formal verification, and an Internet culture where expediency and pragmatism are favored over formal correctness. Fortunately, recent work in the space of clean slate Internet design---especially, the software defined networking (SDN) paradigm---offers the Internet community another chance to develop the right kind of architecture and abstractions. This has also led to a great resurgence in interest of applying formal methods to specification, verification, and synthesis of networking protocols and applications. In this paper, we present a self-contained tutorial of the formidable amount of work that has been done in formal methods, and present a survey of its applications to networking.Comment: 30 pages, submitted to IEEE Communications Surveys and Tutorial

    Help, help, Im being suppressed the significance of suppressors in software testing

    Get PDF
    pre-printAbstract-Test features are basic compositional units used to describe what a test does (and does not) involve. For example, in API-based testing, the most obvious features are function calls; in grammar-based testing, the obvious features are the elements of the grammar. The relationship between features as abstractions of tests and produced behaviors of the tested program is surprisingly poorly understood. This paper shows how large-scale random testing modified to use diverse feature sets can uncover causal relationships between what a test contains and what the program being tested does. We introduce a general notion of observable behaviors as targets, where a target can be a detected fault, an executed branch or statement, or a complex coverage entity such as a state, predicate-valuation, or program path. While it is obvious that targets have triggers - features without which they cannot be hit by a test - the notion of suppressors - features which make a test less likely to hit a target - has received little attention despite having important implications for automated test generation and program understanding. For a set of subjects including C compilers, a flash file system, and JavaScript engines, we show that suppression is both common and important

    Dynamically fighting bugs : prevention, detection and elimination

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 147-160).This dissertation presents three test-generation techniques that are used to improve software quality. Each of our techniques targets bugs that are found by different stake-holders: developers, testers, and maintainers. We implemented and evaluated our techniques on real code. We present the design of each tool and conduct experimental evaluation of the tools with available alternatives. Developers need to prevent regression errors when they create new functionality. This dissertation presents a technique that helps developers prevent regression errors in object-oriented programs by automatically generating unit-level regression tests. Our technique generates regressions tests by using models created dynamically from example executions. In our evaluation, our technique created effective regression tests, and achieved good coverage even for programs with constrained APIs. Testers need to detect bugs in programs. This dissertation presents a technique that helps testers detect and localize bugs in web applications. Our technique automatically creates tests that expose failures by combining dynamic test generation with explicit state model checking. In our evaluation, our technique discovered hundreds of faults in real applications. Maintainers have to reproduce failing executions in order to eliminate bugs found in deployed programs. This dissertation presents a technique that helps maintainers eliminate bugs by generating tests that reproduce failing executions. Our technique automatically generates tests that reproduce the failed executions by monitoring methods and storing optimized states of method arguments.(cont.) In our evaluation, our technique reproduced failures with low overhead in real programs Analyses need to avoid unnecessary computations in order to scale. This dissertation presents a technique that helps our other techniques to scale by inferring the mutability classification of arguments. Our technique classifies mutability by combining both static analyses and a novel dynamic mutability analysis. In our evaluation, our technique efficiently and correctly classified most of the arguments for programs with more than hundred thousand lines of code.by Shay Artzi.Ph.D

    Models and metaphors: complexity theory and through-life management in the built environment

    Get PDF
    Complexity thinking may have both modelling and metaphorical applications in the through-life management of the built environment. These two distinct approaches are examined and compared. In the first instance, some of the sources of complexity in the design, construction and maintenance of the built environment are identified. The metaphorical use of complexity in management thinking and its application in the built environment are briefly examined. This is followed by an exploration of modelling techniques relevant to built environment concerns. Non-linear and complex mathematical techniques such as fuzzy logic, cellular automata and attractors, may be applicable to their analysis. Existing software tools are identified and examples of successful built environment applications of complexity modelling are given. Some issues that arise include the definition of phenomena in a mathematically usable way, the functionality of available software and the possibility of going beyond representational modelling. Further questions arising from the application of complexity thinking are discussed, including the possibilities for confusion that arise from the use of metaphor. The metaphor of a 'commentary machine' is suggested as a possible way forward and it is suggested that an appropriate linguistic analysis can in certain situations reduce perceived complexity
    • 

    corecore