1,481 research outputs found

    A comparison of tree- and line-oriented observational slicing

    Get PDF
    Observation-based slicing and its generalization observational slicing are recently-introduced, language-independent dynamic slicing techniques. They both construct slices based on the dependencies observed during program execution, rather than static or dynamic dependence analysis. The original implementation of the observation-based slicing algorithm used lines of source code as its program representation. A recent variation, developed to slice modelling languages (such as Simulink), used an XML representation of an executable model. We ported the XML slicer to source code by constructing a tree representation of traditional source code through the use of srcML. This work compares the tree- and line-based slicers using four experiments involving twenty different programs, ranging from classic benchmarks to million-line production systems. The resulting slices are essentially the same size for the majority of the programs and are often identical. However, structural constraints imposed by the tree representation sometimes force the slicer to retain enclosing control structures. It can also “bog down” trying to delete single-token subtrees. This occasionally makes the tree-based slices larger and the tree-based slicer slower than a parallelised version of the line-based slicer. In addition, a Java versus C comparison finds that the two languages lead to similar slices, but Java code takes noticeably longer to slice. The initial experiments suggest two improvements to the tree-based slicer: the addition of a size threshold, for ignoring small subtrees, and subtree replacement. The former enables the slicer to run 3.4 times faster while producing slices that are only about 9% larger. At the same time the subtree replacement reduces size by about 8–12% and allows the tree-based slicer to produce more natural slices

    Evaluating Lexical Approximation of Program Dependence

    Get PDF
    Complex dependence analysis typically provides an underpinning approximation of true program dependence. We investigate the effectiveness of using lexical information to approximate such dependence, introducing two new deletion operators to Observation-Based Slicing (ORBS). ORBS provides direct observation of program dependence, computing a slice using iterative, speculative deletion of program parts. Deletions become permanent if they do not affect the slicing criterion. The original ORBS uses a bounded deletion window operator that attempts to delete consecutive lines together. Our new deletion operators attempt to delete multiple, non-contiguous lines that are lexically similar to each other. We evaluate the lexical dependence approximation by exploring the trade-off between the precision and the speed of dependence analysis performed with new deletion operators. The deletion operators are evaluated independently, as well as collectively via a novel generalization of ORBS that exploits multiple deletion operators: Multi-operator Observation-Based Slicing (MOBS). An empirical evaluation using three Java projects, six C projects, and one multi-lingual project written in Python and C finds that the lexical information provides a useful approximation to the underlying dependence. On average, MOBS can delete 69% of lines deleted by the original ORBS, while taking only 36% of the wall clock time required by ORBS

    Dynamic Slicing by On-demand Re-execution

    Full text link
    In this paper, we propose a novel approach that aims to offer an alternative to the prevalent paradigm to dynamic slicing construction. Dynamic slicing requires dynamic data and control dependencies that arise in an execution. During a single execution, memory reference information is recorded and then traversed to extract dependencies. Execute-once approaches and tools are challenged even by executions of moderate size of simple and short programs. We propose to shift practical time complexity from execution size to slice size. In particular, our approach executes the program multiple times while tracking targeted information at each execution. We present a concrete algorithm that follows an on-demand re-execution paradigm that uses a novel concept of frontier dependency to incrementally build a dynamic slice. To focus dependency tracking, the algorithm relies on static analysis. We show results of an evaluation on the SV-COMP benchmark and Antrl4 unit tests that provide evidence that on-demand re-execution can provide performance gains particularly when slice size is small and execution size is large

    Generalized observational slicing for tree-represented modelling languages

    Get PDF
    Model-driven software engineering raises the abstraction level making complex systems easier to understand than if written in textual code. Nevertheless, large complicated software systems can have large models, motivating the need for slicing techniques that reduce the size of a model. We present a generalization of observation-based slicing that allows the criterion to be defined using a variety of kinds of observable behavior and does not require any complex dependence analysis. We apply our implementation of generalized observational slicing for tree-structured representations to Simulink models. The resulting slice might be the subset of the original model responsible for an observed failure or simply the sub-model semantically related to a classic slicing criterion. Unlike its predecessors, the algorithm is also capable of slicing embedded Stateflow state machines. A study of nine real-world models drawn from four different application domains demonstrates the effectiveness of our approach at dramatically reducing Simulink model sizes for realistic observation scenarios: for 9 out of 20 cases, the resulting model has fewer than 25% of the original model's elements

    Slicing of Concurrent Programs and its Application to Information Flow Control

    Get PDF
    This thesis presents a practical technique for information flow control for concurrent programs with threads and shared-memory communication. The technique guarantees confidentiality of information with respect to a reasonable attacker model and utilizes program dependence graphs (PDGs), a language-independent representation of information flow in a program

    Evidence-driven testing and debugging of software systems

    Get PDF
    Program debugging is the process of testing, exposing, reproducing, diagnosing and fixing software bugs. Many techniques have been proposed to aid developers during software testing and debugging. However, researchers have found that developers hardly use or adopt the proposed techniques in software practice. Evidently, this is because there is a gap between proposed methods and the state of software practice. Most methods fail to address the actual needs of software developers. In this dissertation, we pose the following scientific question: How can we bridge the gap between software practice and the state-of-the-art automated testing and debugging techniques? To address this challenge, we put forward the following thesis: Software testing and debugging should be driven by empirical evidence collected from software practice. In particular, we posit that the feedback from software practice should shape and guide (the automation) of testing and debugging activities. In this thesis, we focus on gathering evidence from software practice by conducting several empirical studies on software testing and debugging activities in the real-world. We then build tools and methods that are well-grounded and driven by the empirical evidence obtained from these experiments. Firstly, we conduct an empirical study on the state of debugging in practice using a survey and a human study. In this study, we ask developers about their debugging needs and observe the tools and strategies employed by developers while testing, diagnosing and repairing real bugs. Secondly, we evaluate the effectiveness of the state-of-the-art automated fault localization (AFL) methods on real bugs and programs. Thirdly, we conducted an experiment to evaluate the causes of invalid inputs in software practice. Lastly, we study how to learn input distributions from real-world sample inputs, using probabilistic grammars. To bridge the gap between software practice and the state of the art in software testing and debugging, we proffer the following empirical results and techniques: (1) We collect evidence on the state of practice in program debugging and indeed, we found that there is a chasm between (available) debugging tools and developer needs. We elicit the actual needs and concerns of developers when testing and diagnosing real faults and provide a benchmark (called DBGBench) to aid the automated evaluation of debugging and repair tools. (2) We provide empirical evidence on the effectiveness of several state-of-the-art AFL techniques (such as statistical debugging formulas and dynamic slicing). Building on the obtained empirical evidence, we provide a hybrid approach that outperforms the state-of-the-art AFL techniques. (3) We evaluate the prevalence and causes of invalid inputs in software practice, and we build on the lessons learned from this experiment to build a general-purpose algorithm (called ddmax) that automatically diagnoses and repairs real-world invalid inputs. (4) We provide a method to learn the distribution of input elements in software practice using probabilistic grammars and we further employ the learned distribution to drive the test generation of inputs that are similar (or dissimilar) to sample inputs found in the wild. In summary, we propose an evidence-driven approach to software testing and debugging, which is based on collecting empirical evidence from software practice to guide and direct software testing and debugging. In our evaluation, we found that our approach is effective in improving the effectiveness of several debugging activities in practice. In particular, using our evidence-driven approach, we elicit the actual debugging needs of developers, improve the effectiveness of several automated fault localization techniques, effectively debug and repair invalid inputs, and generate test inputs that are (dis)similar to real-world inputs. Our proposed methods are built on empirical evidence and they improve over the state-of-the-art techniques in testing and debugging.Software-Debugging bezeichnet das Testen, Aufspüren, Reproduzieren, Diagnostizieren und das Beheben von Fehlern in Programmen. Es wurden bereits viele Debugging-Techniken vorgestellt, die Softwareentwicklern beim Testen und Debuggen unterstützen. Dennoch hat sich in der Forschung gezeigt, dass Entwickler diese Techniken in der Praxis kaum anwenden oder adaptieren. Das könnte daran liegen, dass es einen großen Abstand zwischen den vorgestellten und in der Praxis tatsächlich genutzten Techniken gibt. Die meisten Techniken genügen den Anforderungen der Entwickler nicht. In dieser Dissertation stellen wir die folgende wissenschaftliche Frage: Wie können wir die Kluft zwischen Software-Praxis und den aktuellen wissenschaftlichen Techniken für automatisiertes Testen und Debugging schließen? Um diese Herausforderung anzugehen, stellen wir die folgende These auf: Das Testen und Debuggen von Software sollte von empirischen Daten, die in der Software-Praxis gesammelt wurden, vorangetrieben werden. Genauer gesagt postulieren wir, dass das Feedback aus der Software-Praxis die Automation des Testens und Debuggens formen und bestimmen sollte. In dieser Arbeit fokussieren wir uns auf das Sammeln von Daten aus der Software-Praxis, indem wir einige empirische Studien über das Testen und Debuggen von Software in der echten Welt durchführen. Auf Basis der gesammelten Daten entwickeln wir dann Werkzeuge, die sich auf die Daten der durchgeführten Experimente stützen. Als erstes führen wir eine empirische Studie über den Stand des Debuggens in der Praxis durch, wobei wir eine Umfrage und eine Humanstudie nutzen. In dieser Studie befragen wir Entwickler zu ihren Bedürfnissen, die sie beim Debuggen haben und beobachten die Werkzeuge und Strategien, die sie beim Diagnostizieren, Testen und Aufspüren echter Fehler einsetzen. Als nächstes bewerten wir die Effektivität der aktuellen Automated Fault Localization (AFL)- Methoden zum automatischen Aufspüren von echten Fehlern in echten Programmen. Unser dritter Schritt ist ein Experiment, um die Ursachen von defekten Eingaben in der Software-Praxis zu ermitteln. Zuletzt erforschen wir, wie Häufigkeitsverteilungen von Teileingaben mithilfe einer Grammatik von echten Beispiel-Eingaben aus der Praxis gelernt werden können. Um die Lücke zwischen Software-Praxis und der aktuellen Forschung über Testen und Debuggen von Software zu schließen, bieten wir die folgenden empirischen Ergebnisse und Techniken: (1) Wir sammeln aktuelle Forschungsergebnisse zum Stand des Software-Debuggens und finden in der Tat eine Diskrepanz zwischen (vorhandenen) Debugging-Werkzeugen und dem, was der Entwickler tatsächlich benötigt. Wir sammeln die tatsächlichen Bedürfnisse von Entwicklern beim Testen und Debuggen von Fehlern aus der echten Welt und entwickeln einen Benchmark (DbgBench), um das automatische Evaluieren von Debugging-Werkzeugen zu erleichtern. (2) Wir stellen empirische Daten zur Effektivität einiger aktueller AFL-Techniken vor (z.B. Statistical Debugging-Formeln und Dynamic Slicing). Auf diese Daten aufbauend, stellen wir einen hybriden Algorithmus vor, der die Leistung der aktuellen AFL-Techniken übertrifft. (3) Wir evaluieren die Häufigkeit und Ursachen von ungültigen Eingaben in der Softwarepraxis und stellen einen auf diesen Daten aufbauenden universell einsetzbaren Algorithmus (ddmax) vor, der automatisch defekte Eingaben diagnostiziert und behebt. (4) Wir stellen eine Methode vor, die Verteilung von Schnipseln von Eingaben in der Software-Praxis zu lernen, indem wir Grammatiken mit Wahrscheinlichkeiten nutzen. Die gelernten Verteilungen benutzen wir dann, um den Beispiel-Eingaben ähnliche (oder verschiedene) Eingaben zu erzeugen. Zusammenfassend stellen wir einen auf der Praxis beruhenden Ansatz zum Testen und Debuggen von Software vor, welcher auf empirischen Daten aus der Software-Praxis basiert, um das Testen und Debuggen zu unterstützen. In unserer Evaluierung haben wir festgestellt, dass unser Ansatz effektiv viele Debugging-Disziplinen in der Praxis verbessert. Genauer gesagt finden wir mit unserem Ansatz die genauen Bedürfnisse von Entwicklern, verbessern die Effektivität vieler AFL-Techniken, debuggen und beheben effektiv fehlerhafte Eingaben und generieren Test-Eingaben, die (un)ähnlich zu Eingaben aus der echten Welt sind. Unsere vorgestellten Methoden basieren auf empirischen Daten und verbessern die aktuellen Techniken des Testens und Debuggens

    Tree-oriented vs. line-oriented Observation-Based Slicing

    Get PDF
    Observation-based slicing is a recently-introduced, language-independent slicing technique based on the dependencies observable from program behavior. The original algorithm processed traditional source code at the line-of-text level. A recent variation was developed to slice the tree-based XML representation of executable models. We ported the model slicer to source code using srcML to construct a tree-based representation of traditional source code. We present the results of a comparison of the two slicers using four experiments involving seventeen different programs, including classic benchmarks and larger production systems. The resulting slices had essentially the same size and quite often the same content. Where they differ, the use of tree structure traded an ability to remove unnecessary parts of a statement for the requirement of maintaining aspect of the code structure. Comparing the slicers finds that each has its advantages. For example, when the tree representation facilitates the deletion of large chunks of code, the tree slicer was over eight times faster. In contrast, when slicing C++ code it was over nine times slower because of the multitude of small trees created to support C++ syntax. Given the pros and cons of the two, the results suggest the value of their hybrid combination

    Localizing State-Dependent Faults Using Associated Sequence Mining

    Get PDF
    In this thesis we developed a new fault localization process to localize faults in object oriented software. The process is built upon the Encapsulation\u27\u27 principle and aims to locate state-dependent discrepancies in the software\u27s behavior. We experimented with the proposed process on 50 seeded faults in 8 subject programs, and were able to locate the faulty class in 100% of the cases when objects with constant states were taken into consideration, while we missed 24% percent of the faults when these objects were not considered. We also developed a customized data mining technique Associated sequence mining\u27\u27 to be used in the localization process; experiments showed that it only provided slight enhancement to the result of the process. The customization provided at least 17% enhancement in the time performance and it is generic enough to be applicable in other domains. In addition to that we have developed an extensive taxonomy for object-oriented software faults based on UML models. We used the taxonomy to make decisions regarding the localization process. It provides an aid for understanding the nature of software faults, and will help enhance the different tasks related to software quality assurance. The main contributions of the thesis were based on preliminary experimentation on the usability of the classification algorithms implemented in WEKA in software fault localization, which resulted in the conclusion that both the fault type and the mechanism implemented in the analysis algorithm were significant to affect the results of the localization

    Non-gaussianity and Statistical Anisotropy in Cosmological Inflationary Models

    Full text link
    We study the statistical descriptors for some cosmological inflationary models that allow us to get large levels of non-gaussianity and violations of statistical isotropy. Basically, we study two different class of models: a model that include only scalar field perturbations, specifically a subclass of small-field slow-roll models of inflation with canonical kinetic terms, and models that admit both vector and scalar field perturbations. We study the former to show that it is possible to attain very high, including observable, values for the levels of non-gaussianity f_{NL} and \tao_{NL} in the bispectrum B_\zeta and trispectrum T_\zeta of the primordial curvature perturbation \zeta respectively. Such a result is obtained by taking care of loop corrections in the spectrum P_\zeta, the bispectrum B_\zeta and the trispectrum T_\zeta . Sizeable values for f_{NL} and \tao_{NL} arise even if \zeta is generated during inflation. For the latter we study the spectrum P_\zeta, bispectrum B_\zeta and trispectrum $T_\zeta of the primordial curvature perturbation when \zeta is generated by scalar and vector field perturbations. The tree-level and one-loop contributions from vector field perturbations are worked out considering the possibility that the one-loop contributions may be dominant over the tree level terms. The levels of non-gaussianity f_{NL} and \tao_{NL}, are calculated and related to the level of statistical anisotropy in the power spectrum, g_\zeta . For very small amounts of statistical anisotropy in the power spectrum, the levels of non-gaussianity may be very high, in some cases exceeding the current observational limit.Comment: Latex file, 113 pages. PhD Thesis. Supervisor: Yeinzon Rodriguez
    corecore