444 research outputs found

    A research review of quality assessment for software

    Get PDF
    Measures were recommended to assess the quality of software submitted to the AdaNet program. The quality factors that are important to software reuse are explored and methods of evaluating those factors are discussed. Quality factors important to software reuse are: correctness, reliability, verifiability, understandability, modifiability, and certifiability. Certifiability is included because the documentation of many factors about a software component such as its efficiency, portability, and development history, constitute a class for factors important to some users, not important at all to other, and impossible for AdaNet to distinguish between a priori. The quality factors may be assessed in different ways. There are a few quantitative measures which have been shown to indicate software quality. However, it is believed that there exists many factors that indicate quality and have not been empirically validated due to their subjective nature. These subjective factors are characterized by the way in which they support the software engineering principles of abstraction, information hiding, modularity, localization, confirmability, uniformity, and completeness

    A Fault-Based Model of Fault Localization Techniques

    Get PDF
    Every day, ordinary people depend on software working properly. We take it for granted; from banking software, to railroad switching software, to flight control software, to software that controls medical devices such as pacemakers or even gas pumps, our lives are touched by software that we expect to work. It is well known that the main technique/activity used to ensure the quality of software is testing. Often it is the only quality assurance activity undertaken, making it that much more important. In a typical experiment studying these techniques, a researcher will intentionally seed a fault (intentionally breaking the functionality of some source code) with the hopes that the automated techniques under study will be able to identify the fault\u27s location in the source code. These faults are picked arbitrarily; there is potential for bias in the selection of the faults. Previous researchers have established an ontology for understanding or expressing this bias called fault size. This research captures the fault size ontology in the form of a probabilistic model. The results of applying this model to measure fault size suggest that many faults generated through program mutation (the systematic replacement of source code operators to create faults) are very large and easily found. Secondary measures generated in the assessment of the model suggest a new static analysis method, called testability, for predicting the likelihood that code will contain a fault in the future. While software testing researchers are not statisticians, they nonetheless make extensive use of statistics in their experiments to assess fault localization techniques. Researchers often select their statistical techniques without justification. This is a very worrisome situation because it can lead to incorrect conclusions about the significance of research. This research introduces an algorithm, MeansTest, which helps automate some aspects of the selection of appropriate statistical techniques. The results of an evaluation of MeansTest suggest that MeansTest performs well relative to its peers. This research then surveys recent work in software testing using MeansTest to evaluate the significance of researchers\u27 work. The results of the survey indicate that software testing researchers are underreporting the significance of their work

    A survey on software testability

    Full text link
    Context: Software testability is the degree to which a software system or a unit under test supports its own testing. To predict and improve software testability, a large number of techniques and metrics have been proposed by both practitioners and researchers in the last several decades. Reviewing and getting an overview of the entire state-of-the-art and state-of-the-practice in this area is often challenging for a practitioner or a new researcher. Objective: Our objective is to summarize the body of knowledge in this area and to benefit the readers (both practitioners and researchers) in preparing, measuring and improving software testability. Method: To address the above need, the authors conducted a survey in the form of a systematic literature mapping (classification) to find out what we as a community know about this topic. After compiling an initial pool of 303 papers, and applying a set of inclusion/exclusion criteria, our final pool included 208 papers. Results: The area of software testability has been comprehensively studied by researchers and practitioners. Approaches for measurement of testability and improvement of testability are the most-frequently addressed in the papers. The two most often mentioned factors affecting testability are observability and controllability. Common ways to improve testability are testability transformation, improving observability, adding assertions, and improving controllability. Conclusion: This paper serves for both researchers and practitioners as an "index" to the vast body of knowledge in the area of testability. The results could help practitioners measure and improve software testability in their projects

    Automated Fixing of Programs with Contracts

    Full text link
    This paper describes AutoFix, an automatic debugging technique that can fix faults in general-purpose software. To provide high-quality fix suggestions and to enable automation of the whole debugging process, AutoFix relies on the presence of simple specification elements in the form of contracts (such as pre- and postconditions). Using contracts enhances the precision of dynamic analysis techniques for fault detection and localization, and for validating fixes. The only required user input to the AutoFix supporting tool is then a faulty program annotated with contracts; the tool produces a collection of validated fixes for the fault ranked according to an estimate of their suitability. In an extensive experimental evaluation, we applied AutoFix to over 200 faults in four code bases of different maturity and quality (of implementation and of contracts). AutoFix successfully fixed 42% of the faults, producing, in the majority of cases, corrections of quality comparable to those competent programmers would write; the used computational resources were modest, with an average time per fix below 20 minutes on commodity hardware. These figures compare favorably to the state of the art in automated program fixing, and demonstrate that the AutoFix approach is successfully applicable to reduce the debugging burden in real-world scenarios.Comment: Minor changes after proofreadin

    Basic block coverage for search-based unit testing and crash reproduction

    Get PDF
    Search-based techniques have been widely used for white-box test generation. Many of these approaches rely on the approach level and branch distance heuristics to guide the search process and generate test cases with high line and branch coverage. Despite the positive results achieved by these two heuristics, they only use the information related to the coverage of explicit branches (e.g., indicated by conditional and loop statements), but ignore potential implicit branchings within basic blocks of code. If such implicit branching happens at runtime (e.g., if an exception is thrown in a branchless-method), the existing fitness functions cannot guide the search process. To address this issue, we introduce a new secondary objective, called Basic Block Coverage (BBC), which takes into account the coverage level of relevant basic blocks in the control flow graph. We evaluated the impact of BBC on search-based unit test generation (using the DynaMOSA algorithm) and search-based crash reproduction (using the STDistance and WeightedSum fitness functions). Our results show that for unit test generation, BBC improves the branch coverage of the generated tests. Although small (∼ 1.5%), this improvement in the branch coverage is systematic and leads to an increase of the output domain coverage and implicit runtime exception coverage, and of the diversity of runtime states. In terms of crash reproduction, in the combination of STDistance and WeightedSum, BBC helps in reproducing 3 new crashes for each fitness function. BBC significantly decreases the time required to reproduce 43.5% and 45.1% of the crashes using STDistance and WeightedSum, respectively. For these crashes, BBC reduces the consumed time by 71.7% (for STDistance) and 68.7% (for WeightedSum) on average.Software Engineerin

    An information theoretic notion of software testability

    Get PDF
    CONTEXT: In software testing, Failed Error Propagation (FEP) is the situation in which a faulty program state occurs during the execution of the system under test (SUT) but this does not lead to incorrect output. It is known that FEP can adversely affect software testing and this has resulted in associated information theoretic measures. OBJECTIVE: To devise measures that can be used to assess the testability of the SUT. By testability, we mean how likely it is that a faulty program state, that occurs during testing, will lead to incorrect output. Previous work has considered a single program point rather than an entire program. METHOD: New, more fine-grained, measures were devised. Experiments were used to evaluate these and the previously defined measures (Squeeziness and Normalised Squeeziness). The experiments assessed how well these measures correlated with an estimate of the probability of FEP occurring during testing. Mutants were used to estimate this probability. RESULTS: A strong rank correlation was found between several of the measures and the probability of FEP. Importantly, this included the Normalised Squeeziness of the whole SUT, which is simpler to compute, or estimate, than most of the other measures considered. Additional experiments found that the measures were relatively insensitive to the choice of mutants and also test suite. CONCLUSION: There is scope to use information theoretic measures to estimate how prone an SUT is to FEP. As a result, there is potential to use such measures to prioritise testing or estimate how much testing an SUT might require

    No Fault Found events in maintenance engineering Part 2: Root causes, technical developments and future research

    Get PDF
    This is the second half of a two paper series covering aspects of the no fault found (NFF) phenomenon, which is highly challenging and is becoming even more important due to increasing complexity and criticality of technical systems. Part 1 introduced the fundamental concept of unknown failures from an organizational, behavioral and cultural stand point. It also reported an industrial outlook to the problem, recent procedural standards, whilst discussing the financial implications and safety concerns. In this issue, the authors examine the technical aspects, reviewing the common causes of NFF failures in electronic, software and mechanical systems. This is followed by a survey on technological techniques actively being used to reduce the consequence of such instances. After discussing improvements in testability, the article identifies gaps in literature and points out the core areas that should be focused in the future. Special attention is paid to the recent trends on knowledge sharing and troubleshooting tools; with potential research on technical diagnosis being enumerated

    Application of social networking algorithms in program analysis: understanding execution frequencies

    Get PDF
    2011 Summer.Includes bibliographical references.There may be some parts of a program that are more commonly used at runtime, whereas there may be other parts that are less commonly used or not used at all. In this exploratory study, we propose an approach to predict how frequently or rarely different parts of a program will get used at runtime without actually running the program. Knowledge of the most frequently executed parts can help identify the most critical and the most testable parts of a program. The portions predicted to be the less commonly executed tend to be hard to test parts of a program. Knowing the hard to test parts of a program can aid the early development of test cases. In our approach we statically analyse code or static models of code (like UML class diagrams), using quantified social networking measures and web structure mining measures. These measures assign ranks to different portions of code for use in predictions of the relative frequency that a section of code will be used. We validated these rank ordering of predictions by running the program with a common set of use cases and identifying the actual rank ordering. We compared the predictions with other measures that use direct coupling or lines of code. We found that our predictions fared better as they were statistically more correlated to the actual rank ordering than the other measures. We present a prototype tool written as an eclipse plugin, that implements and validates our approach. Given the source code of a Java program, our tool computes the values of the metrics required by our approach to present ranks of all classes in order of how frequently they are expected to get used. Our tool can also instrument the source code to log all the necessary information at runtime that is required to validate our predictions

    Oracle Assessment, Improvement and Placement

    Get PDF
    The oracle problem remains one of the key challenges in software testing, for which little automated support has been developed so far. This thesis analyses the prevalence of failed error propagation in programs with real faults to address the oracle placement problem and introduces an approach for iterative assessment and improvement of the oracles. To analyse failed error propagation in programs with real faults, we have conducted an empirical study, considering Defects4J, a benchmark of Java programs, of which we used all 6 projects available, 384 real bugs and 528 methods fixed to correct such bugs. The results indicate that the prevalence of failed error propagation is negligible. Moreover, the results on real faults differ from the results on mutants, indicating that if failed error propagation is taken into account, mutants are not a good surrogate of real faults. When measuring failed error propagation, for each method we use the strongest possible oracle as postcondition, which checks all externally observable program variables. The low prevalence of failed error propagation is caused by the presence of such a strong oracle, which usually is not available in practice. Therefore, there is a need for a technique to assess and improve existing weaker oracles. We propose a technique for assessing and improving test oracles, which necessarily places the human tester in the loop and is based on reducing the incidence of both false positives and false negatives. A proof showing that this approach results in an increase in the mutual information between the actual and perfect oracles is provided. The application of the approach to five real-world subjects shows that the fault detection rate of the oracles after improvement increases, on average, by 48.6%. The further evaluation with 39 participants assessed the ability of humans to detect false positives and false negatives manually, without any tool support. The correct classification rate achieved by humans in this case is poor (29%) indicating how helpful our automated approach can be for developers. The comparison of humans’ ability to improve oracles with and without the tool in a study with 29 other participants also empirically validates the effectiveness of the approach
    • …
    corecore