325,886 research outputs found
Redefining and Evaluating Coverage Criteria Based on the Testing Scope
Test coverage information can help testers in deciding when to stop testing and in augmenting their test suites when the measured coverage is not deemed sufficient. Since the notion of a test criterion was introduced in the 70’s, research on coverage testing has been very active with much effort dedicated to the definition of new, more cost-effective, coverage criteria or to the adaptation of existing ones to a different domain. All these studies share the premise that after defining the entity to be covered (e.g., branches), one cannot consider a program to be adequately tested if some of its entities have never been exercised by any input data. However, it is not the case that all entities are of interest in every context. This is particularly true for several paradigms that emerged in the last decade (e.g., component-based development, service-oriented architecture). In such cases, traditional coverage metrics might not always provide meaningful information. In this thesis we address such situation and we redefine coverage criteria so to focus on the program parts that are relevant to the testing scope. We instantiate this general notion of scope-based coverage by introducing three coverage criteria and we demonstrate how they could be applied to different testing contexts. When applied to the context of software reuse, our approach proved to be useful for supporting test case prioritization, selection and minimization. Our studies showed that for prioritization we can improve the average rate of faults detected. For test case selection and minimization, we can considerably reduce the test suite size with small to no extra impact on fault detection effectiveness. When the source code is not available, such as in the service-oriented architecture paradigm, we propose an approach that customizes coverage, measured on invocations at service interface, based on data from similar users. We applied this approach to a real world application and, in our study, we were able to predict the entities that would be of interest for a given user with high precision. Finally, we introduce the first of its kind coverage criterion for operational profile based testing that exploits program spectra obtained from usage traces. Our study showed that it is better correlated than traditional coverage with the probability that the next test input will fail, which implies that our approach can provide a better stopping rule. Promising results were also observed for test case selection. Our redefinition of coverage criteria approaches the topic of coverage testing from a completely different angle. Such a novel perspective paves the way for new avenues of research towards improving the cost-effectiveness of testing, yet all to be explored
A fault syndromes simulator for random access memories
Testing and diagnosis techniques play a key role in the advance of semiconductor memory technology. The challenge of failure detection has created intensive investigation on efficient testing and diagnosis algorithm for better fault coverage and diagnostic resolution. At present, the March test algorithm is used to detect and diagnose all faults related to Random Access Memories. This algorithm also allows the faults to be located and identified. However, the test and diagnosis process is mainly done manually. Due to this, a systematic approach for developing and evaluating memory test algorithm is required. This work is focused on incorporating the March based test algorithm using a software simulator tool for implementing a fast and systematic memory testing algorithm. The simulator allows a user through a GUI to select a March based test algorithm depending on the desired fault coverage and diagnostic resolution. Experimental results show that using the simulator for testing is more efficient than that of the traditional testing algorithm. This new simulator makes it possible for a detailed list of coupling faults and stuck-at faults covered by each algorithm and its percentage to be displayed after a set of test algorithms has been chosen. The percentage of diagnostic resolution is also displayed. This proves that the simulator reduces the trade-off between test time, fault coverage and diagnostic resolution. Moreover, the chosen algorithm can be applied to incorporate with memory built-in self-test and diagnosis, to have a better fault coverage and diagnostic resolution. Universities and industry involved in memory Built-in-Self test, Built-in-Self repair and Built-in-Self diagnose will benefit by saving a few years on finding an efficient algorithm to be implemented in their designs
Scaling up HIV self-testing in sub-Saharan Africa: a review of technology, policy and evidence.
PURPOSE OF REVIEW: HIV self-testing (HIVST) can provide complementary coverage to existing HIV testing services and improve knowledge of status among HIV-infected individuals. This review summarizes the current technology, policy and evidence landscape in sub-Saharan Africa and priorities within a rapidly evolving field. RECENT FINDINGS: HIVST is moving towards scaled implementation, with the release of WHO guidelines, WHO prequalification of the first HIVST product, price reductions of HIVST products and a growing product pipeline. Multicountry evidence from southern and eastern Africa confirms high feasibility, acceptability and accuracy across many delivery models and populations, with minimal harms. Evidence on the effectiveness of HIVST on increased testing coverage is strong, while evidence on demand generation for follow-on HIV prevention and treatment services and cost-effective delivery is emerging. Despite these developments, HIVST delivery remains limited outside of pilot implementation. SUMMARY: Important technology gaps include increasing availability of more sensitive HIVST products in low and middle-income countries. Regulatory and postmarket surveillance systems for HIVST also require further development. Randomized trials evaluating the effectiveness and cost-effectiveness under multiple distribution models, including unrestricted delivery and with a focus on linkage to HIV prevention and treatment, remain priorities. Diversification of studies from west and central Africa and around blood-based products should be addressed
Guiding Deep Learning System Testing using Surprise Adequacy
Deep Learning (DL) systems are rapidly being adopted in safety and security
critical domains, urgently calling for ways to test their correctness and
robustness. Testing of DL systems has traditionally relied on manual collection
and labelling of data. Recently, a number of coverage criteria based on neuron
activation values have been proposed. These criteria essentially count the
number of neurons whose activation during the execution of a DL system
satisfied certain properties, such as being above predefined thresholds.
However, existing coverage criteria are not sufficiently fine grained to
capture subtle behaviours exhibited by DL systems. Moreover, evaluations have
focused on showing correlation between adversarial examples and proposed
criteria rather than evaluating and guiding their use for actual testing of DL
systems. We propose a novel test adequacy criterion for testing of DL systems,
called Surprise Adequacy for Deep Learning Systems (SADL), which is based on
the behaviour of DL systems with respect to their training data. We measure the
surprise of an input as the difference in DL system's behaviour between the
input and the training data (i.e., what was learnt during training), and
subsequently develop this as an adequacy criterion: a good test input should be
sufficiently but not overtly surprising compared to training data. Empirical
evaluation using a range of DL systems from simple image classifiers to
autonomous driving car platforms shows that systematic sampling of inputs based
on their surprise can improve classification accuracy of DL systems against
adversarial examples by up to 77.5% via retraining
Branch-coverage testability transformation for unstructured programs
Test data generation by hand is a tedious, expensive and error-prone activity, yet testing is a vital part of the development process. Several techniques have been proposed to automate the generation of test data, but all of these are hindered by the presence of unstructured control flow. This paper addresses the problem using testability transformation. Testability transformation does not preserve the traditional meaning of the program, rather it deals with preserving test-adequate sets of input data. This requires new equivalence relations which, in turn, entail novel proof obligations. The paper illustrates this using the branch coverage adequacy criterion and develops a branch adequacy equivalence relation and a testability transformation for restructuring. It then presents a proof that the transformation preserves branch adequacy
Data generator for evaluating ETL process quality
Obtaining the right set of data for evaluating the fulfillment of different quality factors in the extract-transform-load (ETL) process design is rather challenging. First, the real data might be out of reach due to different privacy constraints, while manually providing a synthetic set of data is known as a labor-intensive task that needs to take various combinations of process parameters into account. More importantly, having a single dataset usually does not represent the evolution of data throughout the complete process lifespan, hence missing the plethora of possible test cases. To facilitate such demanding task, in this paper we propose an automatic data generator (i.e., Bijoux). Starting from a given ETL process model, Bijoux extracts the semantics of data transformations, analyzes the constraints they imply over input data, and automatically generates testing datasets. Bijoux is highly modular and configurable to enable end-users to generate datasets for a variety of interesting test scenarios (e.g., evaluating specific parts of an input ETL process design, with different input dataset sizes, different distributions of data, and different operation selectivities). We have developed a running prototype that implements the functionality of our data generation framework and here we report our experimental findings showing the effectiveness and scalability of our approach.Peer ReviewedPostprint (author's final draft
Automatic evaluation of generation and parsing for machine translation with automatically acquired transfer rules
This paper presents a new method of evaluation for generation and parsing components of transfer-based MT systems where the transfer rules have been automatically
acquired from parsed sentence-aligned bitext corpora. The method provides a means of quantifying the upper bound imposed on the MT system by the quality of the parsing
and generation technologies for the target language. We include experiments to calculate this upper bound for both handcrafted and automatically induced parsing and generation technologies currently in use by transfer-based MT systems
Evaluation and optimization of frequent association rule based classification
Deriving useful and interesting rules from a data mining system is an essential and important task. Problems
such as the discovery of random and coincidental patterns or patterns with no significant values, and the
generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness
of rules generated by data mining algorithms are actively and constantly being examined and developed. In this
paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms,
combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task
- …