4 research outputs found
Computer-aided Semantic Signature Identification and Document Classification via Semantic Signatures
In this era of textual data explosion on the World Wide Web, it may be very hard to find documents that are similar to the documents that are of interest to us. To overcome this problem we have developed a type of semantic signature that captures the semantics of target content (text). Semantic signatures from a text/document of interest are derived using the software package semantic signature mining tool (SSMinT). This software package has been developed as a part of this thesis work in collaboration with Sri Ramya Peddada. These semantic signatures are used to search and retrieve documents with similar semantic patterns. Effects of different representations of semantic signatures on the document classification outcomes are illustrated. Retrieved document classification accuracies of Euclidean and Spherical K-means clustering algorithms are compared. A Chi-square test is presented to prove that the observed and expected numbers of documents retrieved (from a corpus) are not significantly different. From this Chi-square test it is proved that the semantic signature concept is capable of retrieving documents of interest with high probability. Our findings indicate that this concept has potential for use in commercial text/document searching applications
The Use of Automated Search in Deriving Software Testing Strategies
Testing a software artefact using every one of its possible inputs would normally cost too much, and take too long, compared to the benefits of detecting faults in the software. Instead, a testing strategy is used to select a small subset of the inputs with which to test the software. The criterion used to select this subset affects the likelihood that faults in the software will be detected. For some testing strategies, the criterion may result in subsets that are very efficient at detecting faults, but implementing the strategy -- deriving a 'concrete strategy' specific to the software artefact -- is so difficult that it is not cost-effective to use that strategy in practice.
In this thesis, we propose the use of metaheuristic search to derive concrete testing strategies in a cost-effective manner.
We demonstrate a search-based algorithm that derives concrete strategies for 'statistical testing', a testing strategy that has a good fault-detecting ability in theory, but which is costly to implement in practice. The cost-effectiveness of the search-based approach is enhanced by the rigorous empirical determination of an efficient algorithm configuration and associated parameter settings, and by the exploitation of low-cost commodity GPU cards to reduce the time taken by the algorithm. The use of a flexible grammar-based representation for the test inputs ensures the applicability of the algorithm to a wide range of software
Recommended from our members
The Effectiveness of <i>t</i>-Way Test Data Generation
Modern society is increasingly dependent on the correct functioning of software and increasingly so in areas that are considered safety related or safety critical. Therefore, there is an increasing need to be able to verify and validate that the software is in fact correct and will perform its intended function. Many approaches to this problem have been proposed; however, none seems likely to supplant the role of testing in the near future.
If we accept that there is, and will be, a continuing need to be able to test software then the question becomes one of how can this be done effectively, both in terms of ability to detect errors and in terms of cost. One avenue of research that offers prospects of improving both of these aspects is the automatic generation of test data.
There has recently been a large amount of work conducted in this area. One particularly promising direction has been the application of ideas from the field of experimental design and in particular, the field of t-way adequate factorial designs.
The area however, is not without issues; there is evidence that the technique is capable of detecting errors but that evidence is not unequivocal. Moreover, as with almost all work in the area of automatic test generation, there has been very little comparative work comparing the technique with other test data generation techniques. Worse, there has been effectively no work done that compares any automatic test data generation technique with the effectiveness of tests generated by humans. Another major issue with the technique is the number of tests that applying the technique can result in. This implies that there is a need for an automated oracle if the technique is to be successfully applied. The flaw with this is of course that in most situations the oracle is the human that is conducting the tests, a point often ignored in testing research.
The work presented here addresses both of these points. To do this I have used a code base taken from an industrial engine control system that has an existing set of high quality unit tests developed by hand. To complement this, several other techniques for automatically generating test data have been applied, namely random testing, random experimental designs and a technique for generating single factor experiments. To address the issue of being able to compare the error detection ability of all of the sets of test vectors, rather than the usual effectiveness surrogates of code coverage I have used mutation analysis on the code base to directly measure the ability of each set of test vectors to discover common coding errors. The results presented here show that test data generation techniques based on t-way factorial designs are at least as effective as handgenerated tests and superior to random testing and the factor experimental technique.
The oracle problem associated with the factorial design techniques was addressed using a test set minimisation approach. The mutation tool monitored which vectors could “kill” which code mutants. After a subset of the test vectors had been run, the most effective vectors were retained and the rest discarded. Likewise, mutants that were killed were removed from further consideration and the process repeated. Experimental results show that this minimisation procedure is effective at reducing computational overhead and is capable of producing final sets of test vectors that are comparable in size with the sets of hand-generated tests and so amenable to final hand checking