14 research outputs found
Data Structures & Algorithm Analysis in C++
This is the textbook for CSIS 215 at Liberty University.https://digitalcommons.liberty.edu/textbooks/1005/thumbnail.jp
Recommended from our members
Efficient Algorithms for Road Networks and Noisy Sorting: an Experimental and Theoretical Perspective
Experimental algorithmics, also referred to as algorithm engineering, is the principled approach of using empirical methods to complement traditional theoretical methods, both of which provide valuable insights for the analysis of algorithms. In this dissertation, we study various algorithmic problems for road networks and noisy sorting, analyzing them from both an experimental and theoretical perspective. We first study the problem of exact learning for road networks, and introduce an efficient randomized algorithm using simple distance queries, which can find missing roads and improve the quality of routing services. We provide a partial theoretical analysis based on a cluster degree parameter, d_max, then empirically show that this parameter is small for road networks by evaluating our algorithm on road network data for the U.S. and 4 European countries of various sizes. This analysis provides experimental evidence that our algorithm issues a quasilinear number of queries in expectation for road networks and similar graphs. We also study the small-world navigability of the U.S. road network, inspired by the famous experiments done by Stanley Milgram which gave rise to the "six degrees of separation" expression. We introduce the Neighborhood Preferential Attachment Model, and perform extensive experiments with this model on U.S. road networks to show that our model outperforms other small-world models in terms of the average hop length, while having a more realistic scale-free degree distribution. We then study the problem of sorting n comparable distinct elements, subject to noisy comparison errors, such that the comparison of two elements returns the wrong answer according to a fixed probability p_e > 1/2. We provide new methods for sorting with comparison errors that are data oblivious while avoiding the use of noisy binary search methods. We then experimentally compare our algorithms and other sorting algorithms. Lastly, we study the noisy sorting problem in an external-memory setting, providing new efficient methods that are in the external-memory model for sorting with comparison errors. Our algorithms achieve an optimal number of I/Os, in both cache-aware and cache-oblivious settings
Recommended from our members
The Effectiveness of <i>t</i>-Way Test Data Generation
Modern society is increasingly dependent on the correct functioning of software and increasingly so in areas that are considered safety related or safety critical. Therefore, there is an increasing need to be able to verify and validate that the software is in fact correct and will perform its intended function. Many approaches to this problem have been proposed; however, none seems likely to supplant the role of testing in the near future.
If we accept that there is, and will be, a continuing need to be able to test software then the question becomes one of how can this be done effectively, both in terms of ability to detect errors and in terms of cost. One avenue of research that offers prospects of improving both of these aspects is the automatic generation of test data.
There has recently been a large amount of work conducted in this area. One particularly promising direction has been the application of ideas from the field of experimental design and in particular, the field of t-way adequate factorial designs.
The area however, is not without issues; there is evidence that the technique is capable of detecting errors but that evidence is not unequivocal. Moreover, as with almost all work in the area of automatic test generation, there has been very little comparative work comparing the technique with other test data generation techniques. Worse, there has been effectively no work done that compares any automatic test data generation technique with the effectiveness of tests generated by humans. Another major issue with the technique is the number of tests that applying the technique can result in. This implies that there is a need for an automated oracle if the technique is to be successfully applied. The flaw with this is of course that in most situations the oracle is the human that is conducting the tests, a point often ignored in testing research.
The work presented here addresses both of these points. To do this I have used a code base taken from an industrial engine control system that has an existing set of high quality unit tests developed by hand. To complement this, several other techniques for automatically generating test data have been applied, namely random testing, random experimental designs and a technique for generating single factor experiments. To address the issue of being able to compare the error detection ability of all of the sets of test vectors, rather than the usual effectiveness surrogates of code coverage I have used mutation analysis on the code base to directly measure the ability of each set of test vectors to discover common coding errors. The results presented here show that test data generation techniques based on t-way factorial designs are at least as effective as handgenerated tests and superior to random testing and the factor experimental technique.
The oracle problem associated with the factorial design techniques was addressed using a test set minimisation approach. The mutation tool monitored which vectors could “kill” which code mutants. After a subset of the test vectors had been run, the most effective vectors were retained and the rest discarded. Likewise, mutants that were killed were removed from further consideration and the process repeated. Experimental results show that this minimisation procedure is effective at reducing computational overhead and is capable of producing final sets of test vectors that are comparable in size with the sets of hand-generated tests and so amenable to final hand checking
Improving performance of genetic algorithms by using novel fitness functions
This thesis introduces Intelligent Fitness Functions and Partial Fitness Functions both of which can improve the performance of a genetic algorithm which is limited to a fixed run time. An Intelligent Fitness Function is defined as a fitness function with a memory. The memory is used to store information about individuals so that duplicate individuals do not need to have their fitness tested. Different types of memory (long and short term) and different storage strategies (fitness based, time base and frequency based) have been tested. The results show that an intelligent fitness function, with a time based long term memory improves the efficiency of a genetic algorithm the most. A Partial Fitness Function is defined as a fitness function that only partially tests the fitness of an individual at each generation. Thus only promising individuals get fully tested. Using a partial fitness function gives the genetic algorithm more evolutionary steps in the same length of time as a genetic algorithm using a normal fitness function. The results show that a genetic algorithm using a partial fitness function can achieve higher fitness levels than a genetic algorithm using a normal fitness function. Finally a genetic algorithm designed to solve a substitution cipher is compared to one equipped with an intelligent fitness function and another equipped with a partial fitness function. The genetic algorithm with the intelligent fitness function and the genetic algorithm with the partial fitness function both show a significant improvement over the genetic algorithm with a conventional fitness function.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Fast Parallel Machine Learning Algorithms for Large Datasets Using Graphic Processing Unit
This dissertation deals with developing parallel processing algorithms for Graphic Processing Unit (GPU) in order to solve machine learning problems for large datasets. In particular, it contributes to the development of fast GPU based algorithms for calculating distance (i.e. similarity, affinity, closeness) matrix. It also presents the algorithm and implementation of a fast parallel Support Vector Machine (SVM) using GPU. These application tools are developed using Compute Unified Device Architecture (CUDA), which is a popular software framework for General Purpose Computing using GPU (GPGPU). Distance calculation is the core part of all machine learning algorithms because the closer the query is to some samples (i.e. observations, records, entries), the more likely the query belongs to the class of those samples. K-Nearest Neighbors Search (k-NNS) is a popular and powerful distance based tool for solving classification problem. It is the prerequisite for training local model based classifiers. Fast distance calculation can significantly improve the speed performance of these classifiers and GPUs can be very handy for their accelerations. Meanwhile, several GPU based sorting algorithms are also included to sort the distance matrix and seek for the k-nearest neighbors. The speed performances of the sorting algorithms vary depending upon the input sequences. The GPUKNN proposed in this dissertation utilizes the GPU based distance computation algorithm and automatically picks up the most suitable sorting algorithm according to the characteristics of the input datasets. Every machine learning tool has its own pros and cons. The advantage of SVM is the high classification accuracy. This makes SVM possibly the best classification tool. However, as in many other machine learning algorithms, SVM\u27s slow training phase slows down when the size of the input datasets increase. The GPU version of parallel SVM based on parallel Sequential Minimal Optimization (SMO) implemented in this dissertation is proposed to reduce the time cost in both training and predicting phases. This implementation of GPUSVM is original. It utilizes many parallel processing techniques to accelerate and minimize the computations of kernel evaluation, which are considered as the most time consuming operations in SVM. Although the many-core architecture of GPU performs the best in data level parallelism, multi-task (aka. task level parallelism) processing is also integrated into the application to improve the speed performance of tasks such as multiclass classification and cross-validation. Furthermore, the procedure of finding worst violators is distributed to multiple blocks on the CUDA model. This reduces the time cost for each iteration of SMO during the training phase. All of these violators are shared among different tasks in multiclass classification and cross-validation to reduce the duplicate kernel computations. The speed performance results have shown that the achieved speedup of both the training phase and predicting phase are ranging from one order of magnitude to three orders of magnitude times faster compared to the state of the art LIBSVM software on some well known benchmarking datasets
Search-Based Temporal Testing of Multicore Applications
Multicore systems are increasingly common as a modern computing platform. Multicore processors not only offer better performance-to-cost ratios relative to single-core processors but also have significantly minimised space, weight, and power (SWaP) constraints. Unfortunately, they introduce challenges in verification as their shared components are potential channels for interference. The potential for interference increases the possibility of concurrency faults at runtime and consequently increases the difficulty of verifying. In this thesis, search-based techniques are empirically investigated to determine their effectiveness in temporal testing—searching for test inputs that may lead a task running on an embedded multicore to produce extreme (here longest) execution times, which might cause the system to violate its temporal requirements. Overall, the findings suggest that various forms of search-based approaches are effective in generating test inputs exhibiting extreme execution times on the embedded multicore environment. All previous work in temporal testing has evolved test data directly; this is not essential. In this thesis, one novel proposed approach, i.e. the use of search to discover high performing biased random sampling regimes (which we call 'dependent input sampling strategies'), has proved particularly effective. Shifting the target of search from test data itself to strategies proves particularly well motivated for attaining extreme execution times. Finally, we present also preliminary results on the use of so-called 'hyper-heuristics', which can be used to form optimal hybrids of optimisation techniques. An extensive comparison of direct approaches to establishing a baseline is followed by reports of research into indirect approaches and hyper-heuristics. The shift to strategies from direct data can be thought of as a leap in abstraction level for the underlying temporal test data generation problem. The shift to hyper-heuristics aims to boost the level of optimisation technique abstraction. The former is more fully worked out than the latter and has proved a significant success. For the latter only preliminary results are available; as will be seen from this work as the whole computational requirements for research experimentation are significant
Efficient fault-injection-based assessment of software-implemented hardware fault tolerance
With continuously shrinking semiconductor structure sizes and lower supply
voltages, the per-device susceptibility to transient and permanent hardware
faults is on the rise. A class of countermeasures with growing popularity
is Software-Implemented Hardware Fault Tolerance (SIHFT), which avoids
expensive hardware mechanisms and can be applied application-specifically.
However, SIHFT can, against intuition, cause more harm than good, because
its overhead in execution time and memory space also increases the figurative
“attack surface” of the system – it turns out that application-specific configuration of SIHFT is in fact a necessity rather than just an advantage.
Consequently, target programs need to be analyzed for particularly critical spots to harden. SIHFT-hardened programs need to be measured and compared throughout all development phases of the program to observe reliability improvements or deteriorations over time. Additionally, SIHFT implementations
need to be tested.
The contributions of this dissertation focus on Fault Injection (FI) as an assessment technique satisfying all these requirements – analysis, measurement and comparison, and test. I describe the design and implementation of an FI tool, named Fail*, that overcomes several shortcomings in the state of
the art, and enables research on the general drawbacks of simulation-based
FI. As demonstrated in four case studies in the context of SIHFT research,
Fail* provides novel fine-grained analysis techniques that exploit the newly
gained possibility to analyze FI results from complete fault-space exploration.
These analysis techniques aid SIHFT design decisions on the level of program
modules, functions, variables, source-code lines, or single machine instructions.
Based on the experience from the case studies, I address the problem
of large computation efforts that accompany exhaustive fault-space exploration
from two different angles: Firstly, I develop a heuristical fault-space
pruning technique that allows to freely trade the total FI-experiment count
for result accuracy, while still providing information on all possible faultspace
coordinates. Secondly, I speed up individual TAP-based FI experiments
by improving the fast-forwarding operation by several orders of magnitude
for most workloads. Finally, I dissect current practices in FI-based evaluation
of SIHFT-hardened programs, identify three widespread pitfalls in the
result interpretation, and advance the state of the art by defining a novel
comparison metric