Search CORE

910 research outputs found

Highly parallel computation

Author: Denning Peter J.
Tichy Walter F.
Publication venue
Publication date
Field of study

Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

NASA Technical Reports Server

Recommended from our members

The Effectiveness of <i>t</i>-Way Test Data Generation

Author: Ellims Michael
Publication venue
Publication date: 01/01/2009
Field of study

Modern society is increasingly dependent on the correct functioning of software and increasingly so in areas that are considered safety related or safety critical. Therefore, there is an increasing need to be able to verify and validate that the software is in fact correct and will perform its intended function. Many approaches to this problem have been proposed; however, none seems likely to supplant the role of testing in the near future. If we accept that there is, and will be, a continuing need to be able to test software then the question becomes one of how can this be done effectively, both in terms of ability to detect errors and in terms of cost. One avenue of research that offers prospects of improving both of these aspects is the automatic generation of test data. There has recently been a large amount of work conducted in this area. One particularly promising direction has been the application of ideas from the field of experimental design and in particular, the field of t-way adequate factorial designs. The area however, is not without issues; there is evidence that the technique is capable of detecting errors but that evidence is not unequivocal. Moreover, as with almost all work in the area of automatic test generation, there has been very little comparative work comparing the technique with other test data generation techniques. Worse, there has been effectively no work done that compares any automatic test data generation technique with the effectiveness of tests generated by humans. Another major issue with the technique is the number of tests that applying the technique can result in. This implies that there is a need for an automated oracle if the technique is to be successfully applied. The flaw with this is of course that in most situations the oracle is the human that is conducting the tests, a point often ignored in testing research. The work presented here addresses both of these points. To do this I have used a code base taken from an industrial engine control system that has an existing set of high quality unit tests developed by hand. To complement this, several other techniques for automatically generating test data have been applied, namely random testing, random experimental designs and a technique for generating single factor experiments. To address the issue of being able to compare the error detection ability of all of the sets of test vectors, rather than the usual effectiveness surrogates of code coverage I have used mutation analysis on the code base to directly measure the ability of each set of test vectors to discover common coding errors. The results presented here show that test data generation techniques based on t-way factorial designs are at least as effective as handgenerated tests and superior to random testing and the factor experimental technique. The oracle problem associated with the factorial design techniques was addressed using a test set minimisation approach. The mutation tool monitored which vectors could “kill” which code mutants. After a subset of the test vectors had been run, the most effective vectors were retained and the rest discarded. Likewise, mutants that were killed were removed from further consideration and the process repeated. Experimental results show that this minimisation procedure is effective at reducing computational overhead and is capable of producing final sets of test vectors that are comparable in size with the sets of hand-generated tests and so amenable to final hand checking

Open Research Online (The Open University)

The development and use of variables in mathematics and computer science

Author: Beynon Meurig
Russ Steve
Publication venue: University of Warwick. Department of Computer Science
Publication date
Field of study

There are a wide variety of uses of variables in mathematics which we cope with in practice through conventions and tacit assumptions. Experience with computers has made us articulate, criticise and develop these assumptions much more carefully. Historically the term 'variable quantity' was introduced in the context of describing and calculating changing quantities which corresponded to phenomena in the observable world (e.g. the velocity of fluxion of a body moving under the inverse square law). The evolution of the concept has divorced it from these routes of reference and required us to establish the formal apparatus of interpretation and valuation. While the changes considered are highly structured this may be satisfactory, but computing power invites us to cope with change in vastly more complex, unstructured situations such as in simulation of 'real world' processes. We relate this challenge to the distinctive differences in the use of variables in mathematics and practical computing, and we develop a general framework in which all uses of variables can be described in a unified way

Warwick Research Archives Portal Repository

Modeling Streams-based Variants of Ant Colony Optimisation for Parallel Systems

Author: Cheng Wei
Grelck Clemens
Kirner Raimund
Penczek Frank
Scheuermann Bernd
Shafarenko Alex
Publication venue: HiPEAC
Publication date: 01/01/2012
Field of study

Wei Cheng, Frank Penczek, Clemens Grelck, Raimund Kirner, Bernd Scheuermann, Alex Shafarenko, 'Modeling Streams-based Variants of Ant Colony Optimisation for Parallel Systems' in Proceedings: 2nd HiPEAC Workshop on Feedback-Directed Compiler Optimization for Multi-Core Architectures. Berlin, Germany. 22 January 2013In this paper we present the implementation of a concurrent ant colony optimisation based solver for the combinatorial Single Machine Total Weighted Tardiness Problem (ACO- SMTWTP). We introduce S-Net, a coordination language based on dataflow principles, report on the performance of the implementation and compare it against a sequential and a parallel implementation of the same algorithm in C. As the workload of the optimisation algorithm is highly irregu- lar we consider this application to be an important use-case for runtime measurement directed optimisations of the co- ordination rogram as much as for guiding optimisations of numerical code

University of Hertfordshire Research Archive

Deployment of Deep Neural Networks on Dedicated Hardware Accelerators

Author: Rieber Dennis Sebastian
Publication venue
Publication date: 01/01/2023
Field of study

Deep Neural Networks (DNNs) have established themselves as powerful tools for a wide range of complex tasks, for example computer vision or natural language processing. DNNs are notoriously demanding on compute resources and as a result, dedicated hardware accelerators for all use cases are developed. Different accelerators provide solutions from hyper scaling cloud environments for the training of DNNs to inference devices in embedded systems. They implement intrinsics for complex operations directly in hardware. A common example are intrinsics for matrix multiplication. However, there exists a gap between the ecosystems of applications for deep learning practitioners and hardware accelerators. HowDNNs can efficiently utilize the specialized hardware intrinsics is still mainly defined by human hardware and software experts. Methods to automatically utilize hardware intrinsics in DNN operators are a subject of active research. Existing literature often works with transformationdriven approaches, which aim to establish a sequence of program rewrites and data-layout transformations such that the hardware intrinsic can be used to compute the operator. However, the complexity this of task has not yet been explored, especially for less frequently used operators like Capsule Routing. And not only the implementation of DNN operators with intrinsics is challenging, also their optimization on the target device is difficult. Hardware-in-the-loop tools are often used for this problem. They use latency measurements of implementations candidates to find the fastest one. However, specialized accelerators can have memory and programming limitations, so that not every arithmetically correct implementation is a valid program for the accelerator. These invalid implementations can lead to unnecessary long the optimization time. This work investigates the complexity of transformation-driven processes to automatically embed hardware intrinsics into DNN operators. It is explored with a custom, graph-based intermediate representation (IR). While operators like Fully Connected Layers can be handled with reasonable effort, increasing operator complexity or advanced data-layout transformation can lead to scaling issues. Building on these insights, this work proposes a novel method to embed hardware intrinsics into DNN operators. It is based on a dataflow analysis. The dataflow embedding method allows the exploration of how intrinsics and operators match without explicit transformations. From the results it can derive the data layout and program structure necessary to compute the operator with the intrinsic. A prototype implementation for a dedicated hardware accelerator demonstrates state-of-the art performance for a wide range of convolutions, while being agnostic to the data layout. For some operators in the benchmark, the presented method can also generate alternative implementation strategies to improve hardware utilization, resulting in a geo-mean speed-up of ×2.813 while reducing the memory footprint. Lastly, by curating the initial set of possible implementations for the hardware-in-the-loop optimization, the median timeto- solution is reduced by a factor of ×2.40. At the same time, the possibility to have prolonged searches due a bad initial set of implementations is reduced, improving the optimization’s robustness by ×2.35

Heidelberger Dokumentenserver