53 research outputs found

    A methodology for speeding up matrix vector multiplication for single/multi-core architectures

    Get PDF
    In this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented. This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 1.2 up to 1.45). This is achieved by fully exploiting the combination of the software (e.g., data reuse) and hardware parameters (e.g., data cache associativity) which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions. The proposed methodology produces a different schedule for different values of the (i) number of the levels of data cache; (ii) data cache sizes; (iii) data cache associativities; (iv) data cache and main memory latencies; (v) data array layout of the matrix and (vi) number of cores

    Edge detection using neural network arbitration

    Get PDF
    A human observer is able to recognise and describe most parts of an object by its contour, if this is properly traced and reflects the shape of the object itself. With a machine vision system this recognition task has been approached using a similar technique. This prompted the development of many diverse edge detection algorithms. The work described in this thesis is based on the visual observation that edge maps produced by different algorithms, as the image degrades. Display different properties of the original image. Our proposed objective is to try and improve the edge map through the arbitration between edge maps produced by diverse (in nature, approach and performance) edge detection algorithms. As image processing tools are repetitively applied to similar images we believe the objective can be achieved by a learning process based on sample images. It is shown that such an approach is feasible, using an artificial neural network to perform the arbitration. This is taught from sets extracted from sample images. The arbitration system is implemented upon a parallel processing platform. The performance of the system is presented through examples of diverse types of image. Comparisons with a neural network edge detector (also developed within this thesis) and conventional edge detectors show that the proposed system presents significant advantages

    Parallel implementation of a virtual reality system on a transputer architecture

    Get PDF
    A Virtual Reality is a computer model of an environment, actual or imagined, presented to a user in as realistic a fashion as possible. Stereo goggles may be used to provide the user with a view of the modelled environment from within the environment, while a data-glove is used to interact with the environment. To simulate reality on a computer, the machine has to produce realistic images rapidly. Such a requirement usually necessitates expensive equipment. This thesis presents an implementation of a virtual reality system on a transputer architecture. The system is general, and is intended to provide support for the development of various virtual environments. The three main components of the system are the output device drivers, the input device drivers, and the virtual world kernel. This last component is responsible for the simulation of the virtual world. The rendering system is described in detail. Various methods for implementing the components of the graphics pipeline are discussed. These are then generalised to make use of the facilities provided by the transputer processor for parallel processing. A number of different decomposition techniques are implemented and compared. The emphasis in this section is on the speed at which the world can be rendered, and the interaction latency involved. In the best case, where almost linear speedup is obtained, a world containing over 250 polygons is rendered at 32 frames/second. The bandwidth of the transputer links is the major factor limiting speedup. A description is given of an input device driver which makes use of a powerglove. Techniques for overcoming the limitations of this device, and for interacting with the virtual world, are discussed. The virtual world kernel is designed to make extensive use of the parallel processing facilities provided by transputers. It is capable of providing support for mUltiple worlds concurrently, and for multiple users interacting with these worlds. Two applications are described that were successfully implemented using this system. The design of the system is compared with other recently developed virtual reality systems. Features that are common or advantageous in each of the systems are discussed. The system described in this thesis compares favourably, particularly in its use of parallel processors.KMBT_22

    Edge detection using neural network arbitration

    Get PDF
    A human observer is able to recognise and describe most parts of an object by its contour, if this is properly traced and reflects the shape of the object itself. With a machine vision system this recognition task has been approached using a similar technique. This prompted the development of many diverse edge detection algorithms. The work described in this thesis is based on the visual observation that edge maps produced by different algorithms, as the image degrades. Display different properties of the original image. Our proposed objective is to try and improve the edge map through the arbitration between edge maps produced by diverse (in nature, approach and performance) edge detection algorithms. As image processing tools are repetitively applied to similar images we believe the objective can be achieved by a learning process based on sample images. It is shown that such an approach is feasible, using an artificial neural network to perform the arbitration. This is taught from sets extracted from sample images. The arbitration system is implemented upon a parallel processing platform. The performance of the system is presented through examples of diverse types of image. Comparisons with a neural network edge detector (also developed within this thesis) and conventional edge detectors show that the proposed system presents significant advantages

    A unified programming system for a multi-paradigm parallel architecture

    Get PDF
    Real time image understanding and image generation require very large amounts of computing power. A possible way to meet these requirements is to make use of the power available from parallel computing systems. However parallel machines exhibit performance which is highly dependent on the algorithms being executed. Both image understanding and image generation involve the use of a wide variety of algorithms. A parallel machine suited to some of these algorithms may be unsuited to others. This thesis describes a novel heterogeneous parallel architecture optimised for image based applications. It achieves its performance by combining two different forms of parallel architecture, namely fine grain SIMD and course grain MIMD, into a single architecture. In this way it is possible to match the most appropriate computing resource to each algorithm in a given application. As important as the architecture itself is a method for programming it. This thesis describes a novel multi-paradigm programming language based on C++, which allows programs which make use of both control and data parallelism to be expressed in a single coherent framework, based on object oriented programming. To demonstrate the utility of both the architecture and the programming system, two applications, one from the field of image understanding the other image generation are examined. These applications combine some novel algorithms with other novel implementation approaches to provide the most effective mapping onto this architecture

    Studies of algorithms and related imaging techniques for industrial inspection

    Get PDF
    This thesis will deal with algorithms and imaging techniques for use in automated industrial inspection. The work falls into two main areas, the first dealing with general problems relating to typical inspection tasks, the second with specific applications including the analysis of seals on plastic packets.The requirements of a general object location and inspection system will be discussed initially in relation to algorithms supplied with commercial systems, which often seem ad-hoc. This will be followed up with detailed analyses of several corner and small hole detection algorithms. The features looked for in a useful algorithm are: (1) a high execution speed when implemented on a general purpose microcomputer, (2) good accuracy in locating the desired features, (3) robustness when faced with poor quality, noisy or cluttered images and (4) the ability to distinguish between genuine features and others that appear, superficially, to be similar. A program using these feature detectors to locate partially occluded machine parts in typical images will be presented.The second main area of investigation is that of the detection of faults in heat sealed food packets and is one which has hitherto largely been overlooked. The main problem with these packets is that the cellophane wrapper is highly reflective, giving rise to large areas of glare in any off-camera image. Experience has shown that careful lighting arrangement alone will never totally remove this problem. However, a simple arrangement of switched light beams, along with computer processing, can almost totally eliminate the glare. This approach has been used in the inspection of packets where faults are revealed by parts of the product inside showing through holes in the wrapper. Alternatively, by careful alignment of the light sources, the surface structure of the sealed part of a packet may be revealed. This can reveal defects either through the absence of a regular pattern, or by the presence of wrinkles running across the seal. Algorithms have been developed demonstrating each of these inspection tasks.Overall the work presented in this thesis has spanned several traditional areas of interest, and has also developed the techniques required for packet inspection and other situations where glare is a problem.<p

    Distributed video through telecommunication networks using fractal image compression techniques

    Get PDF
    The research presented in this thesis investigates the use of fractal compression techniques for a real time video distribution system. The motivation for this work was that the method has some useful properties which satisfy many requirements for video compression. In addition, as a novel technique, the fractal compression method has a great potential. In this thesis, we initially develop an understanding of the state of the art in image and video compression and describe the mathematical concepts and basic terminology of the fractal compression algorithm. Several schemes which aim to the improve of the algorithm, for still images are then examined. Amongst these, two novel contributions are described. The first is the partitioning of the image into sections which resulted insignificant reduction of the compression time. In the second, the use of the median metric as alternative to the RMS was considered but was not finally adopted, since the RMS proved to be a more efficient measure. The extension of the fractal compression algorithm from still images to image sequences is then examined and three different schemes to reduce the temporal redundancy of the video compression algorithm are described. The reduction in the execution time of the compression algorithm that can be obtained by the techniques described is significant although real time execution has not yet been achieved. Finally, the basic concepts of distributed programming and networks, as basic elements of a video distribution system, are presented and the hardware and software components of a fractal video distribution system are described. The implementation of the fractal compression algorithm on a TMS320C40 is also considered for speed benefits and it is found that a relatively large number of processors are needed for real time execution

    A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures

    Get PDF
    Current compilers cannot generate code that can compete with hand-tuned code in efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and number of levels of tiling. The scheduling parameter values selection is a very difficult and time-consuming task, since parameter values depend on each other; this is why they are found by using searching methods and empirical techniques. To overcome this problem, the scheduling sub-problems must be optimized together, as one problem and not separately. In this paper, an MMM methodology is presented where the optimum scheduling parameters are found by decreasing the search space theoretically, while the major scheduling sub-problems are addressed together as one problem and not separately according to the hardware architecture parameters and input size; for different hardware architecture parameters and/or input sizes, a different implementation is produced. This is achieved by fully exploiting the software characteristics (e.g., data reuse) and hardware architecture parameters (e.g., data caches sizes and associativities), giving high-quality solutions and a smaller search space. This methodology refers to a wide range of CPU and GPU architectures

    Earth imaging with microsatellites: An investigation, design, implementation and in-orbit demonstration of electronic imaging systems for earth observation on-board low-cost microsatellites.

    Get PDF
    This research programme has studied the possibilities and difficulties of using 50 kg microsatellites to perform remote imaging of the Earth. The design constraints of these missions are quite different to those encountered in larger, conventional spacecraft. While the main attractions of microsatellites are low cost and fast response times, they present the following key limitations: Payload mass under 5 kg, Continuous payload power under 5 Watts, peak power up to 15 Watts, Narrow communications bandwidths (9.6 / 38.4 kbps), Attitude control to within 5&deg;, No moving mechanics. The most significant factor is the limited attitude stability. Without sub-degree attitude control, conventional scanning imaging systems cannot preserve scene geometry, and are therefore poorly suited to current microsatellite capabilities. The foremost conclusion of this thesis is that electronic cameras, which capture entire scenes in a single operation, must be used to overcome the effects of the satellite's motion. The potential applications of electronic cameras, including microsatellite remote sensing, have erupted with the recent availability of high sensitivity field-array CCD (charge-coupled device) image sensors. The research programme has established suitable techniques and architectures necessary for CCD sensors, cameras and entire imaging systems to fulfil scientific/commercial remote sensing despite the difficult conditions on microsatellites. The author has refined these theories by designing, building and exploiting in-orbit five generations of electronic cameras. The major objective of meteorological scale imaging was conclusively demonstrated by the Earth imaging camera flown on the UoSAT-5 spacecraft in 1991. Improved cameras have since been carried by the KITSAT-1 (1992) and PoSAT-1 (1993) microsatellites. PoSAT-1 also flies a medium resolution camera (200 metres) which (despite complete success) has highlighted certain limitations of microsatellites for high resolution remote sensing. A reworked, and extensively modularised, design has been developed for the four camera systems deployed on the FASat-Alfa mission (1995). Based on the success of these missions, this thesis presents many recommendations for the design of microsatellite imaging systems. The novelty of this research programme has been the principle of designing practical camera systems to fit on an existing, highly restrictive, satellite platform, rather than conceiving a fictitious small satellite to support a high performance scanning imager. This pragmatic approach has resulted in the first incontestable demonstrations of the feasibility of remote sensing of the Earth from inexpensive microsatellites
    • …
    corecore