133 research outputs found
Simulating a Pipelined Reconfigurable Mesh on a Linear Array with a Reconfigurable Pipelined Bus System
Due to the unidirectional nature of propagation and predictable delays, optically pipelined buses have been gaining more attention. There have been many models proposed over time that use reconfigurable optically pipelined buses. The reconfigurable nature of the models makes them capable of changing their component’s functionalities and structure that connects the components at every step of computation. There are both one dimensional as well as k –dimensional models that have been proposed in the literature. Though equivalence between various one dimensional models and equivalence between different two dimensional models had been established, so far there has not been any attempt to explore the relationship between a one dimensional model and a two dimensional model. In the proposed research work it is shown that a move from one to two or more dimensions does not cause any increase in the volume of communication between the processors as they communicate in a pipelined manner on the same optical bus. When moving from two dimensions to one dimension, the challenge is to map the processors so that those belonging to a two-dimensional bus segment are contiguous and in the same order on the one-dimensional model. This does not increase any increase in communication overhead as the processors instead of communicating on two dimensional buses now communicate on a linear one dimensional bus structure. To explore the relationship between one dimensional and two dimensional models a commonly used model Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) and its two dimensional counterpart Pipelined Reconfigurable Mesh (PR-Mesh) are chosen Here an attempt has been made to present a simulation of a two dimensional PR-Mesh on a one dimensional LARPBS to establish complexity of the models with respect to one another, and to determine the efficiency with which the LARPBS can simulate the PR-Mesh
Parameterized Implementation of K-means Clustering on Reconfigurable Systems
Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes
Efficient reconfigurable architectures for 3D medical image compression
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities,
such as magnetic resonance imaging (MRI), computed tomography (CT), positron
emission tomography (PET), and ultrasound (US) have generated a massive amount
of volumetric data. These have provided an impetus to the development of other
applications, in particular telemedicine and teleradiology. In these fields, medical
image compression is important since both efficient storage and transmission of data
through high-bandwidth digital communication lines are of crucial importance.
Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow
for quick upgradeability with real-time applications. Moreover, in order to obtain
efficient solutions for large medical volumes data, an efficient implementation of
these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system
building block in the construction of high-performance systems at an economical price.
Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent
advantages such as massive parallelism capabilities, multimillion gate counts, and
special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are
optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits
promising results in reducing Gaussian white noise in medical images. In terms of
hardware implementation, promising trade-offs on maximum frequency, throughput
and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC)
has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete
wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that
3-D IT demonstrates better computational complexity than the 3-D DWT, whilst
the 3-D DWT with LS exhibits a lossless compression that is significantly useful for
medical image compression. Additionally, an architecture of CAVLC that is capable
of compressing high-definition (HD) images in real-time without any buffer between
the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the
slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE),
Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci
Optimization of a hardware/software coprocessing platform for EEG eyeblink detection and removal
The feasibility of implementing a real-time system for removing eyeblink artifacts from electroencephalogram (EEG) recordings utilizing a hardware/software coprocessing platform was investigated. A software based wavelet and independent component analysis (ICA) eyeblink detection and removal process was extended to enable variation in its processing parameters. Exploiting the efficiency of hardware and the reconfigurability of software, it was ported to a field programmable gate array (FPGA) development platform which was found to be capable of implementing the revised algorithm, although not in real-time. The implemented hardware and software solution was applied to a collection of both simulated and clinically acquired EEG data with known artifact and waveform characteristics to assess its speed and accuracy. Configured for optimal accuracy in terms of minimal false positives and negatives as well as maintaining the integrity of the underlying EEG, especially when encountering EEG waveform patterns with an appearance similar to eyeblink artifacts, the system was capable of processing a 10 second EEG epoch in an average of 123 seconds. Configured for efficiency, but with diminished accuracy, the system required an average of 34 seconds. Varying the ICA contrast function showed that the gaussian nonlinearity provided the best combination of reliability and accuracy, albeit with a long execution time. The cubic nonlinearity was fast, but unreliable, while the hyperbolic tangent contrast function frequently diverged. It is believed that the utilization of programmable logic with increased logic capacity and processing speed may enable this approach to achieve the objective of real-time operation
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Automated Nuclei Segmentation of Breast Cancer Histopathology
Automated detection and segmentation of cell nuclei is an essential step in breast cancer histopathology, so that there is improved accuracy, speed, level of automation and adaptability to new application. The goal of this paper is to develop efficient and accurate algorithms for detecting and segmenting cell nuclei in 2-D histological images. In this paper we will implement the utility of our nuclear segmentation algorithm in accurate extraction of nuclear features for automated grading of (a) breast cancer, and (b) distinguishing between cancerous and benign breast histology specimens. In order to address the issue the scheme integrates image information across three different scales: (1) low level information based on pixel values, (2) high-level information based on relationships between pixels for object detection, and(3)domain-specific information based on relationships between histological structures. Low-level information is utilized by a Bayesian Classifier to generate likelihood that each pixel belongs to an object of interest. High-level information is extracted in two ways: (i) by a level-set algorithm, where a contour is evolved in the likelihood scenes generated by the Bayesian classifier to identify object boundaries, and (ii) by a template matching algorithm, where shape models are used to identify glands and nuclei from the low-level likelihood scenes. Structural constraints are imposed via domain specific knowledge in order to verify whether the detected objects do indeed belong to structures of interest. The efficiency of our segmentation algorithm is evaluated by comparing breast cancer grading and benign vs. cancer discrimination accuracies with corresponding accuracies obtained via manual detection and segmentation of glands and nuclei
Parametric Dense Stereovision Implementation on a System-on Chip (SoC)
This paper proposes a novel hardware implementation of a dense recovery of stereovision 3D measurements. Traditionally 3D stereo systems have imposed the maximum number of stereo correspondences, introducing a large restriction on artificial vision algorithms. The proposed system-on-chip (SoC) provides great performance and efficiency, with a scalable architecture available for many different situations, addressing real time processing of stereo image flow. Using double buffering techniques properly combined with pipelined processing, the use of reconfigurable hardware achieves a parametrisable SoC which gives the designer the opportunity to decide its right dimension and features. The proposed architecture does not need any external memory because the processing is done as image flow arrives. Our SoC provides 3D data directly without the storage of whole stereo images. Our goal is to obtain high processing speed while maintaining the accuracy of 3D data using minimum resources. Configurable parameters may be controlled by later/parallel stages of the vision algorithm executed on an embedded processor. Considering hardware FPGA clock of 100 MHz, image flows up to 50 frames per second (fps) of dense stereo maps of more than 30,000 depth points could be obtained considering 2 Mpix images, with a minimum initial latency. The implementation of computer vision algorithms on reconfigurable hardware, explicitly low level processing, opens up the prospect of its use in autonomous systems, and they can act as a coprocessor to reconstruct 3D images with high density information in real time
- …