1,730 research outputs found
Time efficient segmented technique for dynamic programming based algorithms with FPGA implementation
© 2019 World Scientific Publishing Company. Although dynamic programming (DP) is an optimization approach used to solve a complex problem fast, the time required to solve it is still not efficient and grows polynomially with the size of the input. In this contribution, we improve the computation time of the dynamic programming based algorithms by proposing a novel technique, which is called SDP: Segmented Dynamic programming . SDP finds the best way of splitting the compared sequences into segments and then applies the dynamic programming algorithm to each segment individually. This will reduce the computation time dramatically. SDP may be applied to any dynamic programming based algorithm to improve its computation time. As case studies, we apply the SDP technique on two different dynamic programming based algorithms; Needleman-Wunsch (NW) , the widely used program for optimal sequence alignment, and the LCS algorithm, which finds the Longest Common Subsequence between two input strings. The results show that applying the SDP technique in conjunction with the DP based algorithms improves the computation time by up to 80% in comparison to the sole DP algorithms, but with small or ignorable degradation in comparing results. This degradation is controllable and it is based on the number of split segments as an input parameter. However, we compare our results with the well-known heuristic FASTA sequence alignment algorithm, GGSEARCH . We show that our results are much closer to the optimal results than the GGSEARCH algorithm. The results are valid independent from the sequences length and their level of similarity. To show the functionality of our technique on the hardware and to verify the results, we implement it on the Xilinx Zynq-7000 FPGA
Recommended from our members
Efficient architectures and power modelling of multiresolution analysis algorithms on FPGA
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.In the past two decades, there has been huge amount of interest in Multiresolution Analysis Algorithms (MAAs) and their applications. Processing some of their applications such as medical imaging are computationally intensive, power hungry and requires large amount of memory which cause a high demand for efficient algorithm implementation, low power architecture and acceleration. Recently, some MAAs such as Finite Ridgelet Transform (FRIT) Haar Wavelet Transform (HWT) are became very popular and they are suitable for a number of image processing applications such as detection of line singularities and contiguous edges, edge detection (useful for compression and feature detection), medical image denoising and segmentation. Efficient hardware implementation and acceleration of these algorithms particularly when addressing large problems are becoming very chal-lenging and consume lot of power which leads to a number of issues including mobility, reliability concerns. To overcome the computation problems, Field Programmable Gate Arrays (FPGAs) are the technology of choice for accelerating computationally intensive applications due to their high performance. Addressing the power issue requires optimi- sation and awareness at all level of abstractions in the design flow.
The most important achievements of the work presented in this thesis are summarised
here.
Two factorisation methodologies for HWT which are called HWT Factorisation Method1 and (HWTFM1) and HWT Factorasation Method2 (HWTFM2) have been explored to increase number of zeros and reduce hardware resources. In addition, two novel efficient and optimised architectures for proposed methodologies based on Distributed Arithmetic (DA) principles have been proposed. The evaluation of the architectural results have shown that the proposed architectures results have reduced the arithmetics calculation (additions/subtractions) by 33% and 25% respectively compared to direct implementa-tion of HWT and outperformed existing results in place. The proposed HWTFM2 is implemented on advanced and low power FPGA devices using Handel-C language. The FPGAs implementation results have outperformed other existing results in terms of area and maximum frequency. In addition, a novel efficient architecture for Finite Radon Trans-form (FRAT) has also been proposed. The proposed architecture is integrated with the developed HWT architecture to build an optimised architecture for FRIT. Strategies such as parallelism and pipelining have been deployed at the architectural level for efficient im-plementation on different FPGA devices. The proposed FRIT architecture performance has been evaluated and the results outperformed some other existing architecture in place. Both FRAT and FRIT architectures have been implemented on FPGAs using Handel-C language. The evaluation of both architectures have shown that the obtained results out-performed existing results in place by almost 10% in terms of frequency and area. The proposed architectures are also applied on image data (256 £ 256) and their Peak Signal to Noise Ratio (PSNR) is evaluated for quality purposes.
Two architectures for cyclic convolution based on systolic array using parallelism and pipelining which can be used as the main building block for the proposed FRIT architec-ture have been proposed. The first proposed architecture is a linear systolic array with pipelining process and the second architecture is a systolic array with parallel process. The second architecture reduces the number of registers by 42% compare to first architec-ture and both architectures outperformed other existing results in place. The proposed pipelined architecture has been implemented on different FPGA devices with vector size (N) 4,8,16,32 and word-length (W=8). The implementation results have shown a signifi-cant improvement and outperformed other existing results in place.
Ultimately, an in-depth evaluation of a high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called func-tional level power modelling approach have been presented. The mathematical techniques that form the basis of the proposed power modeling has been validated by a range of custom IP cores. The proposed power modelling is scalable, platform independent and compares favorably with existing approaches. A hybrid, top-down design flow paradigm integrating functional level power modelling with commercially available design tools for systematic optimisation of IP cores has also been developed. The in-depth evaluation of this tool enables us to observe the behavior of different custom IP cores in terms of power consumption and accuracy using different design methodologies and arithmetic techniques on virous FPGA platforms. Based on the results achieved, the proposed model accuracy is almost 99% true for all IP core's Dynamic Power (DP) components.Thomas Gerald Gray Charitable Trus
Mobile Hardware Based Implementation of a Novel, Efficient, Fuzzy Logic Inspired Edge Detection Technique for Analysis of Malaria Infected Microscopic Thin Blood Images
This paper proposes a novel, efficient, low complexity algorithm for edge detection, with a cheap, easily accessible, networkable hardware implementation, specifically focused on the analysis of malaria infected thin blood smears. The algorithm presents a new and dynamic thresholding technique that eliminates inter-cell interference based on histogram analysis. Following this, binary image morphological processing is performed which is shown to outperform the same operation on the much more complex greyscale images. Edge tracking is done via a simplified fuzzy logic inspired rule system. The entire system is implemented on multiple platforms to test widespread compatibility but primarily developed for a battery powered standalone raspberry pi with low power, low resolution touchscreen and hardware buttons. The entire algorithm was pitted against the much more complex but still very well performing Canny algorithm, which despite the age, is still one of the most comprehensive edge detection techniques available; modern variants were considered and reviewed, but ultimately given the level of outperformance, they were not viable options
FPGA implementations for parallel multidimensional filtering algorithms
PhD ThesisOne and multi dimensional raw data collections introduce noise and artifacts, which need to be recovered from degradations by an automated filtering system before, further machine analysis. The need for automating wide-ranged filtering applications necessitates the design of generic filtering architectures, together with the development of multidimensional and extensive convolution operators. Consequently, the aim of this thesis is to investigate the problem of automated construction of a generic parallel filtering system. Serving this goal, performance-efficient FPGA implementation architectures are developed to realize parallel one/multi-dimensional filtering algorithms. The proposed generic architectures provide a mechanism for fast FPGA prototyping of high performance computations to obtain efficiently implemented performance indices of area, speed, dynamic power, throughput and computation rates, as a complete package. These parallel filtering algorithms and their automated generic architectures tackle the major bottlenecks and limitations of existing multiprocessor systems in wordlength, input data segmentation, boundary conditions as well as inter-processor communications, in order to support high data throughput real-time applications of low-power architectures using a Xilinx Virtex-6 FPGA board.
For one-dimensional raw signal filtering case, mathematical model and architectural development of the generalized parallel 1-D filtering algorithms are presented using the 1-D block filtering method. Five generic architectures are implemented on a Virtex-6 ML605 board, evaluated and compared. A complete set of results on area, speed, power, throughput and computation rates are obtained and discussed as performance indices for the 1-D convolution architectures. A successful application of parallel 1-D cross-correlation is demonstrated.
For two dimensional greyscale/colour image processing cases, new parallel 2-D/3-D filtering algorithms are presented and mathematically modelled using input decimation and output image reconstruction by interpolation. Ten generic architectures are implemented on the Virtex-6 ML605 board, evaluated and compared. Key results on area, speed, power, throughput and computation rate are obtained and discussed as performance indices for the 2-D convolution architectures. 2-D image reconfigurable processors are developed and implemented using single, dual and quad MAC FIR units. 3-D Colour image processors are devised to act as 3-D colour filtering engines. A 2-D cross-correlator parallel engine is successfully developed as a parallel 2-D matched filtering algorithm for locating any MRI slice within a MRI data stack library. Twelve 3-D MRI filtering operators are plugged in and adapted to be suitable for biomedical imaging, including 3-D edge operators and 3-D noise smoothing operators.
Since three dimensional greyscale/colour volumetric image applications are computationally intensive, a new parallel 3-D/4-D filtering algorithm is presented and mathematically modelled using volumetric data image segmentation by decimation and output reconstruction by interpolation, after simultaneously and independently performing 3-D filtering. Eight generic architectures are developed and implemented on the Virtex-6 board, including 3-D spatial and FFT convolution architectures. Fourteen 3-D MRI filtering operators are plugged and adapted for this particular biomedical imaging application, including 3-D edge operators and 3-D noise smoothing operators. Three successful applications are presented in 4-D colour MRI (fMRI) filtering processors, k-space MRI volume data filter and 3-D cross-correlator.IRAQI Government
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
A Micro Power Hardware Fabric for Embedded Computing
Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor
Novel control approaches for the next generation computer numerical control (CNC) system for hybrid micro-machines
It is well-recognised that micro-machining is a key enabling technology for manufacturing high value-added 3D micro-products, such as optics, moulds/dies and biomedical implants etc. These products are usually made of a wide range of engineering materials and possess complex freeform surfaces with tight tolerance on form accuracy and surface finish.In recent years, hybrid micro-machining technology has been developed to integrate several machining processes on one platform to tackle the manufacturing challenges for the aforementioned micro-products. However, the complexity of system integration and ever increasing demand for further enhanced productivity impose great challenges on current CNC systems. This thesis develops, implements and evaluates three novel control approaches to overcome the identified three major challenges, i.e. system integration, parametric interpolation and toolpath smoothing. These new control approaches provide solid foundation for the development of next generation CNC system for hybrid micro-machines.There is a growing trend for hybrid micro-machines to integrate more functional modules. Machine developers tend to choose modules from different vendors to satisfy the performance and cost requirements. However, those modules often possess proprietary hardware and software interfaces and the lack of plug-and-play solutions lead to tremendous difficulty in system integration. This thesis proposes a novel three-layer control architecture with component-based approach for system integration. The interaction of hardware is encapsulated into software components, while the data flow among different components is standardised. This approach therefore can significantly enhance the system flexibility. It has been successfully verified through the integration of a six-axis hybrid micro-machine. Parametric curves have been proven to be the optimal toolpath representation method for machining 3D micro-products with freeform surfaces, as they can eliminate the high-frequency fluctuation of feedrate and acceleration caused by the discontinuity in the first derivatives along linear or circular segmented toolpath. The interpolation for parametric curves is essentially an optimization problem, which is extremely difficult to get the time-optimal solution. This thesis develops a novel real-time interpolator for parametric curves (RTIPC), which provides a near time-optimal solution. It limits the machine dynamics (axial velocities, axial accelerations and jerk) and contour error through feedrate lookahead and acceleration lookahead operations. Experiments show that the RTIPC can simplify the coding significantly, and achieve up to ten times productivity than the industry standard linear interpolator. Furthermore, it is as efficient as the state-of-the-art Position-Velocity-Time (PVT) interpolator, while achieving much smoother motion profiles.Despite the fact that parametric curves have huge advantage in toolpath continuity, linear segmented toolpath is still dominantly used on the factory floor due to its straightforward coding and excellent compatibility with various CNC systems. This thesis presents a new real-time global toolpath smoothing algorithm, which bridges the gap in toolpath representation for CNC systems. This approach uses a cubic B-spline to approximate a sequence of linear segments. The approximation deviation is controlled by inserting and moving new control points on the control polygon. Experiments show that the proposed approach can increase the productivity by more than three times than the standard toolpath traversing algorithm, and 40% than the state-of-the-art corner blending algorithm, while achieving excellent surface finish.Finally, some further improvements for CNC systems, such as adaptive cutting force control and on-line machining parameters adjustment with metrology, are discussed in the future work section.It is well-recognised that micro-machining is a key enabling technology for manufacturing high value-added 3D micro-products, such as optics, moulds/dies and biomedical implants etc. These products are usually made of a wide range of engineering materials and possess complex freeform surfaces with tight tolerance on form accuracy and surface finish.In recent years, hybrid micro-machining technology has been developed to integrate several machining processes on one platform to tackle the manufacturing challenges for the aforementioned micro-products. However, the complexity of system integration and ever increasing demand for further enhanced productivity impose great challenges on current CNC systems. This thesis develops, implements and evaluates three novel control approaches to overcome the identified three major challenges, i.e. system integration, parametric interpolation and toolpath smoothing. These new control approaches provide solid foundation for the development of next generation CNC system for hybrid micro-machines.There is a growing trend for hybrid micro-machines to integrate more functional modules. Machine developers tend to choose modules from different vendors to satisfy the performance and cost requirements. However, those modules often possess proprietary hardware and software interfaces and the lack of plug-and-play solutions lead to tremendous difficulty in system integration. This thesis proposes a novel three-layer control architecture with component-based approach for system integration. The interaction of hardware is encapsulated into software components, while the data flow among different components is standardised. This approach therefore can significantly enhance the system flexibility. It has been successfully verified through the integration of a six-axis hybrid micro-machine. Parametric curves have been proven to be the optimal toolpath representation method for machining 3D micro-products with freeform surfaces, as they can eliminate the high-frequency fluctuation of feedrate and acceleration caused by the discontinuity in the first derivatives along linear or circular segmented toolpath. The interpolation for parametric curves is essentially an optimization problem, which is extremely difficult to get the time-optimal solution. This thesis develops a novel real-time interpolator for parametric curves (RTIPC), which provides a near time-optimal solution. It limits the machine dynamics (axial velocities, axial accelerations and jerk) and contour error through feedrate lookahead and acceleration lookahead operations. Experiments show that the RTIPC can simplify the coding significantly, and achieve up to ten times productivity than the industry standard linear interpolator. Furthermore, it is as efficient as the state-of-the-art Position-Velocity-Time (PVT) interpolator, while achieving much smoother motion profiles.Despite the fact that parametric curves have huge advantage in toolpath continuity, linear segmented toolpath is still dominantly used on the factory floor due to its straightforward coding and excellent compatibility with various CNC systems. This thesis presents a new real-time global toolpath smoothing algorithm, which bridges the gap in toolpath representation for CNC systems. This approach uses a cubic B-spline to approximate a sequence of linear segments. The approximation deviation is controlled by inserting and moving new control points on the control polygon. Experiments show that the proposed approach can increase the productivity by more than three times than the standard toolpath traversing algorithm, and 40% than the state-of-the-art corner blending algorithm, while achieving excellent surface finish.Finally, some further improvements for CNC systems, such as adaptive cutting force control and on-line machining parameters adjustment with metrology, are discussed in the future work section
Circuit design and analysis for on-FPGA communication systems
On-chip communication system has emerged as a prominently important subject in Very-Large-
Scale-Integration (VLSI) design, as the trend of technology scaling favours logics more than interconnects.
Interconnects often dictates the system performance, and, therefore, research for new
methodologies and system architectures that deliver high-performance communication services
across the chip is mandatory. The interconnect challenge is exacerbated in Field-Programmable
Gate Array (FPGA), as a type of ASIC where the hardware can be programmed post-fabrication.
Communication across an FPGA will be deteriorating as a result of interconnect scaling. The programmable
fabrics, switches and the specific routing architecture also introduce additional latency
and bandwidth degradation further hindering intra-chip communication performance.
Past research efforts mainly focused on optimizing logic elements and functional units in FPGAs.
Communication with programmable interconnect received little attention and is inadequately understood.
This thesis is among the first to research on-chip communication systems that are built on
top of programmable fabrics and proposes methodologies to maximize the interconnect throughput
performance. There are three major contributions in this thesis: (i) an analysis of on-chip
interconnect fringing, which degrades the bandwidth of communication channels due to routing
congestions in reconfigurable architectures; (ii) a new analogue wave signalling scheme that significantly
improves the interconnect throughput by exploiting the fundamental electrical characteristics
of the reconfigurable interconnect structures. This new scheme can potentially mitigate
the interconnect scaling challenges. (iii) a novel Dynamic Programming (DP)-network to provide
adaptive routing in network-on-chip (NoC) systems. The DP-network architecture performs runtime
optimization for route planning and dynamic routing which, effectively utilizes the in-silicon
bandwidth. This thesis explores a new horizon in reconfigurable system design, in which new
methodologies and concepts are proposed to enhance the on-FPGA communication throughput
performance that is of vital importance in new technology processes
- …