579 research outputs found

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Implementation and Applications of Logarithmic Signal Processing on an FPGA

    Get PDF
    This thesis presents two novel algorithms for converting a normalised binary floating point number into a binary logarithmic number with the single-precision of a floating point number. The thesis highlights the importance of logarithmic number systems in real-time DSP applications. A real-time cross-correlation application where logarithmic signal processing is used to simplify the complex computation is presented. The first algorithm presented in this thesis comprises two stages. A piecewise linear approximation to the original logarithmic curve is performed in the first stage and a scaled-down normalised error curve is stored in the second stage. The algorithm requires less than 20 kbits of ROM and a maximum of three small multipliers. The architecture is implemented on Xilinx's Spartan3 and Spartan6 FPGA family. Synthesis results confirm that the algorithm operates at a frequency of 42.3 MHz on a Spartan3 device and 127.8 MHz on a Spartan6. Both solutions have a pipeline latency of two clocks. The operating speed increases to 71.4 MHz and 160 MHz respectively when the pipeline latencies increase to eight clocks. The proposed algorithm is further improved by using a PWL (Piece-Wise Linear) approximation of the transform curve combined with a PWL approximation of a scaled version of the normalized segment error. A hardware approach for reducing the memory with additional XOR gates in the second stage is also presented. The architecture presented uses just one 18k bit Block RAM (BRAM) and synthesis results indicate operating frequencies of 93 and 110 MHz when implemented on the Xilinx Spartan3 and Spartan6 devices respectively. Finally a novel prototype of an FPGA-based four channel correlation velocimetry system is presented. The system operates at a higher sampling frquency than previous published work and outputs the new result after every new sample it receives. The system works at a sampling frequency of 195.31 kHz and a sample resolution of 12 bits. The prototype system calculates a delay in a range of 0 to 2.6 ms with a resolution of 5.12 us

    Algorithms and architectures for decimal transcendental function computation

    Get PDF
    Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors. Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors. In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008. To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches

    Implementation of a real time Hough transform using FPGA technology

    Get PDF
    This thesis is concerned with the modelling, design and implementation of efficient architectures for performing the Hough Transform (HT) on mega-pixel resolution real-time images using Field Programmable Gate Array (FPGA) technology. Although the HT has been around for many years and a number of algorithms have been developed it still remains a significant bottleneck in many image processing applications. Even though, the basic idea of the HT is to locate curves in an image that can be parameterized: e.g. straight lines, polynomials or circles, in a suitable parameter space, the research presented in this thesis will focus only on location of straight lines on binary images. The HT algorithm uses an accumulator array (accumulator bins) to detect the existence of a straight line on an image. As the image needs to be binarized, a novel generic synchronization circuit for windowing operations was designed to perform edge detection. An edge detection method of special interest, the canny method, is used and the design and implementation of it in hardware is achieved in this thesis. As each image pixel can be implemented independently, parallel processing can be performed. However, the main disadvantage of the HT is the large storage and computational requirements. This thesis presents new and state-of-the-art hardware implementations for the minimization of the computational cost, using the Hybrid-Logarithmic Number System (Hybrid-LNS) for calculating the HT for fixed bit-width architectures. It is shown that using the Hybrid-LNS the computational cost is minimized, while the precision of the HT algorithm is maintained. Advances in FPGA technology now make it possible to implement functions as the HT in reconfigurable fabrics. Methods for storing large arrays on FPGA’s are presented, where data from a 1024 x 1024 pixel camera at a rate of up to 25 frames per second are processed

    Local control of multiple module converters with ratings-based load sharing

    Get PDF
    Multiple module dc-dc converters show promise in meeting the increasing demands on ef- ficiency and performance of energy conversion systems. In order to increase reliability, maintainability, and expandability, a modular approach in converter design is often desired. This thesis proposes local control of multiple module converters as an alternative to using a central controller or master controller. A power ratings-based load sharing scheme that allows for uniform and non-uniform sharing is introduced. Focus is given to an input series, output parallel (ISOP) configuration and modules with a push-pull topology. Sensorless current mode (SCM) control is digitally implemented on separate controllers for each of the modules. The benefits of interleaving the switching signals of the distributed modules is presented. Simulation and experimental results demonstrate stable, ratings-based sharing in an ISOP converter with a high conversion ratio for both uniform and non-uniform load sharing cases

    Design and Implementation of a Re-Configurable Arbitrary Signal Generator and Radio Frequency Spectrum Analyser

    Get PDF
    This research is focused on the design, simulation and implementation of a reconfigurable arbitrary signal generator and the design, simulation and implementation of a radio frequency spectrum analyser based on digital signal processing. Until recently, Application Specific Integrated Circuits (ASICs) were used to produce high performance re-configurable function and arbitrary waveform generators with comprehensive modulation capabilities. However, that situation is now changing with the availability of advanced but low cost Field Programmable Gate Arrays (FPGAs), which could be used as an alternative to ASICs in these applications. The availability of high performance FPGA families opens up the opportunity to compete with ASIC solutions at a fraction of the development cost of an ASIC solution. A fast digital signal processing algorithm for digital waveform generation, using primarily but not limited to Direct Digital Synthesis (DDS) technologies, developed and implemented in a field-configurable logic, with control provided by an embedded microprocessor replacing a high cost ASIC design appeared to be a very attractive concept. This research demonstrates that such a concept is feasible in its entirety. A fully functional, low-complexity, low cost, pulse, Gaussian white noise and DDS based function and arbitrary waveform generator, capable of being amplitude, frequency and phase modulated by an internally generated or external modulating signal was implemented in a low-cost FPGA. The FPGA also included the capabilities to perform pulse width modulation and pulse delay modulation on pulse waveforms. Algorithms to up-convert the sampling rate of the external modulating signal using Cascaded Integrator Comb (CIC) filters and using interpolation method were analysed. Both solutions were implemented to compare their hardware complexities. Analysis of generating noise with user-defined distribution is presented. The ability of triggering the generator by an internally generated or an external event to generate a burst of waveforms where the time between the trigger signal and waveform output is fixed was also implemented in the FPGA. Finally, design of interface to a microprocessor to provide control of the versatile waveform generator was also included in the FPGA. This thesis summarises the literature, design considerations, simulation and implementation of the generator design. The second part of the research is focused on radio frequency spectrum analysis based on digital signal processing. Most existing spectrum analysers are analogue in nature and their complexity increases with frequency. Therefore, the possibility of using digital techniques for spectrum analysis was considered. The aim was to come up with digital system architecture for spectrum analysis and to develop and implement the new approach on a suitable digital platform. This thesis analyses the current literature on shifting algorithms to remove spurious responses and highlights its drawbacks. This thesis also analyses existing literature on quadrature receivers and presents novel adaptation of the existing architectures for application in spectrum analysis. A wide band spectrum analyser receiver with compensation for gain and phase imbalances in the Radio Frequency (RF) input range, as well as compensation for gain and phase imbalances within the Intermediate Frequency (IF) pass band complete with Resolution Band Width (RBW) filtering, Video Band Width (VBW) filtering and amplitude detection was implemented in a low cost FPGA. The ability to extract the modulating signal from a frequency or amplitude modulated RF signal was also implemented. The same family of FPGA used in the generator design was chosen to be the digital platform for this design. This research makes arguments for the new architecture and then summarises the literature, design considerations, simulation and implementation of the new digital algorithm for the radio frequency spectrum analyser

    Power-Aware Design Methodologies for FPGA-Based Implementation of Video Processing Systems

    Get PDF
    The increasing capacity and capabilities of FPGA devices in recent years provide an attractive option for performance-hungry applications in the image and video processing domain. FPGA devices are often used as implementation platforms for image and video processing algorithms for real-time applications due to their programmable structure that can exploit inherent spatial and temporal parallelism. While performance and area remain as two main design criteria, power consumption has become an important design goal especially for mobile devices. Reduction in power consumption can be achieved by reducing the supply voltage, capacitances, clock frequency and switching activities in a circuit. Switching activities can be reduced by architectural optimization of the processing cores such as adders, multipliers, multiply and accumulators (MACS), etc. This dissertation research focuses on reducing the switching activities in digital circuits by considering data dependencies in bit level, word level and block level neighborhoods in a video frame. The bit level data neighborhood dependency consideration for power reduction is illustrated in the design of pipelined array, Booth and log-based multipliers. For an array multiplier, operands of the multipliers are partitioned into higher and lower parts so that the probability of the higher order parts being zero or one increases. The gating technique for the pipelined approach deactivates part(s) of the multiplier when the above special values are detected. For the Booth multiplier, the partitioning and gating technique is integrated into the Booth recoding scheme. In addition, a delay correction strategy is developed for the Booth multiplier to reduce the switching activities of the sign extension part in the partial products. A novel architecture design for the computation of log and inverse-log functions for the reduction of power consumption in arithmetic circuits is also presented. This also utilizes the proposed partitioning and gating technique for further dynamic power reduction by reducing the switching activities. The word level and block level data dependencies for reducing the dynamic power consumption are illustrated by presenting the design of a 2-D convolution architecture. Here the similarities of the neighboring pixels in window-based operations of image and video processing algorithms are considered for reduced switching activities. A partitioning and detection mechanism is developed to deactivate the parallel architecture for window-based operations if higher order parts of the pixel values are the same. A neighborhood dependent approach (NDA) is incorporated with different window buffering schemes. Consideration of the symmetry property in filter kernels is also applied with the NDA method for further reduction of switching activities. The proposed design methodologies are implemented and evaluated in a FPGA environment. It is observed that the dynamic power consumption in FPGA-based circuit implementations is significantly reduced in bit level, data level and block level architectures when compared to state-of-the-art design techniques. A specific application for the design of a real-time video processing system incorporating the proposed design methodologies for low power consumption is also presented. An image enhancement application is considered and the proposed partitioning and gating, and NDA methods are utilized in the design of the enhancement system. Experimental results show that the proposed multi-level power aware methodology achieves considerable power reduction. Research work is progressing In utilizing the data dependencies in subsequent frames in a video stream for the reduction of circuit switching activities and thereby the dynamic power consumption
    • …
    corecore