75 research outputs found

    FPGA-based module for SURF extraction

    Get PDF
    We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots

    Performance and area evaluations of processor-based benchmarks on FPGA devices

    Get PDF
    The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

    Comparison of different design alternatives for hardware-in-the-loop of power converters

    Full text link
    This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effor

    A Methodology to Design Pipelined Simulated Annealing Kernel Accelerators on Space-Borne Field-Programmable Gate Arrays

    Get PDF
    Increased levels of science objectives expected from spacecraft systems necessitate the ability to carry out fast on-board autonomous mission planning and scheduling. Heterogeneous radiation-hardened Field Programmable Gate Arrays (FPGAs) with embedded multiplier and memory modules are well suited to support the acceleration of scheduling algorithms. A methodology to design circuits specifically to accelerate Simulated Annealing Kernels (SAKs) in event scheduling algorithms is shown. The main contribution of this thesis is the low complexity scoring calculation used for the heuristic mapping algorithm used to balance resource allocation across a coarse-grained pipelined data-path. The methodology was exercised over various kernels with different cost functions and problem sizes. These test cases were benchedmarked for execution time, resource usage, power, and energy on a Xilinx Virtex 4 LX QR 200 FPGA and a BAE RAD 750 microprocessor

    A Preliminary FPGA Implementation and Analysis of Phatak’s Quotient-First Scaling Algorithm in the Reduced-Precision Residue Number System

    Get PDF
    We built and tested the first hardware implementation of Phatak’s Quotient-First Scaling (QFS) algorithm in the reduced-precision residue number system (RP-RNS). This algorithm is designed to expedite division in the Residue Number System for the special case when the divisor is known ahead of time (i.e., when the divisor can be considered to be a constant, as in the modular exponentiation required for the RSA encryption/decryption). We implemented the QFS algorithm using an FPGA and tested it for operand lengths up to 1024 bits. The RP-RNS modular exponentiation algorithm is not based on Montgomery’s method, but on quotient estimation derived from the straightforward division algorithm, with substantial amount of precomputations whose results are read from look-up tables at run-time. Phatak’s preliminary analysis indicates that under reasonable assumptions about hardware capabilities, a single modular multiplication’s (or QFS’s) execution time grows logarithmically with respect to the operand word length. We experimentally confirmed this predicted growth rate of the delay of a modular multiplication with our FPGA implementation. Though our implementation did not outperform the most recent implementations such as that by Gandino, et al., we determined that this outcome was solely a consequence of tradeoffs stemming from our decision to store the lookup tables on the FPGA

    An FPGA architecture design of a high performance adaptive notch filter

    Get PDF
    The occurrence of narrowband interference near frequencies carrying information is a common problem in modern control and signal processing applications. A very narrow notch filter is required in order to remove the unwanted signal while not compromising the integrity of the carrier signal. In many practical situations, the interference may wander within a frequency band, in which case a wider notch filter would be needed to guarantee its removal, which may also allow for the degradation of information being carried in nearby frequencies. If the interference frequency could be autonomously tracked, a narrow bandwidth notch filter could be successfully implemented for the particular frequency. Adaptive signal processing is a powerful technique that can be used in the tracking and elimination of such a signal. An application where an adaptive notch filter becomes necessary is in biomedical instrumentation, such as the electrocardiogram recorder. The recordings can become useless when in the presence of electromagnetic fields generated by power lines. Research was conducted to fully characterize the interference. Research on notch filter structures and adaptive filter algorithms has been carried out. The lattice form filter structure was chosen for its inherent stability and performance benefits. A new adaptive filter algorithm was developed targeting a hardware implementation. The algorithm used techniques from several other algorithms that were found to be beneficial. This work developed the hardware implementation of a lattice form adaptive notch filter to be used for the removal of power line interference from electrocardiogram signals. The various design tradeo s encountered were documented. The final design was targeted toward multiple field programmable gate arrays using multiple optimization efforts. Those results were then compared. The adaptive notch filter was able to successfully track and remove the interfering signal. The lattice form structure utilized by the proposed filter was verified to exhibit an inherently stable realization. The filter was subjected to various environments that modeled the different power line disturbances that could be present. The final filter design resulted in a 3 dB bandwidth of 15.8908 Hz, and a null depth of 54 dB. For the baseline test case, the algorithm achieved convergence after 270 iterations. The final hardware implementation was successfully verified against the MATLAB simulation results. A speedup of 3.8 was seen between the Xilinx Virtex-5 and Spartan-II device technologies. The final design used a small fraction of the available resources for each of the two devices that were characterized. This would allow the component to be more readily available to be added to existing projects, or further optimized by utilizing additional logic

    Hardware implementation of naive bayes classifier for malware detection

    Get PDF
    Naïve bayes classifier is a probabilistic supervised machine learning algorithm, that can be launched on most general-purpose devices to solve wide range of classification problems. However, when it comes to real time applications, the general-purpose devices are limited in term of their computational throughput, thus this algorithm couldn’t be used for that purpose. The aim of this project is to accelerate this algorithm in hardware environment to improve its performance by exploring its hidden concurrency and map it into parallel hardware as an optimized IP package, suitable for FPGA-SoC applications. Thus, it could be used as a middle box system for real time malware detection. In order for the proposed hardware to meet the requirements of this research, it should be able to handle both training, and inference part in hardware, and also should be able to receive a flow of 20 features, each of 32-bitsize, organized in 4-gram format. To meet these requirements, an enhanced version of the algorithm was developed and tested in Cprogramming. Then an equivalent design with a 5-stages pipelined architecture, and single instruction multiple data capabilities, was built in hardware to address the case. At the end, the proposed hardware found to be 65 times faster in term of its computational throughput compared to an existing design, and that with keeping the accuracy level as high as 94%, under the conditions of experiment carried
    • …
    corecore