69 research outputs found

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Square-rich fixed point polynomial evaluation on FPGAs

    Get PDF
    Polynomial evaluation is important across a wide range of application domains, so significant work has been done on accelerating its computation. The conventional algorithm, referred to as Horner's rule, involves the least number of steps but can lead to increased latency due to serial computation. Parallel evaluation algorithms such as Estrin's method have shorter latency than Horner's rule, but achieve this at the expense of large hardware overhead. This paper presents an efficient polynomial evaluation algorithm, which reforms the evaluation process to include an increased number of squaring steps. By using a squarer design that is more efficient than general multiplication, this can result in polynomial evaluation with a 57.9% latency reduction over Horner's rule and 14.6% over Estrin's method, while consuming less area than Horner's rule, when implemented on a Xilinx Virtex 6 FPGA. When applied in fixed point function evaluation, where precision requirements limit the rounding of operands, it still achieves a 52.4% performance gain compared to Horner's rule with only a 4% area overhead in evaluating 5th degree polynomials

    ZyCAP : efficient partial reconfiguration management on the Xilinx Zynq

    Get PDF
    New hybrid FPGA platforms that couple processors with a reconfigurable fabric, such as the Xilinx Zynq, offer an alternative view of reconfigurable computing where software applications leverage hardware resources through the use of often reconfigured accelerators. For this to be feasible, reconfiguration overheads must be reduced so that the processor is not burdened with managing the process. We discuss partial reconfiguration (PR) on these architectures, and present an open source controller, ZyCAP, that overcomes the limitations of existing methods, offering more effective use of hardware resources in such architectures. ZyCAP combines high-throughput configuration with a high-level software interface that frees the processor from detailed PR management, making PR on the Zynq easy and efficient

    Minimizing DSP block usage through multi-pumping

    Get PDF
    Resource sharing in the mapping of an algorithm to an architecture allows the same resource to be scheduled for different uses in different cycles, generally at the cost of increased schedule length. Multi-pumping is a method whereby a resource is clocked at a frequency that is a multiple of the surrounding circuit, thereby offering multiple executions per global clock, and therefore sharing in the same clock cycle. This concept maps well to FPGA architectures, where hard macro blocks are typically capable of running at higher frequencies than standard logic. While this technique has been demonstrated for multipliers, modern DSP blocks are more complex with multiple computational nodes. In this paper, we apply multi-pumping to minimise DSP block usage, while taking advantage of the multiple nodes they support. The proposed approach uses, on average, 39% fewer DSP blocks, at a cost of 19% more LUTs and 7% more registers

    FPGA dynamic and partial reconfiguration : a survey of architectures, methods, and applications

    Get PDF
    Dynamic and partial reconfiguration are key differentiating capabilities of field programmable gate arrays (FPGAs). While they have been studied extensively in academic literature, they find limited use in deployed systems. We review FPGA reconfiguration, looking at architectures built for the purpose, and the properties of modern commercial architectures. We then investigate design flows, and identify the key challenges in making reconfigurable FPGA systems easier to design. Finally, we look at applications where reconfiguration has found use, as well as proposing new areas where this capability places FPGAs in a unique position for adoption

    Shaping spectral leakage for IEEE 802.11 p vehicular communications

    Get PDF
    IEEE 802.11p is a recently defined standard for the physical (PHY) and medium access control (MAC) layers for Dedicated Short-Range Communications. Four Spectrum Emission Masks (SEMs) are specified in 802.11p that are much more stringent than those for current 802.11 systems. In addition, the guard interval in 802.11p has been lengthened by reducing the bandwidth to support vehicular communication (VC) channels, and this results in a narrowing of the frequency guard. This raises a significant challenge for filtering the spectrum of 802.11p signals to meet the specifications of the SEMs. We investigate state of the art pulse shaping and filtering techniques for 802.11p, before proposing a new method of shaping the 802.11p spectral leakage to meet the most stringent, class D, SEM specification. The proposed method, performed at baseband to relax the strict constraints of the radio frequency (RF) front-end, allows 802.11p systems to be implemented using commercial off-the- shelf (COTS) 802.11a RF hardware, resulting in reduced total system cost

    Extensible FlexRay communication controller for FPGA-based automotive systems

    Get PDF
    Modern vehicles incorporate an increasing number of distributed compute nodes, resulting in the need for faster and more reliable in-vehicle networks. Time-triggered protocols such as FlexRay have been gaining ground as the standard for high-speed reliable communications in the automotive industry, marking a shift away from the event-triggered medium access used in controller area networks (CANs). These new standards enable the higher levels of determinism and reliability demanded from next-generation safety-critical applications. Advanced applications can benefit from tight coupling of the embedded computing units with the communication interface, thereby providing functionality beyond the FlexRay standard. Such an approach is highly suited to implementation on reconfigurable architectures. This paper describes a field-programmable gate array (FPGA)-based communication controller (CC) that features configurable extensions to provide functionality that is unavailable with standard implementations or off-the-shelf devices. It is implemented and verified on a Xilinx Spartan 6 FPGA, integrated with both a logic-based hardware ECU and a fully fledged processor-based electronic control unit (ECU). Results show that the platform-centric implementation generates a highly efficient core in terms of power, performance, and resource utilization. We demonstrate that the flexible extensions help enable advanced applications that integrate features such as fault tolerance, timeliness, and security, with practical case studies. This tight integration between the controller, computational functions, and flexible extensions on the controller enables enhancements that open the door for exciting applications in future vehicles
    • …
    corecore