4,878 research outputs found

    Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

    Full text link
    This paper presents an FPGA runtime framework that demonstrates the feasibility of using dynamic partial reconfiguration (DPR) for time-sharing an FPGA by multiple realtime computer vision pipelines. The presented time-sharing runtime framework manages an FPGA fabric that can be round-robin time-shared by different pipelines at the time scale of individual frames. In this new use-case, the challenge is to achieve useful performance despite high reconfiguration time. The paper describes the basic runtime support as well as four optimizations necessary to achieve realtime performance given the limitations of DPR on today's FPGAs. The paper provides a characterization of a working runtime framework prototype on a Xilinx ZC706 development board. The paper also reports the performance of realtime computer vision pipelines when time-shared

    Real-Time Dense Stereo Matching With ELAS on FPGA Accelerated Embedded Devices

    Full text link
    For many applications in low-power real-time robotics, stereo cameras are the sensors of choice for depth perception as they are typically cheaper and more versatile than their active counterparts. Their biggest drawback, however, is that they do not directly sense depth maps; instead, these must be estimated through data-intensive processes. Therefore, appropriate algorithm selection plays an important role in achieving the desired performance characteristics. Motivated by applications in space and mobile robotics, we implement and evaluate a FPGA-accelerated adaptation of the ELAS algorithm. Despite offering one of the best trade-offs between efficiency and accuracy, ELAS has only been shown to run at 1.5-3 fps on a high-end CPU. Our system preserves all intriguing properties of the original algorithm, such as the slanted plane priors, but can achieve a frame rate of 47fps whilst consuming under 4W of power. Unlike previous FPGA based designs, we take advantage of both components on the CPU/FPGA System-on-Chip to showcase the strategy necessary to accelerate more complex and computationally diverse algorithms for such low power, real-time systems.Comment: 8 pages, 7 figures, 2 table

    Hardware Based Projection onto The Parity Polytope and Probability Simplex

    Full text link
    This paper is concerned with the adaptation to hardware of methods for Euclidean norm projections onto the parity polytope and probability simplex. We first refine recent efforts to develop efficient methods of projection onto the parity polytope. Our resulting algorithm can be configured to have either average computational complexity O(d)\mathcal{O}\left(d\right) or worst case complexity O(dlogd)\mathcal{O}\left(d\log{d}\right) on a serial processor where dd is the dimension of projection space. We show how to adapt our projection routine to hardware. Our projection method uses a sub-routine that involves another Euclidean projection; onto the probability simplex. We therefore explain how to adapt to hardware a well know simplex projection algorithm. The hardware implementations of both projection algorithms achieve area scalings of O(d(logd)2)\mathcal{O}(d\left(\log{d}\right)^2) at a delay of O((logd)2)\mathcal{O}(\left(\log{d}\right)^2). Finally, we present numerical results in which we evaluate the fixed-point accuracy and resource scaling of these algorithms when targeting a modern FPGA

    Hiding State in CλaSH Hardware Descriptions

    Get PDF
    Synchronous hardware can be modelled as a mapping from input and state to output and a new state, such mappings are referred to as transition functions. It is natural to use a functional language to implement transition functions. The CaSH compiler is capable of translating transition functions to VHDL. Modelling hardware using multiple components is convenient. Components in CaSH can be considered as instantiations of functions. To avoid packing and unpacking state when composing components, functions are lifted to arrows. By using arrows the chance of making errors will decrease as it is not required to manually (un)pack the state. Furthermore, the Haskell do-syntax for arrows increases the readability of hardware designs. This is demonstrated using a realistic example of a circuit which consists of multiple components

    Channel Sounding for the Masses: Low Complexity GNU 802.11b Channel Impulse Response Estimation

    Full text link
    New techniques in cross-layer wireless networks are building demand for ubiquitous channel sounding, that is, the capability to measure channel impulse response (CIR) with any standard wireless network and node. Towards that goal, we present a software-defined IEEE 802.11b receiver and CIR estimation system with little additional computational complexity compared to 802.11b reception alone. The system implementation, using the universal software radio peripheral (USRP) and GNU Radio, is described and compared to previous work. By overcoming computational limitations and performing direct-sequence spread-spectrum (DS-SS) matched filtering on the USRP, we enable high-quality yet inexpensive CIR estimation. We validate the channel sounder and present a drive test campaign which measures hundreds of channels between WiFi access points and an in-vehicle receiver in urban and suburban areas

    PGPG: An Automatic Generator of Pipeline Design for Programmable GRAPE Systems

    Get PDF
    We have developed PGPG (Pipeline Generator for Programmable GRAPE), a software which generates the low-level design of the pipeline processor and communication software for FPGA-based computing engines (FBCEs). An FBCE typically consists of one or multiple FPGA (Field-Programmable Gate Array) chips and local memory. Here, the term "Field-Programmable" means that one can rewrite the logic implemented to the chip after the hardware is completed, and therefore a single FBCE can be used for calculation of various functions, for example pipeline processors for gravity, SPH interaction, or image processing. The main problem with FBCEs is that the user need to develop the detailed hardware design for the processor to be implemented to FPGA chips. In addition, she or he has to write the control logic for the processor, communication and data conversion library on the host processor, and application program which uses the developed processor. These require detailed knowledge of hardware design, a hardware description language such as VHDL, the operating system and the application, and amount of human work is huge. A relatively simple design would require 1 person-year or more. The PGPG software generates all necessary design descriptions, except for the application software itself, from a high-level design description of the pipeline processor in the PGPG language. The PGPG language is a simple language, specialized to the description of pipeline processors. Thus, the design of pipeline processor in PGPG language is much easier than the traditional design. For real applications such as the pipeline for gravitational interaction, the pipeline processor generated by PGPG achieved the performance similar to that of hand-written code. In this paper we present a detailed description of PGPG version 1.0.Comment: 24 pages, 6 figures, accepted PASJ 2005 July 2

    R3^3SGM: Real-time Raster-Respecting Semi-Global Matching for Power-Constrained Systems

    Full text link
    Stereo depth estimation is used for many computer vision applications. Though many popular methods strive solely for depth quality, for real-time mobile applications (e.g. prosthetic glasses or micro-UAVs), speed and power efficiency are equally, if not more, important. Many real-world systems rely on Semi-Global Matching (SGM) to achieve a good accuracy vs. speed balance, but power efficiency is hard to achieve with conventional hardware, making the use of embedded devices such as FPGAs attractive for low-power applications. However, the full SGM algorithm is ill-suited to deployment on FPGAs, and so most FPGA variants of it are partial, at the expense of accuracy. In a non-FPGA context, the accuracy of SGM has been improved by More Global Matching (MGM), which also helps tackle the streaking artifacts that afflict SGM. In this paper, we propose a novel, resource-efficient method that is inspired by MGM's techniques for improving depth quality, but which can be implemented to run in real time on a low-power FPGA. Through evaluation on multiple datasets (KITTI and Middlebury), we show that in comparison to other real-time capable stereo approaches, we can achieve a state-of-the-art balance between accuracy, power efficiency and speed, making our approach highly desirable for use in real-time systems with limited power.Comment: Accepted in FPT 2018 as Oral presentation, 8 pages, 6 figures, 4 table
    corecore