892 research outputs found

    Power efficient dataflow design for a heterogeneous smart camera architecture

    Get PDF
    Visual attention modelling characterises the scene to segment regions of visual interest and is increasingly being used as a pre-processing step in many computer vision applications including surveillance and security. Smart camera architectures are an emerging technology and a foundation of security and safety frameworks in modern vision systems. In this paper, we present a dataflow design of a visual saliency based camera architecture targeting a heterogeneous CPU+FPGA platform to propose a smart camera network infrastructure. The proposed design flow encompasses image processing algorithm implementation, hardware & software integration and network connectivity through a unified model. By leveraging the properties of the dataflow paradigm, we iteratively refine the algorithm specification into a deployable solution, addressing distinct requirements at each design stage: from algorithm accuracy to hardware-software interactions, real-time execution and power consumption. Our design achieved real-time run time performance and the power consumption of the optimised asynchronous design is reported at only 0.25 Watt. The resource usages on a Xilinx Zynq platform remains significantly low

    FPGA-based smart camera mote for pervasive wireless network

    Get PDF
    International audienceSmart camera networks raise challenging issues in many fields of research, including vision processing, communication protocols, distributed algorithms or power management. The ever increasing resolution of image sensors entails huge amounts of data, far exceeding the bandwidth of current networks and thus forcing smart camera nodes to process raw data into useful information. Consequently, on-board processing has become a key issue for the expansion of such networked systems. In this context, FPGA-based platforms, supporting massive, fine grain data parallelism, offer large opportunities. Besides, the concept of a middleware, providing services for networking, data transfer, dynamic loading or hardware abstraction, has emerged as a means of harnessing the hardware and software complexity of smart camera nodes. In this paper, we prospect the development of a new kind of smart cameras, wherein FPGAs provide high performance processing and general purpose processors support middleware services. In this approach, FPGA devices can be reconfigured at run-time through the network both from explicit user request and transparent middleware decision. An embedded real-time operating system is in charge of the communication layer, and thus can autonomously decide to use a part of the FPGA as an available processing resource. The classical programmability issue, a significant obstacle when dealing with FPGAs, is addressed by resorting to a domain specific high-level programming language (CAPH) for describing operations to be implemented on FPGAs

    Rapid Prototyping of Embedded Video Processing Systems in FPGA Devices

    Get PDF
    Design of video processing circuits requires a variety of tools and knowledge, and it is difficult to find the right combination of tools for an efficient design process, specifically when considering open tools for evaluation or educational purpose. This chapter presents an overview of video processing requirements, programmable devices used for embedded video processing and the components of a video processing chain. We propose a novel design flow for generating customizable intellectual property (IP) cores used in streaming video processing applications. This design flow is based on domain-specific modules in Python language. Examples of generated cores are presented

    Profile driven dataflow optimisation of mean shift visual tracking

    Get PDF
    Profile guided optimisation is a common technique used by compilers and runtime systems to shorten execution runtimes and to optimise locality aware scheduling and memory access on heterogeneous hardware platforms. Some profiling tools trace the execution of low level code, whilst others are designed for abstract models of computation to provide rich domain-specific context in profiling reports. We have implemented mean shift, a computer vision tracking algorithm, in the RVC-CAL dataflow language and use both dynamic runtime and static dataflow profiling mechanisms to identify and eliminate bottlenecks in our naive initial version. We use these profiling reports to tune the CPU scheduler reducing runtime by 88%, and to optimise our dataflow implementation that reduces runtime by a further 43% - an overall runtime reduction of 93%. We also assess the portability of our mean shift optimisations by trading off CPU runtime against resource utilisation on FPGAs. Applying all dataflow optimisations reduces FPGA design space significantly, requiring fewer slice LUTs and less block memory

    FPGA-Based Processor Acceleration for Image Processing Applications

    Get PDF
    FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a k-means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the k-means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively

    Integrated input modeling and memory management for image processing applications

    Get PDF
    Image processing applications often demand powerful calculations and real-time performance with low power and energy consumption. Programmable hardware provides inherent parallelism and flexibility making it a good implementation choice for this application domain. In this work we introduce a new modeling technique combining Cyclo-Static Dataflow (CSDF) base model semantics and Homogeneous Parameterized Dataflow (HPDF) meta-modeling framework, which exposes more levels of parallelism than previous models and can be used to reduce buffer sizes. We model two different applications and show how we can achieve efficient scheduling and memory organization, which is crucial for this application domain, since large amounts of data are processed, and storing intermediate results usually requires the use of off-chip resources, causing slower data access and higher power consumption. We also designed a reusable wishbone compliant memory controller module that can be used to access the Xilinx Multimedia Board’s memory chips using single accesses or burst mode
    corecore