193 research outputs found

    A dataflow IR for memory efficient RIPL compilation to FPGAs

    Get PDF
    Field programmable gate arrays (FPGAs) are fundamentally different to fixed processors architectures because their memory hierarchies can be tailored to the needs of an algorithm. FPGA compilers for high level languages are not hindered by fixed memory hierarchies. The constraint when compiling to FPGAs is the availability of resources. In this paper we describe how the dataflow intermediary of our declarative FPGA image processing DSL called RIPL (Rathlin Image Processing Language) enables us to constrain memory. We use five benchmarks to demonstrate that memory use with RIPL is comparable to the Vivado HLS OpenCV library without the need for language pragmas to guide hardware synthesis. The benchmarks also show that RIPL is more expressive than the Darkroom FPGA image processing language

    Biblioteca para diseño basado en modelos de algoritmos de procesado de imágenes en FPGA

    Get PDF
    This paper describes a library (XSGImgLib) that includes parameterizable blocks to implement low-level image processing tasks on FPGAs. A modelbased design technique provided by Xilinx System Generator (XSG) has been used to design the blocks, which implement point operation (binarization) and neighborhood operations (linear and non-linear filtering) in grayscale images. The blocks are parameterizable for input/output data precision, window size, normalization strategy, and implementation options (area versus speed optimization). The paper includes the implementation results obtained after fixing these options and exemplifies the combination of several blocks of the library to build a complete design for image segmentation purposes.Este artículo describe una biblioteca de bloques parametrizables (XSGImgLib) para la implementación de tareas de procesado de imágenes en FPGA. Se ha utilizado la técnica de diseño basado en modelos proporcionada por Xilinx System Generator (XSG) para diseñar diferentes bloques de procesado que implementan operaciones puntuales (binarización) y basadas en vecindad (filtros lineales y no-lineales) para imágenes en escala de grises. La parametrización de los bloques permite configurar la precisión de los datos de entrada/salida, el tamaño de la ventana, la estrategia de normalización y distintas opciones de implementación (optimización en área o velocidad). El artículo muestra los resultados de implementación para las diferentes opciones de configuración y ejemplifica la combinación de los bloques de procesado en el desarrollo de un sistema para segmentado de imágenes.Agencia Española de Cooperación Internacional para el Desarrollo PCID/024124/09, PCID/030769/1

    Histogram of oriented gradients front end processing: an FPGA based processor approach

    Get PDF
    The Field Programmable Gate Array (FPGA) implementation of the commonly used Histogram of Oriented Gradients (HOG) algorithm is explored. The HOG algorithm is employed to extract features for object detection. A key focus has been to explore the use of a new FPGA-based processor which has been targeted at image processing. The paper gives details of the mapping and scheduling factors that influence the performance and the stages that were undertaken to allow the algorithm to be deployed on FPGA hardware, whilst taking into account the specific IPPro architecture features. We show that multi-core IPPro performance can exceed that of against state-of-the-art FPGA designs by up to 3.2 times with reduced design and implementation effort and increased flexibility all on a low cost, Zynq programmable system

    Profile Guided Dataflow Transformation for FPGAs and CPUs

    Get PDF
    This paper proposes a new high-level approach for optimising field programmable gate array (FPGA) designs. FPGA designs are commonly implemented in low-level hardware description languages (HDLs), which lack the abstractions necessary for identifying opportunities for significant performance improvements. Using a computer vision case study, we show that modelling computation with dataflow abstractions enables substantial restructuring of FPGA designs before lowering to the HDL level, and also improve CPU performance. Using the CPU transformations, runtime is reduced by 43 %. Using the FPGA transformations, clock frequency is increased from 67MHz to 110MHz. Our results outperform commercial low-level HDL optimisations, showcasing dataflow program abstraction as an amenable computation model for highly effective FPGA optimisation

    Profile driven dataflow optimisation of mean shift visual tracking

    Get PDF
    Profile guided optimisation is a common technique used by compilers and runtime systems to shorten execution runtimes and to optimise locality aware scheduling and memory access on heterogeneous hardware platforms. Some profiling tools trace the execution of low level code, whilst others are designed for abstract models of computation to provide rich domain-specific context in profiling reports. We have implemented mean shift, a computer vision tracking algorithm, in the RVC-CAL dataflow language and use both dynamic runtime and static dataflow profiling mechanisms to identify and eliminate bottlenecks in our naive initial version. We use these profiling reports to tune the CPU scheduler reducing runtime by 88%, and to optimise our dataflow implementation that reduces runtime by a further 43% - an overall runtime reduction of 93%. We also assess the portability of our mean shift optimisations by trading off CPU runtime against resource utilisation on FPGAs. Applying all dataflow optimisations reduces FPGA design space significantly, requiring fewer slice LUTs and less block memory

    Skeletons for parallel image processing: an overview of the SKiPPER project

    Get PDF
    International audienceThis paper is a general overview of the SKIPPER project, run at Blaise Pascal University between 1996 and 2002. The main goal of the SKIPPER project was to demonstrate the appli- cability of skeleton-based parallel programming techniques to the fast prototyping of reactive vision applications. This project has produced several versions of a full-fledged integrated pa- rallel programming environment (PPE). These PPEs have been used to implement realistic vi- sion applications, such as road following or vehicle tracking for assisted driving, on embedded parallel platforms embarked on semi-autonomous vehicles. All versions of SKIPPER share a common front-end and repertoire of skeletons--presented in previous papers--but differ in the techniques used for implementing skeletons. This paper focuses on these implementation issues, by making a comparative survey, according to a set of four criteria (efficiency, expres- sivity, portability, predictability), of these implementation techniques. It also gives an account of the lessons we have learned, both when dealing with these implementation issues and when using the resulting tools for prototyping vision applications

    An embedded system supporting dynamic partial reconfiguration of hardware resources for morphological image processing

    Get PDF
    Processors for high-performance computing applications are generally designed with a focus on high clock rates, parallelism of operations and high communication bandwidth, often at the expense of large power consumption. However, the emphasis of many embedded systems and untethered devices is on minimal hardware requirements and reduced power consumption. With the incessant growth of computational needs for embedded applications, which contradict chip power and area needs, the burden is put on the hardware designers to come up with designs that optimize power and area requirements. This thesis investigates the efficient design of an embedded system for morphological image processing applications on Xilinx FPGAs (Field Programmable Gate Array) by optimizing both area and power usage while delivering high performance. The design leverages a unique capability of FPGAs called dynamic partial reconfiguration (DPR) which allows changing the hardware configuration of silicon pieces at runtime. DPR allows regions of the FPGA to be reprogrammed with new functionality while applications are still running in the remainder of the device. The main aim of this thesis is to design an embedded system for morphological image processing by accounting for real time and area constraints as compared to a statically configured FPGA. IP (Intellectual Property) cores are synthesized for both static and dynamic time. DPR enables instantiation of more hardware logic over a period of time on an existing device by time-multiplexing the hardware realization of functions. A comparison of power consumption is presented for the statically and dynamically reconfigured designs. Finally, a performance comparison is included for the implementation of the respective algorithms on a hardwired ARM processor as well as on another general-purpose processor. The results prove the viability of DPR for morphological image processing applications

    A spectral estimation toolkit for Java applications

    Get PDF
    AbstractThis paper examines the capability, performance, and relevance of a high-performance advanced signal processing toolkit in Java, a programming language for Web-based applications. To demonstrate the simplicity, ease, and application use of the toolkit, a spectral estimation applet has been developed in the Java environment using advanced Internet technologies such as Remote Method Invocation (RMI). This application provides an interactive and visual approach in understanding theoretical concepts of advanced signal processing methods and shows the need to create more application applets to better understand additional concepts in signal and image processing. Furthermore, a toolkit with limited functionality and different framework has been developed for embedded and handheld devices such as cellular phones and palm pilots. This toolkit is also shown to be useful in developing applications MIDlets on those devices
    corecore