6 research outputs found

    A High–Performance Parallel Implementation of the Chambolle Algorithm

    Get PDF
    The determination of the optical flow is a central problem in image processing, as it allows to describe how an image changes over time by means of a numerical vector field. The estimation of the optical flow is however a very complex problem, which has been faced using many different mathematical approaches. A large body of work has been recently published about variational methods, following the technique for total variation minimization proposed by Chambolle. Still, their hardware implementations do not offer good performances in terms of frames that can be processed per time unit, mainly because of the complex dependency scheme among the data. In this work, we propose a highly parallel and accelerated FPGA implementation of the Chambolle algorithm, which splits the original image into a set of overlapping sub-frames and efficiently exploits the reuse of intermediate results. We validate our hardware on large frames (up to 1024 Ă— 768), and the proposed approach largely outperforms the state-of-the-art implementations, reaching up to 76Ă— speedups as well as realtime frame rates even at high resolutions

    A high-performance parallel implementation of the Chambolle algorithm

    Full text link

    Design Methods for Parallel Hardware Implementation of Multimedia Iterative Algorithms

    Get PDF
    Traditionally, parallel implementations of multimedia algorithms are carried out manually, since the automation of this task is very difficult due to the complex dependencies that generally exist between different elements of the data set. Moreover, there is a wide family of iterative multimedia algorithms that cannot be executed with satisfactory performance on Multi-Processor Systems-on-Chip or Graphics Processing Units. For this reason, new methods to design custom hardware circuits that exploit the intrinsic parallelism of multimedia algorithms are needed. As a consequence, in this paper, we propose a novel design method for the definition of hardware systems optimized for a particular class of multimedia iterative algorithms. We have successfully applied the proposed approach to several real-world case studies, such as iterative convolution filters and the Chambolle algorithm, and the proposed design method has been able to automatically implement, for each one of them, a parallel architecture able to meet real-time performance (up to 72 frames per second for the Chambolle algorithm), with on-chip memory requirements from 2 to 3 orders of magnitude smaller than the state-of-the art approaches

    A high-performance parallel implementation of the Chambolle algorithm

    Get PDF
    The determination of the optical flow is a central problem in image processing, as it allows to describe how an image changes over time by means of a numerical vector field. The estimation of the optical flow is however a very complex problem, which has been faced using many different mathematical approaches. A large body of work has been recently published about variational methods, following the technique for total variation minimization proposed by Chambolle. Still, their hardware implementations do not offer good performances in terms of frames that can be processed per time unit, mainly because of the complex dependency scheme among the data. In this work, we propose a highly parallel and accelerated FPGA implementation of the Chambolle algorithm, which splits the original image into a set of overlapping sub-frames and efficiently exploits the reuse of intermediate results. We validate our hardware on large frames (up to 1024 Ă— 768), and the proposed approach largely outperforms the state-of-the-art implementations, reaching up to 76Ă— speedups as well as realtime frame rates even at high resolutions
    corecore