16 research outputs found
Design Methods for Parallel Hardware Implementation of Multimedia Iterative Algorithms
Traditionally, parallel implementations of multimedia algorithms are carried out manually, since the automation of this task is very difficult due to the complex dependencies that generally exist between different elements of the data set. Moreover, there is a wide family of iterative multimedia algorithms that cannot be executed with satisfactory performance on Multi-Processor Systems-on-Chip or Graphics Processing Units. For this reason, new methods to design custom hardware circuits that exploit the intrinsic parallelism of multimedia algorithms are needed. As a consequence, in this paper, we propose a novel design method for the definition of hardware systems optimized for a particular class of multimedia iterative algorithms. We have successfully applied the proposed approach to several real-world case studies, such as iterative convolution filters and the Chambolle algorithm, and the proposed design method has been able to automatically implement, for each one of them, a parallel architecture able to meet real-time performance (up to 72 frames per second for the Chambolle algorithm), with on-chip memory requirements from 2 to 3 orders of magnitude smaller than the state-of-the art approaches
A High–Performance Parallel Implementation of the Chambolle Algorithm
The determination of the optical flow is a central problem in image processing, as it allows to describe how an image changes over time by means of a numerical vector field. The estimation of the optical flow is however a very complex problem, which has been faced using many different mathematical approaches. A large body of work has been recently published about variational methods, following the technique for total variation minimization proposed by Chambolle. Still, their hardware implementations do not offer good performances in terms of frames that can be processed per time unit, mainly because of the complex dependency scheme among the data. In this work, we propose a highly parallel and accelerated FPGA implementation of the Chambolle algorithm, which splits the original image into a set of overlapping sub-frames and efficiently exploits the reuse of intermediate results. We validate our hardware on large frames (up to 1024 Ă— 768), and the proposed approach largely outperforms the state-of-the-art implementations, reaching up to 76Ă— speedups as well as realtime frame rates even at high resolutions
Parallelizing the Chambolle Algorithm for Performance-Optimized Mapping on FPGA Devices
The performance and the efficiency of recent computing platforms have been deeply influenced by the widespread adoption of hardware accelerators, such as Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs), which are often employed to support the tasks of General Purpose Processors (GPP). One of the main advantages of these accelerators over their sequential counterparts (GPPs) is their ability of performing massive parallel computation. However, in order to exploit this competitive edge, it is necessary to extract the parallelism from the target algorithm to be executed, which is in general a very challenging task.
This concept is demonstrated, for instance, by the poor performance achieved on relevant multimedia algorithms, such as Chambolle, which is a well-known algorithm employed for the optical flow estimation. The implementations of this algorithm that can be found in the state of the art are generally based on GPUs, but barely improve the performance that can be obtained with a powerful GPP. In this paper, we propose a novel approach to extract the parallelism from computation-intensive multimedia algorithms, which includes an analysis of their dependency schema and an assessment of their data reuse. We then perform a thorough analysis of the Chambolle algorithm, providing a formal proof of its inner data dependencies and locality properties. Then, we exploit the considerations drawn from this analysis by proposing an architectural template that takes advantage of the fine-grained parallelism of FPGA devices. Moreover, since the proposed template can be instantiated with different parameters, we also propose a design metric, the expansion rate, to help the designer in the estimation of the efficiency and performance of the different instances, making it possible to select the right one before the implementation phase. We finally show, by means of experimental results, how the proposed analysis and parallelization approach leads to the design of efficient and high-performance FPGA-based implementations that are orders of magnitude faster than the state-of-the-art ones
Floor plan design and automatic nodes deployment for indoor location and monitoring systems
Many Smart Building systems, such as indoor localization or occupancy monitoring systems, require the installation of several transmitting and receiving nodes. The quantity and the positioning of these devices heavily affects the accuracy and the total cost of the system, but tools to automate the nodes configuration currently lack. We propose an open-source design tool for the specification of the building floor plan. The tool is able to suggest a near optimal allocation of sensor nodes, depending on hardware characteristics and prices, in order to maximize the coverage area while minimizing the total cost
Sink state analysis in multi-tenant smart buildings
In the last years, many research projects have focused on the design of complex Building Management Systems (BMSes) aiming at materializing the Smart Buildings vision. Unfortunately, multi-user management is not totally supported in modern BMSes, multi-tenant implications have not been adequately investigated and also some other issues remain unsolved in state-of-the-art approaches. For instance, a behavioral analysis regarding the correct control of the building is needed to ensure the correct functioning of the Smart Building control system. In this paper we will present an approach to formally verify a Smart Building control system. Differently from the existing solutions, mainly derived from the control theory world, we propose a stochastic method that copes with the probabilities that each rule inside a ruleset has to be triggered. This method leverages the features and the capabilities of the specific scenario represented by Smart Buildings, in order to simplify Building managers duties in real world applications. The power of this method has been validated through a wide set of rules obtained by means of an experimental campaign. The analyses show that users generally tend to (unconsciously) specify sinks and assertive rules, symptom that a collaborative management approach is far to be applicable in real world applications without the help of a tool such as the one presented in this paper