50 research outputs found
A Hierarchical Symmetric Stereo Algorithm using Dynamic Programming
In this paper, a new hierarchical stereo algorithm is presented. The algorithm matches individual pixels in corresponding scanlines by minimizing a cost function. Several cost functions are compared. The algorithm achieves a tremendous gain in speed and memory requirements by implementing it hierarchically. The images are downsampled an optimal number of times and the disparity map of a lower level is used as 'offset' disparity map at a higher level. An important contribution consists of the complexity analysis of the algorithm. It is shown that this complexity is independent of the disparityrange. This result is also used to determine the optimal number of downsample levels. This speed gain results in the ability to use more complex (compute intensive) cost functions that deliver high quality disparity maps. Another advantage of this algorithm is that cost functions can be chosen independent of the optimisation algorithm. The algorithm in this paper is symmetric, i.e. exactly the same matches are found if left and right image are swapped. Finally, the algorithm was carefully implemented so that a minimal amount of memory is used. It has proven its efficiency on large images with a high disparity range as well as its quality. Examples are given in this paper
Technique for reducing complexity of recursive motion estimation algorithms.
The recursive search motion estimation algorithm offers sm-ooth and accurate motion vector elds. Computationally, the most expensive part of the motion estimator is the evalu-ation of the various motion vector candidates. Evaluation is performed by comparing blocks in two consecutive frames pointed by motion vector candidates. This paper addresses the issue of reducing the already extremely low number of motion vector evaluations. We apply pre-processing tech-niques to reduce the number of motion vector candidates from 7 to 5, i.e. 30 % without sacricing quality. We ex-emplify the above ndings through experimental results ob-tained using the 3-D recursive search motion estimation al-gorithm. The required pre-processing overhead is negligi-ble. 1
Algorithm/Architecture Co-design of the Generalized Sampling Theorem Based De-Interlacer.
De-interlacing is a major determinant of image quality in a modern display processing chain. The de-interlacing method based on the generalized sampling theorem (GST)applied to motion estimation and motion compensation provides the best de-interlacing results. With HDTV interlaced input material (1920*1080i), this method requires about 1000 GOPs and a communication bandwidth around 10 Gbytes/sec. We analyze and simplify the algorithm and propose a processing architecture. As a result, the operation count of the motion estimator decreases with a factor of 5.5 and the bandwidth to local pixel storage with a factor of 3.3 with only mild and acceptable quality loss. We present a task breakup and a suitable heterogeneous multi-processor architecture. The task break-up is such that the computational load of the processors is balanced and the flexibility of the architecture is preserved within the application domain. To cope with the large memory bandwidth requirements, we exploit locality of reference with multi-level scratchpad memories
Memory-cenric video processing
This work presents a domain-specific memory subsystem based on a two-level memory hierarchy. It targets the application domain of video post-processing applications including video enhancement and format conversion. These applications are based on motion compensation and/or broad class of content adaptive filtering to provide the highest quality of pictures. Our approach meets the required performance and has sufficient flexibility for the application domain. It especially aims at the implementation-wise most challenging applications: compute-intensive and bandwidth-demanding applications that provide the highest quality at high picture resolutions. The lowest level of the memory hierarchy, closest to the processing element, the L0 scratchpad, is organized specifically to enable fast retrieval of an arbitrarily positioned 2-D block of pixels to the processing element. To guarantee the performance, most of its addressing logic is hardwired, leaving a user a set of API for initialization and storing/loading the data to/from the L0 scratchpad. The next level of the memory hierarchy, the L1 scratchpad, minimizes the off-chip memory bandwidth requirements. The L1 scratchpad is organized specifically to enable efficient aligned block-based accesses. With lower data rates compared to the L0 scratchpad and aligned block access, software-based addressing is used to enable full flexibility. The two-level memory hierarchy exploits prefetching to further improve the performanc