1,313 research outputs found

    Depth-based Multi-View 3D Video Coding

    Get PDF

    CANF-VC++: Enhancing Conditional Augmented Normalizing Flows for Video Compression with Advanced Techniques

    Full text link
    Video has become the predominant medium for information dissemination, driving the need for efficient video codecs. Recent advancements in learned video compression have shown promising results, surpassing traditional codecs in terms of coding efficiency. However, challenges remain in integrating fragmented techniques and incorporating new tools into existing codecs. In this paper, we comprehensively review the state-of-the-art CANF-VC codec and propose CANF-VC++, an enhanced version that addresses these challenges. We systematically explore architecture design, reference frame type, training procedure, and entropy coding efficiency, leading to substantial coding improvements. CANF-VC++ achieves significant Bj{\o}ntegaard-Delta rate savings on conventional datasets UVG, HEVC Class B and MCL-JCV, outperforming the baseline CANF-VC and even the H.266 reference software VTM. Our work demonstrates the potential of integrating advancements in video compression and serves as inspiration for future research in the field

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„์„ ์œ„ํ•œ ๋‹ค์ค‘ ๋ฒกํ„ฐ ๊ธฐ๋ฐ˜์˜ MEMC ๋ฐ ์‹ฌ์ธต CNN

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2019. 2. ์ดํ˜์žฌ.Block-based hierarchical motion estimations are widely used and are successful in generating high-quality interpolation. However, it still fails in the motion estimation of small objects when a background region moves in a different direction. This is because the motion of small objects is neglected by the down-sampling and over-smoothing operations at the top level of image pyramids in the maximum a posterior (MAP) method. Consequently, the motion vector of small objects cannot be detected at the bottom level, and therefore, the small objects often appear deformed in an interpolated frame. This thesis proposes a novel algorithm that preserves the motion vector of the small objects by adding a secondary motion vector candidate that represents the movement of the small objects. This additional candidate is always propagated from the top to the bottom layers of the image pyramid. Experimental results demonstrate that the intermediate frame interpolated by the proposed algorithm significantly improves the visual quality when compared with conventional MAP-based frame interpolation. In motion compensated frame interpolation, a repetition pattern in an image makes it difficult to derive an accurate motion vector because multiple similar local minima exist in the search space of the matching cost for motion estimation. In order to improve the accuracy of motion estimation in a repetition region, this thesis attempts a semi-global approach that exploits both local and global characteristics of a repetition region. A histogram of the motion vector candidates is built by using a voter based voting system that is more reliable than an elector based voting system. Experimental results demonstrate that the proposed method significantly outperforms the previous local approach in term of both objective peak signal-to-noise ratio (PSNR) and subjective visual quality. In video frame interpolation or motion-compensated frame rate up-conversion (MC-FRUC), motion compensation along unidirectional motion trajectories directly causes overlaps and holes issues. To solve these issues, this research presents a new algorithm for bidirectional motion compensated frame interpolation. Firstly, the proposed method generates bidirectional motion vectors from two unidirectional motion vector fields (forward and backward) obtained from the unidirectional motion estimations. It is done by projecting the forward and backward motion vectors into the interpolated frame. A comprehensive metric as an extension of the distance between a projected block and an interpolated block is proposed to compute weighted coefficients in the case when the interpolated block has multiple projected ones. Holes are filled based on vector median filter of non-hole available neighbor blocks. The proposed method outperforms existing MC-FRUC methods and removes block artifacts significantly. Video frame interpolation with a deep convolutional neural network (CNN) is also investigated in this thesis. Optical flow and video frame interpolation are considered as a chicken-egg problem such that one problem affects the other and vice versa. This thesis presents a stack of networks that are trained to estimate intermediate optical flows from the very first intermediate synthesized frame and later the very end interpolated frame is generated by the second synthesis network that is fed by stacking the very first one and two learned intermediate optical flows based warped frames. The primary benefit is that it glues two problems into one comprehensive framework that learns altogether by using both an analysis-by-synthesis technique for optical flow estimation and vice versa, CNN kernels based synthesis-by-analysis. The proposed network is the first attempt to bridge two branches of previous approaches, optical flow based synthesis and CNN kernels based synthesis into a comprehensive network. Experiments are carried out with various challenging datasets, all showing that the proposed network outperforms the state-of-the-art methods with significant margins for video frame interpolation and the estimated optical flows are accurate for challenging movements. The proposed deep video frame interpolation network to post-processing is applied to the improvement of the coding efficiency of the state-of-art video compress standard, HEVC/H.265 and experimental results prove the efficiency of the proposed network.๋ธ”๋ก ๊ธฐ๋ฐ˜ ๊ณ„์ธต์  ์›€์ง์ž„ ์ถ”์ •์€ ๊ณ ํ™”์งˆ์˜ ๋ณด๊ฐ„ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์–ด ํญ๋„“๊ฒŒ ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ, ๋ฐฐ๊ฒฝ ์˜์—ญ์ด ์›€์ง์ผ ๋•Œ, ์ž‘์€ ๋ฌผ์ฒด์— ๋Œ€ํ•œ ์›€์ง์ž„ ์ถ”์ • ์„ฑ๋Šฅ์€ ์—ฌ์ „ํžˆ ์ข‹์ง€ ์•Š๋‹ค. ์ด๋Š” maximum a posterior (MAP) ๋ฐฉ์‹์œผ๋กœ ์ด๋ฏธ์ง€ ํ”ผ๋ผ๋ฏธ๋“œ์˜ ์ตœ์ƒ์œ„ ๋ ˆ๋ฒจ์—์„œ down-sampling๊ณผ over-smoothing์œผ๋กœ ์ธํ•ด ์ž‘์€ ๋ฌผ์ฒด์˜ ์›€์ง์ž„์ด ๋ฌด์‹œ๋˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด๋ฏธ์ง€ ํ”ผ๋ผ๋ฏธ๋“œ์˜ ์ตœํ•˜์œ„ ๋ ˆ๋ฒจ์—์„œ ์ž‘์€ ๋ฌผ์ฒด์˜ ์›€์ง์ž„ ๋ฒกํ„ฐ๋Š” ๊ฒ€์ถœ๋  ์ˆ˜ ์—†์–ด ๋ณด๊ฐ„ ์ด๋ฏธ์ง€์—์„œ ์ž‘์€ ๋ฌผ์ฒด๋Š” ์ข…์ข… ๋ณ€ํ˜•๋œ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ธ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ž‘์€ ๋ฌผ์ฒด์˜ ์›€์ง์ž„์„ ๋‚˜ํƒ€๋‚ด๋Š” 2์ฐจ ์›€์ง์ž„ ๋ฒกํ„ฐ ํ›„๋ณด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์ž‘์€ ๋ฌผ์ฒด์˜ ์›€์ง์ž„ ๋ฒกํ„ฐ๋ฅผ ๋ณด์กดํ•˜๋Š” ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ์ถ”๊ฐ€๋œ ์›€์ง์ž„ ๋ฒกํ„ฐ ํ›„๋ณด๋Š” ํ•ญ์ƒ ์ด๋ฏธ์ง€ ํ”ผ๋ผ๋ฏธ๋“œ์˜ ์ตœ์ƒ์œ„์—์„œ ์ตœํ•˜์œ„ ๋ ˆ๋ฒจ๋กœ ์ „ํŒŒ๋œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋ณด๊ฐ„ ์ƒ์„ฑ ํ”„๋ ˆ์ž„์ด ๊ธฐ์กด MAP ๊ธฐ๋ฐ˜ ๋ณด๊ฐ„ ๋ฐฉ์‹์œผ๋กœ ์ƒ์„ฑ๋œ ํ”„๋ ˆ์ž„๋ณด๋‹ค ์ด๋ฏธ์ง€ ํ™”์งˆ์ด ์ƒ๋‹นํžˆ ํ–ฅ์ƒ๋จ์„ ๋ณด์—ฌ์ค€๋‹ค. ์›€์ง์ž„ ๋ณด์ƒ ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„์—์„œ, ์ด๋ฏธ์ง€ ๋‚ด์˜ ๋ฐ˜๋ณต ํŒจํ„ด์€ ์›€์ง์ž„ ์ถ”์ •์„ ์œ„ํ•œ ์ •ํ•ฉ ์˜ค์ฐจ ํƒ์ƒ‰ ์‹œ ๋‹ค์ˆ˜์˜ ์œ ์‚ฌ local minima๊ฐ€ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ •ํ™•ํ•œ ์›€์ง์ž„ ๋ฒกํ„ฐ ์œ ๋„๋ฅผ ์–ด๋ ต๊ฒŒ ํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๋ฐ˜๋ณต ํŒจํ„ด์—์„œ์˜ ์›€์ง์ž„ ์ถ”์ •์˜ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋ฐ˜๋ณต ์˜์—ญ์˜ localํ•œ ํŠน์„ฑ๊ณผ globalํ•œ ํŠน์„ฑ์„ ๋™์‹œ์— ํ™œ์šฉํ•˜๋Š” semi-globalํ•œ ์ ‘๊ทผ์„ ์‹œ๋„ํ•œ๋‹ค. ์›€์ง์ž„ ๋ฒกํ„ฐ ํ›„๋ณด์˜ ํžˆ์Šคํ† ๊ทธ๋žจ์€ ์„ ๊ฑฐ ๊ธฐ๋ฐ˜ ํˆฌํ‘œ ์‹œ์Šคํ…œ๋ณด๋‹ค ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ์œ ๊ถŒ์ž ๊ธฐ๋ฐ˜ ํˆฌํ‘œ ์‹œ์Šคํ…œ ๊ธฐ๋ฐ˜์œผ๋กœ ํ˜•์„ฑ๋œ๋‹ค. ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์ด ์ด์ „์˜ localํ•œ ์ ‘๊ทผ๋ฒ•๋ณด๋‹ค peak signal-to-noise ratio (PSNR)์™€ ์ฃผ๊ด€์  ํ™”์งˆ ํŒ๋‹จ ๊ด€์ ์—์„œ ์ƒ๋‹นํžˆ ์šฐ์ˆ˜ํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„ ๋˜๋Š” ์›€์ง์ž„ ๋ณด์ƒ ํ”„๋ ˆ์ž„์œจ ์ƒํ–ฅ ๋ณ€ํ™˜ (MC-FRUC)์—์„œ, ๋‹จ๋ฐฉํ–ฅ ์›€์ง์ž„ ๊ถค์ ์— ๋”ฐ๋ฅธ ์›€์ง์ž„ ๋ณด์ƒ์€ overlap๊ณผ hole ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚จ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์–‘๋ฐฉํ–ฅ ์›€์ง์ž„ ๋ณด์ƒ ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•œ๋‹ค. ๋จผ์ €, ์ œ์•ˆ๋œ ๋ฐฉ๋ฒ•์€ ๋‹จ๋ฐฉํ–ฅ ์›€์ง์ž„ ์ถ”์ •์œผ๋กœ๋ถ€ํ„ฐ ์–ป์–ด์ง„ ๋‘ ๊ฐœ์˜ ๋‹จ๋ฐฉํ–ฅ ์›€์ง์ž„ ์˜์—ญ(์ „๋ฐฉ ๋ฐ ํ›„๋ฐฉ)์œผ๋กœ๋ถ€ํ„ฐ ์–‘๋ฐฉํ–ฅ ์›€์ง์ž„ ๋ฒกํ„ฐ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ์ด๋Š” ์ „๋ฐฉ ๋ฐ ํ›„๋ฐฉ ์›€์ง์ž„ ๋ฒกํ„ฐ๋ฅผ ๋ณด๊ฐ„ ํ”„๋ ˆ์ž„์— ํˆฌ์˜ํ•จ์œผ๋กœ์จ ์ˆ˜ํ–‰๋œ๋‹ค. ๋ณด๊ฐ„๋œ ๋ธ”๋ก์— ์—ฌ๋Ÿฌ ๊ฐœ์˜ ํˆฌ์˜๋œ ๋ธ”๋ก์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ํˆฌ์˜๋œ ๋ธ”๋ก๊ณผ ๋ณด๊ฐ„๋œ ๋ธ”๋ก ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ๋ฅผ ํ™•์žฅํ•˜๋Š” ๊ธฐ์ค€์ด ๊ฐ€์ค‘ ๊ณ„์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•ด ์ œ์•ˆ๋œ๋‹ค. Hole์€ hole์ด ์•„๋‹Œ ์ด์›ƒ ๋ธ”๋ก์˜ vector median filter๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ฒ˜๋ฆฌ๋œ๋‹ค. ์ œ์•ˆ ๋ฐฉ๋ฒ•์€ ๊ธฐ์กด์˜ MC-FRUC๋ณด๋‹ค ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜๋ฉฐ, ๋ธ”๋ก ์—ดํ™”๋ฅผ ์ƒ๋‹นํžˆ ์ œ๊ฑฐํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” CNN์„ ์ด์šฉํ•œ ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„์— ๋Œ€ํ•ด์„œ๋„ ๋‹ค๋ฃฌ๋‹ค. Optical flow ๋ฐ ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„์€ ํ•œ ๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ๋‹ค๋ฅธ ๋ฌธ์ œ์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” chicken-egg ๋ฌธ์ œ๋กœ ๊ฐ„์ฃผ๋œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ค‘๊ฐ„ optical flow ๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๋„คํŠธ์›Œํฌ์™€ ๋ณด๊ฐ„ ํ”„๋ ˆ์ž„์„ ํ•ฉ์„ฑ ํ•˜๋Š” ๋‘ ๊ฐ€์ง€ ๋„คํŠธ์›Œํฌ๋กœ ์ด๋ฃจ์–ด์ง„ ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ ์Šคํƒ์„ ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. The final ๋ณด๊ฐ„ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•˜๋Š” ๋„คํŠธ์›Œํฌ์˜ ๊ฒฝ์šฐ ์ฒซ ๋ฒˆ์งธ ๋„คํŠธ์›Œํฌ์˜ ์ถœ๋ ฅ์ธ ๋ณด๊ฐ„ ํ”„๋ ˆ์ž„ ์™€ ์ค‘๊ฐ„ optical flow based warped frames์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„์„œ ํ”„๋ ˆ์ž„์„ ์ƒ์„ฑํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ตฌ์กฐ์˜ ๊ฐ€์žฅ ํฐ ํŠน์ง•์€ optical flow ๊ณ„์‚ฐ์„ ์œ„ํ•œ ํ•ฉ์„ฑ์— ์˜ํ•œ ๋ถ„์„๋ฒ•๊ณผ CNN ๊ธฐ๋ฐ˜์˜ ๋ถ„์„์— ์˜ํ•œ ํ•ฉ์„ฑ๋ฒ•์„ ๋ชจ๋‘ ์ด์šฉํ•˜์—ฌ ํ•˜๋‚˜์˜ ์ข…ํ•ฉ์ ์ธ framework๋กœ ๊ฒฐํ•ฉํ•˜์˜€๋‹ค๋Š” ๊ฒƒ์ด๋‹ค. ์ œ์•ˆ๋œ ๋„คํŠธ์›Œํฌ๋Š” ๊ธฐ์กด์˜ ๋‘ ๊ฐ€์ง€ ์—ฐ๊ตฌ์ธ optical flow ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„ ํ•ฉ์„ฑ๊ณผ CNN ๊ธฐ๋ฐ˜ ํ•ฉ์„ฑ ํ”„๋ ˆ์ž„ ํ•ฉ์„ฑ๋ฒ•์„ ์ฒ˜์Œ ๊ฒฐํ•ฉ์‹œํ‚จ ๋ฐฉ์‹์ด๋‹ค. ์‹คํ—˜์€ ๋‹ค์–‘ํ•˜๊ณ  ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ์ด๋ฃจ์–ด์กŒ์œผ๋ฉฐ, ๋ณด๊ฐ„ ํ”„๋ ˆ์ž„ quality ์™€ optical flow ๊ณ„์‚ฐ ์ •ํ™•๋„ ์ธก๋ฉด์—์„œ ๊ธฐ์กด์˜ state-of-art ๋ฐฉ์‹์— ๋น„ํ•ด ์›”๋“ฑํžˆ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์˜ ํ›„ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ ์‹ฌ์ธต ๋น„๋””์˜ค ํ”„๋ ˆ์ž„ ๋ณด๊ฐ„ ๋„คํŠธ์›Œํฌ๋Š” ์ฝ”๋”ฉ ํšจ์œจ ํ–ฅ์ƒ์„ ์œ„ํ•ด ์ตœ์‹  ๋น„๋””์˜ค ์••์ถ• ํ‘œ์ค€์ธ HEVC/H.265์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆ ๋„คํŠธ์›Œํฌ์˜ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ•œ๋‹ค.Abstract i Table of Contents iv List of Tables vii List of Figures viii Chapter 1. Introduction 1 1.1. Hierarchical Motion Estimation of Small Objects 2 1.2. Motion Estimation of a Repetition Pattern Region 4 1.3. Motion-Compensated Frame Interpolation 5 1.4. Video Frame Interpolation with Deep CNN 6 1.5. Outline of the Thesis 7 Chapter 2. Previous Works 9 2.1. Previous Works on Hierarchical Block-Based Motion Estimation 9 2.1.1.โ€‚Maximum a Posterior (MAP) Framework 10 2.1.2.Hierarchical Motion Estimation 12 2.2. Previous Works on Motion Estimation for a Repetition Pattern Region 13 2.3. Previous Works on Motion Compensation 14 2.4. Previous Works on Video Frame Interpolation with Deep CNN 16 Chapter 3. Hierarchical Motion Estimation for Small Objects 19 3.1. Problem Statement 19 3.2. The Alternative Motion Vector of High Cost Pixels 20 3.3. Modified Hierarchical Motion Estimation 23 3.4. Framework of the Proposed Algorithm 24 3.5. Experimental Results 25 3.5.1. Performance Analysis 26 3.5.2. Performance Evaluation 29 Chapter 4. Semi-Global Accurate Motion Estimation for a Repetition Pattern Region 32 4.1. Problem Statement 32 4.2. Objective Function and Constrains 33 4.3. Elector based Voting System 34 4.4. Voter based Voting System 36 4.5. Experimental Results 40 Chapter 5. Multiple Motion Vectors based Motion Compensation 44 5.1. Problem Statement 44 5.2. Adaptive Weighted Multiple Motion Vectors based Motion Compensation 45 5.2.1. One-to-Multiple Motion Vector Projection 45 5.2.2. A Comprehensive Metric as the Extension of Distance 48 5.3. Handling Hole Blocks 49 5.4. Framework of the Proposed Motion Compensated Frame Interpolation 50 5.5. Experimental Results 51 Chapter 6. Video Frame Interpolation with a Stack of Deep CNN 56 6.1. Problem Statement 56 6.2. The Proposed Network for Video Frame Interpolation 57 6.2.1. A Stack of Synthesis Networks 57 6.2.2. Intermediate Optical Flow Derivation Module 60 6.2.3. Warping Operations 62 6.2.4. Training and Loss Function 63 6.2.5. Network Architecture 64 6.2.6. Experimental Results 64 6.2.6.1. Frame Interpolation Evaluation 64 6.2.6.2. Ablation Experiments 77 6.3. Extension for Quality Enhancement for Compressed Videos Task 83 6.4. Extension for Improving the Coding Efficiency of HEVC based Low Bitrate Encoder 88 Chapter 7. Conclusion 94 References 97Docto

    Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis

    Get PDF
    abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    Motion hints based video coding

    Full text link
    The persistent growth of video-based applications is heavily dependent on the advancements in video coding systems. Modern video codecs use the motion model itself to describe the geometric boundaries of moving objects in video sequences and thereby spend a significant portion of their bit rate refining the motion description in regions where motion discontinuities exist. This explicit communication of motion introduces redundancy, since some aspects of the motion can at least partially be inferred from the reference frames. In this thesis work, a novel bi-directional motion hints based prediction paradigm is proposed that moves away from the traditional redundant approach of careful partitioning around object boundaries by exploiting the spatial structure of the reference frames to infer appropriate boundaries for the intermediate ones. Motion hint provide a global description of motion over specific domain. Fundamentally this is related to the segmentation of foreground from background regions where the foreground and background motions are the motion hints. The appealing thing about motion hints is that they are continuous and invertible, even though the observed motion field for a frame is discontinuous and non-invertible. Experimental results show that at low bit rate applications, the motion hints based coder achieved a rate-distortion (RD) gain of 0.81 dB, or equivalently 13.38% savings in bit rate over the H.264/AVC reference. In a hybrid setting, this gain increased to 0.94 dB and 20.41% bit rebate is obtained. If both low and high bit rate scenarios are considered then the hybrid coder showed a RD performance of 0.80 dB, or equivalently 16.57% savings in bit rate. The usage of higher fractional pixel accurate motion hint, predictive coding of motion hint, a memory-based initialization for motion hint estimation improved the RD gain to 0.85 dB and 17.55% of bit rebate. The prediction framework is highly flexible in the sense that the motion model order for the hints can be content adaptive i.e. it can accommodate different motion models like affine, elastic, etc. Detecting motion discontinuity macroblocks (MBs) is a challenging task and the prediction paradigm managed to detect a significant number of such MBs. If the motion hints based prediction is used as a prediction mode for MBs, at low bit rates almost 50% of the motion discontinuity MBs chose to use affine hint mode and this number increased to 60% if elastic hint is used
    • โ€ฆ
    corecore