625 research outputs found

    비디오 프레임 보간을 위한 다중 벡터 기반의 MEMC 및 심층 CNN

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2019. 2. 이혁재.Block-based hierarchical motion estimations are widely used and are successful in generating high-quality interpolation. However, it still fails in the motion estimation of small objects when a background region moves in a different direction. This is because the motion of small objects is neglected by the down-sampling and over-smoothing operations at the top level of image pyramids in the maximum a posterior (MAP) method. Consequently, the motion vector of small objects cannot be detected at the bottom level, and therefore, the small objects often appear deformed in an interpolated frame. This thesis proposes a novel algorithm that preserves the motion vector of the small objects by adding a secondary motion vector candidate that represents the movement of the small objects. This additional candidate is always propagated from the top to the bottom layers of the image pyramid. Experimental results demonstrate that the intermediate frame interpolated by the proposed algorithm significantly improves the visual quality when compared with conventional MAP-based frame interpolation. In motion compensated frame interpolation, a repetition pattern in an image makes it difficult to derive an accurate motion vector because multiple similar local minima exist in the search space of the matching cost for motion estimation. In order to improve the accuracy of motion estimation in a repetition region, this thesis attempts a semi-global approach that exploits both local and global characteristics of a repetition region. A histogram of the motion vector candidates is built by using a voter based voting system that is more reliable than an elector based voting system. Experimental results demonstrate that the proposed method significantly outperforms the previous local approach in term of both objective peak signal-to-noise ratio (PSNR) and subjective visual quality. In video frame interpolation or motion-compensated frame rate up-conversion (MC-FRUC), motion compensation along unidirectional motion trajectories directly causes overlaps and holes issues. To solve these issues, this research presents a new algorithm for bidirectional motion compensated frame interpolation. Firstly, the proposed method generates bidirectional motion vectors from two unidirectional motion vector fields (forward and backward) obtained from the unidirectional motion estimations. It is done by projecting the forward and backward motion vectors into the interpolated frame. A comprehensive metric as an extension of the distance between a projected block and an interpolated block is proposed to compute weighted coefficients in the case when the interpolated block has multiple projected ones. Holes are filled based on vector median filter of non-hole available neighbor blocks. The proposed method outperforms existing MC-FRUC methods and removes block artifacts significantly. Video frame interpolation with a deep convolutional neural network (CNN) is also investigated in this thesis. Optical flow and video frame interpolation are considered as a chicken-egg problem such that one problem affects the other and vice versa. This thesis presents a stack of networks that are trained to estimate intermediate optical flows from the very first intermediate synthesized frame and later the very end interpolated frame is generated by the second synthesis network that is fed by stacking the very first one and two learned intermediate optical flows based warped frames. The primary benefit is that it glues two problems into one comprehensive framework that learns altogether by using both an analysis-by-synthesis technique for optical flow estimation and vice versa, CNN kernels based synthesis-by-analysis. The proposed network is the first attempt to bridge two branches of previous approaches, optical flow based synthesis and CNN kernels based synthesis into a comprehensive network. Experiments are carried out with various challenging datasets, all showing that the proposed network outperforms the state-of-the-art methods with significant margins for video frame interpolation and the estimated optical flows are accurate for challenging movements. The proposed deep video frame interpolation network to post-processing is applied to the improvement of the coding efficiency of the state-of-art video compress standard, HEVC/H.265 and experimental results prove the efficiency of the proposed network.블록 기반 계층적 움직임 추정은 고화질의 보간 이미지를 생성할 수 있어 폭넓게 사용되고 있다. 하지만, 배경 영역이 움직일 때, 작은 물체에 대한 움직임 추정 성능은 여전히 좋지 않다. 이는 maximum a posterior (MAP) 방식으로 이미지 피라미드의 최상위 레벨에서 down-sampling과 over-smoothing으로 인해 작은 물체의 움직임이 무시되기 때문이다. 결과적으로 이미지 피라미드의 최하위 레벨에서 작은 물체의 움직임 벡터는 검출될 수 없어 보간 이미지에서 작은 물체는 종종 변형된 것처럼 보인다. 본 논문에서는 작은 물체의 움직임을 나타내는 2차 움직임 벡터 후보를 추가하여 작은 물체의 움직임 벡터를 보존하는 새로운 알고리즘을 제안한다. 추가된 움직임 벡터 후보는 항상 이미지 피라미드의 최상위에서 최하위 레벨로 전파된다. 실험 결과는 제안된 알고리즘의 보간 생성 프레임이 기존 MAP 기반 보간 방식으로 생성된 프레임보다 이미지 화질이 상당히 향상됨을 보여준다. 움직임 보상 프레임 보간에서, 이미지 내의 반복 패턴은 움직임 추정을 위한 정합 오차 탐색 시 다수의 유사 local minima가 존재하기 때문에 정확한 움직임 벡터 유도를 어렵게 한다. 본 논문은 반복 패턴에서의 움직임 추정의 정확도를 향상시키기 위해 반복 영역의 local한 특성과 global한 특성을 동시에 활용하는 semi-global한 접근을 시도한다. 움직임 벡터 후보의 히스토그램은 선거 기반 투표 시스템보다 신뢰할 수 있는 유권자 기반 투표 시스템 기반으로 형성된다. 실험 결과는 제안된 방법이 이전의 local한 접근법보다 peak signal-to-noise ratio (PSNR)와 주관적 화질 판단 관점에서 상당히 우수함을 보여준다. 비디오 프레임 보간 또는 움직임 보상 프레임율 상향 변환 (MC-FRUC)에서, 단방향 움직임 궤적에 따른 움직임 보상은 overlap과 hole 문제를 일으킨다. 본 연구에서 이러한 문제를 해결하기 위해 양방향 움직임 보상 프레임 보간을 위한 새로운 알고리즘을 제시한다. 먼저, 제안된 방법은 단방향 움직임 추정으로부터 얻어진 두 개의 단방향 움직임 영역(전방 및 후방)으로부터 양방향 움직임 벡터를 생성한다. 이는 전방 및 후방 움직임 벡터를 보간 프레임에 투영함으로써 수행된다. 보간된 블록에 여러 개의 투영된 블록이 있는 경우, 투영된 블록과 보간된 블록 사이의 거리를 확장하는 기준이 가중 계수를 계산하기 위해 제안된다. Hole은 hole이 아닌 이웃 블록의 vector median filter를 기반으로 처리된다. 제안 방법은 기존의 MC-FRUC보다 성능이 우수하며, 블록 열화를 상당히 제거한다. 본 논문에서는 CNN을 이용한 비디오 프레임 보간에 대해서도 다룬다. Optical flow 및 비디오 프레임 보간은 한 가지 문제가 다른 문제에 영향을 미치는 chicken-egg 문제로 간주된다. 본 논문에서는 중간 optical flow 를 계산하는 네트워크와 보간 프레임을 합성 하는 두 가지 네트워크로 이루어진 하나의 네트워크 스택을 구조를 제안한다. The final 보간 프레임을 생성하는 네트워크의 경우 첫 번째 네트워크의 출력인 보간 프레임 와 중간 optical flow based warped frames을 입력으로 받아서 프레임을 생성한다. 제안된 구조의 가장 큰 특징은 optical flow 계산을 위한 합성에 의한 분석법과 CNN 기반의 분석에 의한 합성법을 모두 이용하여 하나의 종합적인 framework로 결합하였다는 것이다. 제안된 네트워크는 기존의 두 가지 연구인 optical flow 기반 프레임 합성과 CNN 기반 합성 프레임 합성법을 처음 결합시킨 방식이다. 실험은 다양하고 복잡한 데이터 셋으로 이루어졌으며, 보간 프레임 quality 와 optical flow 계산 정확도 측면에서 기존의 state-of-art 방식에 비해 월등히 높은 성능을 보였다. 본 논문의 후 처리를 위한 심층 비디오 프레임 보간 네트워크는 코딩 효율 향상을 위해 최신 비디오 압축 표준인 HEVC/H.265에 적용할 수 있으며, 실험 결과는 제안 네트워크의 효율성을 입증한다.Abstract i Table of Contents iv List of Tables vii List of Figures viii Chapter 1. Introduction 1 1.1. Hierarchical Motion Estimation of Small Objects 2 1.2. Motion Estimation of a Repetition Pattern Region 4 1.3. Motion-Compensated Frame Interpolation 5 1.4. Video Frame Interpolation with Deep CNN 6 1.5. Outline of the Thesis 7 Chapter 2. Previous Works 9 2.1. Previous Works on Hierarchical Block-Based Motion Estimation 9 2.1.1. Maximum a Posterior (MAP) Framework 10 2.1.2.Hierarchical Motion Estimation 12 2.2. Previous Works on Motion Estimation for a Repetition Pattern Region 13 2.3. Previous Works on Motion Compensation 14 2.4. Previous Works on Video Frame Interpolation with Deep CNN 16 Chapter 3. Hierarchical Motion Estimation for Small Objects 19 3.1. Problem Statement 19 3.2. The Alternative Motion Vector of High Cost Pixels 20 3.3. Modified Hierarchical Motion Estimation 23 3.4. Framework of the Proposed Algorithm 24 3.5. Experimental Results 25 3.5.1. Performance Analysis 26 3.5.2. Performance Evaluation 29 Chapter 4. Semi-Global Accurate Motion Estimation for a Repetition Pattern Region 32 4.1. Problem Statement 32 4.2. Objective Function and Constrains 33 4.3. Elector based Voting System 34 4.4. Voter based Voting System 36 4.5. Experimental Results 40 Chapter 5. Multiple Motion Vectors based Motion Compensation 44 5.1. Problem Statement 44 5.2. Adaptive Weighted Multiple Motion Vectors based Motion Compensation 45 5.2.1. One-to-Multiple Motion Vector Projection 45 5.2.2. A Comprehensive Metric as the Extension of Distance 48 5.3. Handling Hole Blocks 49 5.4. Framework of the Proposed Motion Compensated Frame Interpolation 50 5.5. Experimental Results 51 Chapter 6. Video Frame Interpolation with a Stack of Deep CNN 56 6.1. Problem Statement 56 6.2. The Proposed Network for Video Frame Interpolation 57 6.2.1. A Stack of Synthesis Networks 57 6.2.2. Intermediate Optical Flow Derivation Module 60 6.2.3. Warping Operations 62 6.2.4. Training and Loss Function 63 6.2.5. Network Architecture 64 6.2.6. Experimental Results 64 6.2.6.1. Frame Interpolation Evaluation 64 6.2.6.2. Ablation Experiments 77 6.3. Extension for Quality Enhancement for Compressed Videos Task 83 6.4. Extension for Improving the Coding Efficiency of HEVC based Low Bitrate Encoder 88 Chapter 7. Conclusion 94 References 97Docto

    Parallel HEVC Decoding on Multi- and Many-core Architectures : A Power and Performance Analysis

    Get PDF
    The Joint Collaborative Team on Video Decoding is developing a new standard named High Efficiency Video Coding (HEVC) that aims at reducing the bitrate of H.264/AVC by another 50 %. In order to fulfill the computational demands of the new standard, in particular for high resolutions and at low power budgets, exploiting parallelism is no longer an option but a requirement. Therefore, HEVC includes several coding tools that allows to divide each picture into several partitions that can be processed in parallel, without degrading the quality nor the bitrate. In this paper we adapt one of these approaches, the Wavefront Parallel Processing (WPP) coding, and show how it can be implemented on multi- and many-core processors. Our approach, named Overlapped Wavefront (OWF), processes several partitions as well as several pictures in parallel. This has the advantage that the amount of (thread-level) parallelism stays constant during execution. In addition, performance and power results are provided for three platforms: a server Intel CPU with 8 cores, a laptop Intel CPU with 4 cores, and a TILE-Gx36 with 36 cores from Tilera. The results show that our parallel HEVC decoder is capable of achieving an average frame rate of 116 fps for 4k resolution on a standard multicore CPU. The results also demonstrate that exploiting more parallelism by increasing the number of cores can improve the energy efficiency measured in terms of Joules per frame substantially

    Cubic-panorama image dataset analysis for storage and transmission

    Full text link

    An Efficient Motion Estimation Method for H.264-Based Video Transcoding with Arbitrary Spatial Resolution Conversion

    Get PDF
    As wireless and wired network connectivity is rapidly expanding and the number of network users is steadily increasing, it has become more and more important to support universal access of multimedia content over the whole network. A big challenge, however, is the great diversity of network devices from full screen computers to small smart phones. This leads to research on transcoding, which involves in efficiently reformatting compressed data from its original high resolution to a desired spatial resolution supported by the displaying device. Particularly, there is a great momentum in the multimedia industry for H.264-based transcoding as H.264 has been widely employed as a mandatory player feature in applications ranging from television broadcast to video for mobile devices. While H.264 contains many new features for effective video coding with excellent rate distortion (RD) performance, a major issue for transcoding H.264 compressed video from one spatial resolution to another is the computational complexity. Specifically, it is the motion compensated prediction (MCP) part. MCP is the main contributor to the excellent RD performance of H.264 video compression, yet it is very time consuming. In general, a brute-force search is used to find the best motion vectors for MCP. In the scenario of transcoding, however, an immediate idea for improving the MCP efficiency for the re-encoding procedure is to utilize the motion vectors in the original compressed stream. Intuitively, motion in the high resolution scene is highly related to that in the down-scaled scene. In this thesis, we study homogeneous video transcoding from H.264 to H.264. Specifically, for the video transcoding with arbitrary spatial resolution conversion, we propose a motion vector estimation algorithm based on a multiple linear regression model, which systematically utilizes the motion information in the original scenes. We also propose a practical solution for efficiently determining a reference frame to take the advantage of the new feature of multiple references in H.264. The performance of the algorithm was assessed in an H.264 transcoder. Experimental results show that, as compared with a benchmark solution, the proposed method significantly reduces the transcoding complexity without degrading much the video quality

    Interpolation temporelle des images avec estimation de mouvement raffinée basée pixel et réduction de l'effet de halo

    Get PDF
    Dans le présent travail, après un résumé de l'état de l'art, une nouvelle interpolation temporelle des images avec réduction de halo est proposée. D'abord, pour la télévision de définition standard, une estimation de mouvement dont la résolution est le pixel, est suggérée. L'estimation se fait par l'appariement des blocs, et est suivie par un raffinement basé pixel en considérant des vecteurs de mouvement environnant. La réduction de halo se faisant à l'aide d'une fenêtre glissante de forme adaptative ne recourt pas à une détection explicite des régions d'occlusion. Ensuite, pour la télévision à haute définition, dans le but de réduire la complexité, l'estimation de mouvement de résolution pixel ainsi que la réduction de halo sont généralisées dans le contexte d'une décomposition hiérarchique. L'interpolation finale proposée est générique et est fonction à la fois de la position de l'image et de la fiabilité de l'estimation. Plusieurs . post-traitements pour améliorer la qualité de l'image sont aussi suggérés. L'algorithme proposé intégré dans un ASIC selon la technologie de circuit intégré contemporain fonctionne en temps réel

    Regularized motion estimation techniques and their applications to video coding

    Get PDF
    Ankara : Department of Electrical and Electronics Engineering and the Institute of Engineering and Sciences of Bilkent University, 1996.Thesis (Master's) -- Bilkent University, 1996.Includes bibliographical references leaves 77-80.Novel regularized motion estimation techniques and their possible applications to video coding are presented. A block matching motion estimation algorithm which extracts better block motion field by forming and ininimizing a suitable energy function is introduced. Based on ciri ¿idciptive structure onto block sizes, cui cidvcinced block matching ¿ilgorithm is presented. The block sizes are adaptively ¿idjusted according to the motion. Blockwise coarse to fine segmentation based motion estimation algorithm is introduced for further reduction on the number of bits that are spent lor the coding of the block motion vectors. Motion estiiricition algorithms which can be used lor ¿iverage motion determination and artificial frame generation by fractional motion compensation are ¿ilso developed. Finallj^, an alternative motion estimation cind compensation technique which defines feciture based motion vectors on the ob ject boundciries and reconstructs the decoded frame from the interpolation of the compensated object boundaries is presented. All the algorithms developed in this thesis are simulated on recil or synthetic images cind their performance is demonstrcited.Kıranyaz, SerkanM.S

    A survey on passive digital video forgery detection techniques

    Get PDF
    Digital media devices such as smartphones, cameras, and notebooks are becoming increasingly popular. Through digital platforms such as Facebook, WhatsApp, Twitter, and others, people share digital images, videos, and audio in large quantities. Especially in a crime scene investigation, digital evidence plays a crucial role in a courtroom. Manipulating video content with high-quality software tools is easier, which helps fabricate video content more efficiently. It is therefore necessary to develop an authenticating method for detecting and verifying manipulated videos. The objective of this paper is to provide a comprehensive review of the passive methods for detecting video forgeries. This survey has the primary goal of studying and analyzing the existing passive techniques for detecting video forgeries. First, an overview of the basic information needed to understand video forgery detection is presented. Later, it provides an in-depth understanding of the techniques used in the spatial, temporal, and spatio-temporal domain analysis of videos, datasets used, and their limitations are reviewed. In the following sections, standard benchmark video forgery datasets and the generalized architecture for passive video forgery detection techniques are discussed in more depth. Finally, identifying loopholes in existing surveys so detecting forged videos much more effectively in the future are discussed

    On predictive RAHT for dynamic point cloud compression

    Get PDF
    Dissertação (mestrado)—Universidade de Brasília, Faculdade de Tecnologia, Departamento de Engenharia Elétrica, 2021.O aumento no número de aplicações 3D fez necessária a pesquisa e o desenvolvimento de padrões para compressão de nuvem de pontos. Visto que nuvens de pontos representam uma quantidade significativa de dados, padrões de compressão são essenciais para transmissão e armazenamento eficientes desses formatos. Por esse motivo, o Moving Pictures Expert Group (MPEG) iniciou atividades de padronização de technologias para compressão de nuvens de pontos resultando em dois padrões: o Geometry-based Point Cloud Compression (G-PCC) e o Video-based Point Cloud Compression (V-PCC). G-PCC foi desenvolvido para compressão de nuvens de pontos estáticas, aquelas que representam objetos e cenas, e nuvens de pontos adquiridas dinamicamente, obtidas por technologia LiDAR. Por outro lado, V-PCC foi direcionado para compressão de nuvens de pontos dinâmicas, aquelas representadas por diversos quadros temporais semelhantes a sequências de vídeo. Na compressão de nuvens de pontos dinâmicas, os algoritmos para estimar e compensar movimento desempenham um papel essencial. Eles permitem que redundâncias temporais entre quadros sucessivos sejam exploradas, reduzindo significativamente o número de bits necessários para armazenar e transmitir as cenas dinâmicas. Embora técnicas de estimação de movimento já tenham sido estudadas, esses algoritmos para nuvens de pontos ainda são muito complexos e exigem muito poder computacional, tornando-os inadequados para aplicações práticas com restrições de tempo. Portanto, uma solução de estimação de movimento eficiente para nuvens de pontos ainda é um problema de pesquisa em aberto. Com base nisso, o trabalho apresentado nesta dissertação se concentra em explorar o uso de uma predição inter-quadros simples ao lado da region-adaptive hierarchical (or Haar) transform (RAHT). Nosso objetivo é melhorar o desempenho de compressão de atributos da RAHT para nuvens de pontos dinâmicas usando um algoritmo de predição inter-quadros de baixa complexidade. Desenvolvemos esquemas simples combinando a última versão da transformada RAHT com uma etapa preditiva intra-quadros adicionada a uma predição inter-quadros de baixa complexidade para melhorar o desempenho da compressão de nuvens de pontos dinâmicas usando a RAHT. Como mencionado anteriormente, os algoritmos de predição inter-quadros baseados em estimação de movimento ainda são muito complexos para nuvens de pontos. Por esse motivo, usamos uma predição inter-quadros com base na proximidade espacial de voxels vizinhos entre quadros sucessivos. A predição inter-quadros do vizinho mais próximo combina cada voxel no quadro de nuvem de pontos atual com seu voxel mais próximo no quadro imediatamente anterior. Por ser um algoritmo simples, ele pode ser implementado de forma eficiente para aplicações com restrições de tempo. Finalmente, desenvolvemos duas abordagens adaptativas que combinam a predição inter- quadros do vizinho mais próximo ao lado da RAHT com predição intra-quadros. A primeira abordagem desenvolvida é definida como fragment-based multiple decision e a segunda como level-based multiple decision. Ambos os esquemas são capazes de superar o uso apenas da predição intra-quadros ao lado da RAHT para compressão de nuvens de pontos dinâmicas. O algoritmo fragment-based tem um desempenho ligeiramente melhor se comparado ao uso apenas da predição intra-quadros com ganhos Bjontegaard delta (BD) PSNR-Y médios de 0,44 dB e economia média de taxa de bits de 10,57%. O esquema level-based foi capaz de atingir ganhos mais substanciais sobre o uso apenas da predição intra-quadros com ganhos BD PSNR-Y médios de 0,97 dB e economia média de taxa de bits de 21,73%.Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES).The increase in 3D applications made necessary the research and development of standards for point cloud compression. Since point clouds represent a significant amount of data, compression standards are essential to efficiently transmit and store such data. For this reason, the Moving Pictures Expert Group (MPEG) started the standardization activities of point cloud compression algorithms resulting in two standards: the Geometry-based Point Cloud Compression (G-PCC) and the Video-based Point Cloud Compression (V-PCC). G-PCC was designed to address static point clouds, those consisting of objects and scenes, and dynamically acquired point clouds, typically obtained by LiDAR technology. In contrast, V-PCC was addressed to dynamic point clouds, those consisting of several temporal frames similar to a video sequence. In the compression of dynamic point clouds, algorithms to estimate and compensate motion play an essential role. They allow temporal redundancies among successive frames to be further explored, hence, significantly reducing the number of bits required to store and transmit the dynamic scenes. Although motion estimation algorithms have been studied, those algorithms for points clouds are still very complex and demand plenty of computational power, making them unsuitable for practical time-constrained applications. Therefore, an efficient motion estimation solution for point clouds is still an open research problem. Based on that, the work presented in this dissertation focuses on exploring the use of a simple inter-frame prediction alongside the region-adaptive hierarchical (or Haar) transform (RAHT). Our goal is to improve RAHT's attribute compression performance of dynamic point clouds using a low-complexity inter-frame prediction algorithm. We devise simple schemes combining the latest version of RAHT with an intra-frame predictive step added with a low-complexity inter-frame prediction to improve the compression performance of dynamic point clouds using RAHT. As previously mentioned, inter-frame prediction algorithms based on motion estimation are still very complex for point clouds. For this reason, we use an inter-frame prediction based on the spatial proximity of neighboring voxels between successive frames. The nearest-neighbor inter-frame prediction simply matches each voxel in the current point cloud frame to its nearest voxel in the immediately previous frame. Since it is a straightforward algorithm, it can be efficiently implemented for time-constrained applications. Finally, we devised two adaptive approaches that combine the nearest-neighbor prediction alongside the intra-frame predictive RAHT. The first designed approach is referred to as fragment-based multiple decision, and the second is referred to as level-based multiple decision. Both schemes are capable of outperforming the use of only the intra-frame prediction alongside RAHT in the compression of dynamic point clouds. The fragment- based algorithm is capable of slightly outperforming the use of only the intra-frame prediction with average Bjontegaard delta (BD) PSNR-Y gains of 0.44 dB and bitrate savings of 10.57%. The level-based scheme achieves more substantial gains over the use of only the intra-frame prediction with average BD PSNR-Y gains of 0.97 dB and bitrate savings of 21.73%

    Efficient compression of motion compensated residuals

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    corecore