42 research outputs found

    Motion hints based video coding

    Full text link
    The persistent growth of video-based applications is heavily dependent on the advancements in video coding systems. Modern video codecs use the motion model itself to describe the geometric boundaries of moving objects in video sequences and thereby spend a significant portion of their bit rate refining the motion description in regions where motion discontinuities exist. This explicit communication of motion introduces redundancy, since some aspects of the motion can at least partially be inferred from the reference frames. In this thesis work, a novel bi-directional motion hints based prediction paradigm is proposed that moves away from the traditional redundant approach of careful partitioning around object boundaries by exploiting the spatial structure of the reference frames to infer appropriate boundaries for the intermediate ones. Motion hint provide a global description of motion over specific domain. Fundamentally this is related to the segmentation of foreground from background regions where the foreground and background motions are the motion hints. The appealing thing about motion hints is that they are continuous and invertible, even though the observed motion field for a frame is discontinuous and non-invertible. Experimental results show that at low bit rate applications, the motion hints based coder achieved a rate-distortion (RD) gain of 0.81 dB, or equivalently 13.38% savings in bit rate over the H.264/AVC reference. In a hybrid setting, this gain increased to 0.94 dB and 20.41% bit rebate is obtained. If both low and high bit rate scenarios are considered then the hybrid coder showed a RD performance of 0.80 dB, or equivalently 16.57% savings in bit rate. The usage of higher fractional pixel accurate motion hint, predictive coding of motion hint, a memory-based initialization for motion hint estimation improved the RD gain to 0.85 dB and 17.55% of bit rebate. The prediction framework is highly flexible in the sense that the motion model order for the hints can be content adaptive i.e. it can accommodate different motion models like affine, elastic, etc. Detecting motion discontinuity macroblocks (MBs) is a challenging task and the prediction paradigm managed to detect a significant number of such MBs. If the motion hints based prediction is used as a prediction mode for MBs, at low bit rates almost 50% of the motion discontinuity MBs chose to use affine hint mode and this number increased to 60% if elastic hint is used

    Análise do HEVC escalável : desempenho e controlo de débito

    Get PDF
    Mestrado em Engenharia Eletrónica e TelecomunicaçõesEsta dissertação apresenta um estudo da norma de codificação de vídeo de alta eficiência (HEVC) e a sua extensão para vídeo escalável, SHVC. A norma de vídeo SHVC proporciona um melhor desempenho quando codifica várias camadas em simultâneo do que quando se usa o codificador HEVC numa configuração simulcast. Ambos os codificadores de referência, tanto para a camada base como para a camada superior usam o mesmo modelo de controlo de débito, modelo R-λ, que foi otimizado para o HEVC. Nenhuma otimização de alocação de débito entre camadas foi até ao momento proposto para o modelo de testes (SHM 8) para a escalabilidade do HEVC (SHVC). Derivamos um novo modelo R-λ apropriado para a camada superior e para o caso de escalabilidade espacial, que conduziu a um ganho de BD-débito de 1,81% e de BD-PSNR de 0,025 em relação ao modelo de débito-distorção existente no SHM do SHVC. Todavia, mostrou-se também nesta dissertação que o proposto modelo de R-λ não deve ser usado na camada inferior (camada base) no SHVC e por conseguinte no HEVC.This dissertation provides a study of the High Efficiency Video Coding standard (HEVC) and its scalable extension, SHVC. The SHVC provides a better performance when encoding several layers simultaneously than using an HEVC encoder in a simulcast configuration. Both reference encoders, in the base layer and in the enhancement layer use the same rate control model, R-λ model, which was optimized for HEVC. No optimal bitrate partitioning amongst layers is proposed in scalable HEVC (SHVC) test model (SHM 8). We derived a new R-λ model for the enhancement layer and for the spatial case which led to a DB-rate gain of 1.81% and DB-PSNR gain of 0.025 in relation to the rate-distortion model of SHM-SHVC. Nevertheless, we also show in this dissertation that the proposed model of R-λ should not be used neither in the base layer nor in HEVC

    Towards Computational Efficiency of Next Generation Multimedia Systems

    Get PDF
    To address throughput demands of complex applications (like Multimedia), a next-generation system designer needs to co-design and co-optimize the hardware and software layers. Hardware/software knobs must be tuned in synergy to increase the throughput efficiency. This thesis provides such algorithmic and architectural solutions, while considering the new technology challenges (power-cap and memory aging). The goal is to maximize the throughput efficiency, under timing- and hardware-constraints

    Low-complexity scalable and multiview video coding

    Get PDF

    Data-driven visual quality estimation using machine learning

    Get PDF
    Heutzutage werden viele visuelle Inhalte erstellt und sind zugänglich, was auf Verbesserungen der Technologie wie Smartphones und das Internet zurückzuführen ist. Es ist daher notwendig, die von den Nutzern wahrgenommene Qualität zu bewerten, um das Erlebnis weiter zu verbessern. Allerdings sind nur wenige der aktuellen Qualitätsmodelle speziell für höhere Auflösungen konzipiert, sagen mehr als nur den Mean Opinion Score vorher oder nutzen maschinelles Lernen. Ein Ziel dieser Arbeit ist es, solche maschinellen Modelle für höhere Auflösungen mit verschiedenen Datensätzen zu trainieren und zu evaluieren. Als Erstes wird eine objektive Analyse der Bildqualität bei höheren Auflösungen durchgeführt. Die Bilder wurden mit Video-Encodern komprimiert, hierbei weist AV1 die beste Qualität und Kompression auf. Anschließend werden die Ergebnisse eines Crowd-Sourcing-Tests mit einem Labortest bezüglich Bildqualität verglichen. Weiterhin werden auf Deep Learning basierende Modelle für die Vorhersage von Bild- und Videoqualität beschrieben. Das auf Deep Learning basierende Modell ist aufgrund der benötigten Ressourcen für die Vorhersage der Videoqualität in der Praxis nicht anwendbar. Aus diesem Grund werden pixelbasierte Videoqualitätsmodelle vorgeschlagen und ausgewertet, die aussagekräftige Features verwenden, welche Bild- und Bewegungsaspekte abdecken. Diese Modelle können zur Vorhersage von Mean Opinion Scores für Videos oder sogar für anderer Werte im Zusammenhang mit der Videoqualität verwendet werden, wie z.B. einer Bewertungsverteilung. Die vorgestellte Modellarchitektur kann auf andere Videoprobleme angewandt werden, wie z.B. Videoklassifizierung, Vorhersage der Qualität von Spielevideos, Klassifikation von Spielegenres oder der Klassifikation von Kodierungsparametern. Ein wichtiger Aspekt ist auch die Verarbeitungszeit solcher Modelle. Daher wird ein allgemeiner Ansatz zur Beschleunigung von State-of-the-Art-Videoqualitätsmodellen vorgestellt, der zeigt, dass ein erheblicher Teil der Verarbeitungszeit eingespart werden kann, während eine ähnliche Vorhersagegenauigkeit erhalten bleibt. Die Modelle sind als Open Source veröffentlicht, so dass die entwickelten Frameworks für weitere Forschungsarbeiten genutzt werden können. Außerdem können die vorgestellten Ansätze als Bausteine für neuere Medienformate verwendet werden.Today a lot of visual content is accessible and produced, due to improvements in technology such as smartphones and the internet. This results in a need to assess the quality perceived by users to further improve the experience. However, only a few of the state-of-the-art quality models are specifically designed for higher resolutions, predict more than mean opinion score, or use machine learning. One goal of the thesis is to train and evaluate such machine learning models of higher resolutions with several datasets. At first, an objective evaluation of image quality in case of higher resolutions is performed. The images are compressed using video encoders, and it is shown that AV1 is best considering quality and compression. This evaluation is followed by the analysis of a crowdsourcing test in comparison with a lab test investigating image quality. Afterward, deep learning-based models for image quality prediction and an extension for video quality are proposed. However, the deep learning-based video quality model is not practically usable because of performance constrains. For this reason, pixel-based video quality models using well-motivated features covering image and motion aspects are proposed and evaluated. These models can be used to predict mean opinion scores for videos, or even to predict other video quality-related information, such as a rating distributions. The introduced model architecture can be applied to other video problems, such as video classification, gaming video quality prediction, gaming genre classification or encoding parameter estimation. Furthermore, one important aspect is the processing time of such models. Hence, a generic approach to speed up state-of-the-art video quality models is introduced, which shows that a significant amount of processing time can be saved, while achieving similar prediction accuracy. The models have been made publicly available as open source so that the developed frameworks can be used for further research. Moreover, the presented approaches may be usable as building blocks for newer media formats

    The quality of experience of emerging display technologies

    Get PDF
    As new display technologies emerge and become part of everyday life, the understanding of the visual experience they provide becomes more relevant. The cognition of perception is the most vital component of visual experience; however, it is not the only cognition that contributes to the complex overall experience of the end-user. Expectations can create significant cognitive bias that may even override what the user genuinely perceives. Even if a visualization technology is somewhat novel, expectations can be fuelled by prior experiences gained from using similar displays and, more importantly, even a single word or an acronym may induce serious preconceptions, especially if such word suggests excellence in quality. In this interdisciplinary Ph.D. thesis, the effect of minimal, one-word labels on the Quality of Experience (QoE) is investigated in a series of subjective tests. In the studies carried out on an ultra-high-definition (UHD) display, UHD video contents were directly compared to their HD counterparts, with and without labels explicitly informing the test participants about the resolution of each stimulus. The experiments on High Dynamic Range (HDR) visualization addressed the effect of the word “premium” on the quality aspects of HDR video, and also how this may affect the perceived duration of stalling events. In order to support the findings, additional tests were carried out comparing the stalling detection thresholds of HDR video with conventional Low Dynamic Range (LDR) video. The third emerging technology addressed by this thesis is light field visualization. Due to its novel nature and the lack of comprehensive, exhaustive research on the QoE of light field displays and content parameters at the time of this thesis, instead of investigating the labeling effect, four phases of subjective studies were performed on light field QoE. The first phases started with fundamental research, and the experiments progressed towards the concept and evaluation of the dynamic adaptive streaming of light field video, introduced in the final phase

    Tatouage du flux compressé MPEG-4 AVC

    Get PDF
    La présente thèse aborde le sujet de tatouage du flux MPEG-4 AVC sur ses deux volets théoriques et applicatifs en considérant deux domaines applicatifs à savoir la protection du droit d auteur et la vérification de l'intégrité du contenu. Du point de vue théorique, le principal enjeu est de développer un cadre de tatouage unitaire en mesure de servir les deux applications mentionnées ci-dessus. Du point de vue méthodologique, le défi consiste à instancier ce cadre théorique pour servir les applications visées. La première contribution principale consiste à définir un cadre théorique pour le tatouage multi symboles à base de modulation d index de quantification (m-QIM). La règle d insertion QIM a été généralisée du cas binaire au cas multi-symboles et la règle de détection optimale (minimisant la probabilité d erreur à la détection en condition du bruit blanc, additif et gaussien) a été établie. Il est ainsi démontré que la quantité d information insérée peut être augmentée par un facteur de log2m tout en gardant les mêmes contraintes de robustesse et de transparence. Une quantité d information de 150 bits par minutes, soit environ 20 fois plus grande que la limite imposée par la norme DCI est obtenue. La deuxième contribution consiste à spécifier une opération de prétraitement qui permet d éliminer les impactes du phénomène du drift (propagation de la distorsion) dans le flux compressé MPEG-4 AVC. D abord, le problème a été formalisé algébriquement en se basant sur les expressions analytiques des opérations d encodage. Ensuite, le problème a été résolu sous la contrainte de prévention du drift. Une amélioration de la transparence avec des gains de 2 dB en PSNR est obtenueThe present thesis addresses the MPEG-4 AVC stream watermarking and considers two theoretical and applicative challenges, namely ownership protection and content integrity verification.From the theoretical point of view, the thesis main challenge is to develop a unitary watermarking framework (insertion/detection) able to serve the two above mentioned applications in the compressed domain. From the methodological point of view, the challenge is to instantiate this theoretical framework for serving the targeted applications. The thesis first main contribution consists in building the theoretical framework for the multi symbol watermarking based on quantization index modulation (m-QIM). The insertion rule is analytically designed by extending the binary QIM rule. The detection rule is optimized so as to ensure minimal probability of error under additive white Gaussian noise distributed attacks. It is thus demonstrated that the data payload can be increased by a factor of log2m, for prescribed transparency and additive Gaussian noise power. A data payload of 150 bits per minute, i.e. about 20 times larger than the limit imposed by the DCI standard, is obtained. The thesis second main theoretical contribution consists in specifying a preprocessing MPEG-4 AVC shaping operation which can eliminate the intra-frame drift effect. The drift represents the distortion spread in the compressed stream related to the MPEG encoding paradigm. In this respect, the drift distortion propagation problem in MPEG-4 AVC is algebraically expressed and the corresponding equations system is solved under drift-free constraints. The drift-free shaping results in gain in transparency of 2 dB in PSNREVRY-INT (912282302) / SudocSudocFranceF

    Optimization of Pattern Matching Algorithms for Multi- and Many-Core Platforms

    Get PDF
    Image and video compression play a major role in the world today, allowing the storage and transmission of large multimedia content volumes. However, the processing of this information requires high computational resources, hence the improvement of the computational performance of these compression algorithms is very important. The Multidimensional Multiscale Parser (MMP) is a pattern-matching-based compression algorithm for multimedia contents, namely images, achieving high compression ratios, maintaining good image quality, Rodrigues et al. [2008]. However, in comparison with other existing algorithms, this algorithm takes some time to execute. Therefore, two parallel implementations for GPUs were proposed by Ribeiro [2016] and Silva [2015] in CUDA and OpenCL-GPU, respectively. In this dissertation, to complement the referred work, we propose two parallel versions that run the MMP algorithm in CPU: one resorting to OpenMP and another that converts the existing OpenCL-GPU into OpenCL-CPU. The proposed solutions are able to improve the computational performance of MMP by 3 and 2:7 , respectively. The High Efficiency Video Coding (HEVC/H.265) is the most recent standard for compression of image and video. Its impressive compression performance, makes it a target for many adaptations, particularly for holoscopic image/video processing (or light field). Some of the proposed modifications to encode this new multimedia content are based on geometry-based disparity compensations (SS), developed by Conti et al. [2014], and a Geometric Transformations (GT) module, proposed by Monteiro et al. [2015]. These compression algorithms for holoscopic images based on HEVC present an implementation of specific search for similar micro-images that is more efficient than the one performed by HEVC, but its implementation is considerably slower than HEVC. In order to enable better execution times, we choose to use the OpenCL API as the GPU enabling language in order to increase the module performance. With its most costly setting, we are able to reduce the GT module execution time from 6.9 days to less then 4 hours, effectively attaining a speedup of 45

    H2-Stereo: High-Speed, High-Resolution Stereoscopic Video System

    Full text link
    High-speed, high-resolution stereoscopic (H2-Stereo) video allows us to perceive dynamic 3D content at fine granularity. The acquisition of H2-Stereo video, however, remains challenging with commodity cameras. Existing spatial super-resolution or temporal frame interpolation methods provide compromised solutions that lack temporal or spatial details, respectively. To alleviate this problem, we propose a dual camera system, in which one camera captures high-spatial-resolution low-frame-rate (HSR-LFR) videos with rich spatial details, and the other captures low-spatial-resolution high-frame-rate (LSR-HFR) videos with smooth temporal details. We then devise a Learned Information Fusion network (LIFnet) that exploits the cross-camera redundancies to enhance both camera views to high spatiotemporal resolution (HSTR) for reconstructing the H2-Stereo video effectively. We utilize a disparity network to transfer spatiotemporal information across views even in large disparity scenes, based on which, we propose disparity-guided flow-based warping for LSR-HFR view and complementary warping for HSR-LFR view. A multi-scale fusion method in feature domain is proposed to minimize occlusion-induced warping ghosts and holes in HSR-LFR view. The LIFnet is trained in an end-to-end manner using our collected high-quality Stereo Video dataset from YouTube. Extensive experiments demonstrate that our model outperforms existing state-of-the-art methods for both views on synthetic data and camera-captured real data with large disparity. Ablation studies explore various aspects, including spatiotemporal resolution, camera baseline, camera desynchronization, long/short exposures and applications, of our system to fully understand its capability for potential applications
    corecore