433 research outputs found
Super-resolution assessment and detection
Super Resolution (SR) techniques are powerful digital manipulation tools that have significantly impacted various industries due to their ability to enhance the resolution of lower quality images and videos. Yet, the real-world adaptation of SR models poses numerous challenges, which blind SR models aim to overcome by emulating complex real-world degradations. In this thesis, we investigate these SR techniques, with a particular focus on comparing the performance of blind models to their non-blind counterparts under various conditions. Despite recent progress, the proliferation of SR techniques raises concerns about their potential misuse. These methods can easily manipulate real digital content and create misrepresentations, which highlights the need for robust SR detection mechanisms. In our study, we analyze the limitations of current SR detection techniques and propose a new detection system that exhibits higher performance in discerning real and upscaled videos. Moreover, we conduct several experiments to gain insights into the strengths and weaknesses of the detection models, providing a better understanding of their behavior and limitations. Particularly, we target 4K videos, which are rapidly becoming the standard resolution in various fields such as streaming services, gaming, and content creation. As part of our research, we have created and utilized a unique dataset in 4K resolution, specifically designed to facilitate the investigation of SR techniques and their detection
The quality of experience of emerging display technologies
As new display technologies emerge and become part of everyday life, the understanding of the visual experience they provide becomes more relevant. The cognition of perception is the most vital component of visual experience; however, it is not the only cognition that contributes to the complex overall experience of the end-user. Expectations can create significant cognitive bias that may even override what the user
genuinely perceives. Even if a visualization technology is somewhat novel, expectations can be fuelled by prior experiences gained from using similar displays and, more importantly, even a single word or an acronym may induce serious preconceptions, especially if such word suggests excellence in quality. In this interdisciplinary Ph.D. thesis, the effect of minimal, one-word labels on the Quality of Experience (QoE) is investigated in a series of subjective tests. In the studies carried out on an ultra-high-definition (UHD) display, UHD video contents
were directly compared to their HD counterparts, with and without labels explicitly informing the test participants about the resolution of each stimulus. The experiments on High Dynamic Range (HDR) visualization addressed the effect of the word “premium” on the quality aspects of HDR video, and also how this may affect the perceived duration of stalling events. In order to support the findings,
additional tests were carried out comparing the stalling detection thresholds of HDR video with conventional Low Dynamic Range (LDR) video. The third emerging technology addressed by this thesis is light field visualization. Due to its novel nature and the lack of comprehensive, exhaustive research on the QoE of light field displays and content parameters at the time of this thesis, instead
of investigating the labeling effect, four phases of subjective studies were performed on light field QoE. The first phases started with fundamental research, and the experiments progressed towards the concept and evaluation of the dynamic adaptive streaming of light field video, introduced in the final phase
Perceptual quality of 4K-resolution video content compared to HD
With the introduction of 4K UHD video and display resolution, questions arise on the perceptual differences between 4K UHD and upsampled HD video content. In this paper, a striped pair comparison has been performed on a diverse set of 4K UHD video sources. The goal was to subjectively assess the perceived sharpness of 4K UHD and downscaled/upscaled HD video. A striped pair comparison has been applied in order to make the test as straightforward as possible for a non-expert participant population. Under these conditions and over this set of sequences, on average, on 54.8% of the sequences (17 out of 31), 4K UHD resolution content could be identified as being sharper compared to its HD down and upsampled alternative. The probabilities in which 4K UHD could be differentiated from downscaled/upscaled HD range from 83.3% for the easiest to assess sequence down to 39.7% for the most difficult sequence. Although significance tests demonstrate there is a positive sharpness difference from camera quality 4K UHD content compared to the HD downscaled/upscaled variations, it is very content dependent and all circumstances have been chosen in favor of the 4K UHD representation. The results of this test can contribute to the research process of developing metrics indicating visibility of high resolution features within specific content
Towards Real World HDRTV Reconstruction: A Data Synthesis-based Approach
Existing deep learning based HDRTV reconstruction methods assume one kind of
tone mapping operators (TMOs) as the degradation procedure to synthesize
SDRTV-HDRTV pairs for supervised training. In this paper, we argue that,
although traditional TMOs exploit efficient dynamic range compression priors,
they have several drawbacks on modeling the realistic degradation: information
over-preservation, color bias and possible artifacts, making the trained
reconstruction networks hard to generalize well to real-world cases. To solve
this problem, we propose a learning-based data synthesis approach to learn the
properties of real-world SDRTVs by integrating several tone mapping priors into
both network structures and loss functions. In specific, we design a
conditioned two-stream network with prior tone mapping results as a guidance to
synthesize SDRTVs by both global and local transformations. To train the data
synthesis network, we form a novel self-supervised content loss to constraint
different aspects of the synthesized SDRTVs at regions with different
brightness distributions and an adversarial loss to emphasize the details to be
more realistic. To validate the effectiveness of our approach, we synthesize
SDRTV-HDRTV pairs with our method and use them to train several HDRTV
reconstruction networks. Then we collect two inference datasets containing both
labeled and unlabeled real-world SDRTVs, respectively. Experimental results
demonstrate that, the networks trained with our synthesized data generalize
significantly better to these two real-world datasets than existing solutions
Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark
HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for
over-the-top (OTT) video streaming services, due to its ability to deliver a
seamless streaming experience. A key component of HAS is the bitrate ladder,
which provides the encoding parameters (e.g., bitrate-resolution pairs) to
encode the source video. The representations in the bitrate ladder allow the
client's player to dynamically adjust the quality of the video stream based on
network conditions by selecting the most appropriate representation from the
bitrate ladder. The most straightforward and lowest complexity approach
involves using a fixed bitrate ladder for all videos, consisting of
pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely,
the most reliable technique relies on intensively encoding all resolutions over
a wide range of bitrates to build the convex hull, thereby optimizing the
bitrate ladder for each specific video. Several techniques have been proposed
to predict content-based ladders without performing a costly exhaustive search
encoding. This paper provides a comprehensive review of various methods,
including both conventional and learning-based approaches. Furthermore, we
conduct a benchmark study focusing exclusively on various learning-based
approaches for predicting content-optimized bitrate ladders across multiple
codec settings. The considered methods are evaluated on our proposed
large-scale dataset, which includes 300 UHD video shots encoded with software
and hardware encoders using three state-of-the-art encoders, including
AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis
provides baseline methods and insights, which will be valuable for future
research in the field of bitrate ladder prediction. The source code of the
proposed benchmark and the dataset will be made publicly available upon
acceptance of the paper
- …