42 research outputs found

    Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark

    Full text link
    HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for over-the-top (OTT) video streaming services, due to its ability to deliver a seamless streaming experience. A key component of HAS is the bitrate ladder, which provides the encoding parameters (e.g., bitrate-resolution pairs) to encode the source video. The representations in the bitrate ladder allow the client's player to dynamically adjust the quality of the video stream based on network conditions by selecting the most appropriate representation from the bitrate ladder. The most straightforward and lowest complexity approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly exhaustive search encoding. This paper provides a comprehensive review of various methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study focusing exclusively on various learning-based approaches for predicting content-optimized bitrate ladders across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art encoders, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides baseline methods and insights, which will be valuable for future research in the field of bitrate ladder prediction. The source code of the proposed benchmark and the dataset will be made publicly available upon acceptance of the paper

    Color correction for stereo and multiview coding

    No full text
    International audienceNowadays, various multimedia applications use multi-view video (MVV), which is obtained by capturing the same scene with multiple cameras from varying viewpoints. Therefore, illumination and color variations can be observed among the different views. These color inconsistencies can significantly reduce compression efficiency and rendering quality. Various methods have been proposed in the literature to compensate for these color mismatches. In this chapter, we review the commonly used color correction techniques for MVV, with a focus on coding applications. Experimental evaluations of the most prominent methods are provided to give the reader the opportunity to compare their performances in terms of visual effect, coding performance, and rendering quality

    No-reference perceptual blur metric for stereoscopic images

    No full text
    International audienceIn this paper, we propose a no-reference perceptual blur metric for 3D stereoscopic images. The proposed approach relies on computing perceptual local blurriness map for each image of the stereo pair. To take into account the disparity/depth masking effect, we modulate the obtained perceptual score at each position of the blurriness maps according to its location in the scene. Under the assumption that, in case of asymmetric stereoscopic image quality, 3D perception mechanisms place more emphasis on the view providing the most important and contrasted information, the two derived local blurriness maps are combined using weighting factors based on local information content. Thanks to the inclusion of those psychophysical findings, the proposed metric handles efficiently symmetric as well as asymmetric distortions. Experimental results show that the proposed metric correlates better with human perception than state-of-the-art metrics

    Color calibration of multi-view video plus depth for advanced 3D video

    No full text
    International audienceMulti-view video plus depth (MVD) format is considered as the next-generation standard for advanced 3D video systems. MVD consists of multiple color videos with a depth value associated with each texture pixel. Relying on this representation and by using depth-image-based rendering techniques, new viewpoints for multi-view video applications can be generated. However, since MVD is captured from different viewing angles with different cameras, significant illumination and color differences can be observed between views. These color mismatches degrade the performance of view rendering algorithms by introducing visible artifacts leading to a reduced view synthesis quality. To cope with this issue, we propose an effective method for correcting color inconsistencies in MVD. Firstly, to avoid occlusion problems and allow performing correction in the most accurate way, we consider only the overlapping region when calculating the color mapping function. These common regions are determined using a reliable feature matching technique. Also, to maintain the temporal coherence, correction is applied on a temporal sliding window. Experimental results show that the proposed method reduces the color difference between views and improves view rendering process providing high-quality results

    Perceptually Driven Non-Uniform Asymmetric Coding of Stereoscopic 3D Video

    No full text
    International audienceAsymmetric stereoscopic video coding has already proven its effectiveness in reducing the bandwidth required for stereoscopic 3D delivery without degrading the visual quality. This approach, where the left and right views are encoded with different levels of quality, relies on the perceptual theory of binocular suppression. However, to ensure comfortable 3D viewing, the just noticeable level of asymmetry, i.e., the maximum quality gap between views, has to be carefully defined. Both subjectively- and empirically-fixed thresholds of asymmetry demonstrated either the maladjustment to content or dependency to the experimental design. This paper describes a new non-uniform asymmetric stereoscopic video coding method adjusting adaptively the level of asymmetry for each region of the image based on its perceptual significance. The proposed method uses a fully automated model that dynamically determines the best bounds of asymmetry for which the 3D viewing experience will not be altered. This is achieved by exploiting several HVS-inspired models, namely, the binocular just noticeable difference (BJND), the visual saliency map and depth information. Simulation results show that theproposed method results in up to 26% of bit rate saving and provides better 3D visual quality compared to state-of-the-art asymmetric coding methods

    Quality Assessment of Out-of-Focus Blurred Images Based on Objects Depth Ordering and Saliency

    No full text
    International audienceBlur is one of the most encountered visual distortions in images. It can be either deliberately introduced to highlight some objects, or caused by acquisition/processing. Both cases usually induce spatially-varying blur or out-of-focus blur. Despite its wide occurrence, only a few dedicated image quality metrics can be found in the literature. Most of the proposed metrics are based on the assumption of uniformly blurred images. Consequently, in this paper, we propose a quality assessment framework handling both types of blur and predicting their inherent level of annoyance. To achieve this aim, a local perceptual blurriness map providing the level of blur at each location in an image is first generated. Then, depth ordering is obtained from the image in order to characterize the placement of the image objects in the scene. Next, the visual saliency information is computed to take into account the visual importance of each object. Finally, the local perceptual blurriness map is weighted using both objects depth ordering and saliency maps to provide final scores of blur. Experimental results show that the proposed metric achieves good prediction performance compared to state-of-the-art metrics

    Stereoscopic 3D image quality assessment based on cyclopean view and depth map

    No full text
    International audienceThis paper presents a full reference quality assessment metric for stereoscopic images based on perceptual binocular characteristics. To ensure that the predicted 3D quality of experience is as reliable and close as possible to 3D human perception, the proposed stereoscopic image quality assessment (SIQA) method is relying on the cyclopean image. Our approach is motivated by the fact that in case of asymmetric quality, 3D perception mechanisms place more emphasis on the view providing the most important and contrasted information. We integrated this psychophysical findings in the proposed 3D-IQA framework thanks to a weighting factor based on local information content. Add to that, to take into account the disparity/depth masking effect, we modulate the obtained quality score of each pixel of the cyclopean image according to its location in the scene. Experimental results show that the proposed metric correlates better with human judgement than the state-of-the-art metrics

    EGB: Image Quality Assessment based on Ensemble of Gradient Boosting

    No full text
    International audienceMultimedia services are constantly trying to deliver better image quality to users. To meet this need, they must have an effective and reliable tool to assess the perceptual image quality. This is particularly true for image restoration (IR) algorithms, where the image quality assessment (IQA) metric plays a key role in the development of these latter. For instance, the recent advances in IR algorithms, which are mainly due to the adoption of generative adversarial network (GAN)-based methods, have clearly shown the need for a reliable IQA metric highly correlated with human judgment. In this paper, we propose an ensemble of gradient boosting (EGB) metric based on selected features similarity and ensemble learning. First, we analyzed the capability of features extracted by different layers of deep convolutional neural network (CNN) to characterize the perceptual quality distance between the reference and distorted/processed images. We observed that a subset of these layers is more relevant to the IQA task. Accordingly, we exploited these selected layers to compute the features similarity, which are then used as input to a regression network to predict the image quality score. The regression network consists of three gradient boosting regression models that are combined to derive the final quality score. Experiments were performed on the perceptual image processing algorithms (PIPAL) dataset, which has been used in the NTIRE 2021 perceptual image quality assessment challenge. The results show that the proposed metric significantly outperforms the state-of-the-art methods for IQA task

    Detect and defense against adversarial examples in deep learning using natural scene statistics and adaptive denoising

    No full text
    International audienceDespite the enormous performance of deep neural networks (DNNs), recent studies have shown their vulnerability to adversarial examples (AEs), i.e., carefully perturbed inputs designed to fool the targeted DNN. Currently, the literature is rich with many effective attacks to craft such AEs. Meanwhile, many defense strategies have been developed to mitigate this vulnerability. However, these latter showed their effectiveness against specific attacks and does not generalize well to different attacks. In this paper, we propose a framework for defending DNN classifier against adversarial samples. The proposed method is based on a two-stage framework involving a separate detector and a denoising block. The detector aims to detect AEs by characterizing them through the use of natural scene statistic (NSS), where we demonstrate that these statistical features are altered by the presence of adversarial perturbations. The denoiser is based on block matching 3D (BM3D) filter fed by an optimum threshold value estimated by a convolutional neural network (CNN) to project back the samples detected as AEs into their data manifold. We conducted a complete evaluation on three standard datasets, namely MNIST, CIFAR-10 and Tiny-ImageNet. The experimental results show that the proposed defense method outperforms the state-of-the-art defense techniques by improving the robustness against a set of attacks under black-box, gray-box and white-box settings. The source code is available at: https://github.com/kherchouche-anouar/2DAE
    corecore