11 research outputs found

    PEA265: Perceptual Assessment of Video Compression Artifacts

    Full text link
    The most widely used video encoders share a common hybrid coding framework that includes block-based motion estimation/compensation and block-based transform coding. Despite their high coding efficiency, the encoded videos often exhibit visually annoying artifacts, denoted as Perceivable Encoding Artifacts (PEAs), which significantly degrade the visual Qualityof- Experience (QoE) of end users. To monitor and improve visual QoE, it is crucial to develop subjective and objective measures that can identify and quantify various types of PEAs. In this work, we make the first attempt to build a large-scale subjectlabelled database composed of H.265/HEVC compressed videos containing various PEAs. The database, namely the PEA265 database, includes 4 types of spatial PEAs (i.e. blurring, blocking, ringing and color bleeding) and 2 types of temporal PEAs (i.e. flickering and floating). Each containing at least 60,000 image or video patches with positive and negative labels. To objectively identify these PEAs, we train Convolutional Neural Networks (CNNs) using the PEA265 database. It appears that state-of-theart ResNeXt is capable of identifying each type of PEAs with high accuracy. Furthermore, we define PEA pattern and PEA intensity measures to quantify PEA levels of compressed video sequence. We believe that the PEA265 database and our findings will benefit the future development of video quality assessment methods and perceptually motivated video encoders.Comment: 10 pages,15 figures,4 table

    Multi-Frame Quality Enhancement for Compressed Video

    Full text link
    The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, ignoring the similarity between consecutive frames. In this paper, we investigate that heavy quality fluctuation exists across compressed video frames, and thus low quality frames can be enhanced using the neighboring high quality frames, seen as Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as a first attempt in this direction. In our approach, we firstly develop a Support Vector Machine (SVM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are as the input. The MF-CNN compensates motion between the non-PQF and PQFs through the Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help of its nearest PQFs. Finally, the experiments validate the effectiveness and generality of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code of our MFQE approach is available at https://github.com/ryangBUAA/MFQE.gitComment: to appear in CVPR 201

    Visual Distortions in 360-degree Videos.

    Get PDF
    Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers' visual perception and the immersive experience at large is still unknown-thus, it is an open research topic-this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications

    Visual Distortions in 360-degree Videos

    Get PDF
    Omnidirectional (or 360-degree) images and videos are emergent signals in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality, they allow an immersive experience in which the user is provided with a 360-degree field of view and can navigate throughout a scene, e.g., through the use of Head Mounted Displays. Since it represents the full 360-degree field of view from one point of the scene, omnidirectional content is naturally represented as spherical visual signals. Current approaches for capturing, processing, delivering, and displaying 360-degree content, however, present many open technical challenges and introduce several types of distortions in these visual signals. Some of the distortions are specific to the nature of 360-degree images, and often different from those encountered in the classical image communication framework. This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications. While their impact on viewers' visual perception and on the immersive experience at large is still unknown ---thus, it stays an open research topic--- this review serves the purpose of identifying the main causes of visual distortions in the end-to-end 360-degree content distribution pipeline. It is essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications. It is also necessary to the deployment of proper psychovisual studies to characterise the human perception of these new images in interactive and immersive applications

    Impact of media-related SIFs on QoE for H.265/HEVC video streaming

    Get PDF
    Long term evolution (LTE) is the fastest-deployed mobile broadband technology driven by demand for improved user experience. It has distinguished itself compared to other mobile broadband technologies in its ability to handle the growth of video traffic that has become an important part of user’s mobile broadband experience. Growing trend of video consumption implies that that media-related system influence factors (SIFs) should be identified and well-understood in order to determine how they affect the user’s quality of experience (QoE). Therefore, this paper aims to provide a deeper understanding of media-related SIFs and their impact on QoE for video streaming. Experimental study has included two phases, i.e., H.265/ high efficiency video coding (HEVC) coded video streaming emulation over LTE network and end-user survey for collecting mean opinion score (MOS). Results obtained from statistical analysis imply that there exists strong and statistically significant impact of individual media-related SIFs and their interaction on QoE for video streaming

    Generalized Rate-Distortion Functions of Videos

    Get PDF
    Customers are consuming enormous digital videos every day via various kinds of video services through terrestrial, cable, and satellite communication systems or over-the-top Internet connections. To offer the best possible services using the limited capacity of video distribution systems, these video services desire precise understanding of the relationship between the perceptual quality of a video and its media attributes, for which we term it the GRD function. In this thesis, we focus on accurately estimating the generalized rate-distortion (GRD) function with a minimal number of measurement queries. We first explore the GRD behavior of compressed digital videos in a two-dimensional space of bitrate and resolution. Our analysis on real-world GRD data reveals that all GRD functions share similar regularities, but meanwhile exhibit considerable variations across different combinations of content and encoder types. Based on the analysis, we define the theoretical space of the GRD function, which not only constructs the groundwork of the form a GRD model should take, but also determines the constraints these functions must satisfy. We propose two computational GRD models. In the first model, we assume that the quality scores are precise, and develop a robust axial-monotonic Clough-Tocher (RAMCT) interpolation method to approximate the GRD function from a moderate number of measurements. In the second model, we show that the GRD function space is a convex set residing in a Hilbert space, and that a GRD function can be estimated by solving a projection problem onto the convex set. By analyzing GRD functions that arise in practice, we approximate the infinite-dimensional theoretical space by a low-dimensional one, based on which an empirical GRD model of few parameters is proposed. To further reduce the number of queries, we present a novel sampling scheme based on a probabilistic model and an information measure. The proposed sampling method generates a sequence of queries by minimizing the overall informativeness of the remaining samples. To evaluate the performance of the GRD estimation methods, we collect a large-scale database consisting of more than 4,0004,000 real-world GRD functions, namely the Waterloo generalized rate-distortion (Waterloo GRD) database. Extensive comparison experiments are carried out on the database. Superiority of the two proposed GRD models over state-of-the-art approaches are attested both quantitatively and visually. Meanwhile, it is also validated that the proposed sampling algorithm consistently reduces the number of queries needed by various GRD estimation algorithms. Finally, we show the broad application scope of the proposed GRD models by exemplifying three applications: rate-distortion curve prediction, per-title encoding profile generation, and video encoder comparison
    corecore