24,208 research outputs found
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
Graph Spectral Image Processing
Recent advent of graph signal processing (GSP) has spurred intensive studies
of signals that live naturally on irregular data kernels described by graphs
(e.g., social networks, wireless sensor networks). Though a digital image
contains pixels that reside on a regularly sampled 2D grid, if one can design
an appropriate underlying graph connecting pixels with weights that reflect the
image structure, then one can interpret the image (or image patch) as a signal
on a graph, and apply GSP tools for processing and analysis of the signal in
graph spectral domain. In this article, we overview recent graph spectral
techniques in GSP specifically for image / video processing. The topics covered
include image compression, image restoration, image filtering and image
segmentation
Solving Inverse Problems with Piecewise Linear Estimators: From Gaussian Mixture Models to Structured Sparsity
A general framework for solving image inverse problems is introduced in this
paper. The approach is based on Gaussian mixture models, estimated via a
computationally efficient MAP-EM algorithm. A dual mathematical interpretation
of the proposed framework with structured sparse estimation is described, which
shows that the resulting piecewise linear estimate stabilizes the estimation
when compared to traditional sparse inverse problem techniques. This
interpretation also suggests an effective dictionary motivated initialization
for the MAP-EM algorithm. We demonstrate that in a number of image inverse
problems, including inpainting, zooming, and deblurring, the same algorithm
produces either equal, often significantly better, or very small margin worse
results than the best published ones, at a lower computational cost.Comment: 30 page
The AV1 Constrained Directional Enhancement Filter (CDEF)
This paper presents the constrained directional enhancement filter designed
for the AV1 royalty-free video codec. The in-loop filter is based on a
non-linear low-pass filter and is designed for vectorization efficiency. It
takes into account the direction of edges and patterns being filtered. The
filter works by identifying the direction of each block and then adaptively
filtering with a high degree of control over the filter strength along the
direction and across it. The proposed enhancement filter is shown to improve
the quality of the Alliance for Open Media (AOM) AV1 and Thor video codecs in
particular in low complexity configurations.Comment: 5 page
PEA265: Perceptual Assessment of Video Compression Artifacts
The most widely used video encoders share a common hybrid coding framework
that includes block-based motion estimation/compensation and block-based
transform coding. Despite their high coding efficiency, the encoded videos
often exhibit visually annoying artifacts, denoted as Perceivable Encoding
Artifacts (PEAs), which significantly degrade the visual Qualityof- Experience
(QoE) of end users. To monitor and improve visual QoE, it is crucial to develop
subjective and objective measures that can identify and quantify various types
of PEAs. In this work, we make the first attempt to build a large-scale
subjectlabelled database composed of H.265/HEVC compressed videos containing
various PEAs. The database, namely the PEA265 database, includes 4 types of
spatial PEAs (i.e. blurring, blocking, ringing and color bleeding) and 2 types
of temporal PEAs (i.e. flickering and floating). Each containing at least
60,000 image or video patches with positive and negative labels. To objectively
identify these PEAs, we train Convolutional Neural Networks (CNNs) using the
PEA265 database. It appears that state-of-theart ResNeXt is capable of
identifying each type of PEAs with high accuracy. Furthermore, we define PEA
pattern and PEA intensity measures to quantify PEA levels of compressed video
sequence. We believe that the PEA265 database and our findings will benefit the
future development of video quality assessment methods and perceptually
motivated video encoders.Comment: 10 pages,15 figures,4 table
- …