31 research outputs found
Steerable Discrete Cosine Transform
In image compression, classical block-based separable transforms tend to be
inefficient when image blocks contain arbitrarily shaped discontinuities. For
this reason, transforms incorporating directional information are an appealing
alternative. In this paper, we propose a new approach to this problem, namely a
discrete cosine transform (DCT) that can be steered in any chosen direction.
Such transform, called steerable DCT (SDCT), allows to rotate in a flexible way
pairs of basis vectors, and enables precise matching of directionality in each
image block, achieving improved coding efficiency. The optimal rotation angles
for SDCT can be represented as solution of a suitable rate-distortion (RD)
problem. We propose iterative methods to search such solution, and we develop a
fully fledged image encoder to practically compare our techniques with other
competing transforms. Analytical and numerical results prove that SDCT
outperforms both DCT and state-of-the-art directional transforms
A case study in identifying acceptable bitrates for human face recognition tasks
Face recognition from images or video footage requires a certain level of recorded image quality. This paper derives acceptable bitrates (relating to levels of compression and consequently quality) of footage with human faces, using an industry implementation of the standard H.264/MPEG-4 AVC and the Closed-Circuit Television (CCTV) recording systems on London buses. The London buses application is utilized as a case study for setting up a methodology and implementing suitable data analysis for face recognition from recorded footage, which has been degraded by compression. The majority of CCTV recorders on buses use a proprietary format based on the H.264/MPEG-4 AVC video coding standard, exploiting both spatial and temporal redundancy. Low bitrates are favored in the CCTV industry for saving storage and transmission bandwidth, but they compromise the image usefulness of the recorded imagery. In this context, usefulness is determined by the presence of enough facial information remaining in the compressed image to allow a specialist to recognize a person. The investigation includes four steps: (1) Development of a video dataset representative of typical CCTV bus scenarios. (2) Selection and grouping of video scenes based on local (facial) and global (entire scene) content properties. (3) Psychophysical investigations to identify the key scenes, which are most affected by compression, using an industry implementation of H.264/MPEG-4 AVC. (4) Testing of CCTV recording systems on buses with the key scenes and further psychophysical investigations. The results showed a dependency upon scene content properties. Very dark scenes and scenes with high levels of spatial–temporal busyness were the most challenging to compress, requiring higher bitrates to maintain useful information
Enhanced error-resilient video transport over MIMO systems using multiple descriptions
International audienceExpectation Propagation (Minka, 2001) is a widely successful algorithm for variational inference. EP is an iterative algorithm that can be used to approximate complicated distributions, most often posterior distributions arising in Bayesian settings. Its most typical use is to find a Gaussian approximation to posterior distributions, and in many applications of this type, EP performs extremely well. Surprisingly, despite its widespread use, there are very few theoretical guarantees on Gaussian EP.A basic requirement of statistical inference methods is that they should perform well in the limit of infinite data, and here we show that it is indeed the case for EP. In the classical large data limit, where the Bernstein-von Mises theorem applies, we prove that EP is exact, meaning that it recovers the correct Gaussian posterior. We prove further that in the same limit EP behaves like a simpler algorithm we call averaged-EP (aEP), and in turn aEP behaves similarly to the Newton algorithm. This correspondence yields interesting insights into the dynamic behavior of EP, for example that it may diverge under poor initialization, just like the Newton algorithm. EP is a simple algorithm to state, but a difficult one to study. Our results should facilitate further research into the theoretical properties of this important method
HD-VideoBench: A benchmark for evaluating high definition digital video applications
HD-VideoBench is a benchmark devoted to high definition (HD) digital video processing. It includes a set of video encoders and decoders (Codecs) for the MPEG-2, MPEG-4 and H.264 video standards. The applications were carefully selected taken into account the quality and portability of the code, the representativeness of the video application domain, the availability of high performance optimizations and the distribution under a free license. Additionally, HD-VideoBench defines a set of input sequences and configuration parameters of the video Codecs which are appropriate for the HD video domain.Peer ReviewedPostprint (published version
VideoWall Bench: A Benchmark for Evaluating Hardware Accelerated Video Decoding on Linux
VideoWall Bench is a benchmark script for benchmarking video decoding capabilities using hardware acceleration on Linux. Intel has introduced Video Acceleration API (VA-API) which enabled and provides access for graphics hardware to do hardware acceleration. VA API provides a set of video decoders (Codecs) for the H.264 video standards. Multiple video decoding using video wall methodology is a method of benchmarking that be implemented in this script. Using this method, users can really stress the multiple video decoding capabilities of one platform and at the same time measure processor usage for video decoding process. VideoWall Bench benchmark video decoding performance by measuring processor utilization, memory utilization, total frame rate per second (FPS) and time fluctuation in video decoding process. Additionally, VideoWall Bench also includes set
Towards Hybrid-Optimization Video Coding
Video coding is a mathematical optimization problem of rate and distortion
essentially. To solve this complex optimization problem, two popular video
coding frameworks have been developed: block-based hybrid video coding and
end-to-end learned video coding. If we rethink video coding from the
perspective of optimization, we find that the existing two frameworks represent
two directions of optimization solutions. Block-based hybrid coding represents
the discrete optimization solution because those irrelevant coding modes are
discrete in mathematics. It searches for the best one among multiple starting
points (i.e. modes). However, the search is not efficient enough. On the other
hand, end-to-end learned coding represents the continuous optimization solution
because the gradient descent is based on a continuous function. It optimizes a
group of model parameters efficiently by the numerical algorithm. However,
limited by only one starting point, it is easy to fall into the local optimum.
To better solve the optimization problem, we propose to regard video coding as
a hybrid of the discrete and continuous optimization problem, and use both
search and numerical algorithm to solve it. Our idea is to provide multiple
discrete starting points in the global space and optimize the local optimum
around each point by numerical algorithm efficiently. Finally, we search for
the global optimum among those local optimums. Guided by the hybrid
optimization idea, we design a hybrid optimization video coding framework,
which is built on continuous deep networks entirely and also contains some
discrete modes. We conduct a comprehensive set of experiments. Compared to the
continuous optimization framework, our method outperforms pure learned video
coding methods. Meanwhile, compared to the discrete optimization framework, our
method achieves comparable performance to HEVC reference software HM16.10 in
PSNR