9,286 research outputs found
Perceptually-Driven Video Coding with the Daala Video Codec
The Daala project is a royalty-free video codec that attempts to compete with
the best patent-encumbered codecs. Part of our strategy is to replace core
tools of traditional video codecs with alternative approaches, many of them
designed to take perceptual aspects into account, rather than optimizing for
simple metrics like PSNR. This paper documents some of our experiences with
these tools, which ones worked and which did not. We evaluate which tools are
easy to integrate into a more traditional codec design, and show results in the
context of the codec being developed by the Alliance for Open Media.Comment: 19 pages, Proceedings of SPIE Workshop on Applications of Digital
Image Processing (ADIP), 201
Combining open- and closed-loop architectures for H.264/AVC-TO-SVC transcoding
Scalable video coding (SVC) allows encoded bitstreams to be adapted. However, most bitstreams do not incorporate this scalability so bitstreams have to be adapted multiple times to accommodate for varying network conditions or end-user devices. Each adaptation incorporates an additional loss of quality due to transcoding. To overcome this issue, we propose a single transcoding step from H.264/AVC to SVC. Doing so, the resulting bitstream can be freely adapted without any additional quality reduction. Open-loop transcoding architectures can be used for H.264/AVC-to-SVC transcoding with a low complexity, although these architectures suffer from drift artifacts. Closed-loop transcoding, on the other hand, requires a higher complexity. To overcome the drawbacks of both systems, we propose combining both techniques
Coding local and global binary visual features extracted from video sequences
Binary local features represent an effective alternative to real-valued
descriptors, leading to comparable results for many visual analysis tasks,
while being characterized by significantly lower computational complexity and
memory requirements. When dealing with large collections, a more compact
representation based on global features is often preferred, which can be
obtained from local features by means of, e.g., the Bag-of-Visual-Word (BoVW)
model. Several applications, including for example visual sensor networks and
mobile augmented reality, require visual features to be transmitted over a
bandwidth-limited network, thus calling for coding techniques that aim at
reducing the required bit budget, while attaining a target level of efficiency.
In this paper we investigate a coding scheme tailored to both local and global
binary features, which aims at exploiting both spatial and temporal redundancy
by means of intra- and inter-frame coding. In this respect, the proposed coding
scheme can be conveniently adopted to support the Analyze-Then-Compress (ATC)
paradigm. That is, visual features are extracted from the acquired content,
encoded at remote nodes, and finally transmitted to a central controller that
performs visual analysis. This is in contrast with the traditional approach, in
which visual content is acquired at a node, compressed and then sent to a
central unit for further processing, according to the Compress-Then-Analyze
(CTA) paradigm. In this paper we experimentally compare ATC and CTA by means of
rate-efficiency curves in the context of two different visual analysis tasks:
homography estimation and content-based retrieval. Our results show that the
novel ATC paradigm based on the proposed coding primitives can be competitive
with CTA, especially in bandwidth limited scenarios.Comment: submitted to IEEE Transactions on Image Processin
Enabling error-resilient internet broadcasting using motion compensated spatial partitioning and packet FEC for the dirac video codec
Video transmission over the wireless or wired
network require protection from channel errors since compressed video bitstreams are very sensitive to transmission errors because of the use of predictive coding and variable length coding. In this paper, a simple, low complexity and patent free error-resilient coding is proposed. It is based upon the idea of using spatial partitioning on the motion compensated residual frame without employing the transform coefficient coding. The proposed scheme is intended for open source Dirac video codec in order to enable the codec to be used for Internet
broadcasting. By partitioning the wavelet transform coefficients of the motion compensated residual frame into groups and independently processing each group using arithmetic coding and Forward Error Correction (FEC), robustness to transmission errors over the packet erasure
wired network could be achieved. Using the Rate
Compatibles Punctured Code (RCPC) and Turbo Code
(TC) as the FEC, the proposed technique provides
gracefully decreasing perceptual quality over packet loss rates up to 30%. The PSNR performance is much better when compared with the conventional data partitioning only methods. Simulation results show that the use of multiple
partitioning of wavelet coefficient in Dirac can achieve up to 8 dB PSNR gain over its existing un-partitioned method
Predictive Coding For Animation-Based Video Compression
We address the problem of efficiently compressing video for conferencing-type
applications. We build on recent approaches based on image animation, which can
achieve good reconstruction quality at very low bitrate by representing face
motions with a compact set of sparse keypoints. However, these methods encode
video in a frame-by-frame fashion, i.e. each frame is reconstructed from a
reference frame, which limits the reconstruction quality when the bandwidth is
larger. Instead, we propose a predictive coding scheme which uses image
animation as a predictor, and codes the residual with respect to the actual
target frame. The residuals can be in turn coded in a predictive manner, thus
removing efficiently temporal dependencies. Our experiments indicate a
significant bitrate gain, in excess of 70% compared to the HEVC video standard
and over 30% compared to VVC, on a datasetof talking-head videosComment: Accepted paper: ICIP 202
Graph-based transform with weighted self-loops for predictive transform coding based on template matching
This paper introduces the GBT-L, a novel class of Graph-based Transform within the con- text of block-based predictive transform coding. The GBT-L is constructed using a 2D graph with unit edge weights and weighted self-loops in every vertex. The weighted self- loops are selected based on the residual values to be transformed. To avoid signalling any additional information required to compute the inverse GBT-L, we also introduce a coding framework that uses a template-based strategy to predict residual blocks in the pixel and residual domains. Evaluation results on several video frames and medical images, in terms of the percentage of preserved energy and mean square error, show that the GBT-L can outperform the DST, DCT and the Graph-based Separable Transfor
Design of a digital compression technique for shuttle television
The determination of the performance and hardware complexity of data compression algorithms applicable to color television signals, were studied to assess the feasibility of digital compression techniques for shuttle communications applications. For return link communications, it is shown that a nonadaptive two dimensional DPCM technique compresses the bandwidth of field-sequential color TV to about 13 MBPS and requires less than 60 watts of secondary power. For forward link communications, a facsimile coding technique is recommended which provides high resolution slow scan television on a 144 KBPS channel. The onboard decoder requires about 19 watts of secondary power
- …