39,950 research outputs found
Weighted universal image compression
We describe a general coding strategy leading to a family of universal image compression systems designed to give good performance in applications where the statistics of the source to be compressed are not available at design time or vary over time or space. The basic approach considered uses a two-stage structure in which the single source code of traditional image compression systems is replaced with a family of codes designed to cover a large class of possible sources. To illustrate this approach, we consider the optimal design and use of two-stage codes containing collections of vector quantizers (weighted universal vector quantization), bit allocations for JPEG-style coding (weighted universal bit allocation), and transform codes (weighted universal transform coding). Further, we demonstrate the benefits to be gained from the inclusion of perceptual distortion measures and optimal parsing. The strategy yields two-stage codes that significantly outperform their single-stage predecessors. On a sequence of medical images, weighted universal vector quantization outperforms entropy coded vector quantization by over 9 dB. On the same data sequence, weighted universal bit allocation outperforms a JPEG-style code by over 2.5 dB. On a collection of mixed test and image data, weighted universal transform coding outperforms a single, data-optimized transform code (which gives performance almost identical to that of JPEG) by over 6 dB
A Generative Model of Natural Texture Surrogates
Natural images can be viewed as patchworks of different textures, where the
local image statistics is roughly stationary within a small neighborhood but
otherwise varies from region to region. In order to model this variability, we
first applied the parametric texture algorithm of Portilla and Simoncelli to
image patches of 64X64 pixels in a large database of natural images such that
each image patch is then described by 655 texture parameters which specify
certain statistics, such as variances and covariances of wavelet coefficients
or coefficient magnitudes within that patch.
To model the statistics of these texture parameters, we then developed
suitable nonlinear transformations of the parameters that allowed us to fit
their joint statistics with a multivariate Gaussian distribution. We find that
the first 200 principal components contain more than 99% of the variance and
are sufficient to generate textures that are perceptually extremely close to
those generated with all 655 components. We demonstrate the usefulness of the
model in several ways: (1) We sample ensembles of texture patches that can be
directly compared to samples of patches from the natural image database and can
to a high degree reproduce their perceptual appearance. (2) We further
developed an image compression algorithm which generates surprisingly accurate
images at bit rates as low as 0.14 bits/pixel. Finally, (3) We demonstrate how
our approach can be used for an efficient and objective evaluation of samples
generated with probabilistic models of natural images.Comment: 34 pages, 9 figure
Query by String word spotting based on character bi-gram indexing
In this paper we propose a segmentation-free query by string word spotting
method. Both the documents and query strings are encoded using a recently
proposed word representa- tion that projects images and strings into a common
atribute space based on a pyramidal histogram of characters(PHOC). These
attribute models are learned using linear SVMs over the Fisher Vector
representation of the images along with the PHOC labels of the corresponding
strings. In order to search through the whole page, document regions are
indexed per character bi- gram using a similar attribute representation. On top
of that, we propose an integral image representation of the document using a
simplified version of the attribute model for efficient computation. Finally we
introduce a re-ranking step in order to boost retrieval performance. We show
state-of-the-art results for segmentation-free query by string word spotting in
single-writer and multi-writer standard datasetsComment: To be published in ICDAR201
Distributed Representation of Geometrically Correlated Images with Compressed Linear Measurements
This paper addresses the problem of distributed coding of images whose
correlation is driven by the motion of objects or positioning of the vision
sensors. It concentrates on the problem where images are encoded with
compressed linear measurements. We propose a geometry-based correlation model
in order to describe the common information in pairs of images. We assume that
the constitutive components of natural images can be captured by visual
features that undergo local transformations (e.g., translation) in different
images. We first identify prominent visual features by computing a sparse
approximation of a reference image with a dictionary of geometric basis
functions. We then pose a regularized optimization problem to estimate the
corresponding features in correlated images given by quantized linear
measurements. The estimated features have to comply with the compressed
information and to represent consistent transformation between images. The
correlation model is given by the relative geometric transformations between
corresponding features. We then propose an efficient joint decoding algorithm
that estimates the compressed images such that they stay consistent with both
the quantized measurements and the correlation model. Experimental results show
that the proposed algorithm effectively estimates the correlation between
images in multi-view datasets. In addition, the proposed algorithm provides
effective decoding performance that compares advantageously to independent
coding solutions as well as state-of-the-art distributed coding schemes based
on disparity learning
Loss-resilient Coding of Texture and Depth for Free-viewpoint Video Conferencing
Free-viewpoint video conferencing allows a participant to observe the remote
3D scene from any freely chosen viewpoint. An intermediate virtual viewpoint
image is commonly synthesized using two pairs of transmitted texture and depth
maps from two neighboring captured viewpoints via depth-image-based rendering
(DIBR). To maintain high quality of synthesized images, it is imperative to
contain the adverse effects of network packet losses that may arise during
texture and depth video transmission. Towards this end, we develop an
integrated approach that exploits the representation redundancy inherent in the
multiple streamed videos a voxel in the 3D scene visible to two captured views
is sampled and coded twice in the two views. In particular, at the receiver we
first develop an error concealment strategy that adaptively blends
corresponding pixels in the two captured views during DIBR, so that pixels from
the more reliable transmitted view are weighted more heavily. We then couple it
with a sender-side optimization of reference picture selection (RPS) during
real-time video coding, so that blocks containing samples of voxels that are
visible in both views are more error-resiliently coded in one view only, given
adaptive blending will erase errors in the other view. Further, synthesized
view distortion sensitivities to texture versus depth errors are analyzed, so
that relative importance of texture and depth code blocks can be computed for
system-wide RPS optimization. Experimental results show that the proposed
scheme can outperform the use of a traditional feedback channel by up to 0.82
dB on average at 8% packet loss rate, and by as much as 3 dB for particular
frames
- …