304 research outputs found
Learning to compress and search visual data in large-scale systems
The problem of high-dimensional and large-scale representation of visual data
is addressed from an unsupervised learning perspective. The emphasis is put on
discrete representations, where the description length can be measured in bits
and hence the model capacity can be controlled. The algorithmic infrastructure
is developed based on the synthesis and analysis prior models whose
rate-distortion properties, as well as capacity vs. sample complexity
trade-offs are carefully optimized. These models are then extended to
multi-layers, namely the RRQ and the ML-STC frameworks, where the latter is
further evolved as a powerful deep neural network architecture with fast and
sample-efficient training and discrete representations. For the developed
algorithms, three important applications are developed. First, the problem of
large-scale similarity search in retrieval systems is addressed, where a
double-stage solution is proposed leading to faster query times and shorter
database storage. Second, the problem of learned image compression is targeted,
where the proposed models can capture more redundancies from the training
images than the conventional compression codecs. Finally, the proposed
algorithms are used to solve ill-posed inverse problems. In particular, the
problems of image denoising and compressive sensing are addressed with
promising results.Comment: PhD thesis dissertatio
Efficient algorithms for scalable video coding
A scalable video bitstream specifically designed for the needs of various client terminals,
network conditions, and user demands is much desired in current and future video transmission
and storage systems. The scalable extension of the H.264/AVC standard (SVC) has
been developed to satisfy the new challenges posed by heterogeneous environments, as
it permits a single video stream to be decoded fully or partially with variable quality, resolution,
and frame rate in order to adapt to a specific application. This thesis presents
novel improved algorithms for SVC, including: 1) a fast inter-frame and inter-layer coding
mode selection algorithm based on motion activity; 2) a hierarchical fast mode selection
algorithm; 3) a two-part Rate Distortion (RD) model targeting the properties of different
prediction modes for the SVC rate control scheme; and 4) an optimised Mean Absolute
Difference (MAD) prediction model.
The proposed fast inter-frame and inter-layer mode selection algorithm is based on the
empirical observation that a macroblock (MB) with slow movement is more likely to be
best matched by one in the same resolution layer. However, for a macroblock with fast
movement, motion estimation between layers is required. Simulation results show that
the algorithm can reduce the encoding time by up to 40%, with negligible degradation in
RD performance.
The proposed hierarchical fast mode selection scheme comprises four levels and makes
full use of inter-layer, temporal and spatial correlation aswell as the texture information of
each macroblock. Overall, the new technique demonstrates the same coding performance
in terms of picture quality and compression ratio as that of the SVC standard, yet produces
a saving in encoding time of up to 84%. Compared with state-of-the-art SVC fast mode
selection algorithms, the proposed algorithm achieves a superior computational time reduction
under very similar RD performance conditions.
The existing SVC rate distortion model cannot accurately represent the RD properties of
the prediction modes, because it is influenced by the use of inter-layer prediction. A separate
RD model for inter-layer prediction coding in the enhancement layer(s) is therefore
introduced. Overall, the proposed algorithms improve the average PSNR by up to 0.34dB
or produce an average saving in bit rate of up to 7.78%. Furthermore, the control accuracy
is maintained to within 0.07% on average.
As aMADprediction error always exists and cannot be avoided, an optimisedMADprediction
model for the spatial enhancement layers is proposed that considers the MAD from
previous temporal frames and previous spatial frames together, to achieve a more accurateMADprediction.
Simulation results indicate that the proposedMADprediction model
reduces the MAD prediction error by up to 79% compared with the JVT-W043 implementation
Deep Joint Source-Channel Coding for Wireless Image Transmission
We propose a joint source and channel coding (JSCC) technique for wireless image transmission that does not rely on explicit codes for either compression or error correction; instead, it directly maps the image pixel values to the complex-valued channel input symbols. We parameterize the encoder and decoder functions by two convolutional neural networks (CNNs), which are trained jointly, and can be considered as an autoencoder with a non-trainable layer in the middle that represents the noisy communication channel. Our results show that the proposed deep JSCC scheme outperforms digital transmission concatenating JPEG or JPEG2000 compression with a capacity achieving channel code at low signal-to-noise ratio (SNR) and channel bandwidth values in the presence of additive white Gaussian noise (AWGN). More strikingly, deep JSCC does not suffer from the “cliff effect,” and it provides a graceful performance degradation as the channel SNR varies with respect to the SNR value assumed during training. In the case of a slow Rayleigh fading channel, deep JSCC learns noise resilient coded representations and significantly outperforms separation-based digital communication at all SNR and channel bandwidth values
Livrable D4.2 of the PERSEE project : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architecture
51Livrable D4.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D4.2 du projet. Son titre : Représentation et codage 3D - Rapport intermédiaire - Définitions des softs et architectur
A fully embedded two-stage coder for hyperspectral near-lossless compression
This letter proposes a near-lossless coder for hyperspectral images. The coding technique is fully embedded and minimizes the distortion in the l2 norm initially and in the l∞ norm subsequently. Based on a two-stage near-lossless compression scheme, it includes a lossy and a near-lossless layer. The novelties are: the observation of the convergence of the entropy of the residuals in the original domain and in the spectral-spatial transformed domain; and an embedded near-lossless layer. These contributions enable a progressive transmission while optimising both SNR and PAE performance. The embeddedness is accomplished by bitplane encoding plus arithmetic encoding. Experimental results suggest that the proposed method yields a highly competitive coding performance for hyperspectral images, outperforming multi-component JPEG2000 for l∞ norm and pairing its performance for l2 norm, and also outperforming M-CALIC in the near-lossless case -for PAE ≥5-
- …