487 research outputs found
Biologically Inspired Dynamic Textures for Probing Motion Perception
Perception is often described as a predictive process based on an optimal
inference with respect to a generative model. We study here the principled
construction of a generative model specifically crafted to probe motion
perception. In that context, we first provide an axiomatic, biologically-driven
derivation of the model. This model synthesizes random dynamic textures which
are defined by stationary Gaussian distributions obtained by the random
aggregation of warped patterns. Importantly, we show that this model can
equivalently be described as a stochastic partial differential equation. Using
this characterization of motion in images, it allows us to recast motion-energy
models into a principled Bayesian inference framework. Finally, we apply these
textures in order to psychophysically probe speed perception in humans. In this
framework, while the likelihood is derived from the generative model, the prior
is estimated from the observed results and accounts for the perceptual bias in
a principled fashion.Comment: Twenty-ninth Annual Conference on Neural Information Processing
Systems (NIPS), Dec 2015, Montreal, Canad
You Can Mask More For Extremely Low-Bitrate Image Compression
Learned image compression (LIC) methods have experienced significant progress
during recent years. However, these methods are primarily dedicated to
optimizing the rate-distortion (R-D) performance at medium and high bitrates (>
0.1 bits per pixel (bpp)), while research on extremely low bitrates is limited.
Besides, existing methods fail to explicitly explore the image structure and
texture components crucial for image compression, treating them equally
alongside uninformative components in networks. This can cause severe
perceptual quality degradation, especially under low-bitrate scenarios. In this
work, inspired by the success of pre-trained masked autoencoders (MAE) in many
downstream tasks, we propose to rethink its mask sampling strategy from
structure and texture perspectives for high redundancy reduction and
discriminative feature representation, further unleashing the potential of LIC
methods. Therefore, we present a dual-adaptive masking approach (DA-Mask) that
samples visible patches based on the structure and texture distributions of
original images. We combine DA-Mask and pre-trained MAE in masked image
modeling (MIM) as an initial compressor that abstracts informative semantic
context and texture representations. Such a pipeline can well cooperate with
LIC networks to achieve further secondary compression while preserving
promising reconstruction quality. Consequently, we propose a simple yet
effective masked compression model (MCM), the first framework that unifies MIM
and LIC end-to-end for extremely low-bitrate image compression. Extensive
experiments have demonstrated that our approach outperforms recent
state-of-the-art methods in R-D performance, visual quality, and downstream
applications, at very low bitrates. Our code is available at
https://github.com/lianqi1008/MCM.git.Comment: Under revie
IBVC: Interpolation-driven B-frame Video Compression
Learned B-frame video compression aims to adopt bi-directional motion
estimation and motion compensation (MEMC) coding for middle frame
reconstruction. However, previous learned approaches often directly extend
neural P-frame codecs to B-frame relying on bi-directional optical-flow
estimation or video frame interpolation. They suffer from inaccurate quantized
motions and inefficient motion compensation. To address these issues, we
propose a simple yet effective structure called Interpolation-driven B-frame
Video Compression (IBVC). Our approach only involves two major operations:
video frame interpolation and artifact reduction compression. IBVC introduces a
bit-rate free MEMC based on interpolation, which avoids optical-flow
quantization and additional compression distortions. Later, to reduce duplicate
bit-rate consumption and focus on unaligned artifacts, a residual guided
masking encoder is deployed to adaptively select the meaningful contexts with
interpolated multi-scale dependencies. In addition, a conditional
spatio-temporal decoder is proposed to eliminate location errors and artifacts
instead of using MEMC coding in other methods. The experimental results on
B-frame coding demonstrate that IBVC has significant improvements compared to
the relevant state-of-the-art methods. Meanwhile, our approach can save bit
rates compared with the random access (RA) configuration of H.266 (VTM). The
code will be available at https://github.com/ruhig6/IBVC.Comment: Submitted to IEEE TCSV
Texture Structure Analysis
abstract: Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present in the textures. This structure can be expressed in terms of perceived regularity. Our human visual system (HVS) uses the perceived regularity as one of the important pre-attentive cues in low-level image understanding. Similar to the HVS, image processing and computer vision systems can make fast and efficient decisions if they can quantify this regularity automatically. In this work, the problem of quantifying the degree of perceived regularity when looking at an arbitrary texture is introduced and addressed. One key contribution of this work is in proposing an objective no-reference perceptual texture regularity metric based on visual saliency. Other key contributions include an adaptive texture synthesis method based on texture regularity, and a low-complexity reduced-reference visual quality metric for assessing the quality of synthesized textures. In order to use the best performing visual attention model on textures, the performance of the most popular visual attention models to predict the visual saliency on textures is evaluated. Since there is no publicly available database with ground-truth saliency maps on images with exclusive texture content, a new eye-tracking database is systematically built. Using the Visual Saliency Map (VSM) generated by the best visual attention model, the proposed texture regularity metric is computed. The proposed metric is based on the observation that VSM characteristics differ between textures of differing regularity. The proposed texture regularity metric is based on two texture regularity scores, namely a textural similarity score and a spatial distribution score. In order to evaluate the performance of the proposed regularity metric, a texture regularity database called RegTEX, is built as a part of this work. It is shown through subjective testing that the proposed metric has a strong correlation with the Mean Opinion Score (MOS) for the perceived regularity of textures. The proposed method is also shown to be robust to geometric and photometric transformations and outperforms some of the popular texture regularity metrics in predicting the perceived regularity. The impact of the proposed metric to improve the performance of many image-processing applications is also presented. The influence of the perceived texture regularity on the perceptual quality of synthesized textures is demonstrated through building a synthesized textures database named SynTEX. It is shown through subjective testing that textures with different degrees of perceived regularities exhibit different degrees of vulnerability to artifacts resulting from different texture synthesis approaches. This work also proposes an algorithm for adaptively selecting the appropriate texture synthesis method based on the perceived regularity of the original texture. A reduced-reference texture quality metric for texture synthesis is also proposed as part of this work. The metric is based on the change in perceived regularity and the change in perceived granularity between the original and the synthesized textures. The perceived granularity is quantified through a new granularity metric that is proposed in this work. It is shown through subjective testing that the proposed quality metric, using just 2 parameters, has a strong correlation with the MOS for the fidelity of synthesized textures and outperforms the state-of-the-art full-reference quality metrics on 3 different texture databases. Finally, the ability of the proposed regularity metric in predicting the perceived degradation of textures due to compression and blur artifacts is also established.Dissertation/ThesisPh.D. Electrical Engineering 201
- …