Search CORE

4,150 research outputs found

A Perceptually Based Comparison of Image Similarity Metrics

Author: Russell Richard
Sinha Pawan
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 01/01/2011
Field of study

The assessment of how well one image matches another forms a critical component both of models of human visual processing and of many image analysis systems. Two of the most commonly used norms for quantifying image similarity are L1 and L2, which are specific instances of the Minkowski metric. However, there is often not a principled reason for selecting one norm over the other. One way to address this problem is by examining whether one metric, better than the other, captures the perceptual notion of image similarity. This can be used to derive inferences regarding similarity criteria the human visual system uses, as well as to evaluate and design metrics for use in image-analysis applications. With this goal, we examined perceptual preferences for images retrieved on the basis of the L1 versus the L2 norm. These images were either small fragments without recognizable content, or larger patterns with recognizable content created by vector quantization. In both conditions the participants showed a small but consistent preference for images matched with the L1 metric. These results suggest that, in the domain of natural images of the kind we have used, the L1 metric may better capture human notions of image similarity

CiteSeerX

Gettysburg College

A Generative Model of Natural Texture Surrogates

Author: Bethge Matthias
Das Debapriya
Ludtke Niklas
Theis Lucas
Publication venue
Publication date: 01/05/2015
Field of study

Natural images can be viewed as patchworks of different textures, where the local image statistics is roughly stationary within a small neighborhood but otherwise varies from region to region. In order to model this variability, we first applied the parametric texture algorithm of Portilla and Simoncelli to image patches of 64X64 pixels in a large database of natural images such that each image patch is then described by 655 texture parameters which specify certain statistics, such as variances and covariances of wavelet coefficients or coefficient magnitudes within that patch. To model the statistics of these texture parameters, we then developed suitable nonlinear transformations of the parameters that allowed us to fit their joint statistics with a multivariate Gaussian distribution. We find that the first 200 principal components contain more than 99% of the variance and are sufficient to generate textures that are perceptually extremely close to those generated with all 655 components. We demonstrate the usefulness of the model in several ways: (1) We sample ensembles of texture patches that can be directly compared to samples of patches from the natural image database and can to a high degree reproduce their perceptual appearance. (2) We further developed an image compression algorithm which generates surprisingly accurate images at bit rates as low as 0.14 bits/pixel. Finally, (3) We demonstrate how our approach can be used for an efficient and objective evaluation of samples generated with probabilistic models of natural images.Comment: 34 pages, 9 figure

arXiv.org e-Print Archive

MPG.PuRe

JND-Based Perceptual Video Coding for 4:4:4 Screen Content Data in HEVC

Author: Prangnell Lee
Sanchez Victor
Publication venue
Publication date: 12/02/2018
Field of study

The JCT-VC standardized Screen Content Coding (SCC) extension in the HEVC HM RExt + SCM reference codec offers an impressive coding efficiency performance when compared with HM RExt alone; however, it is not significantly perceptually optimized. For instance, it does not include advanced HVS-based perceptual coding methods, such as JND-based spatiotemporal masking schemes. In this paper, we propose a novel JND-based perceptual video coding technique for HM RExt + SCM. The proposed method is designed to further improve the compression performance of HM RExt + SCM when applied to YCbCr 4:4:4 SC video data. In the proposed technique, luminance masking and chrominance masking are exploited to perceptually adjust the Quantization Step Size (QStep) at the Coding Block (CB) level. Compared with HM RExt 16.10 + SCM 8.0, the proposed method considerably reduces bitrates (Kbps), with a maximum reduction of 48.3%. In addition to this, the subjective evaluations reveal that SC-PAQ achieves visually lossless coding at very low bitrates.Comment: Preprint: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2018

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Information recovery from rank-order encoded images

Author: Furber Steve
sen Bhattacharya Basabdatta
Publication venue
Publication date: 04/03/2008
Field of study

The time to detection of a visual stimulus by the primate eye is recorded at 100 – 150ms. This near instantaneous recognition is in spite of the considerable processing required by the several stages of the visual pathway to recognise and react to a visual scene. How this is achieved is still a matter of speculation. Rank-order codes have been proposed as a means of encoding by the primate eye in the rapid transmission of the initial burst of information from the sensory neurons to the brain. We study the efficiency of rank-order codes in encoding perceptually-important information in an image. VanRullen and Thorpe built a model of the ganglion cell layers of the retina to simulate and study the viability of rank-order as a means of encoding by retinal neurons. We validate their model and quantify the information retrieved from rank-order encoded images in terms of the visually-important information recovered. Towards this goal, we apply the ‘perceptual information preservation algorithm’, proposed by Petrovic and Xydeas after slight modification. We observe a low information recovery due to losses suffered during the rank-order encoding and decoding processes. We propose to minimise these losses to recover maximum information in minimum time from rank-order encoded images. We first maximise information recovery by using the pseudo-inverse of the filter-bank matrix to minimise losses during rankorder decoding. We then apply the biological principle of lateral inhibition to minimise losses during rank-order encoding. In doing so, we propose the Filteroverlap Correction algorithm. To test the perfomance of rank-order codes in a biologically realistic model, we design and simulate a model of the foveal-pit ganglion cells of the retina keeping close to biological parameters. We use this as a rank-order encoder and analyse its performance relative to VanRullen and Thorpe’s retinal model

University of Lincoln Institutional Repository

Multi-Modal Perception for Selective Rendering

Author: Blauert
Bonneel
Cater
Coath
Daly
Driver
Durlach
Fujisaki
Hachisuka
Harvey
Hecht
Hulusic
Hulusic
Kayser
Kendall
Koulieris
Moore
Naylor
Painter
Raghuvanshi
Ramanarayanan
Ramanarayanan
Siltanen
Tsingos
Welch
Yee
Publication venue: 'Wiley'
Publication date: 25/01/2016
Field of study

A major challenge in generating high-fidelity virtual environments (VEs) is to be able to provide realism at interactive rates. The high-fidelity simulation of light and sound is still unachievable in real-time as such physical accuracy is very computationally demanding. Only recently has visual perception been used in high-fidelity rendering to improve performance by a series of novel exploitations; to render parts of the scene that are not currently being attended to by the viewer at a much lower quality without the difference being perceived. This paper investigates the effect spatialised directional sound has on the visual attention of a user towards rendered images. These perceptual artefacts are utilised in selective rendering pipelines via the use of multi-modal maps. The multi-modal maps are tested through psychophysical experiments to examine their applicability to selective rendering algorithms, with a series of fixed cost rendering functions, and are found to perform significantly better than only using image saliency maps that are naively applied to multi-modal virtual environments

Crossref

Birmingham City University Open Access Repository

BCU Open Access

Warwick Research Archives Portal Repository

Bournemouth University Research Online