413 research outputs found
A practical guide and software for analysing pairwise comparison experiments
Most popular strategies to capture subjective judgments from humans involve
the construction of a unidimensional relative measurement scale, representing
order preferences or judgments about a set of objects or conditions. This
information is generally captured by means of direct scoring, either in the
form of a Likert or cardinal scale, or by comparative judgments in pairs or
sets. In this sense, the use of pairwise comparisons is becoming increasingly
popular because of the simplicity of this experimental procedure. However, this
strategy requires non-trivial data analysis to aggregate the comparison ranks
into a quality scale and analyse the results, in order to take full advantage
of the collected data. This paper explains the process of translating pairwise
comparison data into a measurement scale, discusses the benefits and
limitations of such scaling methods and introduces a publicly available
software in Matlab. We improve on existing scaling methods by introducing
outlier analysis, providing methods for computing confidence intervals and
statistical testing and introducing a prior, which reduces estimation error
when the number of observers is low. Most of our examples focus on image
quality assessment.Comment: Code available at https://github.com/mantiuk/pwcm
Recommended from our members
Predicting visible flicker in temporally changing images
Novel display algorithms such as low-persistence displays,
black frame insertion, and temporal resolution multiplexing in-
troduce temporal change into images at 40-180 Hz, on the bound-
ary of the temporal integration of the visual system. This can
lead to flicker, a highly-objectionable artifact known to induce
viewer discomfort. The critical flicker frequency (CFF) alone
does not model this phenomenon well, as flicker sensitivity varies
with contrast, and spatial frequency; a content-aware model is re-
quired. In this paper, we introduce a visual model for predicting
flicker visibility in temporally changing images. The model per-
forms a multi-scale analysis on the difference between consecu-
tive frames, normalizing values with the spatio-temporal contrast
sensitivity function as approximated by the pyramid of visibility.
The output of the model is a 2D detection probability map. We
ran a subjective flicker marking experiment to fit the model pa-
rameters, then analyze the difference between two display algo-
rithms, black frame insertion and temporal resolution multiplex-
ing, to demonstrate the application of our model
Recommended from our members
Transformation Consistency Regularization – A Semi-supervised Paradigm for Image-to-Image Translation
Scarcity of labeled data has motivated the development of semi-supervised
learning methods, which learn from large portions of unlabeled data alongside a
few labeled samples. Consistency Regularization between model's predictions
under different input perturbations, particularly has shown to provide
state-of-the art results in a semi-supervised framework. However, most of these
method have been limited to classification and segmentation applications. We
propose Transformation Consistency Regularization, which delves into a more
challenging setting of image-to-image translation, which remains unexplored by
semi-supervised algorithms. The method introduces a diverse set of geometric
transformations and enforces the model's predictions for unlabeled data to be
invariant to those transformations. We evaluate the efficacy of our algorithm
on three different applications: image colorization, denoising and
super-resolution. Our method is significantly data efficient, requiring only
around 10 - 20% of labeled samples to achieve similar image reconstructions to
its fully-supervised counterpart. Furthermore, we show the effectiveness of our
method in video processing applications, where knowledge from a few frames can
be leveraged to enhance the quality of the rest of the movie
Robust estimation of exposure ratios in multi-exposure image stacks
Merging multi-exposure image stacks into a high dynamic range (HDR) image
requires knowledge of accurate exposure times. When exposure times are
inaccurate, for example, when they are extracted from a camera's EXIF metadata,
the reconstructed HDR images reveal banding artifacts at smooth gradients. To
remedy this, we propose to estimate exposure ratios directly from the input
images. We derive the exposure time estimation as an optimization problem, in
which pixels are selected from pairs of exposures to minimize estimation error
caused by camera noise. When pixel values are represented in the logarithmic
domain, the problem can be solved efficiently using a linear solver. We
demonstrate that the estimation can be easily made robust to pixel misalignment
caused by camera or object motion by collecting pixels from multiple spatial
tiles. The proposed automatic exposure estimation and alignment eliminates
banding artifacts in popular datasets and is essential for applications that
require physically accurate reconstructions, such as measuring the modulation
transfer function of a display. The code for the method is available.Comment: 11 pages, 11 figures, journa
Recommended from our members
Noise-Aware Merging of High Dynamic Range Image Stacks Without Camera Calibration
A near-optimal reconstruction of the radiance of a High Dynamic Range scene
from an exposure stack can be obtained by modeling the camera noise
distribution. The latent radiance is then estimated using Maximum Likelihood
Estimation. But this requires a well-calibrated noise model of the camera,
which is difficult to obtain in practice. We show that an unbiased estimation
of comparable variance can be obtained with a simpler Poisson noise estimator,
which does not require the knowledge of camera-specific noise parameters. We
demonstrate this empirically for four different cameras, ranging from a
smartphone camera to a full-frame mirrorless camera. Our experimental results
are consistent for simulated as well as real images, and across different
camera settings
Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping
Many image enhancement or editing operations, such as forward and inverse
tone mapping or color grading, do not have a unique solution, but instead a
range of solutions, each representing a different style. Despite this, existing
learning-based methods attempt to learn a unique mapping, disregarding this
style. In this work, we show that information about the style can be distilled
from collections of image pairs and encoded into a 2- or 3-dimensional vector.
This gives us not only an efficient representation but also an interpretable
latent space for editing the image style. We represent the global color mapping
between a pair of images as a custom normalizing flow, conditioned on a
polynomial basis of the pixel color. We show that such a network is more
effective than PCA or VAE at encoding image style in low-dimensional space and
lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement
over the state-of-the-art methods.Comment: Published in European Conference on Visual Media Production (CVMP
'22
Exploiting the limitations of spatio-temporal vision for more efficient VR rendering
Increasingly higher virtual reality (VR) display resolutions and
good-quality anti-aliasing make rendering in VR prohibitively expensive.
The generation of these complex frames 90 times per second
in a binocular setup demands substantial computational power.
Wireless transmission of the frames from the GPU to the VR headset
poses another challenge, requiring high-bandwidth dedicated
links
Recommended from our members
The effect of display brightness and viewing distance: A dataset for visually lossless image compression
Visibility of image artifacts depends on the viewing conditions, such as display brightness and distance to the display. However, most image and video quality metrics operate under the assumption of a single standard viewing condition, without considering luminance or distance to
the display. To address this limitation, we isolate brightness and distance as the components impacting the visibility of artifacts and collect a new dataset for visually lossless image compression. The dataset includes images encoded with JPEG andWebP at the quality level that makes compression
artifacts imperceptible to an average observer. The visibility thresholds are collected under two luminance conditions: 10 cd/m2, simulating a dimmed mobile phone, and 220 cd/m2, which is a typical peak luminance of modern computer displays; and two distance conditions:
30 and 60 pixels per visual degree. The dataset was used to evaluate existing image quality and visibility metrics in their ability to consider display brightness and its distance to viewer. We include two deep neural network architectures, proposed to control image compression for visually
lossless coding in our experiments.
</jats:p
- …