108 research outputs found
A practical guide and software for analysing pairwise comparison experiments
Most popular strategies to capture subjective judgments from humans involve
the construction of a unidimensional relative measurement scale, representing
order preferences or judgments about a set of objects or conditions. This
information is generally captured by means of direct scoring, either in the
form of a Likert or cardinal scale, or by comparative judgments in pairs or
sets. In this sense, the use of pairwise comparisons is becoming increasingly
popular because of the simplicity of this experimental procedure. However, this
strategy requires non-trivial data analysis to aggregate the comparison ranks
into a quality scale and analyse the results, in order to take full advantage
of the collected data. This paper explains the process of translating pairwise
comparison data into a measurement scale, discusses the benefits and
limitations of such scaling methods and introduces a publicly available
software in Matlab. We improve on existing scaling methods by introducing
outlier analysis, providing methods for computing confidence intervals and
statistical testing and introducing a prior, which reduces estimation error
when the number of observers is low. Most of our examples focus on image
quality assessment.Comment: Code available at https://github.com/mantiuk/pwcm
Recommended from our members
Predicting visible flicker in temporally changing images
Novel display algorithms such as low-persistence displays,
black frame insertion, and temporal resolution multiplexing in-
troduce temporal change into images at 40-180 Hz, on the bound-
ary of the temporal integration of the visual system. This can
lead to flicker, a highly-objectionable artifact known to induce
viewer discomfort. The critical flicker frequency (CFF) alone
does not model this phenomenon well, as flicker sensitivity varies
with contrast, and spatial frequency; a content-aware model is re-
quired. In this paper, we introduce a visual model for predicting
flicker visibility in temporally changing images. The model per-
forms a multi-scale analysis on the difference between consecu-
tive frames, normalizing values with the spatio-temporal contrast
sensitivity function as approximated by the pyramid of visibility.
The output of the model is a 2D detection probability map. We
ran a subjective flicker marking experiment to fit the model pa-
rameters, then analyze the difference between two display algo-
rithms, black frame insertion and temporal resolution multiplex-
ing, to demonstrate the application of our model
Recommended from our members
Transformation Consistency Regularization – A Semi-supervised Paradigm for Image-to-Image Translation
Scarcity of labeled data has motivated the development of semi-supervised
learning methods, which learn from large portions of unlabeled data alongside a
few labeled samples. Consistency Regularization between model's predictions
under different input perturbations, particularly has shown to provide
state-of-the art results in a semi-supervised framework. However, most of these
method have been limited to classification and segmentation applications. We
propose Transformation Consistency Regularization, which delves into a more
challenging setting of image-to-image translation, which remains unexplored by
semi-supervised algorithms. The method introduces a diverse set of geometric
transformations and enforces the model's predictions for unlabeled data to be
invariant to those transformations. We evaluate the efficacy of our algorithm
on three different applications: image colorization, denoising and
super-resolution. Our method is significantly data efficient, requiring only
around 10 - 20% of labeled samples to achieve similar image reconstructions to
its fully-supervised counterpart. Furthermore, we show the effectiveness of our
method in video processing applications, where knowledge from a few frames can
be leveraged to enhance the quality of the rest of the movie
High-fidelity imaging : the computational models of the human visual system in high dynamic range video compression, visible difference prediction and image processing
As new displays and cameras offer enhanced color capabilities, there is a need to extend the precision of digital content. High Dynamic Range (HDR) imaging encodes images and video with higher than normal bit-depth precision, enabling representation of the complete color gamut and the full visible range of luminance. This thesis addresses three problems of HDR imaging: the measurement of visible distortions in HDR images, lossy compression for HDR video, and artifact-free image processing. To measure distortions in HDR images, we develop a visual difference predictor for HDR images that is based on a computational model of the human visual system. To address the problem of HDR image encoding and compression, we derive a perceptually motivated color space for HDR pixels that can efficiently encode all perceivable colors and distinguishable shades of brightness. We use the derived color space to extend the MPEG-4 video compression standard for encoding HDR movie sequences. We also propose a backward-compatible HDR MPEG compression algorithm that encodes both a low-dynamic range and an HDR video sequence into a single MPEG stream. Finally, we propose a framework for image processing in the contrast domain. The framework transforms an image into multi-resolution physical contrast images (maps), which are then rescaled in just-noticeable-difference (JND) units. The application of the framework is demonstrated with a contrast-enhancing tone mapping and a color to gray conversion that preserves color saliency.Aktuelle Innovationen in der Farbverarbeitung bei Bildschirmen und Kameras erzwingen eine Präzisionserweiterung bei digitalen Medien. High Dynamic Range (HDR) kodieren Bilder und Video mit einer grösseren Bittiefe pro Pixel, und ermöglichen damit die Darstellung des kompletten Farbraums und aller sichtbaren Helligkeitswerte. Diese Arbeit konzentriert sich auf drei Probleme in der HDR-Verarbeitung: Messung von für den Menschen störenden Fehlern in HDR-Bildern, verlustbehaftete Kompression von HDR-Video, und visuell verlustfreie HDR-Bildverarbeitung. Die Messung von HDR-Bildfehlern geschieht mittels einer Vorhersage von sichtbaren Unterschieden zweier HDR-Bilder. Die Vorhersage basiert dabei auf einer Modellierung der menschlichen Sehens. Wir addressieren die Kompression und Kodierung von HDR-Bildern mit der Ableitung eines perzeptuellen Farbraums für HDR-Pixel, der alle wahrnehmbaren Farben und deren unterscheidbaren Helligkeitsnuancen effizient abbildet. Danach verwenden wir diesen Farbraum für die Erweiterung des MPEG-4 Videokompressionsstandards, welcher sich hinfort auch für die Kodierung von HDR-Videosequenzen eignet. Wir unterbreiten weiters eine rückwärts-kompatible MPEG-Kompression von HDR-Material, welche die übliche YUV-Bildsequenz zusammen mit dessen HDRVersion in einen gemeinsamen MPEG-Strom bettet. Abschliessend erklären wir unser Framework zur Bildverarbeitung in der Kontrastdomäne. Das Framework transformiert Bilder in mehrere physikalische Kontrastauflösungen, um sie danach in Einheiten von just-noticeable-difference (JND, noch erkennbarem Unterschied) zu reskalieren. Wir demonstrieren den Nutzen dieses Frameworks anhand von einem kontrastverstärkenden Tone Mapping-Verfahren und einer Graukonvertierung, die die urspr ünglichen Farbkontraste bestmöglich beibehält
Recommended from our members
Noise-Aware Merging of High Dynamic Range Image Stacks Without Camera Calibration
A near-optimal reconstruction of the radiance of a High Dynamic Range scene
from an exposure stack can be obtained by modeling the camera noise
distribution. The latent radiance is then estimated using Maximum Likelihood
Estimation. But this requires a well-calibrated noise model of the camera,
which is difficult to obtain in practice. We show that an unbiased estimation
of comparable variance can be obtained with a simpler Poisson noise estimator,
which does not require the knowledge of camera-specific noise parameters. We
demonstrate this empirically for four different cameras, ranging from a
smartphone camera to a full-frame mirrorless camera. Our experimental results
are consistent for simulated as well as real images, and across different
camera settings
Distilling Style from Image Pairs for Global Forward and Inverse Tone Mapping
Many image enhancement or editing operations, such as forward and inverse
tone mapping or color grading, do not have a unique solution, but instead a
range of solutions, each representing a different style. Despite this, existing
learning-based methods attempt to learn a unique mapping, disregarding this
style. In this work, we show that information about the style can be distilled
from collections of image pairs and encoded into a 2- or 3-dimensional vector.
This gives us not only an efficient representation but also an interpretable
latent space for editing the image style. We represent the global color mapping
between a pair of images as a custom normalizing flow, conditioned on a
polynomial basis of the pixel color. We show that such a network is more
effective than PCA or VAE at encoding image style in low-dimensional space and
lets us obtain an accuracy close to 40 dB, which is about 7-10 dB improvement
over the state-of-the-art methods.Comment: Published in European Conference on Visual Media Production (CVMP
'22
Single-frame Regularization for Temporally Stable CNNs
Convolutional neural networks (CNNs) can model complicated non-linear
relations between images. However, they are notoriously sensitive to small
changes in the input. Most CNNs trained to describe image-to-image mappings
generate temporally unstable results when applied to video sequences, leading
to flickering artifacts and other inconsistencies over time. In order to use
CNNs for video material, previous methods have relied on estimating dense
frame-to-frame motion information (optical flow) in the training and/or the
inference phase, or by exploring recurrent learning structures. We take a
different approach to the problem, posing temporal stability as a
regularization of the cost function. The regularization is formulated to
account for different types of motion that can occur between frames, so that
temporally stable CNNs can be trained without the need for video material or
expensive motion estimation. The training can be performed as a fine-tuning
operation, without architectural modifications of the CNN. Our evaluation shows
that the training strategy leads to large improvements in temporal smoothness.
Moreover, for small datasets the regularization can help in boosting the
generalization performance to a much larger extent than what is possible with
na\"ive augmentation strategies
HDR-VDP-3: A multi-metric for predicting image differences, quality and contrast distortions in high dynamic range and regular content
High-Dynamic-Range Visual-Difference-Predictor version 3, or HDR-VDP-3, is a
visual metric that can fulfill several tasks, such as full-reference
image/video quality assessment, prediction of visual differences between a pair
of images, or prediction of contrast distortions. Here we present a high-level
overview of the metric, position it with respect to related work, explain the
main differences compared to version 2.2, and describe how the metric was
adapted for the HDR Video Quality Measurement Grand Challenge 2023
- …