4,452 research outputs found

    Selecting stimuli parameters for video quality studies based on perceptual similarity distances

    Get PDF
    This work presents a methodology to optimize the selection of multiple parameter levels of an image acquisition, degradation, or post-processing process applied to stimuli intended to be used in a subjective image or video quality assessment (QA) study. It is known that processing parameters (e.g. compression bit-rate) or technical quality measures (e.g. peak signal-to-noise ratio, PSNR) are often non-linearly related to human quality judgment, and the model of either relationship may not be known in advance. Using these approaches to select parameter levels may lead to an inaccurate estimate of the relationship between the parameter and subjective quality judgments – the system’s quality model. To overcome this, we propose a method for modeling the relationship between parameter levels and perceived quality distances using a paired comparison parameter selection procedure in which subjects judge the perceived similarity in quality. Our goal is to enable the selection of evenly sampled parameter levels within the considered quality range for use in a subjective QA study. This approach is tested on two applications: (1) selection of compression levels for laparoscopic surgery video QA study, and (2) selection of dose levels for an interventional X-ray QA study. Subjective scores, obtained from the follow-up single stimulus QA experiments conducted with expert subjects who evaluated the selected bit-rates and dose levels, were roughly equidistant in the perceptual quality space - as intended. These results suggest that a similarity judgment task can help select parameter values corresponding to desired subjective quality levels

    Influence of study design on digital pathology image quality evaluation : the need to define a clinical task

    Get PDF
    Despite the current rapid advance in technologies for whole slide imaging, there is still no scientific consensus on the recommended methodology for image quality assessment of digital pathology slides. For medical images in general, it has been recommended to assess image quality in terms of doctors’ success rates in performing a specific clinical task while using the images (clinical image quality, cIQ). However, digital pathology is a new modality, and already identifying the appropriate task is difficult. In an alternative common approach, humans are asked to do a simpler task such as rating overall image quality (perceived image quality, pIQ), but that involves the risk of nonclinically relevant findings due to an unknown relationship between the pIQ and cIQ. In this study, we explored three different experimental protocols: (1) conducting a clinical task (detecting inclusion bodies), (2) rating image similarity and preference, and (3) rating the overall image quality. Additionally, within protocol 1, overall quality ratings were also collected (task-aware pIQ). The experiments were done by diagnostic veterinary pathologists in the context of evaluating the quality of hematoxylin and eosin-stained digital pathology slides of animal tissue samples under several common image alterations: additive noise, blurring, change in gamma, change in color saturation, and JPG compression. While the size of our experiments was small and prevents drawing strong conclusions, the results suggest the need to define a clinical task. Importantly, the pIQ data collected under protocols 2 and 3 did not always rank the image alterations the same as their cIQ from protocol 1, warning against using conventional pIQ to predict cIQ. At the same time, there was a correlation between the cIQ and task-aware pIQ ratings from protocol 1, suggesting that the clinical experiment context (set by specifying the clinical task) may affect human visual attention and bring focus to their criteria of image quality. Further research is needed to assess whether and for which purposes (e.g., preclinical testing) task-aware pIQ ratings could substitute cIQ for a given clinical task

    An intuitive control space for material appearance

    Get PDF
    Many different techniques for measuring material appearance have been proposed in the last few years. These have produced large public datasets, which have been used for accurate, data-driven appearance modeling. However, although these datasets have allowed us to reach an unprecedented level of realism in visual appearance, editing the captured data remains a challenge. In this paper, we present an intuitive control space for predictable editing of captured BRDF data, which allows for artistic creation of plausible novel material appearances, bypassing the difficulty of acquiring novel samples. We first synthesize novel materials, extending the existing MERL dataset up to 400 mathematically valid BRDFs. We then design a large-scale experiment, gathering 56,000 subjective ratings on the high-level perceptual attributes that best describe our extended dataset of materials. Using these ratings, we build and train networks of radial basis functions to act as functionals mapping the perceptual attributes to an underlying PCA-based representation of BRDFs. We show that our functionals are excellent predictors of the perceived attributes of appearance. Our control space enables many applications, including intuitive material editing of a wide range of visual properties, guidance for gamut mapping, analysis of the correlation between perceptual attributes, or novel appearance similarity metrics. Moreover, our methodology can be used to derive functionals applicable to classic analytic BRDF representations. We release our code and dataset publicly, in order to support and encourage further research in this direction

    A Similarity Measure for Material Appearance

    Get PDF
    We present a model to measure the similarity in appearance between different materials, which correlates with human similarity judgments. We first create a database of 9,000 rendered images depicting objects with varying materials, shape and illumination. We then gather data on perceived similarity from crowdsourced experiments; our analysis of over 114,840 answers suggests that indeed a shared perception of appearance similarity exists. We feed this data to a deep learning architecture with a novel loss function, which learns a feature space for materials that correlates with such perceived appearance similarity. Our evaluation shows that our model outperforms existing metrics. Last, we demonstrate several applications enabled by our metric, including appearance-based search for material suggestions, database visualization, clustering and summarization, and gamut mapping.Comment: 12 pages, 17 figure

    Subjective image quality assessment with boosted triplet comparisons.

    Get PDF
    In subjective full-reference image quality assessment, a reference image is distorted at increasing distortion levels. The differences between perceptual image qualities of the reference image and its distorted versions are evaluated, often using degradation category ratings (DCR). However, the DCR has been criticized since differences between rating categories on this ordinal scale might not be perceptually equidistant, and observers may have different understandings of the categories. Pair comparisons (PC) of distorted images, followed by Thurstonian reconstruction of scale values, overcomes these problems. In addition, PC is more sensitive than DCR, and it can provide scale values in fractional, just noticeable difference (JND) units that express a precise perceptional interpretation. Still, the comparison of images of nearly the same quality can be difficult. We introduce boosting techniques embedded in more general triplet comparisons (TC) that increase the sensitivity even more. Boosting amplifies the artefacts of distorted images, enlarges their visual representation by zooming, increases the visibility of the distortions by a flickering effect, or combines some of the above. Experimental results show the effectiveness of boosted TC for seven types of distortion (color diffusion, jitter, high sharpen, JPEG 2000 compression, lens blur, motion blur, multiplicative noise). For our study, we crowdsourced over 1.7 million responses to triplet questions. We give a detailed analysis of the data in terms of scale reconstructions, accuracy, detection rates, and sensitivity gain. Generally, boosting increases the discriminatory power and allows to reduce the number of subjective ratings without sacrificing the accuracy of the resulting relative image quality values. Our technique paves the way to fine-grained image quality datasets, allowing for more distortion levels, yet with high-quality subjective annotations. We also provide the details for Thurstonian scale reconstruction from TC and our annotated dataset, KonFiG-IQA , containing 10 source images, processed using 7 distortion types at 12 or even 30 levels, uniformly spaced over a span of 3 JND units

    Kuvanlaatukokemuksen arvionnin instrumentit

    Get PDF
    This dissertation describes the instruments available for image quality evaluation, develops new methods for subjective image quality evaluation and provides image and video databases for the assessment and development of image quality assessment (IQA) algorithms. The contributions of the thesis are based on six original publications. The first publication introduced the VQone toolbox for subjective image quality evaluation. It created a platform for free-form experimentation with standardized image quality methods and was the foundation for later studies. The second publication focused on the dilemma of reference in subjective experiments by proposing a new method for image quality evaluation: the absolute category rating with dynamic reference (ACR-DR). The third publication presented a database (CID2013) in which 480 images were evaluated by 188 observers using the ACR-DR method proposed in the prior publication. Providing databases of image files along with their quality ratings is essential in the field of IQA algorithm development. The fourth publication introduced a video database (CVD2014) based on having 210 observers rate 234 video clips. The temporal aspect of the stimuli creates peculiar artifacts and degradations, as well as challenges to experimental design and video quality assessment (VQA) algorithms. When the CID2013 and CVD2014 databases were published, most state-of-the-art I/VQAs had been trained on and tested against databases created by degrading an original image or video with a single distortion at a time. The novel aspect of CID2013 and CVD2014 was that they consisted of multiple concurrent distortions. To facilitate communication and understanding among professionals in various fields of image quality as well as among non-professionals, an attribute lexicon of image quality, the image quality wheel, was presented in the fifth publication of this thesis. Reference wheels and terminology lexicons have a long tradition in sensory evaluation contexts, such as taste experience studies, where they are used to facilitate communication among interested stakeholders; however, such an approach has not been common in visual experience domains, especially in studies on image quality. The sixth publication examined how the free descriptions given by the observers influenced the ratings of the images. Understanding how various elements, such as perceived sharpness and naturalness, affect subjective image quality can help to understand the decision-making processes behind image quality evaluation. Knowing the impact of each preferential attribute can then be used for I/VQA algorithm development; certain I/VQA algorithms already incorporate low-level human visual system (HVS) models in their algorithms.Väitöskirja tarkastelee ja kehittää uusia kuvanlaadun arvioinnin menetelmiä, sekä tarjoaa kuva- ja videotietokantoja kuvanlaadun arviointialgoritmien (IQA) testaamiseen ja kehittämiseen. Se, mikä koetaan kauniina ja miellyttävänä, on psykologisesti kiinnostava kysymys. Työllä on myös merkitystä teollisuuteen kameroiden kuvanlaadun kehittämisessä. Väitöskirja sisältää kuusi julkaisua, joissa tarkastellaan aihetta eri näkökulmista. I. julkaisussa kehitettiin sovellus keräämään ihmisten antamia arvioita esitetyistä kuvista tutkijoiden vapaaseen käyttöön. Se antoi mahdollisuuden testata standardoituja kuvanlaadun arviointiin kehitettyjä menetelmiä ja kehittää niiden pohjalta myös uusia menetelmiä luoden perustan myöhemmille tutkimuksille. II. julkaisussa kehitettiin uusi kuvanlaadun arviointimenetelmä. Menetelmä hyödyntää sarjallista kuvien esitystapaa, jolla muodostettiin henkilöille mielikuva kuvien laatuvaihtelusta ennen varsinaista arviointia. Tämän todettiin vähentävän tulosten hajontaa ja erottelevan pienempiä kuvanlaatueroja. III. julkaisussa kuvaillaan tietokanta, jossa on 188 henkilön 480 kuvasta antamat laatuarviot ja niihin liittyvät kuvatiedostot. Tietokannat ovat arvokas työkalu pyrittäessä kehittämään algoritmeja kuvanlaadun automaattiseen arvosteluun. Niitä tarvitaan mm. opetusmateriaalina tekoälyyn pohjautuvien algoritmien kehityksessä sekä vertailtaessa eri algoritmien suorituskykyä toisiinsa. Mitä paremmin algoritmin tuottama ennuste korreloi ihmisten antamiin laatuarvioihin, sen parempi suorituskyky sillä voidaan sanoa olevan. IV. julkaisussa esitellään tietokanta, jossa on 210 henkilön 234 videoleikkeestä tekemät laatuarviot ja niihin liittyvät videotiedostot. Ajallisen ulottuvuuden vuoksi videoärsykkeiden virheet ovat erilaisia kuin kuvissa, mikä tuo omat haasteensa videoiden laatua arvioiville algoritmeille (VQA). Aikaisempien tietokantojen ärsykkeet on muodostettu esimerkiksi sumentamalla yksittäistä kuvaa asteittain, jolloin ne sisältävät vain yksiulotteisia vääristymiä. Nyt esitetyt tietokannat poikkeavat aikaisemmista ja sisältävät useita samanaikaisia vääristymistä, joiden interaktio kuvanlaadulle voi olla merkittävää. V. julkaisussa esitellään kuvanlaatuympyrä (image quality wheel). Se on kuvanlaadun käsitteiden sanasto, joka on kerätty analysoimalla 146 henkilön tuottamat 39 415 kuvanlaadun sanallista kuvausta. Sanastoilla on pitkät perinteet aistinvaraisen arvioinnin tutkimusperinteessä, mutta niitä ei ole aikaisemmin kehitetty kuvanlaadulle. VI. tutkimuksessa tutkittiin, kuinka arvioitsijoiden antamat käsitteet vaikuttavat kuvien laadun arviointiin. Esimerkiksi kuvien arvioitu terävyys tai luonnollisuus auttaa ymmärtämään laadunarvioinnin taustalla olevia päätöksentekoprosesseja. Tietoa voidaan käyttää esimerkiksi kuvan- ja videonlaadun arviointialgoritmien (I/VQA) kehitystyössä

    Content-aware objective video quality assessment

    Get PDF
    Since the end-user of video-based systems is often a human observer, prediction of user-perceived video quality (PVQ) is an important task for increasing the user satisfaction. Despite the large variety of objective video quality measures (VQMs), their lack of generalizability remains a problem. This is mainly due to the strong dependency between PVQ and video content. Although this problem is well known, few existing VQMs directly account for the influence of video content on PVQ. Recently, we proposed a method to predict PVQ by introducing relevant video content features in the computation of video distortion measures. The method is based on analyzing the level of spatiotemporal activity in the video and using those as parameters of the anthropomorphic video distortion models. We focus on the experimental evaluation of the proposed methodology based on a total of five public databases, four different objective VQMs, and 105 content related indexes. Additionally, relying on the proposed method, we introduce an approach for selecting the levels of video distortions for the purpose of subjective quality assessment studies. Our results suggest that when adequately combined with content related indexes, even very simple distortion measures (e.g., peak signal to noise ratio) are able to achieve high performance, i.e., high correlation between the VQM and the PVQ. In particular, we have found that by incorporating video content features, it is possible to increase the performance of the VQM by up to 20% relative to its noncontent-aware baseline
    corecore