21 research outputs found

    Scene-based imperceptible-visible watermarking for HDR video content

    Get PDF
    This paper presents the High Dynamic Range - Imperceptible Visible Watermarking for HDR video content (HDR-IVW-V) based on scene detection for robust copyright protection of HDR videos using a visually imperceptible watermarking methodology. HDR-IVW-V employs scene detection to reduce both computational complexity and undesired visual attention to watermarked regions. Visual imperceptibility is achieved by finding the region of a frame with the highest hiding capacities on which the Human Visual System (HVS) cannot recognize the embedded watermark. The embedded watermark remains visually imperceptible as long as the normal color calibration parameters are held. HDR-IVW-V is evaluated on PQ-encoded HDR video content successfully attaining visual imperceptibility, robustness to tone mapping operations and image quality preservation

    Watermarking of HDR images in the spatial domain with HVS-imperceptibility

    Get PDF
    This paper presents a watermarking method in the spatial domain with HVS-imperceptibility for High Dynamic Range (HDR) images. The proposed method combines the content readability afforded by invisible watermarking with the visual ownership identification afforded by visible watermarking. The HVS-imperceptibility is guaranteed thanks to a Luma Variation Tolerance (LVT) curve, which is associated with the transfer function (TF) used for HDR encoding and provides the information needed to embed an imperceptible watermark in the spatial domain. The LVT curve is based on the inaccuracies between the non-linear digital representation of the linear luminance acquired by an HDR sensor and the brightness perceived by the Human Visual System (HVS) from the linear luminance displayed on an HDR screen. The embedded watermarks remain imperceptible to the HVS as long as the TF is not altered or the normal calibration and colorimetry conditions of the HDR screen remain unchanged. Extensive qualitative and quantitative evaluations on several HDR images encoded by two widely-used TFs confirm the strong HVSimperceptibility capabilities of the method, as well as the robustness of the embedded watermarks to tone mapping, lossy compression, and common signal processing operations

    Perceptual Visibility Model for Temporal Contrast Changes in Periphery

    Get PDF
    Modeling perception is critical for many applications and developments in computer graphics to optimize and evaluate content generation techniques. Most of the work to date has focused on central (foveal) vision. However, this is insufficient for novel wide-field-of-view display devices, such as virtual and augmented reality headsets. Furthermore, the perceptual models proposed for the fovea do not readily extend to the off-center, peripheral visual field, where human perception is drastically different. In this paper, we focus on modeling the temporal aspect of visual perception in the periphery. We present new psychophysical experiments that measure the sensitivity of human observers to different spatio-temporal stimuli across a wide field of view. We use the collected data to build a perceptual model for the visibility of temporal changes at different eccentricities in complex video content. Finally, we discuss, demonstrate, and evaluate several problems that can be addressed using our technique. First, we show how our model enables injecting new content into the periphery without distracting the viewer, and we discuss the link between the model and human attention. Second, we demonstrate how foveated rendering methods can be evaluated and optimized to limit the visibility of temporal aliasing

    MediaSync: Handbook on Multimedia Synchronization

    Get PDF
    This book provides an approachable overview of the most recent advances in the fascinating field of media synchronization (mediasync), gathering contributions from the most representative and influential experts. Understanding the challenges of this field in the current multi-sensory, multi-device, and multi-protocol world is not an easy task. The book revisits the foundations of mediasync, including theoretical frameworks and models, highlights ongoing research efforts, like hybrid broadband broadcast (HBB) delivery and users' perception modeling (i.e., Quality of Experience or QoE), and paves the way for the future (e.g., towards the deployment of multi-sensory and ultra-realistic experiences). Although many advances around mediasync have been devised and deployed, this area of research is getting renewed attention to overcome remaining challenges in the next-generation (heterogeneous and ubiquitous) media ecosystem. Given the significant advances in this research area, its current relevance and the multiple disciplines it involves, the availability of a reference book on mediasync becomes necessary. This book fills the gap in this context. In particular, it addresses key aspects and reviews the most relevant contributions within the mediasync research space, from different perspectives. Mediasync: Handbook on Multimedia Synchronization is the perfect companion for scholars and practitioners that want to acquire strong knowledge about this research area, and also approach the challenges behind ensuring the best mediated experiences, by providing the adequate synchronization between the media elements that constitute these experiences

    An ensemble architecture for forgery detection and localization in digital images

    Get PDF
    Questa tesi presenta un approccio d'insieme unificato - "ensemble" - per il rilevamento e la localizzazione di contraffazioni in immagini digitali. Il focus della ricerca è su due delle più comuni ma efficaci tecniche di contraffazione: "copy-move" e "splicing". L'architettura proposta combina una serie di metodi di rilevamento e localizzazione di manipolazioni per ottenere prestazioni migliori rispetto a metodi utilizzati in modalità "standalone". I principali contributi di questo lavoro sono elencati di seguito. In primo luogo, nel Capitolo 1 e 2 viene presentata un'ampia rassegna dell'attuale stato dell'arte nel rilevamento di manipolazioni ("forgery"), con particolare attenzione agli approcci basati sul deep learning. Un'importante intuizione che ne deriva è la seguente: questi approcci, sebbene promettenti, non possono essere facilmente confrontati in termini di performance perché tipicamente vengono valutati su dataset personalizzati a causa della mancanza di dati annotati con precisione. Inoltre, spesso questi dati non sono resi disponibili pubblicamente. Abbiamo poi progettato un algoritmo di rilevamento di manipolazioni copy-move basato su "keypoint", descritto nel capitolo 3. Rispetto a esistenti approcci simili, abbiamo aggiunto una fase di clustering basato su densità spaziale per filtrare le corrispondenze rumorose dei keypoint. I risultati hanno dimostrato che questo metodo funziona bene su due dataset di riferimento e supera uno dei metodi più citati in letteratura. Nel Capitolo 4 viene proposta una nuova architettura per predire la direzione della luce 3D in una data immagine. Questo approccio sfrutta l'idea di combinare un metodo "data-driven" con un modello di illuminazione fisica, consentendo così di ottenere prestazioni migliori. Al fine di sopperire al problema della scarsità di dati per l'addestramento di architetture di deep learning altamente parametrizzate, in particolare per il compito di scomposizione intrinseca delle immagini, abbiamo sviluppato due algoritmi di generazione dei dati. Questi sono stati utilizzati per produrre due dataset - uno sintetico e uno di immagini reali - con lo scopo di addestrare e valutare il nostro approccio. Il modello di stima della direzione della luce proposto è stato sfruttato in un nuovo approccio di rilevamento di manipolazioni di tipo splicing, discusso nel Capitolo 5, in cui le incoerenze nella direzione della luce tra le diverse regioni dell'immagine vengono utilizzate per evidenziare potenziali attacchi splicing. L'approccio ensemble proposto è descritto nell'ultimo capitolo. Questo include un modulo "FusionForgery" che combina gli output dei metodi "base" proposti in precedenza e assegna un'etichetta binaria (forged vs. original). Nel caso l'immagine sia identificata come contraffatta, il nostro metodo cerca anche di specializzare ulteriormente la decisione tra attacchi splicing o copy-move. In questo secondo caso, viene eseguito anche un tentativo di ricostruire le regioni "sorgente" utilizzate nell'attacco copy-move. Le prestazioni dell'approccio proposto sono state valutate addestrandolo e testandolo su un dataset sintetico, generato da noi, comprendente sia attacchi copy-move che di tipo splicing. L'approccio ensemble supera tutti i singoli metodi "base" in termini di prestazioni, dimostrando la validità della strategia proposta.This thesis presents a unified ensemble approach for forgery detection and localization in digital images. The focus of the research is on two of the most common but effective forgery techniques: copy-move and splicing. The ensemble architecture combines a set of forgery detection and localization methods in order to achieve improved performance with respect to standalone approaches. The main contributions of this work are listed in the following. First, an extensive review of the current state of the art in forgery detection, with a focus on deep learning-based approaches is presented in Chapter 1 and 2. An important insight that is derived is the following: these approaches, although promising, cannot be easily compared in terms of performance because they are typically evaluated on custom datasets due to the lack of precisely annotated data. Also, they are often not publicly available. We then designed a keypoint-based copy-move detection algorithm, which is described in Chapter 3. Compared to previous existing keypoints-based approaches, we added a density-based clustering step to filter out noisy keypoints matches. This method has been demonstrated to perform well on two benchmark datasets and outperforms one of the most cited state-of-the-art methods. In Chapter 4 a novel architecture is proposed to predict the 3D light direction of the light in a given image. This approach leverages the idea of combining, in a data-driven method, a physical illumination model that allows for improved regression performance. In order to fill in the gap of data scarcity for training highly-parameterized deep learning architectures, especially for the task of intrinsic image decomposition, we developed two data generation algorithms that were used to produce two datasets - one synthetic and one of real images - to train and evaluate our approach. The proposed light direction estimation model has then been employed to design a novel splicing detection approach, discussed in Chapter 5, in which light direction inconsistencies between different regions in the image are used to highlight potential splicing attacks. The proposed ensemble scheme for forgery detection is described in the last chapter. It includes a "FusionForgery" module that combines the outputs of the different previously proposed "base" methods and assigns a binary label (forged vs. pristine) to the input image. In the case of forgery prediction, our method also tries to further specialize the decision between splicing and copy-move attacks. If the image is predicted as copy-moved, an attempt to reconstruct the source regions used in the copy-move attack is also done. The performance of the proposed approach has been assessed by training and testing it on a synthetic dataset, generated by us, comprising both copy-move and splicing attacks. The ensemble approach outperforms all of the individual "base" methods, demonstrating the validity of the proposed strategy

    The quality of experience of emerging display technologies

    Get PDF
    As new display technologies emerge and become part of everyday life, the understanding of the visual experience they provide becomes more relevant. The cognition of perception is the most vital component of visual experience; however, it is not the only cognition that contributes to the complex overall experience of the end-user. Expectations can create significant cognitive bias that may even override what the user genuinely perceives. Even if a visualization technology is somewhat novel, expectations can be fuelled by prior experiences gained from using similar displays and, more importantly, even a single word or an acronym may induce serious preconceptions, especially if such word suggests excellence in quality. In this interdisciplinary Ph.D. thesis, the effect of minimal, one-word labels on the Quality of Experience (QoE) is investigated in a series of subjective tests. In the studies carried out on an ultra-high-definition (UHD) display, UHD video contents were directly compared to their HD counterparts, with and without labels explicitly informing the test participants about the resolution of each stimulus. The experiments on High Dynamic Range (HDR) visualization addressed the effect of the word “premium” on the quality aspects of HDR video, and also how this may affect the perceived duration of stalling events. In order to support the findings, additional tests were carried out comparing the stalling detection thresholds of HDR video with conventional Low Dynamic Range (LDR) video. The third emerging technology addressed by this thesis is light field visualization. Due to its novel nature and the lack of comprehensive, exhaustive research on the QoE of light field displays and content parameters at the time of this thesis, instead of investigating the labeling effect, four phases of subjective studies were performed on light field QoE. The first phases started with fundamental research, and the experiments progressed towards the concept and evaluation of the dynamic adaptive streaming of light field video, introduced in the final phase

    Subjective image quality assessment with boosted triplet comparisons.

    Get PDF
    In subjective full-reference image quality assessment, a reference image is distorted at increasing distortion levels. The differences between perceptual image qualities of the reference image and its distorted versions are evaluated, often using degradation category ratings (DCR). However, the DCR has been criticized since differences between rating categories on this ordinal scale might not be perceptually equidistant, and observers may have different understandings of the categories. Pair comparisons (PC) of distorted images, followed by Thurstonian reconstruction of scale values, overcomes these problems. In addition, PC is more sensitive than DCR, and it can provide scale values in fractional, just noticeable difference (JND) units that express a precise perceptional interpretation. Still, the comparison of images of nearly the same quality can be difficult. We introduce boosting techniques embedded in more general triplet comparisons (TC) that increase the sensitivity even more. Boosting amplifies the artefacts of distorted images, enlarges their visual representation by zooming, increases the visibility of the distortions by a flickering effect, or combines some of the above. Experimental results show the effectiveness of boosted TC for seven types of distortion (color diffusion, jitter, high sharpen, JPEG 2000 compression, lens blur, motion blur, multiplicative noise). For our study, we crowdsourced over 1.7 million responses to triplet questions. We give a detailed analysis of the data in terms of scale reconstructions, accuracy, detection rates, and sensitivity gain. Generally, boosting increases the discriminatory power and allows to reduce the number of subjective ratings without sacrificing the accuracy of the resulting relative image quality values. Our technique paves the way to fine-grained image quality datasets, allowing for more distortion levels, yet with high-quality subjective annotations. We also provide the details for Thurstonian scale reconstruction from TC and our annotated dataset, KonFiG-IQA , containing 10 source images, processed using 7 distortion types at 12 or even 30 levels, uniformly spaced over a span of 3 JND units

    Discrete Wavelet Transforms

    Get PDF
    The discrete wavelet transform (DWT) algorithms have a firm position in processing of signals in several areas of research and industry. As DWT provides both octave-scale frequency and spatial timing of the analyzed signal, it is constantly used to solve and treat more and more advanced problems. The present book: Discrete Wavelet Transforms: Algorithms and Applications reviews the recent progress in discrete wavelet transform algorithms and applications. The book covers a wide range of methods (e.g. lifting, shift invariance, multi-scale analysis) for constructing DWTs. The book chapters are organized into four major parts. Part I describes the progress in hardware implementations of the DWT algorithms. Applications include multitone modulation for ADSL and equalization techniques, a scalable architecture for FPGA-implementation, lifting based algorithm for VLSI implementation, comparison between DWT and FFT based OFDM and modified SPIHT codec. Part II addresses image processing algorithms such as multiresolution approach for edge detection, low bit rate image compression, low complexity implementation of CQF wavelets and compression of multi-component images. Part III focuses watermaking DWT algorithms. Finally, Part IV describes shift invariant DWTs, DC lossless property, DWT based analysis and estimation of colored noise and an application of the wavelet Galerkin method. The chapters of the present book consist of both tutorial and highly advanced material. Therefore, the book is intended to be a reference text for graduate students and researchers to obtain state-of-the-art knowledge on specific applications

    Compression, Modeling, and Real-Time Rendering of Realistic Materials and Objects

    Get PDF
    The realism of a scene basically depends on the quality of the geometry, the illumination and the materials that are used. Whereas many sources for the creation of three-dimensional geometry exist and numerous algorithms for the approximation of global illumination were presented, the acquisition and rendering of realistic materials remains a challenging problem. Realistic materials are very important in computer graphics, because they describe the reflectance properties of surfaces, which are based on the interaction of light and matter. In the real world, an enormous diversity of materials can be found, comprising very different properties. One important objective in computer graphics is to understand these processes, to formalize them and to finally simulate them. For this purpose various analytical models do already exist, but their parameterization remains difficult as the number of parameters is usually very high. Also, they fail for very complex materials that occur in the real world. Measured materials, on the other hand, are prone to long acquisition time and to huge input data size. Although very efficient statistical compression algorithms were presented, most of them do not allow for editability, such as altering the diffuse color or mesostructure. In this thesis, a material representation is introduced that makes it possible to edit these features. This makes it possible to re-use the acquisition results in order to easily and quickly create deviations of the original material. These deviations may be subtle, but also substantial, allowing for a wide spectrum of material appearances. The approach presented in this thesis is not based on compression, but on a decomposition of the surface into several materials with different reflection properties. Based on a microfacette model, the light-matter interaction is represented by a function that can be stored in an ordinary two-dimensional texture. Additionally, depth information, local rotations, and the diffuse color are stored in these textures. As a result of the decomposition, some of the original information is inevitably lost, therefore an algorithm for the efficient simulation of subsurface scattering is presented as well. Another contribution of this work is a novel perception-based simplification metric that includes the material of an object. This metric comprises features of the human visual system, for example trichromatic color perception or reduced resolution. The proposed metric allows for a more aggressive simplification in regions where geometric metrics do not simplif

    Multimedia Forensics

    Get PDF
    This book is open access. Media forensics has never been more relevant to societal life. Not only media content represents an ever-increasing share of the data traveling on the net and the preferred communications means for most users, it has also become integral part of most innovative applications in the digital information ecosystem that serves various sectors of society, from the entertainment, to journalism, to politics. Undoubtedly, the advances in deep learning and computational imaging contributed significantly to this outcome. The underlying technologies that drive this trend, however, also pose a profound challenge in establishing trust in what we see, hear, and read, and make media content the preferred target of malicious attacks. In this new threat landscape powered by innovative imaging technologies and sophisticated tools, based on autoencoders and generative adversarial networks, this book fills an important gap. It presents a comprehensive review of state-of-the-art forensics capabilities that relate to media attribution, integrity and authenticity verification, and counter forensics. Its content is developed to provide practitioners, researchers, photo and video enthusiasts, and students a holistic view of the field