19 research outputs found

    Méthodes de tatouage robuste pour la protection de l imagerie numerique 3D

    Get PDF
    La multiplication des contenus stéréoscopique augmente les risques de piratage numérique. La solution technologique par tatouage relève ce défi. En pratique, le défi d une approche de tatouage est d'atteindre l équilibre fonctionnel entre la transparence, la robustesse, la quantité d information insérée et le coût de calcul. Tandis que la capture et l'affichage du contenu 3D ne sont fondées que sur les deux vues gauche/droite, des représentations alternatives, comme les cartes de disparité devrait également être envisagée lors de la transmission/stockage. Une étude spécifique sur le domaine d insertion optimale devient alors nécessaire. Cette thèse aborde les défis mentionnés ci-dessus. Tout d'abord, une nouvelle carte de disparité (3D video-New Three Step Search- 3DV-SNSL) est développée. Les performances des 3DV-NTSS ont été évaluées en termes de qualité visuelle de l'image reconstruite et coût de calcul. En comparaison avec l'état de l'art (NTSS et FS-MPEG) des gains moyens de 2dB en PSNR et 0,1 en SSIM sont obtenus. Le coût de calcul est réduit par un facteur moyen entre 1,3 et 13. Deuxièmement, une étude comparative sur les principales classes héritées des méthodes de tatouage 2D et de leurs domaines d'insertion optimales connexes est effectuée. Quatre méthodes d'insertion appartenant aux familles SS, SI et hybride (Fast-IProtect) sont considérées. Les expériences ont mis en évidence que Fast-IProtect effectué dans la nouvelle carte de disparité (3DV-NTSS) serait suffisamment générique afin de servir une grande variété d'applications. La pertinence statistique des résultats est donnée par les limites de confiance de 95% et leurs erreurs relatives inférieurs er <0.1The explosion in stereoscopic video distribution increases the concerns over its copyright protection. Watermarking can be considered as the most flexible property right protection technology. The watermarking applicative issue is to reach the trade-off between the properties of transparency, robustness, data payload and computational cost. While the capturing and displaying of the 3D content are solely based on the two left/right views, some alternative representations, like the disparity maps should also be considered during transmission/storage. A specific study on the optimal (with respect to the above-mentioned properties) insertion domain is also required. The present thesis tackles the above-mentioned challenges. First, a new disparity map (3D video-New Three Step Search - 3DV-NTSS) is designed. The performances of the 3DV-NTSS were evaluated in terms of visual quality of the reconstructed image and computational cost. When compared with state of the art methods (NTSS and FS-MPEG) average gains of 2dB in PSNR and 0.1 in SSIM are obtained. The computational cost is reduced by average factors between 1.3 and 13. Second, a comparative study on the main classes of 2D inherited watermarking methods and on their related optimal insertion domains is carried out. Four insertion methods are considered; they belong to the SS, SI and hybrid (Fast-IProtect) families. The experiments brought to light that the Fast-IProtect performed in the new disparity map domain (3DV-NTSS) would be generic enough so as to serve a large variety of applications. The statistical relevance of the results is given by the 95% confidence limits and their underlying relative errors lower than er<0.1EVRY-INT (912282302) / SudocSudocFranceF

    A blind stereoscopic image quality evaluator with segmented stacked autoencoders considering the whole visual perception route

    Get PDF
    Most of the current blind stereoscopic image quality assessment (SIQA) algorithms cannot show reliable accuracy. One reason is that they do not have the deep architectures and the other reason is that they are designed on the relatively weak biological basis, compared with findings on human visual system (HVS). In this paper, we propose a Deep Edge and COlor Signal INtegrity Evaluator (DECOSINE) based on the whole visual perception route from eyes to the frontal lobe, and especially focus on edge and color signal processing in retinal ganglion cells (RGC) and lateral geniculate nucleus (LGN). Furthermore, to model the complex and deep structure of the visual cortex, Segmented Stacked Auto-encoder (S-SAE) is used, which has not utilized for SIQA before. The utilization of the S-SAE complements weakness of deep learning-based SIQA metrics that require a very long training time. Experiments are conducted on popular SIQA databases, and the superiority of DECOSINE in terms of prediction accuracy and monotonicity is proved. The experimental results show that our model about the whole visual perception route and utilization of S-SAE are effective for SIQA

    Reduced reference image and video quality assessments: review of methods

    Get PDF
    With the growing demand for image and video-based applications, the requirements of consistent quality assessment metrics of image and video have increased. Different approaches have been proposed in the literature to estimate the perceptual quality of images and videos. These approaches can be divided into three main categories; full reference (FR), reduced reference (RR) and no-reference (NR). In RR methods, instead of providing the original image or video as a reference, we need to provide certain features (i.e., texture, edges, etc.) of the original image or video for quality assessment. During the last decade, RR-based quality assessment has been a popular research area for a variety of applications such as social media, online games, and video streaming. In this paper, we present review and classification of the latest research work on RR-based image and video quality assessment. We have also summarized different databases used in the field of 2D and 3D image and video quality assessment. This paper would be helpful for specialists and researchers to stay well-informed about recent progress of RR-based image and video quality assessment. The review and classification presented in this paper will also be useful to gain understanding of multimedia quality assessment and state-of-the-art approaches used for the analysis. In addition, it will help the reader select appropriate quality assessment methods and parameters for their respective applications

    Cubic-panorama image dataset analysis for storage and transmission

    Full text link

    Quality of Experience in Immersive Video Technologies

    Get PDF
    Over the last decades, several technological revolutions have impacted the television industry, such as the shifts from black & white to color and from standard to high-definition. Nevertheless, further considerable improvements can still be achieved to provide a better multimedia experience, for example with ultra-high-definition, high dynamic range & wide color gamut, or 3D. These so-called immersive technologies aim at providing better, more realistic, and emotionally stronger experiences. To measure quality of experience (QoE), subjective evaluation is the ultimate means since it relies on a pool of human subjects. However, reliable and meaningful results can only be obtained if experiments are properly designed and conducted following a strict methodology. In this thesis, we build a rigorous framework for subjective evaluation of new types of image and video content. We propose different procedures and analysis tools for measuring QoE in immersive technologies. As immersive technologies capture more information than conventional technologies, they have the ability to provide more details, enhanced depth perception, as well as better color, contrast, and brightness. To measure the impact of immersive technologies on the viewersâ QoE, we apply the proposed framework for designing experiments and analyzing collected subjectsâ ratings. We also analyze eye movements to study human visual attention during immersive content playback. Since immersive content carries more information than conventional content, efficient compression algorithms are needed for storage and transmission using existing infrastructures. To determine the required bandwidth for high-quality transmission of immersive content, we use the proposed framework to conduct meticulous evaluations of recent image and video codecs in the context of immersive technologies. Subjective evaluation is time consuming, expensive, and is not always feasible. Consequently, researchers have developed objective metrics to automatically predict quality. To measure the performance of objective metrics in assessing immersive content quality, we perform several in-depth benchmarks of state-of-the-art and commonly used objective metrics. For this aim, we use ground truth quality scores, which are collected under our subjective evaluation framework. To improve QoE, we propose different systems for stereoscopic and autostereoscopic 3D displays in particular. The proposed systems can help reducing the artifacts generated at the visualization stage, which impact picture quality, depth quality, and visual comfort. To demonstrate the effectiveness of these systems, we use the proposed framework to measure viewersâ preference between these systems and standard 2D & 3D modes. In summary, this thesis tackles the problems of measuring, predicting, and improving QoE in immersive technologies. To address these problems, we build a rigorous framework and we apply it through several in-depth investigations. We put essential concepts of multimedia QoE under this framework. These concepts not only are of fundamental nature, but also have shown their impact in very practical applications. In particular, the JPEG, MPEG, and VCEG standardization bodies have adopted these concepts to select technologies that were proposed for standardization and to validate the resulting standards in terms of compression efficiency

    On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator

    Get PDF
    Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise

    3D-in-2D Displays for ATC.

    Get PDF
    This paper reports on the efforts and accomplishments of the 3D-in-2D Displays for ATC project at the end of Year 1. We describe the invention of 10 novel 3D/2D visualisations that were mostly implemented in the Augmented Reality ARToolkit. These prototype implementations of visualisation and interaction elements can be viewed on the accompanying video. We have identified six candidate design concepts which we will further research and develop. These designs correspond with the early feasibility studies stage of maturity as defined by the NASA Technology Readiness Level framework. We developed the Combination Display Framework from a review of the literature, and used it for analysing display designs in terms of display technique used and how they are combined. The insights we gained from this framework then guided our inventions and the human-centered innovation process we use to iteratively invent. Our designs are based on an understanding of user work practices. We also developed a simple ATC simulator that we used for rapid experimentation and evaluation of design ideas. We expect that if this project continues, the effort in Year 2 and 3 will be focus on maturing the concepts and employment in a operational laboratory settings

    Estimating Head Measurements from 3D Point Clouds

    Get PDF
    Maße menschlicher Köpfe sind unter anderem nützlich für die Ergonomie, die Akustik, die Medizin, Computer Vision sowie Computergrafik. Solche Maße werden üblicherweise gänzlich oder teilweise manuell gewonnen, was ein umständliches Verfahren darstellt, da die Genauigkeit von der Kompetenz der Person abhängt, die diese Messungen vornimmt. Darüber hinaus enthalten manuell erfasste Daten weniger Informationen, von denen neue Maße abgeleitet werden können, wenn das Subjekt nicht länger verfügbar ist. Um diese Nachteile wettzumachen, wurde ein Verfahren entwickelt, das in diesem Manuskript vorgestellt wird, um automatisch Maße aus 3D Punktwolken zu bestimmen, da diese eine langfristige Repräsentation von Menschen darstellen. Diese 3D Punktwolken wurden mit dem ASUS Xtion Pro Live RGB-D Sensor und KinFu (der open-source Implementierung von KinectFusion) aufgenommen. Es werden sowohl qualitative als auch quantitative Auswertungen der gewonnenen Maße präsentiert. Weiterhin wurde die Umsetzbarkeit des entwickelten Verfahrens anhand einer Fallstudie beurteilt, in der die gewonnenen Maße genutzt wurden, um den Einfluss von anthropometrischen Daten auf die Berechung der interauralen Zeitdifferenz zu schätzen. In Anbetracht der vielversprechenden Ergebnisse der Bestimmung von Maßen aus 3D Modellen, die mit dem Asus Xtion Pro Live Sensor und KinFu erstellt wurden, (sowie der Ergebnisse aus der Literatur) und der Entwicklung neuer RGB-D Sensoren, wird außerdem eine Studie des Einflusses von sieben verschiedenen RGB-D Sensoren auf die Rekonstruktion mittels KinFu dargestellt. Diese Studie enthält qualitative und quantitative Auswertungen von Rekonstruktionen vier verschiedener Objekte, die in unterschiedlichen Distanzen von 40 cm bis 120 cm aufgenommen wurden. Diese Spanne wurde anhand der Reichweite der Sensoren gewählt. Des Weiteren ist eine Sammlung der erhaltenen Rekonstruktionen als Datensatz verfügbar unter http://uni-tuebingen.de/en/138898.Human head measurements are valuable in ergonomics, acoustics, medicine, computer vision, and computer graphics, among other fields. Such measurements are usually obtained using entirely or partially manual tasks, which is a cumbersome practice since the level of accuracy depends on the expertise of the person that takes the measurements. Moreover, manually acquired measurements contain less information from which new measurements can be deduced when the subject is no longer accessible. Therefore, in order to overcome these disadvantages, an approach to automatically estimate measurements from 3D point clouds, which are long-term representations of humans, has been developed and is described in the presented manuscript. The 3D point clouds were acquired using an RGBD sensor Asus Xtion Pro Live and KinFu (open-source implementation of KinectFusion). Qualitative and quantitative evaluations of the estimated measurements are presented. Furthermore, the feasibility of the developed approach was evaluated through a case study in which the estimated measurements were used to appraise the influence of anthropometric data on the computation of the interaural time difference. Considering the promising results obtained from the estimation of measurements from 3D models acquired with the sensor Asus Xtion Pro Live and KinFu (plus the results reported in the literature) and the development of new RGBD sensors, a study of the influence of seven different RGBD sensors on the reconstruction obtained with KinFu is also presented. This study contains qualitative and quantitative evaluations of reconstructions of four diverse objects captured at different distances that range from 40 cm to 120 cm. Such range was established according to the operational range of the sensors. Furthermore, a collection of obtained reconstructions is available as a dataset in http://uni-tuebingen.de/en/138898
    corecore