80 research outputs found

    Metrics for Stereoscopic Image Compression

    Get PDF
    Metrics for automatically predicting the compression settings for stereoscopic images, to minimize file size, while still maintaining an acceptable level of image quality are investigated. This research evaluates whether symmetric or asymmetric compression produces a better quality of stereoscopic image. Initially, how Peak Signal to Noise Ratio (PSNR) measures the quality of varyingly compressed stereoscopic image pairs was investigated. Two trials with human subjects, following the ITU-R BT.500-11 Double Stimulus Continuous Quality Scale (DSCQS) were undertaken to measure the quality of symmetric and asymmetric stereoscopic image compression. Computational models of the Human Visual System (HVS) were then investigated and a new stereoscopic image quality metric designed and implemented. The metric point matches regions of high spatial frequency between the left and right views of the stereo pair and accounts for HVS sensitivity to contrast and luminance changes in these regions. The PSNR results show that symmetric, as opposed to asymmetric stereo image compression, produces significantly better results. The human factors trial suggested that in general, symmetric compression of stereoscopic images should be used. The new metric, Stereo Band Limited Contrast, has been demonstrated as a better predictor of human image quality preference than PSNR and can be used to predict a perceptual threshold level for stereoscopic image compression. The threshold is the maximum compression that can be applied without the perceived image quality being altered. Overall, it is concluded that, symmetric, as opposed to asymmetric stereo image encoding, should be used for stereoscopic image compression. As PSNR measures of image quality are correctly criticized for correlating poorly with perceived visual quality, the new HVS based metric was developed. This metric produces a useful threshold to provide a practical starting point to decide the level of compression to use

    Compression and Subjective Quality Assessment of 3D Video

    Get PDF
    In recent years, three-dimensional television (3D TV) has been broadly considered as the successor to the existing traditional two-dimensional television (2D TV) sets. With its capability of offering a dynamic and immersive experience, 3D video (3DV) is expected to expand conventional video in several applications in the near future. However, 3D content requires more than a single view to deliver the depth sensation to the viewers and this, inevitably, increases the bitrate compared to the corresponding 2D content. This need drives the research trend in video compression field towards more advanced and more efficient algorithms. Currently, the Advanced Video Coding (H.264/AVC) is the state-of-the-art video coding standard which has been developed by the Joint Video Team of ISO/IEC MPEG and ITU-T VCEG. This codec has been widely adopted in various applications and products such as TV broadcasting, video conferencing, mobile TV, and blue-ray disc. One important extension of H.264/AVC, namely Multiview Video Coding (MVC) was an attempt to multiple view compression by taking into consideration the inter-view dependency between different views of the same scene. This codec H.264/AVC with its MVC extension (H.264/MVC) can be used for encoding either conventional stereoscopic video, including only two views, or multiview video, including more than two views. In spite of the high performance of H.264/MVC, a typical multiview video sequence requires a huge amount of storage space, which is proportional to the number of offered views. The available views are still limited and the research has been devoted to synthesizing an arbitrary number of views using the multiview video and depth map (MVD). This process is mandatory for auto-stereoscopic displays (ASDs) where many views are required at the viewer side and there is no way to transmit such a relatively huge number of views with currently available broadcasting technology. Therefore, to satisfy the growing hunger for 3D related applications, it is mandatory to further decrease the bitstream by introducing new and more efficient algorithms for compressing multiview video and depth maps. This thesis tackles the 3D content compression targeting different formats i.e. stereoscopic video and depth-enhanced multiview video. Stereoscopic video compression algorithms introduced in this thesis mostly focus on proposing different types of asymmetry between the left and right views. This means reducing the quality of one view compared to the other view aiming to achieve a better subjective quality against the symmetric case (the reference) and under the same bitrate constraint. The proposed algorithms to optimize depth-enhanced multiview video compression include both texture compression schemes as well as depth map coding tools. Some of the introduced coding schemes proposed for this format include asymmetric quality between the views. Knowing that objective metrics are not able to accurately estimate the subjective quality of stereoscopic content, it is suggested to perform subjective quality assessment to evaluate different codecs. Moreover, when the concept of asymmetry is introduced, the Human Visual System (HVS) performs a fusion process which is not completely understood. Therefore, another important aspect of this thesis is conducting several subjective tests and reporting the subjective ratings to evaluate the perceived quality of the proposed coded content against the references. Statistical analysis is carried out in the thesis to assess the validity of the subjective ratings and determine the best performing test cases

    Perceptual Quality-of-Experience of Stereoscopic 3D Images and Videos

    Get PDF
    With the fast development of 3D acquisition, communication, processing and display technologies, automatic quality assessment of 3D images and videos has become ever important. Nevertheless, recent progress on 3D image quality assessment (IQA) and video quality assessment (VQA) remains limited. The purpose of this research is to investigate various aspects of human visual quality-of-experience (QoE) when viewing stereoscopic 3D images/videos and to develop objective quality assessment models that automatically predict visual QoE of 3D images/videos. Firstly, we create a new subjective 3D-IQA database that has two features that are lacking in the literature, i.e., the inclusion of both 2D and 3D images, and the inclusion of mixed distortion types. We observe strong distortion type dependent bias when using the direct average of 2D image quality to predict 3D image quality. We propose a binocular rivalry inspired multi-scale model to predict the quality of stereoscopic images and the results show that the proposed model eliminates the prediction bias, leading to significantly improved quality predictions. Second, we carry out two subjective studies on depth perception of stereoscopic 3D images. The first one follows a traditional framework where subjects are asked to rate depth quality directly on distorted stereopairs. The second one uses a novel approach, where the stimuli are synthesized independent of the background image content and the subjects are asked to identify depth changes and label the polarities of depth. Our analysis shows that the second approach is much more effective at singling out the contributions of stereo cues in depth perception. We initialize the notion of depth perception difficulty index (DPDI) and propose a novel computational model for DPDI prediction. The results show that the proposed model leads to highly promising DPDI prediction performance. Thirdly, we carry out subjective 3D-VQA experiments on two databases that contain various asymmetrically compressed stereoscopic 3D videos. We then compare different mixed-distortions asymmetric stereoscopic video coding schemes with symmetric coding methods and verify their potential coding gains. We propose a model to account for the prediction bias from using direct averaging of 2D video quality to predict 3D video quality. The results show that the proposed model leads to significantly improved quality predictions and can help us predict the coding gain of mixed-distortions asymmetric video compression. Fourthly, we investigate the problem of objective quality assessment of Multi-view-plus-depth (MVD) images, with a main focus on the pre- depth-image-based-rendering (pre-DIBR) case. We find that existing IQA methods are difficult to be employed as a guiding criterion in the optimization of MVD video coding and transmission systems when applied post-DIBR. We propose a novel pre-DIBR method based on information content weighting of both texture and depth images, which demonstrates competitive performance against state-of-the-art IQA models applied post-DIBR

    Asymmetric 3D video coding based on regions of perceptual relevance

    Get PDF
    This dissertation presents a study and experimental research on asymmetric coding of stereoscopic video. A review on 3D technologies, video formats and coding is rst presented and then particular emphasis is given to asymmetric coding of 3D content and performance evaluation methods, based on subjective measures, of methods using asymmetric coding. The research objective was de ned to be an extension of the current concept of asymmetric coding for stereo video. To achieve this objective the rst step consists in de ning regions in the spatial dimension of auxiliary view with di erent perceptual relevance within the stereo pair, which are identi ed by a binary mask. Then these regions are encoded with better quality (lower quantisation) for the most relevant ones and worse quality (higher quantisation) for the those with lower perceptual relevance. The actual estimation of the relevance of a given region is based on a measure of disparity according to the absolute di erence between views. To allow encoding of a stereo sequence using this method, a reference H.264/MVC encoder (JM) has been modi ed to allow additional con guration parameters and inputs. The nal encoder is still standard compliant. In order to show the viability of the method subjective assessment tests were performed over a wide range of objective qualities of the auxiliary view. The results of these tests allow us to prove 3 main goals. First, it is shown that the proposed method can be more e cient than traditional asymmetric coding when encoding stereo video at higher qualities/rates. The method can also be used to extend the threshold at which uniform asymmetric coding methods start to have an impact on the subjective quality perceived by the observers. Finally the issue of eye dominance is addressed. Results from stereo still images displayed over a short period of time showed it has little or no impact on the proposed method

    Automatic 2D-to-3D conversion of single low depth-of-field images

    Get PDF
    This research presents a novel approach to the automatic rendering of 3D stereoscopic disparity image pairs from single 2D low depth-of-field (LDOF) images. Initially a depth map is produced through the assignment of depth to every delineated object and region in the image. Subsequently the left and right disparity images are produced through depth imagebased rendering (DIBR). The objects and regions in the image are initially assigned to one of six proposed groups or labels. Labelling is performed in two stages. The first involves the delineation of the dominant object-of-interest (OOI). The second involves the global object and region grouping of the non-OOI regions. The matting of the OOI is also performed in two stages. Initially the in focus foreground or region-of-interest (ROI) is separated from the out of focus background. This is achieved through the correlation of edge, gradient and higher-order statistics (HOS) saliencies. Refinement of the ROI is performed using k-means segmentation and CIEDE2000 colour-difference matching. Subsequently the OOI is extracted from within the ROI through analysis of the dominant gradients and edge saliencies together with k-means segmentation. Depth is assigned to each of the six labels by correlating Gestalt-based principles with vanishing point estimation, gradient plane approximation and depth from defocus (DfD). To minimise some of the dis-occlusions that are generated through the 3D warping sub-process within the DIBR process the depth map is pre-smoothed using an asymmetric bilateral filter. Hole-filling of the remaining dis-occlusions is performed through nearest-neighbour horizontal interpolation, which incorporates depth as well as direction of warp. To minimising the effects of the lateral striations, specific directional Gaussian and circular averaging smoothing is applied independently to each view, with additional average filtering applied to the border transitions. Each stage of the proposed model is benchmarked against data from several significant publications. Novel contributions are made in the sub-speciality fields of ROI estimation, OOI matting, LDOF image classification, Gestalt-based region categorisation, vanishing point detection, relative depth assignment and hole-filling or inpainting. An important contribution is made towards the overall knowledge base of automatic 2D-to-3D conversion techniques, through the collation of existing information, expansion of existing methods and development of newer concepts

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Codage de cartes de profondeur par deformation de courbes elastiques

    Get PDF
    In multiple-view video plus depth, depth maps can be represented by means of grayscale images and the corresponding temporal sequence can be thought as a standard grayscale video sequence. However depth maps have different properties from natural images: they present large areas of smooth surfaces separated by sharp edges. Arguably the most important information lies in object contours, as a consequence an interesting approach consists in performing a lossless coding of the contour map, possibly followed by a lossy coding of per-object depth values.In this context, we propose a new technique for the lossless coding of object contours, based on the elastic deformation of curves. A continuous evolution of elastic deformations between two reference contour curves can be modelled, and an elastically deformed version of the reference contours can be sent to the decoder with an extremely small coding cost and used as side information to improve the lossless coding of the actual contour. After the main discontinuities have been captured by the contour description, the depth field inside each region is rather smooth. We proposed and tested two different techniques for the coding of the depth field inside each region. The first technique performs the shape-adaptive wavelet transform followed by the shape-adaptive version of SPIHT. The second technique performs a prediction of the depth field from its subsampled version and the set of coded contours. It is generally recognized that a high quality view rendering at the receiver side is possible only by preserving the contour information, since distortions on edges during the encoding step would cause a sensible degradation on the synthesized view and on the 3D perception. We investigated this claim by conducting a subjective quality assessment test to compare an object-based technique and a hybrid block-based techniques for the coding of depth maps.Dans le format multiple-view video plus depth, les cartes de profondeur peuvent être représentées comme des images en niveaux de gris et la séquence temporelle correspondante peut être considérée comme une séquence vidéo standard en niveaux de gris. Cependant les cartes de profondeur ont des propriétés différentes des images naturelles: ils présentent de grandes surfaces lisses séparées par des arêtes vives. On peut dire que l'information la plus importante réside dans les contours de l'objet, en conséquence une approche intéressante consiste à effectuer un codage sans perte de la carte de contour, éventuellement suivie d'un codage lossy des valeurs de profondeur par-objet.Dans ce contexte, nous proposons une nouvelle technique pour le codage sans perte des contours de l'objet, basée sur la déformation élastique des courbes. Une évolution continue des déformations élastiques peut être modélisée entre deux courbes de référence, et une version du contour déformée élastiquement peut être envoyé au décodeur avec un coût de codage très faible et utilisé comme information latérale pour améliorer le codage sans perte du contour réel. Après que les principales discontinuités ont été capturés par la description du contour, la profondeur à l'intérieur de chaque région est assez lisse. Nous avons proposé et testé deux techniques différentes pour le codage du champ de profondeur à l'intérieur de chaque région. La première technique utilise la version adaptative à la forme de la transformation en ondelette, suivie par la version adaptative à la forme de SPIHT.La seconde technique effectue une prédiction du champ de profondeur à partir de sa version sous-échantillonnée et l'ensemble des contours codés. Il est généralement reconnu qu'un rendu de haute qualité au récepteur pour un nouveau point de vue est possible que avec la préservation de l'information de contour, car des distorsions sur les bords lors de l'étape de codage entraînerait une dégradation évidente sur la vue synthétisée et sur la perception 3D. Nous avons étudié cette affirmation en effectuant un test d'évaluation de la qualité perçue en comparant, pour le codage des cartes de profondeur, une technique basée sur la compression d'objects et une techniques de codage vidéo hybride à blocs

    Une méthode pour l'évaluation de la qualité des images 3D stéréoscopiques.

    Get PDF
    Dans le contexte d'un intérêt grandissant pour les systèmes stéréoscopiques, mais sans méthodes reproductible pour estimer leur qualité, notre travail propose une contribution à la meilleure compréhension des mécanismes de perception et de jugement humains relatifs au concept multi-dimensionnel de qualité d'image stéréoscopique. Dans cette optique, notre démarche s'est basée sur un certain nombre d'outils : nous avons proposé un cadre adapté afin de structurer le processus d'analyse de la qualité des images stéréoscopiques, nous avons implémenté dans notre laboratoire un système expérimental afin de conduire plusieurs tests, nous avons crée trois bases de données d'images stéréoscopiques contenant des configurations précises et enfin nous avons conduit plusieurs expériences basées sur ces collections d'images. La grande quantité d'information obtenue par l'intermédiaire de ces expérimentations a été utilisée afin de construire un premier modèle mathématique permettant d'expliquer la perception globale de la qualité de la stéréoscopie en fonction des paramètres physiques des images étudiée.In a context of ever-growing interest in stereoscopic systems, but where no standardized algorithmic methods of stereoscopic quality assessment exist, our work stands as a step forward in the understanding of the human perception and judgment mechanisms related to the multidimensional concept of stereoscopic image quality. We used a series of tools in order to perform in-depth investigations in this direction: we proposed an adapted framework to structure the process of stereoscopic quality assessment, we implemented a stereoscopic system in our laboratory for performing various tests, we created three stereoscopic datasets with precise structures, and we performed several experimental studies using these datasets. The numerous experimental data obtained were used in order to propose a first mathematical framework for explaining the overall percept of stereoscopic quality in function of the physical parameters of the stereoscopic images under study.SAVOIE-SCD - Bib.électronique (730659901) / SudocGRENOBLE1/INP-Bib.électronique (384210012) / SudocGRENOBLE2/3-Bib.électronique (384219901) / SudocSudocFranceF
    corecore