271 research outputs found

    Subjective assessment of super multiview video with coding artifacts

    Get PDF
    The subjective assessment of super multiview (SMV) video considers two main perceptual factors: image quality and visual comfort at the viewpoint transition. While previous works only covered raw content with high levels of visual comfort, this work supersedes them by targeting the subjective assessment of SMV content with coding artifacts. The outcome of this analysis yields important conclusions regarding the relationship between these two factors, indicating that 1) the perceived image quality is independent from the view point change speed, and 2) the perceived visual comfort at the view point transition is independent from the image quality. These conclusions facilitate the extension of the scope of existing subjective perception models, designed for raw SMV content, to coded content

    OR-NeRF: Object Removing from 3D Scenes Guided by Multiview Segmentation with Neural Radiance Fields

    Full text link
    The emergence of Neural Radiance Fields (NeRF) for novel view synthesis has increased interest in 3D scene editing. An essential task in editing is removing objects from a scene while ensuring visual reasonability and multiview consistency. However, current methods face challenges such as time-consuming object labeling, limited capability to remove specific targets, and compromised rendering quality after removal. This paper proposes a novel object-removing pipeline, named OR-NeRF, that can remove objects from 3D scenes with user-given points or text prompts on a single view, achieving better performance in less time than previous works. Our method spreads user annotations to all views through 3D geometry and sparse correspondence, ensuring 3D consistency with less processing burden. Then recent 2D segmentation model Segment-Anything (SAM) is applied to predict masks, and a 2D inpainting model is used to generate color supervision. Finally, our algorithm applies depth supervision and perceptual loss to maintain consistency in geometry and appearance after object removal. Experimental results demonstrate that our method achieves better editing quality with less time than previous works, considering both quality and quantity.Comment: project site: https://ornerf.github.io/ (codes available

    Análisis y mejora de un modelo de calidad de imágenes virtuales para sistemas de video multivista

    Full text link
    Las tecnologías de video 3D están avanzando mucho en los últimos años, siendo cada vez mayor el abanico de posibilidades que ofrecen y más económicas sus plataformas de visualización. Por el contrario, las pantallas estereoscópicas no han terminado de asentarse en el mercado y en los hogares de los usuarios. Problemas en la visualización como la fatiga visual o la necesidad de utilizar unas gafas para poder disfrutar del contenido son algunas de sus causas. Como solución, cada día se están mejorando los sistemas de video autoestereoscópico, es decir, de visualización 3D sin gafas, como es el caso del Super Multiview Video (SMV). El SMV es un sistema de video multivista 3D que permite al usuario ver la escena desde distintos puntos de vista. Sin embargo, este tipo de video es muy costoso de producir, tanto económica como computacionalmente, ya que será necesaria la captura, codificación y transmisión de una cantidad de vistas muy grande. Como solución a este problema, se propone la síntesis de vistas virtuales a partir de otras reales de la misma escena, de manera que se pueda prescindir de algunas cámaras y se genere el mismo contenido ahorrando en la cantidad de información a transmitir. Sin embargo, la síntesis de estas vistas puede dar problemas en la visualización, ya que se pueden generar artefactos (errores) en las vistas sintetizadas que empeoran la percepción. Para solucionar estos problemas se proponen modelos que permiten predecir la calidad del video sintetizado a partir de características de la escena, como es la información de profundidad o la distancia focal. Uno de estos modelos es el MVPDM (Multiview Perceptual Disparity Model), que propone la predicción de determinadas medidas de calidad, objetivas o subjetivas a partir de la disparidad perceptual, un parámetro que modela la percepción subjetiva del usuario a partir de la disparidad de la escena. Basado en este modelo tenemos trabajos que se encargan de realizar una predicción de la calidad de la imagen en vistas virtuales a partir de ésta disparidad perceptual. Estos modelos de predicción de calidad han sido probados para una cantidad de datos limitada, obteniendo buenos resultados. Nuestro objetivo en este TFG será el de aumentar los datos de entrenamiento de este modelo, obteniendo una mayor cantidad y variedad de los datos para ver cómo generaliza el modelo para estos nuevos datos. En primer lugar, hemos aumentado los datos en el espacio y en el tiempo, y hemos visualizado la variación del PSNR y disparidad perceptual en cada uno. Después, hemos realizado un análisis similar al del trabajo anterior para los nuevos datos de entrenamiento, y hemos comparado los resultados con los del trabajo previo, observando resultados similares y concluyendo que el modelo es genérico para una cantidad de datos mayor y más variada

    Depth map compression via 3D region-based representation

    Get PDF
    In 3D video, view synthesis is used to create new virtual views between encoded camera views. Errors in the coding of the depth maps introduce geometry inconsistencies in synthesized views. In this paper, a new 3D plane representation of the scene is presented which improves the performance of current standard video codecs in the view synthesis domain. Two image segmentation algorithms are proposed for generating a color and depth segmentation. Using both partitions, depth maps are segmented into regions without sharp discontinuities without having to explicitly signal all depth edges. The resulting regions are represented using a planar model in the 3D world scene. This 3D representation allows an efficient encoding while preserving the 3D characteristics of the scene. The 3D planes open up the possibility to code multiview images with a unique representation.Postprint (author's final draft

    Objective quality metric for 3D virtual views

    Get PDF
    In free-viewpoint television (FTV) framework, due to hard-ware and bandwidth constraints, only a limited number of viewpoints are generally captured, coded and transmitted; therefore, a large number of views needs to be synthesized at the receiver to grant a really immersive 3D experience. It is thus evident that the estimation of the quality of the synthesized views is of paramount importance. Moreover, quality assessment of the synthesized view is very challeng-ing since the corresponding original views are generally not available either on the encoder (not captured) or the decoder side (not transmitted). To tackle the mentioned issues, this paper presents an algorithm to estimate the quality of the synthesized images in the absence of the corresponding ref-erence images. The algorithm is based upon the cyclopean eye theory. The statistical characteristics of an estimated cy-clopean image are compared with the synthesized image to measure its quality. The prediction accuracy and reliability of the proposed technique are tested on standard video dataset compressed with HEVC showing excellent correlation results with respect to state-of-the-art full reference image and video quality metrics. Index Terms — Quality assessment, depth image based rendering, view synthesis, FTV, HEVC 1

    Dense light field coding: a survey

    Get PDF
    Light Field (LF) imaging is a promising solution for providing more immersive and closer to reality multimedia experiences to end-users with unprecedented creative freedom and flexibility for applications in different areas, such as virtual and augmented reality. Due to the recent technological advances in optics, sensor manufacturing and available transmission bandwidth, as well as the investment of many tech giants in this area, it is expected that soon many LF transmission systems will be available to both consumers and professionals. Recognizing this, novel standardization initiatives have recently emerged in both the Joint Photographic Experts Group (JPEG) and the Moving Picture Experts Group (MPEG), triggering the discussion on the deployment of LF coding solutions to efficiently handle the massive amount of data involved in such systems. Since then, the topic of LF content coding has become a booming research area, attracting the attention of many researchers worldwide. In this context, this paper provides a comprehensive survey of the most relevant LF coding solutions proposed in the literature, focusing on angularly dense LFs. Special attention is placed on a thorough description of the different LF coding methods and on the main concepts related to this relevant area. Moreover, comprehensive insights are presented into open research challenges and future research directions for LF coding.info:eu-repo/semantics/publishedVersio
    corecore