855 research outputs found

    Quality Evaluation and Nonuniform Compression of Geometrically Distorted Images Using the Quadtree Distortion Map

    Get PDF
    The paper presents an analysis of the effects of lossy compression algorithms applied to images affected by geometrical distortion. It will be shown that the encoding-decoding process results in a nonhomogeneous image degradation in the geometrically corrected image, due to the different amount of information associated to each pixel. A distortion measure named quadtree distortion map (QDM) able to quantify this aspect is proposed. Furthermore, QDM is exploited to achieve adaptive compression of geometrically distorted pictures, in order to ensure a uniform quality on the final image. Tests are performed using JPEG and JPEG2000 coding standards in order to quantitatively and qualitatively assess the performance of the proposed method

    Learning geometric and lighting priors from natural images

    Get PDF
    Comprendre les images est d’une importance cruciale pour une pléthore de tâches, de la composition numérique au ré-éclairage d’une image, en passant par la reconstruction 3D d’objets. Ces tâches permettent aux artistes visuels de réaliser des chef-d’oeuvres ou d’aider des opérateurs à prendre des décisions de façon sécuritaire en fonction de stimulis visuels. Pour beaucoup de ces tâches, les modèles physiques et géométriques que la communauté scientifique a développés donnent lieu à des problèmes mal posés possédant plusieurs solutions, dont généralement une seule est raisonnable. Pour résoudre ces indéterminations, le raisonnement sur le contexte visuel et sémantique d’une scène est habituellement relayé à un artiste ou un expert qui emploie son expérience pour réaliser son travail. Ceci est dû au fait qu’il est généralement nécessaire de raisonner sur la scène de façon globale afin d’obtenir des résultats plausibles et appréciables. Serait-il possible de modéliser l’expérience à partir de données visuelles et d’automatiser en partie ou en totalité ces tâches ? Le sujet de cette thèse est celui-ci : la modélisation d’a priori par apprentissage automatique profond pour permettre la résolution de problèmes typiquement mal posés. Plus spécifiquement, nous couvrirons trois axes de recherche, soient : 1) la reconstruction de surface par photométrie, 2) l’estimation d’illumination extérieure à partir d’une seule image et 3) l’estimation de calibration de caméra à partir d’une seule image avec un contenu générique. Ces trois sujets seront abordés avec une perspective axée sur les données. Chacun de ces axes comporte des analyses de performance approfondies et, malgré la réputation d’opacité des algorithmes d’apprentissage machine profonds, nous proposons des études sur les indices visuels captés par nos méthodes.Understanding images is needed for a plethora of tasks, from compositing to image relighting, including 3D object reconstruction. These tasks allow artists to realize masterpieces or help operators to safely make decisions based on visual stimuli. For many of these tasks, the physical and geometric models that the scientific community has developed give rise to ill-posed problems with several solutions, only one of which is generally reasonable. To resolve these indeterminations, the reasoning about the visual and semantic context of a scene is usually relayed to an artist or an expert who uses his experience to carry out his work. This is because humans are able to reason globally on the scene in order to obtain plausible and appreciable results. Would it be possible to model this experience from visual data and partly or totally automate tasks? This is the topic of this thesis: modeling priors using deep machine learning to solve typically ill-posed problems. More specifically, we will cover three research axes: 1) surface reconstruction using photometric cues, 2) outdoor illumination estimation from a single image and 3) camera calibration estimation from a single image with generic content. These three topics will be addressed from a data-driven perspective. Each of these axes includes in-depth performance analyses and, despite the reputation of opacity of deep machine learning algorithms, we offer studies on the visual cues captured by our methods

    Angular variation as a monocular cue for spatial percepcion

    Get PDF
    Monocular cues are spatial sensory inputs which are picked up exclusively from one eye. They are in majority static features that provide depth information and are extensively used in graphic art to create realistic representations of a scene. Since the spatial information contained in these cues is picked up from the retinal image, the existence of a link between it and the theory of direct perception can be conveniently assumed. According to this theory, spatial information of an environment is directly contained in the optic array. Thus, this assumption makes possible the modeling of visual perception processes through computational approaches. In this thesis, angular variation is considered as a monocular cue, and the concept of direct perception is adopted by a computer vision approach that considers it as a suitable principle from which innovative techniques to calculate spatial information can be developed. The expected spatial information to be obtained from this monocular cue is the position and orientation of an object with respect to the observer, which in computer vision is a well known field of research called 2D-3D pose estimation. In this thesis, the attempt to establish the angular variation as a monocular cue and thus the achievement of a computational approach to direct perception is carried out by the development of a set of pose estimation methods. Parting from conventional strategies to solve the pose estimation problem, a first approach imposes constraint equations to relate object and image features. In this sense, two algorithms based on a simple line rotation motion analysis were developed. These algorithms successfully provide pose information; however, they depend strongly on scene data conditions. To overcome this limitation, a second approach inspired in the biological processes performed by the human visual system was developed. It is based in the proper content of the image and defines a computational approach to direct perception. The set of developed algorithms analyzes the visual properties provided by angular variations. The aim is to gather valuable data from which spatial information can be obtained and used to emulate a visual perception process by establishing a 2D-3D metric relation. Since it is considered fundamental in the visual-motor coordination and consequently essential to interact with the environment, a significant cognitive effect is produced by the application of the developed computational approach in environments mediated by technology. In this work, this cognitive effect is demonstrated by an experimental study where a number of participants were asked to complete an action-perception task. The main purpose of the study was to analyze the visual guided behavior in teleoperation and the cognitive effect caused by the addition of 3D information. The results presented a significant influence of the 3D aid in the skill improvement, which showed an enhancement of the sense of presence.Las señales monoculares son entradas sensoriales capturadas exclusivamente por un solo ojo que ayudan a la percepción de distancia o espacio. Son en su mayoría características estáticas que proveen información de profundidad y son muy utilizadas en arte gráfico para crear apariencias reales de una escena. Dado que la información espacial contenida en dichas señales son extraídas de la retina, la existencia de una relación entre esta extracción de información y la teoría de percepción directa puede ser convenientemente asumida. De acuerdo a esta teoría, la información espacial de todo le que vemos está directamente contenido en el arreglo óptico. Por lo tanto, esta suposición hace posible el modelado de procesos de percepción visual a través de enfoques computacionales. En esta tesis doctoral, la variación angular es considerada como una señal monocular, y el concepto de percepción directa adoptado por un enfoque basado en algoritmos de visión por computador que lo consideran un principio apropiado para el desarrollo de nuevas técnicas de cálculo de información espacial. La información espacial esperada a obtener de esta señal monocular es la posición y orientación de un objeto con respecto al observador, lo cual en visión por computador es un conocido campo de investigación llamado estimación de la pose 2D-3D. En esta tesis doctoral, establecer la variación angular como señal monocular y conseguir un modelo matemático que describa la percepción directa, se lleva a cabo mediante el desarrollo de un grupo de métodos de estimación de la pose. Partiendo de estrategias convencionales, un primer enfoque implanta restricciones geométricas en ecuaciones para relacionar características del objeto y la imagen. En este caso, dos algoritmos basados en el análisis de movimientos de rotación de una línea recta fueron desarrollados. Estos algoritmos exitosamente proveen información de la pose. Sin embargo, dependen fuertemente de condiciones de la escena. Para superar esta limitación, un segundo enfoque inspirado en los procesos biológicos ejecutados por el sistema visual humano fue desarrollado. Está basado en el propio contenido de la imagen y define un enfoque computacional a la percepción directa. El grupo de algoritmos desarrollados analiza las propiedades visuales suministradas por variaciones angulares. El propósito principal es el de reunir datos de importancia con los cuales la información espacial pueda ser obtenida y utilizada para emular procesos de percepción visual mediante el establecimiento de relaciones métricas 2D- 3D. Debido a que dicha relación es considerada fundamental en la coordinación visuomotora y consecuentemente esencial para interactuar con lo que nos rodea, un efecto cognitivo significativo puede ser producido por la aplicación de métodos de L estimación de pose en entornos mediados tecnológicamente. En esta tesis doctoral, este efecto cognitivo ha sido demostrado por un estudio experimental en el cual un número de participantes fueron invitados a ejecutar una tarea de acción-percepción. El propósito principal de este estudio fue el análisis de la conducta guiada visualmente en teleoperación y el efecto cognitivo causado por la inclusión de información 3D. Los resultados han presentado una influencia notable de la ayuda 3D en la mejora de la habilidad, así como un aumento de la sensación de presencia

    Geometry-Aware Latent Representation Learning for Modeling Disease Progression of Barrett's Esophagus

    Full text link
    Barrett's Esophagus (BE) is the only precursor known to Esophageal Adenocarcinoma (EAC), a type of esophageal cancer with poor prognosis upon diagnosis. Therefore, diagnosing BE is crucial in preventing and treating esophageal cancer. While supervised machine learning supports BE diagnosis, high interobserver variability in histopathological training data limits these methods. Unsupervised representation learning via Variational Autoencoders (VAEs) shows promise, as they map input data to a lower-dimensional manifold with only useful features, characterizing BE progression for improved downstream tasks and insights. However, the VAE's Euclidean latent space distorts point relationships, hindering disease progression modeling. Geometric VAEs provide additional geometric structure to the latent space, with RHVAE assuming a Riemannian manifold and S\mathcal{S}-VAE a hyperspherical manifold. Our study shows that S\mathcal{S}-VAE outperforms vanilla VAE with better reconstruction losses, representation classification accuracies, and higher-quality generated images and interpolations in lower-dimensional settings. By disentangling rotation information from the latent space, we improve results further using a group-based architecture. Additionally, we take initial steps towards S\mathcal{S}-AE, a novel autoencoder model generating qualitative images without a variational framework, but retaining benefits of autoencoders such as stability and reconstruction quality
    • …
    corecore