176 research outputs found

    Characterization of Energy and Performance Bottlenecks in an Omni-directional Camera System

    Get PDF
    abstract: Generating real-world content for VR is challenging in terms of capturing and processing at high resolution and high frame-rates. The content needs to represent a truly immersive experience, where the user can look around in 360-degree view and perceive the depth of the scene. The existing solutions only capture and offload the compute load to the server. But offloading large amounts of raw camera feeds takes longer latencies and poses difficulties for real-time applications. By capturing and computing on the edge, we can closely integrate the systems and optimize for low latency. However, moving the traditional stitching algorithms to battery constrained device needs at least three orders of magnitude reduction in power. We believe that close integration of capture and compute stages will lead to reduced overall system power. We approach the problem by building a hardware prototype and characterize the end-to-end system bottlenecks of power and performance. The prototype has 6 IMX274 cameras and uses Nvidia Jetson TX2 development board for capture and computation. We found that capturing is bottlenecked by sensor power and data-rates across interfaces, whereas compute is limited by the total number of computations per frame. Our characterization shows that redundant capture and redundant computations lead to high power, huge memory footprint, and high latency. The existing systems lack hardware-software co-design aspects, leading to excessive data transfers across the interfaces and expensive computations within the individual subsystems. Finally, we propose mechanisms to optimize the system for low power and low latency. We emphasize the importance of co-design of different subsystems to reduce and reuse the data. For example, reusing the motion vectors of the ISP stage reduces the memory footprint of the stereo correspondence stage. Our estimates show that pipelining and parallelization on custom FPGA can achieve real time stitching.Dissertation/ThesisPrototypeMasters Thesis Electrical Engineering 201

    Recurrent Scene Parsing with Perspective Understanding in the Loop

    Full text link
    Objects may appear at arbitrary scales in perspective images of a scene, posing a challenge for recognition systems that process images at a fixed resolution. We propose a depth-aware gating module that adaptively selects the pooling field size in a convolutional network architecture according to the object scale (inversely proportional to the depth) so that small details are preserved for distant objects while larger receptive fields are used for those nearby. The depth gating signal is provided by stereo disparity or estimated directly from monocular input. We integrate this depth-aware gating into a recurrent convolutional neural network to perform semantic segmentation. Our recurrent module iteratively refines the segmentation results, leveraging the depth and semantic predictions from the previous iterations. Through extensive experiments on four popular large-scale RGB-D datasets, we demonstrate this approach achieves competitive semantic segmentation performance with a model which is substantially more compact. We carry out extensive analysis of this architecture including variants that operate on monocular RGB but use depth as side-information during training, unsupervised gating as a generic attentional mechanism, and multi-resolution gating. We find that gated pooling for joint semantic segmentation and depth yields state-of-the-art results for quantitative monocular depth estimation

    Imaging methods for understanding and improving visual training in the geosciences

    Get PDF
    Experience in the field is a critical educational component of every student studying geology. However, it is typically difficult to ensure that every student gets the necessary experience because of monetary and scheduling limitations. Thus, we proposed to create a virtual field trip based off of an existing 10-day field trip to California taken as part of an undergraduate geology course at the University of Rochester. To assess the effectiveness of this approach, we also proposed to analyze the learning and observation processes of both students and experts during the real and virtual field trips. At sites intended for inclusion in the virtual field trip, we captured gigapixel resolution panoramas by taking hundreds of images using custom built robotic imaging systems. We gathered data to analyze the learning process by fitting each geology student and expert with a portable eye- tracking system that records a video of their eye movements and a video of the scene they are observing. An important component of analyzing the eye-tracking data requires mapping the gaze of each observer into a common reference frame. We have made progress towards developing a software tool that helps automate this procedure by using image feature tracking and registration methods to map the scene video frames from each eye-tracker onto a reference panorama for each site. For the purpose of creating a virtual field trip, we have a large scale semi-immersive display system that consists of four tiled projectors, which have been colorimetrically and photometrically calibrated, and a curved widescreen display surface. We use this system to present the previously captured panoramas, which simulates the experience of visiting the sites in person. In terms of broader geology education and outreach, we have created an interactive website that uses Google Earth as the interface for visually exploring the panoramas captured for each site

    Real-Time Virtual Viewpoint Generation on the GPU for Scene Navigation

    Full text link

    Architectural Digital Photogrammetry

    Get PDF
    This study is to exploit texturing techniques of a common modelling software in the way of creating virtual models of an exist architectures using oriented panoramas. In this research, The panoramic image-based interactive modelling is introduced as assembly point of photography, topography, photogrammetry and modelling techniques. It is an interactive system for generating photorealistic, textured 3D models of architectural structures and urban scenes. The technique is suitable for the architectural survey because it is not a «point by point» survey, and it exploit the geometrical constraints in the architecture to simplify modelling. Many factors are presented to be critical features that affect the modelling quality and accuracy, such as the way and the position in shooting the photos, stitching the multi-image panorama photos, the orientation, texturing techniques and so on. During the last few years, many Image-based modelling programmes have been released. Whereas, in this research, the photo modelling programs was not in use, it meant to face the fundamentals of the photogrammetry and to go beyond the limitations of such software by avoiding the automatism. In addition, it meant to exploit the potent commands of a program as 3DsMax to obtain the final representation of the Architecture. Such representation can be used in different fields (from detailed architectural survey to an architectural representation in cinema and video games), considering the accuracy and the quality which they are vary too. After the theoretical studies of this technique, it was applied in four applications to different types of close range surveys. This practice allowed to comprehend the practical problems in the whole process (from photographing all the way to modelling) and to propose the methods in the ways to improve it and to avoid any complications. It was compared with the laser scanning to study the accuracy of this technique. Thus, it is realized that not only the accuracy of this technique is linked to the size of the surveyed object, but also the size changes the way in which the survey to be approached. Since the 3D modelling program is not dedicated to be used for the image-based modelling, texturing problems was faced. It was analyzed in: how the program can behave with the Bitmap, how to project it, how it could be an interactive projection, and what are the limitations

    Multi-Projective Camera-Calibration, Modeling, and Integration in Mobile-Mapping Systems

    Get PDF
    Optical systems are vital parts of most modern systems such as mobile mapping systems, autonomous cars, unmanned aerial vehicles (UAV), and game consoles. Multi-camera systems (MCS) are commonly employed for precise mapping including aerial and close-range applications. In the first part of this thesis a simple and practical calibration model and a calibration scheme for multi-projective cameras (MPC) is presented. The calibration scheme is enabled by implementing a camera test field equipped with a customized coded target as FGI’s camera calibration room. The first hypothesis was that a test field is necessary to calibrate an MPC. Two commercially available MPCs with 6 and 36 cameras were successfully calibrated in FGI’s calibration room. The calibration results suggest that the proposed model is able to estimate parameters of the MPCs with high geometric accuracy, and reveals the internal structure of the MPCs. In the second part, the applicability of an MPC calibrated by the proposed approach was investigated in a mobile mapping system (MMS). The second hypothesis was that a system calibration is necessary to achieve high geometric accuracies in a multi-camera MMS. The MPC model was updated to consider mounting parameters with respect to GNSS and IMU. A system calibration scheme for an MMS was proposed. The results showed that the proposed system calibration approach was able to produce accurate results by direct georeferencing of multi-images in an MMS. Results of geometric assessments suggested that a centimeter-level accuracy is achievable by employing the proposed approach. A novel correspondence map is demonstrated for MPCs that helps to create metric panoramas. In the third part, the problem of real-time trajectory estimation of a UAV equipped with a projective camera was studied. The main objective of this part was to address the problem of real-time monocular simultaneous localization and mapping (SLAM) of a UAV. An angular framework was discussed to address the gimbal lock singular situation. The results suggest that the proposed solution is an effective and rigorous monocular SLAM for aerial cases where the object is near-planar. In the last part, the problem of tree-species classification by a UAV equipped with two hyper-spectral an RGB cameras was studied. The objective of this study was to investigate different aspects of a precise tree-species classification problem by employing state-of-art methods. A 3D convolutional neural-network (3D-CNN) and a multi-layered perceptron (MLP) were proposed and compared. Both classifiers were highly successful in their tasks, while the 3D-CNN was superior in performance. The classification result was the most accurate results published in comparison to other works.Optiset kuvauslaitteet ovat keskeisessä roolissa moderneissa konenäköön perustuvissa järjestelmissä kuten autonomiset autot, miehittämättömät lentolaitteet (UAV) ja pelikonsolit. Tällaisissa sovelluksissa hyödynnetään tyypillisesti monikamerajärjestelmiä. Väitöskirjan ensimmäisessä osassa kehitetään yksinkertainen ja käytännöllinen matemaattinen malli ja kalibrointimenetelmä monikamerajärjestelmille. Koodatut kohteet ovat keinotekoisia kuvia, joita voidaan tulostaa esimerkiksi A4-paperiarkeille ja jotka voidaan mitata automaattisesti tietokonealgoritmeillä. Matemaattinen malli määritetään hyödyntämällä 3-ulotteista kamerakalibrointihuonetta, johon kehitetyt koodatut kohteet asennetaan. Kaksi kaupallista monikamerajärjestelmää, jotka muodostuvat 6 ja 36 erillisestä kamerasta, kalibroitiin onnistuneesti ehdotetulla menetelmällä. Tulokset osoittivat, että menetelmä tuotti tarkat estimaatit monikamerajärjestelmän geometrisille parametreille ja että estimoidut parametrit vastasivat hyvin kameran sisäistä rakennetta. Työn toisessa osassa tutkittiin ehdotetulla menetelmällä kalibroidun monikamerajärjestelmän mittauskäyttöä liikkuvassa kartoitusjärjestelmässä (MMS). Tavoitteena oli kehittää ja tutkia korkean geometrisen tarkkuuden kartoitusmittauksia. Monikameramallia laajennettiin navigointilaitteiston paikannus ja kallistussensoreihin (GNSS/IMU) liittyvillä parametreillä ja ehdotettiin järjestelmäkalibrointimenetelmää liikkuvalle kartoitusjärjestelmälle. Kalibroidulla järjestelmällä saavutettiin senttimetritarkkuus suorapaikannusmittauksissa. Työssä myös esitettiin monikuville vastaavuuskartta, joka mahdollistaa metristen panoraamojen luonnin monikamarajärjestelmän kuvista. Kolmannessa osassa tutkittiin UAV:​​n liikeradan reaaliaikaista estimointia hyödyntäen yhteen kameraan perustuvaa menetelmää. Päätavoitteena oli kehittää monokulaariseen kuvaamiseen perustuva reaaliaikaisen samanaikaisen paikannuksen ja kartoituksen (SLAM) menetelmä. Työssä ehdotettiin moniresoluutioisiin kuvapyramideihin ja eteneviin suorakulmaisiin alueisiin perustuvaa sovitusmenetelmää. Ehdotetulla lähestymistavalla pystyttiin alentamaan yhteensovittamisen kustannuksia sovituksen tarkkuuden säilyessä muuttumattomana. Kardaanilukko (gimbal lock) tilanteen käsittelemiseksi toteutettiin uusi kulmajärjestelmä. Tulokset osoittivat, että ehdotettu ratkaisu oli tehokas ja tarkka tilanteissa joissa kohde on lähes tasomainen. Suorituskyvyn arviointi osoitti, että kehitetty menetelmä täytti UAV:n reaaliaikaiselle reitinestimoinnille annetut aika- ja tarkkuustavoitteet. Työn viimeisessä osassa tutkittiin puulajiluokitusta käyttäen hyperspektri- ja RGB-kameralla varustettua UAV-järjestelmää. Tavoitteena oli tutkia uusien koneoppimismenetelmien käyttöä tarkassa puulajiluokituksessa ja lisäksi vertailla hyperspektri ja RGB-aineistojen suorituskykyä. Työssä verrattiin 3D-konvoluutiohermoverkkoa (3D-CNN) ja monikerroksista perceptronia (MLP). Molemmat luokittelijat tuottivat hyvän luokittelutarkkuuden, mutta 3D-CNN tuotti tarkimmat tulokset. Saavutettu tarkkuus oli parempi kuin aikaisemmat julkaistut tulokset vastaavilla aineistoilla. Hyperspektrisen ja RGB-datan yhdistelmä tuotti parhaan tarkkuuden, mutta myös RGB-kamera yksin tuotti tarkan tuloksen ja on edullinen ja tehokas aineisto monille luokittelusovelluksille

    Cuboid-maps for indoor illumination modeling and augmented reality rendering

    Get PDF
    This thesis proposes a novel approach for indoor scene illumination modeling and augmented reality rendering. Our key observation is that an indoor scene is well represented by a set of rectangular spaces, where important illuminants reside on their boundary faces, such as a window on a wall or a ceiling light. Given a perspective image or a panorama and detected rectangular spaces as inputs, we estimate their cuboid shapes, and infer illumination components for each face of the cuboids by a simple convolutional neural architecture. The process turns an image into a set of cuboid environment maps, each of which is a simple extension of a traditional cube-map. For augmented reality rendering, we simply take a linear combination of inferred environment maps and an input image, producing surprisingly realistic illumination effects. This approach is simple and efficient, avoids flickering, and achieves quantitatively more accurate and qualitatively more realistic effects than competing substantially more complicated systems

    Deep Learning for 3D Visual Perception

    Get PDF
    La percepción visual 3D se refiere al conjunto de problemas que engloban la reunión de información a través de un sensor visual y la estimación la posición tridimensional y estructura de los objetos y formaciones al rededor del sensor. Algunas funcionalidades como la estimación de la ego moción o construcción de mapas are esenciales para otras tareas de más alto nivel como conducción autónoma o realidad aumentada. En esta tesis se han atacado varios desafíos en la percepción 3D, todos ellos útiles desde la perspectiva de SLAM (Localización y Mapeo Simultáneos) que en si es un problema de percepción 3D.Localización y Mapeo Simultáneos –SLAM– busca realizar el seguimiento de la posición de un dispositivo (por ejemplo de un robot, un teléfono o unas gafas de realidad virtual) con respecto al mapa que está construyendo simultáneamente mientras la plataforma explora el entorno. SLAM es una tecnología muy relevante en distintas aplicaciones como realidad virtual, realidad aumentada o conducción autónoma. SLAM Visual es el termino utilizado para referirse al problema de SLAM resuelto utilizando unicamente sensores visuales. Muchas de las piezas del sistema ideal de SLAM son, hoy en día, bien conocidas, maduras y en muchos casos presentes en aplicaciones. Sin embargo, hay otras piezas que todavía presentan desafíos de investigación significantes. En particular, en los que hemos trabajado en esta tesis son la estimación de la estructura 3D al rededor de una cámara a partir de una sola imagen, reconocimiento de lugares ya visitados bajo cambios de apariencia drásticos, reconstrucción de alto nivel o SLAM en entornos dinámicos; todos ellos utilizando redes neuronales profundas.Estimación de profundidad monocular is la tarea de percibir la distancia a la cámara de cada uno de los pixeles en la imagen, utilizando solo la información que obtenemos de una única imagen. Este es un problema mal condicionado, y por lo tanto es muy difícil de inferir la profundidad exacta de los puntos en una sola imagen. Requiere conocimiento de lo que se ve y del sensor que utilizamos. Por ejemplo, si podemos saber que un modelo de coche tiene cierta altura y también sabemos el tipo de cámara que hemos utilizado (distancia focal, tamaño de pixel...); podemos decir que si ese coche tiene cierta altura en la imagen, por ejemplo 50 pixeles, esta a cierta distancia de la cámara. Para ello nosotros presentamos el primer trabajo capaz de estimar profundidad a partir de una sola vista que es capaz de obtener un funcionamiento razonable con múltiples tipos de cámara; como un teléfono o una cámara de video.También presentamos como estimar, utilizando una sola imagen, la estructura de una habitación o el plan de la habitación. Para este segundo trabajo, aprovechamos imágenes esféricas tomadas por una cámara panorámica utilizando una representación equirectangular. Utilizando estas imágenes recuperamos el plan de la habitación, nuestro objetivo es reconocer las pistas en la imagen que definen la estructura de una habitación. Nos centramos en recuperar la versión más simple, que son las lineas que separan suelo, paredes y techo.Localización y mapeo a largo plazo requiere dar solución a los cambios de apariencia en el entorno; el efecto que puede tener en una imagen tomarla en invierno o verano puede ser muy grande. Introducimos un modelo multivista invariante a cambios de apariencia que resuelve el problema de reconocimiento de lugares de forma robusta. El reconocimiento de lugares visual trata de identificar un lugar que ya hemos visitado asociando pistas visuales que se ven en las imágenes; la tomada en el pasado y la tomada en el presente. Lo preferible es ser invariante a cambios en punto de vista, iluminación, objetos dinámicos y cambios de apariencia a largo plazo como el día y la noche, las estaciones o el clima.Para tener funcionalidad a largo plazo también presentamos DynaSLAM, un sistema de SLAM que distingue las partes estáticas y dinámicas de la escena. Se asegura de estimar su posición unicamente basándose en las partes estáticas y solo reconstruye el mapa de las partes estáticas. De forma que si visitamos una escena de nuevo, nuestro mapa no se ve afectado por la presencia de nuevos objetos dinámicos o la desaparición de los anteriores.En resumen, en esta tesis contribuimos a diferentes problemas de percepción 3D; todos ellos resuelven problemas del SLAM Visual.<br /
    corecore