451 research outputs found

    3D object reconstruction using computer vision : reconstruction and characterization applications for external human anatomical structures

    Get PDF
    Tese de doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Real-Time, Multiple Pan/Tilt/Zoom Computer Vision Tracking and 3D Positioning System for Unmanned Aerial System Metrology

    Get PDF
    The study of structural characteristics of Unmanned Aerial Systems (UASs) continues to be an important field of research for developing state of the art nano/micro systems. Development of a metrology system using computer vision (CV) tracking and 3D point extraction would provide an avenue for making these theoretical developments. This work provides a portable, scalable system capable of real-time tracking, zooming, and 3D position estimation of a UAS using multiple cameras. Current state-of-the-art photogrammetry systems use retro-reflective markers or single point lasers to obtain object poses and/or positions over time. Using a CV pan/tilt/zoom (PTZ) system has the potential to circumvent their limitations. The system developed in this paper exploits parallel-processing and the GPU for CV-tracking, using optical flow and known camera motion, in order to capture a moving object using two PTU cameras. The parallel-processing technique developed in this work is versatile, allowing the ability to test other CV methods with a PTZ system using known camera motion. Utilizing known camera poses, the object\u27s 3D position is estimated and focal lengths are estimated for filling the image to a desired amount. This system is tested against truth data obtained using an industrial system

    Spatial Interaction for Immersive Mixed-Reality Visualizations

    Get PDF
    Growing amounts of data, both in personal and professional settings, have caused an increased interest in data visualization and visual analytics. Especially for inherently three-dimensional data, immersive technologies such as virtual and augmented reality and advanced, natural interaction techniques have been shown to facilitate data analysis. Furthermore, in such use cases, the physical environment often plays an important role, both by directly influencing the data and by serving as context for the analysis. Therefore, there has been a trend to bring data visualization into new, immersive environments and to make use of the physical surroundings, leading to a surge in mixed-reality visualization research. One of the resulting challenges, however, is the design of user interaction for these often complex systems. In my thesis, I address this challenge by investigating interaction for immersive mixed-reality visualizations regarding three core research questions: 1) What are promising types of immersive mixed-reality visualizations, and how can advanced interaction concepts be applied to them? 2) How does spatial interaction benefit these visualizations and how should such interactions be designed? 3) How can spatial interaction in these immersive environments be analyzed and evaluated? To address the first question, I examine how various visualizations such as 3D node-link diagrams and volume visualizations can be adapted for immersive mixed-reality settings and how they stand to benefit from advanced interaction concepts. For the second question, I study how spatial interaction in particular can help to explore data in mixed reality. There, I look into spatial device interaction in comparison to touch input, the use of additional mobile devices as input controllers, and the potential of transparent interaction panels. Finally, to address the third question, I present my research on how user interaction in immersive mixed-reality environments can be analyzed directly in the original, real-world locations, and how this can provide new insights. Overall, with my research, I contribute interaction and visualization concepts, software prototypes, and findings from several user studies on how spatial interaction techniques can support the exploration of immersive mixed-reality visualizations.Zunehmende Datenmengen, sowohl im privaten als auch im beruflichen Umfeld, führen zu einem zunehmenden Interesse an Datenvisualisierung und visueller Analyse. Insbesondere bei inhärent dreidimensionalen Daten haben sich immersive Technologien wie Virtual und Augmented Reality sowie moderne, natürliche Interaktionstechniken als hilfreich für die Datenanalyse erwiesen. Darüber hinaus spielt in solchen Anwendungsfällen die physische Umgebung oft eine wichtige Rolle, da sie sowohl die Daten direkt beeinflusst als auch als Kontext für die Analyse dient. Daher gibt es einen Trend, die Datenvisualisierung in neue, immersive Umgebungen zu bringen und die physische Umgebung zu nutzen, was zu einem Anstieg der Forschung im Bereich Mixed-Reality-Visualisierung geführt hat. Eine der daraus resultierenden Herausforderungen ist jedoch die Gestaltung der Benutzerinteraktion für diese oft komplexen Systeme. In meiner Dissertation beschäftige ich mich mit dieser Herausforderung, indem ich die Interaktion für immersive Mixed-Reality-Visualisierungen im Hinblick auf drei zentrale Forschungsfragen untersuche: 1) Was sind vielversprechende Arten von immersiven Mixed-Reality-Visualisierungen, und wie können fortschrittliche Interaktionskonzepte auf sie angewendet werden? 2) Wie profitieren diese Visualisierungen von räumlicher Interaktion und wie sollten solche Interaktionen gestaltet werden? 3) Wie kann räumliche Interaktion in diesen immersiven Umgebungen analysiert und ausgewertet werden? Um die erste Frage zu beantworten, untersuche ich, wie verschiedene Visualisierungen wie 3D-Node-Link-Diagramme oder Volumenvisualisierungen für immersive Mixed-Reality-Umgebungen angepasst werden können und wie sie von fortgeschrittenen Interaktionskonzepten profitieren. Für die zweite Frage untersuche ich, wie insbesondere die räumliche Interaktion bei der Exploration von Daten in Mixed Reality helfen kann. Dabei betrachte ich die Interaktion mit räumlichen Geräten im Vergleich zur Touch-Eingabe, die Verwendung zusätzlicher mobiler Geräte als Controller und das Potenzial transparenter Interaktionspanels. Um die dritte Frage zu beantworten, stelle ich schließlich meine Forschung darüber vor, wie Benutzerinteraktion in immersiver Mixed-Reality direkt in der realen Umgebung analysiert werden kann und wie dies neue Erkenntnisse liefern kann. Insgesamt trage ich mit meiner Forschung durch Interaktions- und Visualisierungskonzepte, Software-Prototypen und Ergebnisse aus mehreren Nutzerstudien zu der Frage bei, wie räumliche Interaktionstechniken die Erkundung von immersiven Mixed-Reality-Visualisierungen unterstützen können

    Avian surface reconstruction in free-flight with application to flight stability analysis of a barn owl and peregrine falcon

    Get PDF
    Birds primarily create and control the forces necessary for flight through changing the shape and orientation of their wings and tail. Their wing geometry is characterised by complex variation in parameters such as camber, twist, sweep and dihedral. To characterise this complexity, a multi-stereo photogrammetry setup was developed for accurately measuring surface geometry in high-resolution during free-flight. The natural patterning of the birds was used as the basis for phase correlation-based image matching, allowing indoor or outdoor use while being non-intrusive for the birds. The accuracy of the method was quantified and shown to be sufficient for characterising the geometric parameters of interest, but with a reduction in accuracy close to the wing edge and in some localized regions. To demonstrate the method's utility, surface reconstructions are presented for a barn owl (Tyto alba) and peregrine falcon (Falco peregrinus) during three instants of gliding flight per bird. The barn owl flew with a consistent geometry, with positive wing camber and longitudinal anhedral. Based on flight dynamics theory this suggests it was longitudinally statically unstable during these flights. The peregrine flew with a consistent glide angle, but at a range of airspeeds with varying geometry. Unlike the barn owl, its glide configuration did not provide a clear indication of longitudinal static stability/instability. Aspects of the geometries adopted by both birds appeared to be related to control corrections and this method would be well suited for future investigations in this area, as well as for other quantitative studies into avian flight dynamics.Flight O1 - original uncompressed tif images for flight O1 of the barn owlO1_images.zipFlight O2 - original uncompressed tif images for flight O2 of the barn owlO2_images.zipFlight O3 - original uncompressed tif images for flight O3 of the barn owlO3_images.zipFlight P1 - original uncompressed tif images for flight P1 of the peregrineP1_images.zipFlight P2 - original uncompressed tif images for flight P2 of the peregrineP2_images.zipFlight P3 - original uncompressed tif images for flight P3 of the peregrineP3_images.zipREADM

    Cubic-panorama image dataset analysis for storage and transmission

    Full text link

    Técnicas de coste reducido para el posicionamiento del paciente en radioterapia percutánea utilizando un sistema de imágenes ópticas

    Get PDF
    Patient positioning is an important part of radiation therapy which is one of the main solutions for the treatment of malignant tissue in the human body. Currently, the most common patient positioning methods expose healthy tissue of the patient's body to extra dangerous radiations. Other non-invasive positioning methods are either not very accurate or are very costly for an average hospital. In this thesis, we explore the possibility of developing a system comprised of affordable hardware and advanced computer vision algorithms that facilitates patient positioning. Our algorithms are based on the usage of affordable RGB-D sensors, image features, ArUco planar markers, and other geometry registration methods. Furthermore, we take advantage of consumer-level computing hardware to make our systems widely accessible. More specifically, we avoid the usage of approaches that need to take advantage of dedicated GPU hardware for general-purpose computing since they are more costly. In different publications, we explore the usage of the mentioned tools to increase the accuracy of reconstruction/localization of the patient in its pose. We also take into account the visualization of the patient's target position with respect to their current position in order to assist the person who performs patient positioning. Furthermore, we make usage of augmented reality in conjunction with a real-time 3D tracking algorithm for better interaction between the program and the operator. We also solve more fundamental problems about ArUco markers that could be used in the future to improve our systems. These include highquality multi-camera calibration and mapping using ArUco markers plus detection of these markers in event cameras which are very useful in the presence of fast camera movement. In the end, we conclude that it is possible to increase the accuracy of 3D reconstruction and localization by combining current computer vision algorithms with fiducial planar markers with RGB-D sensors. This is reflected in the low amount of error we have achieved in our experiments for patient positioning, pushing forward the state of the art for this application.En el tratamiento de tumores malignos en el cuerpo, el posicionamiento del paciente en las sesiones de radioterapia es una cuestión crucial. Actualmente, los métodos más comunes de posicionamiento del paciente exponen tejido sano del mismo a radiaciones peligrosas debido a que no es posible asegurar que la posición del paciente siempre sea la misma que la que tuvo cuando se planificó la zona a radiar. Los métodos que se usan actualmente, o no son precisos o tienen costes que los hacen inasequibles para ser usados en hospitales con financiación limitada. En esta Tesis hemos analizado la posibilidad de desarrollar un sistema compuesto por hardware de bajo coste y métodos avanzados de visión por ordenador que ayuden a que el posicionamiento del paciente sea el mismo en las diferentes sesiones de radioterapia, con respecto a su pose cuando fue se planificó la zona a radiar. La solución propuesta como resultado de la Tesis se basa en el uso de sensores RGB-D, características extraídas de la imagen, marcadores cuadrados denominados ArUco y métodos de registro de la geometría en la imagen. Además, en la solución propuesta, se aprovecha la existencia de hardware convencional de bajo coste para hacer nuestro sistema ampliamente accesible. Más específicamente, evitamos el uso de enfoques que necesitan aprovechar GPU, de mayores costes, para computación de propósito general. Se han obtenido diferentes publicaciones para conseguir el objetivo final. Las mismas describen métodos para aumentar la precisión de la reconstrucción y la localización del paciente en su pose, teniendo en cuenta la visualización de la posición ideal del paciente con respecto a su posición actual, para ayudar al profesional que realiza la colocación del paciente. También se han propuesto métodos de realidad aumentada junto con algoritmos para seguimiento 3D en tiempo real para conseguir una mejor interacción entre el sistema ideado y el profesional que debe realizar esa labor. De forma añadida, también se han propuesto soluciones para problemas fundamentales relacionados con el uso de marcadores cuadrados que han sido utilizados para conseguir el objetivo de la Tesis. Las soluciones propuestas pueden ser empleadas en el futuro para mejorar otros sistemas. Los problemas citados incluyen la calibración y el mapeo multicámara de alta calidad utilizando los marcadores y la detección de estos marcadores en cámaras de eventos, que son muy útiles en presencia de movimientos rápidos de la cámara. Al final, concluimos que es posible aumentar la precisión de la reconstrucción y localización en 3D combinando los actuales algoritmos de visión por ordenador, que usan marcadores cuadrados de referencia, con sensores RGB-D. Los resultados obtenidos con respecto al error que el sistema obtiene al reproducir el posicionamiento del paciente suponen un importante avance en el estado del arte de este tópico

    Learning-based depth and pose prediction for 3D scene reconstruction in endoscopy

    Get PDF
    Colorectal cancer is the third most common cancer worldwide. Early detection and treatment of pre-cancerous tissue during colonoscopy is critical to improving prognosis. However, navigating within the colon and inspecting the endoluminal tissue comprehensively are challenging, and success in both varies based on the endoscopist's skill and experience. Computer-assisted interventions in colonoscopy show much promise in improving navigation and inspection. For instance, 3D reconstruction of the colon during colonoscopy could promote more thorough examinations and increase adenoma detection rates which are associated with improved survival rates. Given the stakes, this thesis seeks to advance the state of research from feature-based traditional methods closer to a data-driven 3D reconstruction pipeline for colonoscopy. More specifically, this thesis explores different methods that improve subtasks of learning-based 3D reconstruction. The main tasks are depth prediction and camera pose estimation. As training data is unavailable, the author, together with her co-authors, proposes and publishes several synthetic datasets and promotes domain adaptation models to improve applicability to real data. We show, through extensive experiments, that our depth prediction methods produce more robust results than previous work. Our pose estimation network trained on our new synthetic data outperforms self-supervised methods on real sequences. Our box embeddings allow us to interpret the geometric relationship and scale difference between two images of the same surface without the need for feature matches that are often unobtainable in surgical scenes. Together, the methods introduced in this thesis help work towards a complete, data-driven 3D reconstruction pipeline for endoscopy

    Using encoder-decoder architecture for material segmentation based on beam profile analysis

    Get PDF
    Abstract. Recognition and segmentation of materials has proven to be a challenging problem because of the wide divergence in appearance within and between categories. Many recent material segmentation approaches treat materials as yet another set of labels like objects. However, materials are basically different from objects as they have no basic shape or defined spatial extent. Our approach roughly ignores this and can primarily take advantage of limited implicit context (local appearance) as it seems during training, because our training images that almost do not have a global image context; such as (I) where the used materials have no inherent shape or defined spatial extent like apple, orange and potato approximately have the same spherical shape; (II) besides, images where taken under a black background, which roughly removes the spatial features of the materials. We introduce a new materials segmentation dataset, which was taken with a Beam Profile Analysis sensing device. The dataset contains 10 material categories, and it has image pair samples consisting of grayscale images with and without the laser spots (grayscale and laser images) in addition to annotated segmented images. To the best of our knowledge, this is the first material segmentation dataset for Beam Profile Analysis images. As a second step, we proposed a deep learning approach to perform material segmentation on our dataset; our proposed CNNs is an encoder-decoder model, which is based on the DeeplabV3+ model. Our main goal is to obtain segmented material maps and discover how the laser spots contribute to the segmentation results; therefore, we perform a comparative analysis across different types of architectures to observe how the laser spots contribute to the whole segmentation. We built our experiments on three main types of models that use a different type of input; for each model, we implemented various types of backbone architectures. Our experiments results show that the laser spots have an efficient contribution on the segmentation results. GrayLaser model achieves a significant accuracy improvement compared to other models, where the fine-tuned architecture of this model has reached an accuracy of 94% over MIoU metric, and one trained from the scratch has reached an accuracy of 62% over MIoU
    corecore