160 research outputs found

    The development of a hybrid virtual reality/video view-morphing display system for teleoperation and teleconferencing

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, System Design & Management Program, 2000.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 84-89).The goal of this study is to extend the desktop panoramic static image viewer concept (e.g., Apple QuickTime VR; IPIX) to support immersive real time viewing, so that an observer wearing a head-mounted display can make free head movements while viewing dynamic scenes rendered in real time stereo using video data obtained from a set of fixed cameras. Computational experiments by Seitz and others have demonstrated the feasibility of morphing image pairs to render stereo scenes from novel, virtual viewpoints. The user can interact both with morphed real world video images, and supplementary artificial virtual objects (“Augmented Reality”). The inherent congruence of the real and artificial coordinate frames of this system reduces registration errors commonly found in Augmented Reality applications. In addition, the user’s eyepoint is computed locally so that any scene lag resulting from head movement will be less than those from alternative technologies using remotely controlled ground cameras. For space applications, this can significantly reduce the apparent lag due to satellite communication delay. This hybrid VR/view-morphing display (“Virtual Video”) has many important NASA applications including remote teleoperation, crew onboard training, private family and medical teleconferencing, and telemedicine. The technical objective of this study developed a proof-of-concept system using a 3D graphics PC workstation of one of the component technologies, Immersive Omnidirectional Video, of Virtual Video. The management goal identified a system process for planning, managing, and tracking the integration, test and validation of this phased, 3-year multi-university research and development program.by William E. Hutchison.S.M

    Bio-Inspired Multi-Spectral Image Sensor and Augmented Reality Display for Near-Infrared Fluorescence Image-Guided Surgery

    Get PDF
    Background: Cancer remains a major public health problem worldwide and poses a huge economic burden. Near-infrared (NIR) fluorescence image-guided surgery (IGS) utilizes molecular markers and imaging instruments to identify and locate tumors during surgical resection. Unfortunately, current state-of-the-art NIR fluorescence imaging systems are bulky, costly, and lack both fluorescence sensitivity under surgical illumination and co-registration accuracy between multimodal images. Additionally, the monitor-based display units are disruptive to the surgical workflow and are suboptimal at indicating the 3-dimensional position of labeled tumors. These major obstacles have prevented the wide acceptance of NIR fluorescence imaging as the standard of care for cancer surgery. The goal of this dissertation is to enhance cancer treatment by developing novel image sensors and presenting the information using holographic augmented reality (AR) display to the physician in intraoperative settings. Method: By mimicking the visual system of the Morpho butterfly, several single-chip, color-NIR fluorescence image sensors and systems were developed with CMOS technologies and pixelated interference filters. Using a holographic AR goggle platform, an NIR fluorescence IGS display system was developed. Optoelectronic evaluation was performed on the prototypes to evaluate the performance of each component, and small animal models and large animal models were used to verify the overall effectiveness of the integrated systems at cancer detection. Result: The single-chip bio-inspired multispectral logarithmic image sensor I developed has better main performance indicators than the state-of-the-art NIR fluorescence imaging instruments. The image sensors achieve up to 140 dB dynamic range. The sensitivity under surgical illumination achieves 6108 V/(mW/cm2), which is up to 25 times higher. The signal-to-noise ratio is up to 56 dB, which is 11 dB greater. These enable high sensitivity fluorescence imaging under surgical illumination. The pixelated interference filters enable temperature-independent co-registration accuracy between multimodal images. Pre-clinical trials with small animal model demonstrate that the sensor can achieve up to 95% sensitivity and 94% specificity with tumor-targeted NIR molecular probes. The holographic AR goggle provides the physician with a non-disruptive 3-dimensional display in the clinical setup. This is the first display system that co-registers a virtual image with human eyes and allows video rate image transmission. The imaging system is tested in the veterinary science operating room on canine patients with naturally occurring cancers. In addition, a time domain pulse-width-modulation address-event-representation multispectral image sensor and a handheld multispectral camera prototype are developed. Conclusion: The major problems of current state-of-the-art NIR fluorescence imaging systems are successfully solved. Due to enhanced performance and user experience, the bio-inspired sensors and augmented reality display system will give medical care providers much needed technology to enable more accurate value-based healthcare

    Toward Automated Aerial Refueling: Relative Navigation with Structure from Motion

    Get PDF
    The USAF\u27s use of UAS has expanded from reconnaissance to hunter/killer missions. As the UAS mission further expands into aerial combat, better performance and larger payloads will have a negative correlation with range and loiter times. Additionally, the Air Force Future Operating Concept calls for \formations of uninhabited refueling aircraft...[that] enable refueling operations partway inside threat areas. However, a lack of accurate relative positioning information prevents the ability to safely maintain close formation flight and contact between a tanker and a UAS. The inclusion of cutting edge vision systems on present refueling platforms may provide the information necessary to support a AAR mission by estimating the position of a trailing aircraft to provide inputs to a UAS controller capable of maintaining a given position. This research examines the ability of SfM to generate relative navigation information. Previous AAR research efforts involved the use of differential GPS, LiDAR, and vision systems. This research aims to leverage current and future imaging technology to compliment these solutions. The algorithm used in this thesis generates a point cloud by determining 3D structure from a sequence of 2D images. The algorithm then utilizes PCA to register the point cloud to a reference model. The algorithm was tested in a real world environment using a 1:7 scale F-15 model. Additionally, this thesis studies common 3D rigid registration algorithms in an effort characterize their performance in the AAR domain. Three algorithms are tested for runtime and registration accuracy with four data sets

    3D Human Face Reconstruction and 2D Appearance Synthesis

    Get PDF
    3D human face reconstruction has been an extensive research for decades due to its wide applications, such as animation, recognition and 3D-driven appearance synthesis. Although commodity depth sensors are widely available in recent years, image based face reconstruction are significantly valuable as images are much easier to access and store. In this dissertation, we first propose three image-based face reconstruction approaches according to different assumption of inputs. In the first approach, face geometry is extracted from multiple key frames of a video sequence with different head poses. The camera should be calibrated under this assumption. As the first approach is limited to videos, we propose the second approach then focus on single image. This approach also improves the geometry by adding fine grains using shading cue. We proposed a novel albedo estimation and linear optimization algorithm in this approach. In the third approach, we further loose the constraint of the input image to arbitrary in the wild images. Our proposed approach can robustly reconstruct high quality model even with extreme expressions and large poses. We then explore the applicability of our face reconstructions on four interesting applications: video face beautification, generating personalized facial blendshape from image sequences, face video stylizing and video face replacement. We demonstrate great potentials of our reconstruction approaches on these real-world applications. In particular, with the recent surge of interests in VR/AR, it is increasingly common to see people wearing head-mounted displays. However, the large occlusion on face is a big obstacle for people to communicate in a face-to-face manner. Our another application is that we explore hardware/software solutions for synthesizing the face image with presence of HMDs. We design two setups (experimental and mobile) which integrate two near IR cameras and one color camera to solve this problem. With our algorithm and prototype, we can achieve photo-realistic results. We further propose a deep neutral network to solve the HMD removal problem considering it as a face inpainting problem. This approach doesn\u27t need special hardware and run in real-time with satisfying results

    Semi-automatic 3D reconstruction of urban areas using epipolar geometry and template matching

    Get PDF
    WOS:000240143800002 (Nº de Acesso Web of Science)In this work we describe a novel technique for semi-automatic three-dimensional (3D) reconstruction of urban areas, from airborne stereo-pair images whose output is VRML or DXF. The main challenge is to compute the relevant information—building's height and volume, roof's description, and texture—algorithmically, because it is very time consuming and thus expensive to produce it manually for large urban areas. The algorithm requires some initial calibration input and is able to compute the above-mentioned building characteristics from the stereo pair and the availability of the 2D CAD and the digital elevation model of the same area, with no knowledge of the camera pose or its intrinsic parameters. To achieve this, we have used epipolar geometry, homography computation, automatic feature extraction and we have solved the feature correspondence problem in the stereo pair, by using template matching

    Inverse rendering for scene reconstruction in general environments

    Get PDF
    Demand for high-quality 3D content has been exploding recently, owing to the advances in 3D displays and 3D printing. However, due to insufficient 3D content, the potential of 3D display and printing technology has not been realized to its full extent. Techniques for capturing the real world, which are able to generate 3D models from captured images or videos, are a hot research topic in computer graphics and computer vision. Despite significant progress, many methods are still highly constrained and require lots of prerequisites to succeed. Marker-less performance capture is one such dynamic scene reconstruction technique that is still confined to studio environments. The requirements involved, such as the need for a multi-view camera setup, specially engineered lighting or green-screen backgrounds, prevent these methods from being widely used by the film industry or even by ordinary consumers. In the area of scene reconstruction from images or videos, this thesis proposes new techniques that succeed in general environments, even using as few as two cameras. Contributions are made in terms of reducing the constraints of marker-less performance capture on lighting, background and the required number of cameras. The primary theoretical contribution lies in the investigation of light transport mechanisms for high-quality 3D reconstruction in general environments. Several steps are taken to approach the goal of scene reconstruction in general environments. At first, the concept of employing inverse rendering for scene reconstruction is demonstrated on static scenes, where a high-quality multi-view 3D reconstruction method under general unknown illumination is developed. Then, this concept is extended to dynamic scene reconstruction from multi-view video, where detailed 3D models of dynamic scenes can be captured under general and even varying lighting, and in front of a general scene background without a green screen. Finally, efforts are made to reduce the number of cameras employed. New performance capture methods using as few as two cameras are proposed to capture high-quality 3D geometry in general environments, even outdoors.Die Nachfrage nach qualitativ hochwertigen 3D Modellen ist in letzter Zeit, bedingt durch den technologischen Fortschritt bei 3D-Wieder-gabegeräten und -Druckern, stark angestiegen. Allerdings konnten diese Technologien wegen mangelnder Inhalte nicht ihr volles Potential entwickeln. Methoden zur Erfassung der realen Welt, welche 3D-Modelle aus Bildern oder Videos generieren, sind daher ein brandaktuelles Forschungsthema im Bereich Computergrafik und Bildverstehen. Trotz erheblichen Fortschritts in dieser Richtung sind viele Methoden noch stark eingeschränkt und benötigen viele Voraussetzungen um erfolgreich zu sein. Markerloses Performance Capturing ist ein solches Verfahren, das dynamische Szenen rekonstruiert, aber noch auf Studio-Umgebungen beschränkt ist. Die spezifischen Anforderung solcher Verfahren, wie zum Beispiel einen Mehrkameraaufbau, maßgeschneiderte, kontrollierte Beleuchtung oder Greenscreen-Hintergründe verhindern die Verbreitung dieser Verfahren in der Filmindustrie und besonders bei Endbenutzern. Im Bereich der Szenenrekonstruktion aus Bildern oder Videos schlägt diese Dissertation neue Methoden vor, welche in beliebigen Umgebungen und auch mit nur wenigen (zwei) Kameras funktionieren. Dazu werden Schritte unternommen, um die Einschränkungen bisheriger Verfahren des markerlosen Performance Capturings im Hinblick auf Beleuchtung, Hintergründe und die erforderliche Anzahl von Kameras zu verringern. Der wichtigste theoretische Beitrag liegt in der Untersuchung von Licht-Transportmechanismen für hochwertige 3D-Rekonstruktionen in beliebigen Umgebungen. Dabei werden mehrere Schritte unternommen, um das Ziel der Szenenrekonstruktion in beliebigen Umgebungen anzugehen. Zunächst wird die Anwendung von inversem Rendering auf die Rekonstruktion von statischen Szenen dargelegt, indem ein hochwertiges 3D-Rekonstruktionsverfahren aus Mehransichtsaufnahmen unter beliebiger, unbekannter Beleuchtung entwickelt wird. Dann wird dieses Konzept auf die dynamische Szenenrekonstruktion basierend auf Mehransichtsvideos erweitert, wobei detaillierte 3D-Modelle von dynamischen Szenen unter beliebiger und auch veränderlicher Beleuchtung vor einem allgemeinen Hintergrund ohne Greenscreen erfasst werden. Schließlich werden Anstrengungen unternommen die Anzahl der eingesetzten Kameras zu reduzieren. Dazu werden neue Verfahren des Performance Capturings, unter Verwendung von lediglich zwei Kameras vorgeschlagen, um hochwertige 3D-Geometrie im beliebigen Umgebungen, sowie im Freien, zu erfassen

    INFORMATION TECHNOLOGY FOR NEXT-GENERATION OF SURGICAL ENVIRONMENTS

    Get PDF
    Minimally invasive surgeries (MIS) are fundamentally constrained by image quality,access to the operative field, and the visualization environment on which thesurgeon relies for real-time information. Although invasive access benefits the patient,it also leads to more challenging procedures, which require better skills andtraining. Endoscopic surgeries rely heavily on 2D interfaces, introducing additionalchallenges due to the loss of depth perception, the lack of 3-Dimensional imaging,and the reduction of degrees of freedom.By using state-of-the-art technology within a distributed computational architecture,it is possible to incorporate multiple sensors, hybrid display devices, and3D visualization algorithms within a exible surgical environment. Such environmentscan assist the surgeon with valuable information that goes far beyond what iscurrently available. In this thesis, we will discuss how 3D visualization and reconstruction,stereo displays, high-resolution display devices, and tracking techniques arekey elements in the next-generation of surgical environments

    Videos in Context for Telecommunication and Spatial Browsing

    Get PDF
    The research presented in this thesis explores the use of videos embedded in panoramic imagery to transmit spatial and temporal information describing remote environments and their dynamics. Virtual environments (VEs) through which users can explore remote locations are rapidly emerging as a popular medium of presence and remote collaboration. However, capturing visual representation of locations to be used in VEs is usually a tedious process that requires either manual modelling of environments or the employment of specific hardware. Capturing environment dynamics is not straightforward either, and it is usually performed through specific tracking hardware. Similarly, browsing large unstructured video-collections with available tools is difficult, as the abundance of spatial and temporal information makes them hard to comprehend. At the same time, on a spectrum between 3D VEs and 2D images, panoramas lie in between, as they offer the same 2D images accessibility while preserving 3D virtual environments surrounding representation. For this reason, panoramas are an attractive basis for videoconferencing and browsing tools as they can relate several videos temporally and spatially. This research explores methods to acquire, fuse, render and stream data coming from heterogeneous cameras, with the help of panoramic imagery. Three distinct but interrelated questions are addressed. First, the thesis considers how spatially localised video can be used to increase the spatial information transmitted during video mediated communication, and if this improves quality of communication. Second, the research asks whether videos in panoramic context can be used to convey spatial and temporal information of a remote place and the dynamics within, and if this improves users' performance in tasks that require spatio-temporal thinking. Finally, the thesis considers whether there is an impact of display type on reasoning about events within videos in panoramic context. These research questions were investigated over three experiments, covering scenarios common to computer-supported cooperative work and video browsing. To support the investigation, two distinct video+context systems were developed. The first telecommunication experiment compared our videos in context interface with fully-panoramic video and conventional webcam video conferencing in an object placement scenario. The second experiment investigated the impact of videos in panoramic context on quality of spatio-temporal thinking during localization tasks. To support the experiment, a novel interface to video-collection in panoramic context was developed and compared with common video-browsing tools. The final experimental study investigated the impact of display type on reasoning about events. The study explored three adaptations of our video-collection interface to three display types. The overall conclusion is that videos in panoramic context offer a valid solution to spatio-temporal exploration of remote locations. Our approach presents a richer visual representation in terms of space and time than standard tools, showing that providing panoramic contexts to video collections makes spatio-temporal tasks easier. To this end, videos in context are suitable alternative to more difficult, and often expensive solutions. These findings are beneficial to many applications, including teleconferencing, virtual tourism and remote assistance

    Visual Human-Computer Interaction

    Get PDF
    corecore