1,112 research outputs found

    Integrating Shape-from-Shading & Stereopsis

    Get PDF
    This thesis is concerned with inferring scene shape by combining two specifictechniques: shape-from-shading and stereopsis. Shape-from-shading calculates shape using the lighting equation, which takes surface orientation and lighting information to irradiance. As irradiance and lighting information are provided this is the problem of inverting a many to one function to get surface orientation. Surface orientation may be integrated to get depth. Stereopsismatches pixels between two images taken from different locations of the same scene - this is the correspondence problem. Depth can then be calculated using camera calibration information, via triangulation. These methods both fail for certain inputs; the advantage of combining them is that where one fails the other may continue to work. Notably, shape-from-shading requires a smoothly shaded surface, without texture, whilst stereopsis requires texture - each works where the other does not. The first work of this thesis tackles the problem directly. A novel modular solution is proposed to combine both methods; combining is itself done using Gaussian belief propagation. This modular approach highlights missing and weak modules; the rest of the thesis is then concerned with providing a new module and an improved module. The improved module is given in the second research chapter and consists of a new shape-from-shading algorithm. It again uses belief propagation, but this time with directional statistics to represent surface orientation. Message passing is performed using a novel method; it is analytical, which makes this algorithm particularly fast. In the final research chapter a new module is provided, to estimate the light source direction. Without such a modulethe user of the system has to provide it; this is tedious and error prone, andimpedes automation. It is a probabilistic method that uniquely estimates the light source direction using a stereo pair as input

    Vision-based and marker-less surgical tool detection and tracking: a review of the literature

    Get PDF
    In recent years, tremendous progress has been made in surgical practice for example with Minimally Invasive Surgery (MIS). To overcome challenges coming from deported eye-to-hand manipulation, robotic and computer-assisted systems have been developed. Having real-time knowledge of the pose of surgical tools with respect to the surgical camera and underlying anatomy is a key ingredient for such systems. In this paper, we present a review of the literature dealing with vision-based and marker-less surgical tool detection. This paper includes three primary contributions: (1) identification and analysis of data-sets used for developing and testing detection algorithms, (2) in-depth comparison of surgical tool detection methods from the feature extraction process to the model learning strategy and highlight existing shortcomings, and (3) analysis of validation techniques employed to obtain detection performance results and establish comparison between surgical tool detectors. The papers included in the review were selected through PubMed and Google Scholar searches using the keywords: “surgical tool detection”, “surgical tool tracking”, “surgical instrument detection” and “surgical instrument tracking” limiting results to the year range 2000 2015. Our study shows that despite significant progress over the years, the lack of established surgical tool data-sets, and reference format for performance assessment and method ranking is preventing faster improvement

    Colour videos with depth : acquisition, processing and evaluation

    Get PDF
    The human visual system lets us perceive the world around us in three dimensions by integrating evidence from depth cues into a coherent visual model of the world. The equivalent in computer vision and computer graphics are geometric models, which provide a wealth of information about represented objects, such as depth and surface normals. Videos do not contain this information, but only provide per-pixel colour information. In this dissertation, I hence investigate a combination of videos and geometric models: videos with per-pixel depth (also known as RGBZ videos). I consider the full life cycle of these videos: from their acquisition, via filtering and processing, to stereoscopic display. I propose two approaches to capture videos with depth. The first is a spatiotemporal stereo matching approach based on the dual-cross-bilateral grid – a novel real-time technique derived by accelerating a reformulation of an existing stereo matching approach. This is the basis for an extension which incorporates temporal evidence in real time, resulting in increased temporal coherence of disparity maps – particularly in the presence of image noise. The second acquisition approach is a sensor fusion system which combines data from a noisy, low-resolution time-of-flight camera and a high-resolution colour video camera into a coherent, noise-free video with depth. The system consists of a three-step pipeline that aligns the video streams, efficiently removes and fills invalid and noisy geometry, and finally uses a spatiotemporal filter to increase the spatial resolution of the depth data and strongly reduce depth measurement noise. I show that these videos with depth empower a range of video processing effects that are not achievable using colour video alone. These effects critically rely on the geometric information, like a proposed video relighting technique which requires high-quality surface normals to produce plausible results. In addition, I demonstrate enhanced non-photorealistic rendering techniques and the ability to synthesise stereoscopic videos, which allows these effects to be applied stereoscopically. These stereoscopic renderings inspired me to study stereoscopic viewing discomfort. The result of this is a surprisingly simple computational model that predicts the visual comfort of stereoscopic images. I validated this model using a perceptual study, which showed that it correlates strongly with human comfort ratings. This makes it ideal for automatic comfort assessment, without the need for costly and lengthy perceptual studies

    Griff-in-die-Kiste - Neue Ansätze für ein klassisches Problem

    Get PDF
    The automation of handling tasks has been an important scientific topic since the development of the first industrial robots. The first step in the chain of scientific challenges to be solved is the automatic grasping of objects. One of the most famous examples in this context is the well known ”bin-picking” problem. To pick up objects, scrambled in a box is an easy task for humans, but its automation is very complex. Besides the localization of the object, meaning the estimation of the object’s pose (orientation and position), it has to be ensured that a collision free path can be found to safely grasp the objects. For over 50 years, researchers have published approaches towards generic solutions to this problem, but unfortunately no industry applicable, generic system has been developed yet. In this thesis, three different approaches to solve the bin-picking problem are described. More precisely, different solutions to the pose estimation problem are introduced, each paired with additional functionalities to complete it for application in a bin-picking station. It is described, how modern sensors can be used for efficient bin-picking as well as how classic sensor concepts can be applied for novel bin-picking techniques. Three complete systems are described and compared. First, 3D point clouds, generated using a laser scanner, are used as basis. Employing the known Random Sample Matching algorithm and modifications of it, paired with a very efficient depth map based collision avoidance mechanism results in a very robust bin-picking approach. In the second approach, all computations are done on depth maps. This allows the use of 2D image analysis techniques to fulfill the tasks and results in real time data analysis. Combined with force/torque and acceleration sensors, a near time optimal bin-picking system emerges. As a third option, surface normal maps are employed as a basis for pose estimation. In contrast to known approaches, the normal maps are not used for 3D data computation but directly for the object localization problem. This enables the application of a new class of sensors for bin-picking. All three methods are compared and advantages and disadvantages of each approach are discussed.Das automatisierte Handling von Objekten ist seit Entwicklung der ersten Roboter ein Forschungsthema. Der erste Schritt in diese Richtung ist das automatische Greifen von Objekten. Eines der berühmtesten Probleme in diesem Zusammenhang ist der "Griff-in-die-Kiste", oder "Bin-Picking". Frei angeordnete Objekte (Schüttgut) aus einer Kiste zu entnehmen stellt für Menschen keine schwierige Aufgabe dar, ist jedoch extrem komplex zu automatisieren. Neben der Objektlokalisierung, also dem Bestimmen der Position und der Orientierung, der Pose, des Objekts muss hier auch gewährleistet werden, dass eine kollisionsfreie Interaktion des Roboters mit dem Objekt möglich ist. Seit mehr als 50 Jahren veröffentlichen Forscher Ansätze, um einer generischen Lösung dieses Problems näher zu kommen. Dennoch ist Bin-Picking auch heute noch nicht vollständig gelöst. Diese Arbeit beschreibt daher drei neue, unterschiedliche Konzepte um das Bin-Picking-Problem zu lösen. Genauer gesagt werden Verfahren vorgestellt, die auf Basis unterschiedlicher Daten Objekte lokalisieren können. Die Arbeit beschreibt, wie moderne optische Sensoren effizient für das Bin-Picking eingesetzt werden können, aber auch, dass klassische Sensorkonzepte neuartige und effiziente Lösungen ermöglichen. Drei Systeme werden beschrieben und verglichen. Zunächst werden per 3D-Scanner aufgenommene Punktwolken als Basis genutzt und mittels Random Sample Matching Objektposen extrahiert. Die Kollisionsvermeidungsstrategie basiert auf Tiefenbildern, was die Berechnung sehr effizient macht. Als zweites wird die Lokalisierung direkt auf Tiefenbildern gerechnet. Dies ermöglicht den direkten Einsatz von 2d Bildverarbeitungsmethoden, was eine Greifposenbestimmung in Echtzeit ermöglicht. Verbunden mit Kraft-Momentensensorik entsteht so ein nahezu zeitoptimales Bin-Picking-System. Als dritte Möglichkeit werden Oberflächennormalenkarten als Basis zur Objektlokalisierung verwendet. Im Gegensatz zu herkömmlichen Ansätzen aus der Literatur werden diese Karten nicht zu 3d Daten umgerechnet sondern direkt zur Posenschätzung genutzt. Dies ermöglicht den Einsatz einer Klasse von Sensoren zum Bin-Picking die bisher nur in anderen Gebieten genutzt werden konnte. Alle drei Methoden werden miteinander verglichen und Vor- sowie Nachteile beleuchtet

    Merging the Real and the Virtual: An Exploration of Interaction Methods to Blend Realities

    Get PDF
    We investigate, build, and design interaction methods to merge the real with the virtual. An initial investigation looks at spatial augmented reality (SAR) and its effects on pointing with a real mobile phone. A study reveals a set of trade-offs between the raycast, viewport, and direct pointing techniques. To further investigate the manipulation of virtual content within a SAR environment, we design an interaction technique that utilizes the distance that a user holds mobile phone away from their body. Our technique enables pushing virtual content from a mobile phone to an external SAR environment, interact with that content, rotate-scale-translate it, and pull the content back into the mobile phone. This is all done in a way that ensures seamless transitions between the real environment of the mobile phone and the virtual SAR environment. To investigate the issues that occur when the physical environment is hidden by a fully immersive virtual reality (VR) HMD, we design and investigate a system that merges a realtime 3D reconstruction of the real world with a virtual environment. This allows users to freely move, manipulate, observe, and communicate with people and objects situated in their physical reality without losing their sense of immersion or presence inside a virtual world. A study with VR users demonstrates the affordances provided by the system and how it can be used to enhance current VR experiences. We then move to AR, to investigate the limitations of optical see-through HMDs and the problem of communicating the internal state of the virtual world with unaugmented users. To address these issues and enable new ways to visualize, manipulate, and share virtual content, we propose a system that combines a wearable SAR projector. Demonstrations showcase ways to utilize the projected and head-mounted displays together, such as expanding field of view, distributing content across depth surfaces, and enabling bystander collaboration. We then turn to videogames to investigate how spectatorship of these virtual environments can be enhanced through expanded video rendering techniques. We extract and combine additional data to form a cumulative 3D representation of the live game environment for spectators, which enables each spectator to individually control a personal view into the stream while in VR. A study shows that users prefer spectating in VR when compared with a comparable desktop rendering

    REAL-TIME 4D ULTRASOUND RECONSTRUCTION FOR IMAGE-GUIDED INTRACARDIAC INTERVENTIONS

    Get PDF
    Image-guided therapy addresses the lack of direct vision associated with minimally- invasive interventions performed on the beating heart, but requires effective intraoperative imaging. Gated 4D ultrasound reconstruction using a tracked 2D probe generates a time-series of 3D images representing the beating heart over the cardiac cycle. These images have a relatively high spatial resolution and wide field of view, and ultrasound is easily integrated into the intraoperative environment. This thesis presents a real-time 4D ultrasound reconstruction system incorporated within an augmented reality environment for surgical guidance, whose incremental visualization reduces common acquisition errors. The resulting 4D ultrasound datasets are intended for visualization or registration to preoperative images. A human factors experiment demonstrates the advantages of real-time ultrasound reconstruction, and accuracy assessments performed both with a dynamic phantom and intraoperatively reveal RMS localization errors of 2.5-2.7 mm, and 0.8 mm, respectively. Finally, clinical applicability is demonstrated by both porcine and patient imaging

    Medical image synthesis using generative adversarial networks: towards photo-realistic image synthesis

    Full text link
    This proposed work addresses the photo-realism for synthetic images. We introduced a modified generative adversarial network: StencilGAN. It is a perceptually-aware generative adversarial network that synthesizes images based on overlaid labelled masks. This technique can be a prominent solution for the scarcity of the resources in the healthcare sector
    • …
    corecore