886 research outputs found

    Comparative Study of Model-Based and Learning-Based Disparity Map Fusion Methods

    Get PDF
    Creating an accurate depth map has several, valuable applications including augmented/virtual reality, autonomous navigation, indoor/outdoor mapping, object segmentation, and aerial topography. Current hardware solutions for precise 3D scanning are relatively expensive. To combat hardware costs, software alternatives based on stereoscopic images have previously been proposed. However, software solutions are less accurate than hardware solutions, such as laser scanning, and are subject to a variety of irregularities. Notably, disparity maps generated from stereo images typically fall short in cases of occlusion, near object boundaries, and on repetitive texture regions or texture-less regions. Several post-processing methods are examined in an effort to combine strong algorithm results and alleviate erroneous disparity regions. These methods include basic statistical combinations, histogram-based voting, edge detection guidance, support vector machines (SVMs), and bagged trees. Individual errors and average errors are compared between the newly introduced fusion methods and the existing disparity algorithms. Several acceptable solutions are identified to bridge the gap between 3D scanning and stereo imaging. It is shown that fusing disparity maps can result in lower error rates than individual algorithms across the dataset while maintaining a high level of robustness

    Automated Visual Database Creation For A Ground Vehicle Simulator

    Get PDF
    This research focuses on extracting road models from stereo video sequences taken from a moving vehicle. The proposed method combines color histogram based segmentation, active contours (snakes) and morphological processing to extract road boundary coordinates for conversion into Matlab or Multigen OpenFlight compatible polygonal representations. Color segmentation uses an initial truth frame to develop a color probability density function (PDF) of the road versus the terrain. Subsequent frames are segmented using a Maximum Apostiori Probability (MAP) criteria and the resulting templates are used to update the PDFs. Color segmentation worked well where there was minimal shadowing and occlusion by other cars. A snake algorithm was used to find the road edges which were converted to 3D coordinates using stereo disparity and vehicle position information. The resulting 3D road models were accurate to within 1 meter

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    Stereo visual simultaneous localisation and mapping for an outdoor wheeled robot: a front-end study

    Get PDF
    For many mobile robotic systems, navigating an environment is a crucial step in autonomy and Visual Simultaneous Localisation and Mapping (vSLAM) has seen increased effective usage in this capacity. However, vSLAM is strongly dependent on the context in which it is applied, often using heuristic and special cases to provide efficiency and robustness. It is thus crucial to identify the important parameters and factors regarding a particular context as this heavily influences the necessary algorithms, processes, and hardware required for the best results. In this body of work, a generic front-end stereo vSLAM pipeline is tested in the context of a small-scale outdoor wheeled robot that occupies less than 1m3 of volume. The scale of the vehicle constrained the available processing power, Field Of View (FOV), actuation systems, and image distortions present. A dataset was collected with a custom platform that consisted of a Point Grey Bumblebee (Discontinued) stereo camera and Nvidia Jetson TK1 processor. A stereo front-end feature tracking framework was described and evaluated both in simulation and experimentally where appropriate. It was found that scale adversely affected lighting conditions, FOV, baseline, and processing power available, all crucial factors to improve upon. The stereo constraint was effective for robustness criteria, but ineffective in terms of processing power and metric reconstruction. An overall absolute odometer error of 0.25-3m was produced on the dataset but was unable to run in real-time

    Towards Highly-Integrated Stereovideoscopy for \u3ci\u3ein vivo\u3c/i\u3e Surgical Robots

    Get PDF
    When compared to traditional surgery, laparoscopic procedures result in better patient outcomes: shorter recovery, reduced post-operative pain, and less trauma to incisioned tissue. Unfortunately, laparoscopic procedures require specialized training for surgeons, as these minimally-invasive procedures provide an operating environment that has limited dexterity and limited vision. Advanced surgical robotics platforms can make minimally-invasive techniques safer and easier for the surgeon to complete successfully. The most common type of surgical robotics platforms -- the laparoscopic robots -- accomplish this with multi-degree-of-freedom manipulators that are capable of a diversified set of movements when compared to traditional laparoscopic instruments. Also, these laparoscopic robots allow for advanced kinematic translation techniques that allow the surgeon to focus on the surgical site, while the robot calculates the best possible joint positions to complete any surgical motion. An important component of these systems is the endoscopic system used to transmit a live view of the surgical environment to the surgeon. Coupled with 3D high-definition endoscopic cameras, the entirety of the platform, in effect, eliminates the peculiarities associated with laparoscopic procedures, which allows less-skilled surgeons to complete minimally-invasive surgical procedures quickly and accurately. A much newer approach to performing minimally-invasive surgery is the idea of using in-vivo surgical robots -- small robots that are inserted directly into the patient through a single, small incision; once inside, an in-vivo robot can perform surgery at arbitrary positions, with a much wider range of motion. While laparoscopic robots can harness traditional endoscopic video solutions, these in-vivo robots require a fundamentally different video solution that is as flexible as possible and free of bulky cables or fiber optics. This requires a miniaturized videoscopy system that incorporates an image sensor with a transceiver; because of severe size constraints, this system should be deeply embedded into the robotics platform. Here, early results are presented from the integration of a miniature stereoscopic camera into an in-vivo surgical robotics platform. A 26mm X 24mm stereo camera was designed and manufactured. The proposed device features USB connectivity and 1280 X 720 resolution at 30 fps. Resolution testing indicates the device performs much better than similarly-priced analog cameras. Suitability of the platform for 3D computer vision tasks -- including stereo reconstruction -- is examined. The platform was also tested in a living porcine model at the University of Nebraska Medical Center. Results from this experiment suggest that while the platform performs well in controlled, static environments, further work is required to obtain usable results in true surgeries. Concluding, several ideas for improvement are presented, along with a discussion of core challenges associated with the platform. Adviser: Lance C. PĂ©rez [Document = 28 Mb

    3D object reconstruction using computer vision : reconstruction and characterization applications for external human anatomical structures

    Get PDF
    Tese de doutoramento. Engenharia InformĂĄtica. Faculdade de Engenharia. Universidade do Porto. 201

    Perception of Unstructured Environments for Autonomous Off-Road Vehicles

    Get PDF
    Autonome Fahrzeuge benötigen die FĂ€higkeit zur Perzeption als eine notwendige Voraussetzung fĂŒr eine kontrollierbare und sichere Interaktion, um ihre Umgebung wahrzunehmen und zu verstehen. Perzeption fĂŒr strukturierte Innen- und Außenumgebungen deckt wirtschaftlich lukrative Bereiche, wie den autonomen Personentransport oder die Industrierobotik ab, wĂ€hrend die Perzeption unstrukturierter Umgebungen im Forschungsfeld der Umgebungswahrnehmung stark unterreprĂ€sentiert ist. Die analysierten unstrukturierten Umgebungen stellen eine besondere Herausforderung dar, da die vorhandenen, natĂŒrlichen und gewachsenen Geometrien meist keine homogene Struktur aufweisen und Ă€hnliche Texturen sowie schwer zu trennende Objekte dominieren. Dies erschwert die Erfassung dieser Umgebungen und deren Interpretation, sodass Perzeptionsmethoden speziell fĂŒr diesen Anwendungsbereich konzipiert und optimiert werden mĂŒssen. In dieser Dissertation werden neuartige und optimierte Perzeptionsmethoden fĂŒr unstrukturierte Umgebungen vorgeschlagen und in einer ganzheitlichen, dreistufigen Pipeline fĂŒr autonome GelĂ€ndefahrzeuge kombiniert: Low-Level-, Mid-Level- und High-Level-Perzeption. Die vorgeschlagenen klassischen Methoden und maschinellen Lernmethoden (ML) zur Perzeption bzw.~Wahrnehmung ergĂ€nzen sich gegenseitig. DarĂŒber hinaus ermöglicht die Kombination von Perzeptions- und Validierungsmethoden fĂŒr jede Ebene eine zuverlĂ€ssige Wahrnehmung der möglicherweise unbekannten Umgebung, wobei lose und eng gekoppelte Validierungsmethoden kombiniert werden, um eine ausreichende, aber flexible Bewertung der vorgeschlagenen Perzeptionsmethoden zu gewĂ€hrleisten. Alle Methoden wurden als einzelne Module innerhalb der in dieser Arbeit vorgeschlagenen Perzeptions- und Validierungspipeline entwickelt, und ihre flexible Kombination ermöglicht verschiedene Pipelinedesigns fĂŒr eine Vielzahl von GelĂ€ndefahrzeugen und AnwendungsfĂ€llen je nach Bedarf. Low-Level-Perzeption gewĂ€hrleistet eine eng gekoppelte Konfidenzbewertung fĂŒr rohe 2D- und 3D-Sensordaten, um SensorausfĂ€lle zu erkennen und eine ausreichende Genauigkeit der Sensordaten zu gewĂ€hrleisten. DarĂŒber hinaus werden neuartige Kalibrierungs- und RegistrierungsansĂ€tze fĂŒr Multisensorsysteme in der Perzeption vorgestellt, welche lediglich die Struktur der Umgebung nutzen, um die erfassten Sensordaten zu registrieren: ein halbautomatischer Registrierungsansatz zur Registrierung mehrerer 3D~Light Detection and Ranging (LiDAR) Sensoren und ein vertrauensbasiertes Framework, welches verschiedene Registrierungsmethoden kombiniert und die Registrierung verschiedener Sensoren mit unterschiedlichen Messprinzipien ermöglicht. Dabei validiert die Kombination mehrerer Registrierungsmethoden die Registrierungsergebnisse in einer eng gekoppelten Weise. Mid-Level-Perzeption ermöglicht die 3D-Rekonstruktion unstrukturierter Umgebungen mit zwei Verfahren zur SchĂ€tzung der DisparitĂ€t von Stereobildern: ein klassisches, korrelationsbasiertes Verfahren fĂŒr Hyperspektralbilder, welches eine begrenzte Menge an Test- und Validierungsdaten erfordert, und ein zweites Verfahren, welches die DisparitĂ€t aus Graustufenbildern mit neuronalen Faltungsnetzen (CNNs) schĂ€tzt. Neuartige DisparitĂ€tsfehlermetriken und eine Evaluierungs-Toolbox fĂŒr die 3D-Rekonstruktion von Stereobildern ergĂ€nzen die vorgeschlagenen Methoden zur DisparitĂ€tsschĂ€tzung aus Stereobildern und ermöglichen deren lose gekoppelte Validierung. High-Level-Perzeption konzentriert sich auf die Interpretation von einzelnen 3D-Punktwolken zur Befahrbarkeitsanalyse, Objekterkennung und Hindernisvermeidung. Eine DomĂ€nentransferanalyse fĂŒr State-of-the-art-Methoden zur semantischen 3D-Segmentierung liefert Empfehlungen fĂŒr eine möglichst exakte Segmentierung in neuen ZieldomĂ€nen ohne eine Generierung neuer Trainingsdaten. Der vorgestellte Trainingsansatz fĂŒr 3D-Segmentierungsverfahren mit CNNs kann die benötigte Menge an Trainingsdaten weiter reduzieren. Methoden zur ErklĂ€rbarkeit kĂŒnstlicher Intelligenz vor und nach der Modellierung ermöglichen eine lose gekoppelte Validierung der vorgeschlagenen High-Level-Methoden mit Datensatzbewertung und modellunabhĂ€ngigen ErklĂ€rungen fĂŒr CNN-Vorhersagen. Altlastensanierung und MilitĂ€rlogistik sind die beiden HauptanwendungsfĂ€lle in unstrukturierten Umgebungen, welche in dieser Arbeit behandelt werden. Diese Anwendungsszenarien zeigen auch, wie die LĂŒcke zwischen der Entwicklung einzelner Methoden und ihrer Integration in die Verarbeitungskette fĂŒr autonome GelĂ€ndefahrzeuge mit Lokalisierung, Kartierung, Planung und Steuerung geschlossen werden kann. Zusammenfassend lĂ€sst sich sagen, dass die vorgeschlagene Pipeline flexible Perzeptionslösungen fĂŒr autonome GelĂ€ndefahrzeuge bietet und die begleitende Validierung eine exakte und vertrauenswĂŒrdige Perzeption unstrukturierter Umgebungen gewĂ€hrleistet

    Object Detection with Deep Learning to Accelerate Pose Estimation for Automated Aerial Refueling

    Get PDF
    Remotely piloted aircraft (RPAs) cannot currently refuel during flight because the latency between the pilot and the aircraft is too great to safely perform aerial refueling maneuvers. However, an AAR system removes this limitation by allowing the tanker to directly control the RP A. The tanker quickly finding the relative position and orientation (pose) of the approaching aircraft is the first step to create an AAR system. Previous work at AFIT demonstrates that stereo camera systems provide robust pose estimation capability. This thesis first extends that work by examining the effects of the cameras\u27 resolution on the quality of pose estimation. Next, it demonstrates a deep learning approach to accelerate the pose estimation process. The results show that this pose estimation process is precise and fast enough to safely perform AAR

    Pedestrian detection and tracking using stereo vision techniques

    Get PDF
    Automated pedestrian detection, counting and tracking has received significant attention from the computer vision community of late. Many of the person detection techniques described so far in the literature work well in controlled environments, such as laboratory settings with a small number of people. This allows various assumptions to be made that simplify this complex problem. The performance of these techniques, however, tends to deteriorate when presented with unconstrained environments where pedestrian appearances, numbers, orientations, movements, occlusions and lighting conditions violate these convenient assumptions. Recently, 3D stereo information has been proposed as a technique to overcome some of these issues and to guide pedestrian detection. This thesis presents such an approach, whereby after obtaining robust 3D information via a novel disparity estimation technique, pedestrian detection is performed via a 3D point clustering process within a region-growing framework. This clustering process avoids using hard thresholds by using bio-metrically inspired constraints and a number of plan view statistics. This pedestrian detection technique requires no external training and is able to robustly handle challenging real-world unconstrained environments from various camera positions and orientations. In addition, this thesis presents a continuous detect-and-track approach, with additional kinematic constraints and explicit occlusion analysis, to obtain robust temporal tracking of pedestrians over time. These approaches are experimentally validated using challenging datasets consisting of both synthetic data and real-world sequences gathered from a number of environments. In each case, the techniques are evaluated using both 2D and 3D groundtruth methodologies

    Rectification Strategies for a Binary Coded Structured Light 3D Scanner

    Get PDF
    Making a computer able to see exactly as a human being does was for many years one of the most interesting and challenging tasks involving lots of experts and pioneers in fields such as Computer Science and Artificial Intelligence. As a result, a whole field called Computer Vision has emerged becoming very soon a part of our daily life. The successful methodologies of this discipline have been applied in countless areas of application and their use is still in continuous expansion. On the other hand, in an increasing number of applications extracting information from simple 2D images is not enough and what is more requested instead is to use three-dimensional imaging techniques in order to reconstruct the 3D shape of the imaged objects and scene. The techniques developed in this context include both active systems, where some form of illumination is projected onto the scene, and passive systems, where the natural illumination of the scene is used. Among the active systems, one of the most reliable approaches for recovering the surface of objects is the use of structured light. This technique is based on projecting a light pattern and viewing the illuminated scene from one or more points of view. Since the pattern is coded, correspondences between image points and points of the projected pattern can be easily found. In particular, the performances of this kind of 3D scanner are determined by two key aspects, the accuracy and the acquisition time. This thesis aims to design and experiment some rectification strategies for a prototype of binary coded structured light 3D scanner. The rectification is a commonly used technique for stereo vision systems which, in case of structured light, facilitates the establishment of correspondences across a projected pattern and an acquired image and reduces the number of pattern images to be projected, resulting finally in a speeding-up of the acquisition times.Making a computer able to see exactly as a human being does was for many years one of the most interesting and challenging tasks involving lots of experts and pioneers in fields such as Computer Science and Artificial Intelligence. As a result, a whole field called Computer Vision has emerged becoming very soon a part of our daily life. The successful methodologies of this discipline have been applied in countless areas of application and their use is still in continuous expansion. On the other hand, in an increasing number of applications extracting information from simple 2D images is not enough and what is more requested instead is to use three-dimensional imaging techniques in order to reconstruct the 3D shape of the imaged objects and scene. The techniques developed in this context include both active systems, where some form of illumination is projected onto the scene, and passive systems, where the natural illumination of the scene is used. Among the active systems, one of the most reliable approaches for recovering the surface of objects is the use of structured light. This technique is based on projecting a light pattern and viewing the illuminated scene from one or more points of view. Since the pattern is coded, correspondences between image points and points of the projected pattern can be easily found. In particular, the performances of this kind of 3D scanner are determined by two key aspects, the accuracy and the acquisition time. This thesis aims to design and experiment some rectification strategies for a prototype of binary coded structured light 3D scanner. The rectification is a commonly used technique for stereo vision systems which, in case of structured light, facilitates the establishment of correspondences across a projected pattern and an acquired image and reduces the number of pattern images to be projected, resulting finally in a speeding-up of the acquisition times
    • 

    corecore