461 research outputs found

    Design of an Active Stereo Vision 3D Scene Reconstruction System Based on the Linear Position Sensor Module

    Get PDF
    Active vision systems and passive vision systems currently exist for three-dimensional (3D) scene reconstruction. Active systems use a laser that interacts with the scene. Passive systems implement stereo vision, using two cameras and geometry to reconstruct the scene. Each type of system has advantages and disadvantages in resolution, speed, and scene depth. It may be possible to combine the advantages of both systems as well as new hardware technologies such as position sensitive devices (PSDs) and field programmable gate arrays (FPGAs) to create a real-time, mid-range 3D scene reconstruction system. Active systems usually reconstruct long-range scenes so that a measurable amount of time can pass for the laser to travel to the scene and back. Passive systems usually reconstruct close-range scenes but must overcome the correspondence problem. If PSDs are placed in a stereo vision configuration and a laser is directed at the scene, the correspondence problem can be eliminated. The laser can scan the entire scene as the PSDs continually pick up points, and the scene can be reconstructed. By eliminating the correspondence problem, much of the computation time of stereo vision is removed, allowing larger scenes, possibly at mid-range, to be modeled. To give good resolution at a real-time frame rate, points would have to be recorded very quickly. PSDs are analog devices that give the position of a light spot and have very fast response times. The cameras in the system can be replaced by PSDs to help achieve real- time refresh rates and better resolution. A contribution of this thesis is to design a 3D scene reconstruction system by placing two PSDs in a stereo vision configuration and to use FPGAs to perform calculations to achieve real-time frame rates of mid-range scenes. The linear position sensor module (LPSM) made by Noah Corp is based on a PSD and outputs a position in terms of voltage. The LPSM is characterized for this application by testing it with different power lasers while also varying environment variables such as background light, scene type, and scene distance. It is determined that the LPSM is sensitive to red wavelength lasers. When the laser is reflected off of diffuse surfaces, the laser must output at least 500 mW to be picked up by the LPSM and the scene must be within 15 inches, or the power intensity will not meet the intensity requirements of the LPSM. The establishment of these performance boundaries is a contribution of the thesis along with characterizing and testing the LPSM as a vision sensor in the proposed scene reconstruction system. Once performance boundaries are set, the LPSM is used to model calibrated objects. LPSM sensitivity to power intensity changes seems to cause considerable error. The change in power appears to be a function of depth due to the dispersion of the laser beam. The model is improved by using a correction factor to find the position of the light spot. Using a better-focused laser may improve the results. Another option is to place two PSDs in the same configuration and test to see whether the intensity problem is intrinsic to all PSDs or if the problem is unique to the LPSM

    Real Time Structured Light and Applications

    Get PDF

    Anatomically informed image reconstruction for time of flight positron emission tomography

    Get PDF
    Positron emission tomography (PET) has an important role in disease diagnosis, drug development and patient management. PET images are accompanied with computed tomography (CT) or magnetic resonance (MR) to provide the complementary structural information. GE SIGNA PET/MR is the state-of-the-art clinical scanner that aims at combining time of flight-PET (TOF-PET) with anatomical and soft-tissue MR imaging. This work aims at modelling the mathematical and physical processes of TOF-PET data for the GE SIGNA PET/MR within an open-source software, software for tomographic image reconstruction (STIR). This work further examines the developments made to implement the acquisition model using typical (ordered subsets expectation maximisation (OSEM)) and advanced iterative algorithms (TOF-OSEM and TOF-kernelised expectation maximisation (TOF-KEM)). TOF-PET improves conventional PET imaging as it localises the event along the line of response (LOR) within a small region with an uncertainty which is calculated using the timing resolution of the detectors. It demonstrates robustness despite the presence of small errors, inconsistencies or patient motion in the acquired data. The GE SIGNA PET/MR have a timing resolution of 390 ps. The aim of this work is to exploit TOF-PET and further include the anatomical information from MR images to facilitate robust PET reconstructions. All the developments made in this thesis were compared with the vendor's reconstruction software (GE-toolbox). Real phantom and clinical datasets were used for the analysis. The calculated emission and data corrections using developments made in STIR were in excellent agreement with the GE-toolbox despite the absence of dead-time and decay effects within the current developments. Reconstructions using OSEM and TOF-OSEM algorithms demonstrated a good agreement with the GE-toolbox concerning quantitative, resolution and structural based analysis. TOF-KEM reconstructions demonstrated a slight improvement in quantification as compared to TOF-OSEM with STIR. The thesis demonstrates the first instance of real data reconstruction for TOF-PET data using TOF-OSEM and TOF-KEM algorithms. The developments made in this thesis provide a platform to investigate the effects of a novel reconstruction algorithm, TOF-KEM on the dose and scan time reduction using real clinical datasets

    Towards adaptive and autonomous humanoid robots: from vision to actions

    Get PDF
    Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions

    The Estimation and Correction of Rigid Motion in Helical Computed Tomography

    Get PDF
    X-ray CT is a tomographic imaging tool used in medicine and industry. Although technological developments have significantly improved the performance of CT systems, the accuracy of images produced by state-of-the-art scanners is still often limited by artefacts due to object motion. To tackle this problem, a number of motion estimation and compensation methods have been proposed. However, no methods with the demonstrated ability to correct for rigid motion in helical CT scans appear to exist. The primary aims of this thesis were to develop and evaluate effective methods for the estimation and correction of arbitrary six degree-of-freedom rigid motion in helical CT. As a first step, a method was developed to accurately estimate object motion during CT scanning with an optical tracking system, which provided sub-millimetre positional accuracy. Subsequently a motion correction method, which is analogous to a method previously developed for SPECT, was adapted to CT. The principle is to restore projection consistency by modifying the source-detector orbit in response to the measured object motion and reconstruct from the modified orbit with an iterative reconstruction algorithm. The feasibility of this method was demonstrated with a rapidly moving brain phantom, and the efficacy of correcting for a range of human head motions acquired from healthy volunteers was evaluated in simulations. The methods developed were found to provide accurate and artefact-free motion corrected images with most types of head motion likely to be encountered in clinical CT imaging, provided that the motion was accurately known. The method was also applied to CT data acquired on a hybrid PET/CT scanner demonstrating its versatility. Its clinical value may be significant by reducing the need for repeat scans (and repeat radiation doses), anesthesia and sedation in patient groups prone to motion, including young children

    On-the-fly dense 3D surface reconstruction for geometry-aware augmented reality.

    Get PDF
    Augmented Reality (AR) is an emerging technology that makes seamless connections between virtual space and the real world by superimposing computer-generated information onto the real-world environment. AR can provide additional information in a more intuitive and natural way than any other information-delivery method that a human has ever in- vented. Camera tracking is the enabling technology for AR and has been well studied for the last few decades. Apart from the tracking problems, sensing and perception of the surrounding environment are also very important and challenging problems. Although there are existing hardware solutions such as Microsoft Kinect and HoloLens that can sense and build the environmental structure, they are either too bulky or too expensive for AR. In this thesis, the challenging real-time dense 3D surface reconstruction technologies are studied and reformulated for the reinvention of basic position-aware AR towards geometry-aware and the outlook of context- aware AR. We initially propose to reconstruct the dense environmental surface using the sparse point from Simultaneous Localisation and Map- ping (SLAM), but this approach is prone to fail in challenging Minimally Invasive Surgery (MIS) scenes such as the presence of deformation and surgical smoke. We subsequently adopt stereo vision with SLAM for more accurate and robust results. With the success of deep learning technology in recent years, we present learning based single image re- construction and achieve the state-of-the-art results. Moreover, we pro- posed context-aware AR, one step further from purely geometry-aware AR towards the high-level conceptual interaction modelling in complex AR environment for enhanced user experience. Finally, a learning-based smoke removal method is proposed to ensure an accurate and robust reconstruction under extreme conditions such as the presence of surgical smoke

    Second Annual Conference on Astronomical Data Analysis Software and Systems. Abstracts

    Get PDF
    Abstracts from the conference are presented. The topics covered include the following: next generation software systems and languages; databases, catalogs, and archives; user interfaces/visualization; real-time data acquisition/scheduling; and IRAF/STSDAS/PROS status reports

    Design of a High-Speed Architecture for Stabilization of Video Captured Under Non-Uniform Lighting Conditions

    Get PDF
    Video captured in shaky conditions may lead to vibrations. A robust algorithm to immobilize the video by compensating for the vibrations from physical settings of the camera is presented in this dissertation. A very high performance hardware architecture on Field Programmable Gate Array (FPGA) technology is also developed for the implementation of the stabilization system. Stabilization of video sequences captured under non-uniform lighting conditions begins with a nonlinear enhancement process. This improves the visibility of the scene captured from physical sensing devices which have limited dynamic range. This physical limitation causes the saturated region of the image to shadow out the rest of the scene. It is therefore desirable to bring back a more uniform scene which eliminates the shadows to a certain extent. Stabilization of video requires the estimation of global motion parameters. By obtaining reliable background motion, the video can be spatially transformed to the reference sequence thereby eliminating the unintended motion of the camera. A reflectance-illuminance model for video enhancement is used in this research work to improve the visibility and quality of the scene. With fast color space conversion, the computational complexity is reduced to a minimum. The basic video stabilization model is formulated and configured for hardware implementation. Such a model involves evaluation of reliable features for tracking, motion estimation, and affine transformation to map the display coordinates of a stabilized sequence. The multiplications, divisions and exponentiations are replaced by simple arithmetic and logic operations using improved log-domain computations in the hardware modules. On Xilinx\u27s Virtex II 2V8000-5 FPGA platform, the prototype system consumes 59% logic slices, 30% flip-flops, 34% lookup tables, 35% embedded RAMs and two ZBT frame buffers. The system is capable of rendering 180.9 million pixels per second (mpps) and consumes approximately 30.6 watts of power at 1.5 volts. With a 1024×1024 frame, the throughput is equivalent to 172 frames per second (fps). Future work will optimize the performance-resource trade-off to meet the specific needs of the applications. It further extends the model for extraction and tracking of moving objects as our model inherently encapsulates the attributes of spatial distortion and motion prediction to reduce complexity. With these parameters to narrow down the processing range, it is possible to achieve a minimum of 20 fps on desktop computers with Intel Core 2 Duo or Quad Core CPUs and 2GB DDR2 memory without a dedicated hardware

    Passive Visual Sensing in Automatic Arc Welding

    Get PDF
    corecore