187 research outputs found
3D object reconstruction using computer vision : reconstruction and characterization applications for external human anatomical structures
Tese de doutoramento. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 201
Recommended from our members
Development of a Robotic Positioning and Tracking System for a Research Laboratory
Measurement of residual stress using neutron or synchrotron diffraction relies on the accurate alignment of the sample in relation to the gauge volume of the instrument. Automatic sample alignment can be achieved using kinematic models of the positioning system provided the relevant kinematic parameters are known, or can be determined, to a suitable accuracy.
The main problem addressed in this thesis is improving the repeatability and accuracy of the sample positioning for the strain scanning, through the use of techniques from robotic calibration theory to generate kinematic models of both off-the-shelf and custom-built positioning systems. The approach is illustrated using a positioning system in use on the ENGIN-X instrument at the UK’s ISIS pulsed neutron source comprising a traditional XYZΩ table augmented with a triple axis manipulator. Accuracies better than 100microns were achieved for this compound system. Although discussed here in terms of sample positioning systems these methods are entirely applicable to other moving instrument components such as beam shaping jaws and detectors.
Several factors could lead to inaccurate positioning on a neutron or synchrotron diffractometer. It is therefore essential to validate the accuracy of positioning especially during experiments which require a high level of accuracy. In this thesis, a stereo camera system is developed to monitor the sample and other moving parts of the diffractometer. The camera metrology system is designed to measure the positions of retroreflective markers attached to any object that is being monitored. A fully automated camera calibration procedure is developed with an emphasis on accuracy. The potential accuracy of this system is demonstrated and problems that limit accuracy are discussed. It is anticipated that the camera system would be used to correct the positioning system when the error is minimal or notify the user of the error when it is significant
Examination of scanner precision by analysing orthodontic parameters
Background: 3D modelling in orthodontics is becoming an increasingly widespread technique in practice. One of the significant questions already being asked is related to determining the precision of the scanner used for generating surfaces on a 3D model of the jaw. Materials and methods: This research was conducted by generating a set of identical 3D models on Atos optical 3D scanner and Lazak Scan laboratory scanner, which precision was established by measuring a set of orthodontic parameters (54 overall) in all three orthodontic planes. In this manner we explored their precision in space, since they are used for generating spatial models - 3D jaws. Results: There were significant differences between parameters scanned with Atos and Lazak Scan. The smallest difference was 0.017 mm, and the biggest 1.109 mm. Conclusion: This research reveals that both scanners (Atos and Lazak Scan), which belong to general purpose scanners, based on precision parameters can be used in orthodontics. Early analyses indicate that the reference scanner in terms of precision is Atos
Reconstrução tridimensional de ambientes reais usando dados laser e de intensidade
O objectivo do trabalho apresentado nesta tese é a criação de modelos
tridimensionais completos e de alta resolução de ambientes reais (informação
geométrica e de textura) a partir de imagens passivas de intensidade e de
sensores de distância activos. A maior parte dos sistemas de reconstrução 3D
são baseados em sensores laser de distância ou em câmaras fotográficas,
mas muito pouco trabalho tem tentado combinar estes dois tipos de sensores.
A extracção de profundidade a partir de imagens de intensidade é complicada.
Por outro lado, as fotografias fornecem informação adicional sobre os
ambientes que pode ser usada durante o processo de modelação, em
particular para definir, de uma forma precisa, as fronteiras das superfĂcies. Isto
torna os sensores activos e passivos complementares em varios modos e Ă© a
ideia de base que motivou o trabalho apresentado nesta tese.
Na primeira parte da tese, concentramo-nos no registro entre dados oriundos
de sensores activos de distância e de câmaras digitais passivas e no
desenvolvimento de ferramentas para tornar este passo mais fácil,
independente do utilizador e mais preciso. No fim, com esta técnica, obtém-se
um mapa de textura para os modelos baseado em várias fotografias digitais. O
modelo 3D assim obtido é realizado baseado nos dados de distância para a
geometria e nas fotografias digitais para a textura. Com estes modelos, obtémse
uma qualidade fotográfica: uma espécie de fotografia de alta resolução em
3D dum ambiente real.
Na segunda parte da tese, vai-se mais longe na combinação dos dados. As
fotografias digitais são usadas como uma fonte adicional de informação
tridimensional que pode ser valiosa para definir com precisĂŁo as fronteiras das
superfĂcies (onde a informação de distância Ă© menos fiável) ou entĂŁo
preencher falhas nos dados ou aumentar a densidade de pontos 3D em áreas
de interesse.The objective of the work presented in this thesis is to generate complete, highresolution
three-dimensional models of real world scenes (3D geometric and
texture information) from passive intensity images and active range sensors.
Most 3D reconstruction systems are based either in range finders or in digital
cameras but little work tries to combine these two sensors.
Depth extraction from intensity images is complex. On the other hand digital
photographs provide additional information about the scenes that can be used
to help the modelling process, in particular to define accurate surface boundary
conditions. This makes active and passive sensors complementary in many
ways and is the base idea that motivates the work in this thesis.
In the first part of the thesis, we concentrate in the registration between data
coming from active range sensors and passive digital cameras and the
development of tools to make this step easier, more user-independent and
more precise. In the end, with this technique, a texture map for the models is
computed based on several digital photographs. This will lead to 3D models
where 3D geometry is extracted from range data, whereas texture information
comes from digital photographs. With these models, photo realistic quality is
achieved: a kind of high-resolution 3D photograph of a real scene.
In the second part of the thesis, we go further in the combination between the
datasets. The digital photographs are used as an additional source of threedimensional
information that can be valuable to define accurate surface
boundary conditions (where range data is less reliable) or even to fill holes in
the data or increase 3D point density in areas of interest
An investigation into common challenges of 3D scene understanding in visual surveillance
Nowadays, video surveillance systems are ubiquitous. Most installations simply consist of CCTV cameras connected to a central control room and rely on human operators to interpret what they see on the screen in order to, for example, detect a crime (either during or after an event). Some modern computer vision systems aim to automate the process, at least to some degree, and various algorithms have been somewhat successful in certain limited areas. However, such systems remain inefficient in general circumstances and present real challenges yet to be solved. These challenges include the ability to recognise and ultimately predict and prevent abnormal behaviour or even reliably recognise objects, for example in order to detect left luggage or suspicious objects. This thesis first aims to study the state-of-the-art and identify the major challenges and possible requirements of future automated and semi-automated CCTV technology in the field. This thesis presents the application of a suite of 2D and highly novel 3D methodologies that go some way to overcome current limitations.The methods presented here are based on the analysis of object features directly extracted from the geometry of the scene and start with a consideration of mainly existing techniques, such as the use of lines, vanishing points (VPs) and planes, applied to real scenes. Then, an investigation is presented into the use of richer 2.5D/3D surface normal data. In all cases the aim is to combine both 2D and 3D data to obtain a better understanding of the scene, aimed ultimately at capturing what is happening within the scene in order to be able to move towards automated scene analysis. Although this thesis focuses on the widespread application of video surveillance, an example case of the railway station environment is used to represent typical real-world challenges, where the principles can be readily extended elsewhere, such as to airports, motorways, the households, shopping malls etc. The context of this research work, together with an overall presentation of existing methods used in video surveillance and their challenges are described in chapter 1.Common computer vision techniques such as VP detection, camera calibration, 3D reconstruction, segmentation etc., can be applied in an effort to extract meaning to video surveillance applications. According to the literature, these methods have been well researched and their use will be assessed in the context of current surveillance requirements in chapter 2. While existing techniques can perform well in some contexts, such as an architectural environment composed of simple geometrical elements, their robustness and performance in feature extraction and object recognition tasks is not sufficient to solve the key challenges encountered in general video surveillance context. This is largely due to issues such as variable lighting, weather conditions, and shadows and in general complexity of the real-world environment. Chapter 3 presents the research and contribution on those topics – methods to extract optimal features for a specific CCTV application – as well as their strengths and weaknesses to highlight that the proposed algorithm obtains better results than most due to its specific design.The comparison of current surveillance systems and methods from the literature has shown that 2D data are however almost constantly used for many applications. Indeed, industrial systems as well as the research community have been improving intensively 2D feature extraction methods since image analysis and Scene understanding has been of interest. The constant progress on 2D feature extraction methods throughout the years makes it almost effortless nowadays due to a large variety of techniques. Moreover, even if 2D data do not allow solving all challenges in video surveillance or other applications, they are still used as starting stages towards scene understanding and image analysis. Chapter 4 will then explore 2D feature extraction via vanishing point detection and segmentation methods. A combination of most common techniques and a novel approach will be then proposed to extract vanishing points from video surveillance environments. Moreover, segmentation techniques will be explored in the aim to determine how they can be used to complement vanishing point detection and lead towards 3D data extraction and analysis. In spite of the contribution above, 2D data is insufficient for all but the simplest applications aimed at obtaining an understanding of a scene, where the aim is for a robust detection of, say, left luggage or abnormal behaviour; without significant a priori information about the scene geometry. Therefore, more information is required in order to be able to design a more automated and intelligent algorithm to obtain richer information from the scene geometry and so a better understanding of what is happening within. This can be overcome by the use of 3D data (in addition to 2D data) allowing opportunity for object “classification” and from this to infer a map of functionality, describing feasible and unfeasible object functionality in a given environment. Chapter 5 presents how 3D data can be beneficial for this task and the various solutions investigated to recover 3D data, as well as some preliminary work towards plane extraction.It is apparent that VPs and planes give useful information about a scene’s perspective and can assist in 3D data recovery within a scene. However, neither VPs nor plane detection techniques alone allow the recovery of more complex generic object shapes - for example composed of spheres, cylinders etc - and any simple model will suffer in the presence of non-Manhattan features, e.g. introduced by the presence of an escalator. For this reason, a novel photometric stereo-based surface normal retrieval methodology is introduced to capture the 3D geometry of the whole scene or part of it. Chapter 6 describes how photometric stereo allows recovery of 3D information in order to obtain a better understanding of a scene, as well as also partially overcoming some current surveillance challenges, such as difficulty in resolving fine detail, particularly at large standoff distances, and in isolating and recognising more complex objects in real scenes. Here items of interest may be obscured by complex environmental factors that are subject to rapid change, making, for example, the detection of suspicious objects and behaviour highly problematic. Here innovative use is made of an untapped latent capability offered within modern surveillance environments to introduce a form of environmental structuring to good advantage in order to achieve a richer form of data acquisition. This chapter also goes on to explore the novel application of photometric stereo in such diverse applications, how our algorithm can be incorporated into an existing surveillance system and considers a typical real commercial application.One of the most important aspects of this research work is its application. Indeed, while most of the research literature has been based on relatively simple structured environments, the approach here has been designed to be applied to real surveillance environments, such as railway stations, airports, waiting rooms, etc, and where surveillance cameras may be fixed or in the future form part of a mobile robotic free roaming surveillance device, that must continually reinterpret its changing environment. So, as mentioned previously, while the main focus has been to apply this algorithm to railway station environments, the work has been approached in a way that allows adaptation to many other applications, such as autonomous robotics, and in motorway, shopping centre, street and home environments. All of these applications require a better understanding of the scene for security or safety purposes. Finally, chapter 7 presents a global conclusion and what will be achieved in the future
Parallelized computational 3D video microscopy of freely moving organisms at multiple gigapixels per second
To study the behavior of freely moving model organisms such as zebrafish
(Danio rerio) and fruit flies (Drosophila) across multiple spatial scales, it
would be ideal to use a light microscope that can resolve 3D information over a
wide field of view (FOV) at high speed and high spatial resolution. However, it
is challenging to design an optical instrument to achieve all of these
properties simultaneously. Existing techniques for large-FOV microscopic
imaging and for 3D image measurement typically require many sequential image
snapshots, thus compromising speed and throughput. Here, we present 3D-RAPID, a
computational microscope based on a synchronized array of 54 cameras that can
capture high-speed 3D topographic videos over a 135-cm^2 area, achieving up to
230 frames per second at throughputs exceeding 5 gigapixels (GPs) per second.
3D-RAPID features a 3D reconstruction algorithm that, for each synchronized
temporal snapshot, simultaneously fuses all 54 images seamlessly into a
globally-consistent composite that includes a coregistered 3D height map. The
self-supervised 3D reconstruction algorithm itself trains a
spatiotemporally-compressed convolutional neural network (CNN) that maps raw
photometric images to 3D topography, using stereo overlap redundancy and
ray-propagation physics as the only supervision mechanism. As a result, our
end-to-end 3D reconstruction algorithm is robust to generalization errors and
scales to arbitrarily long videos from arbitrarily sized camera arrays. The
scalable hardware and software design of 3D-RAPID addresses a longstanding
problem in the field of behavioral imaging, enabling parallelized 3D
observation of large collections of freely moving organisms at high
spatiotemporal throughputs, which we demonstrate in ants (Pogonomyrmex
barbatus), fruit flies, and zebrafish larvae
Compact Environment Modelling from Unconstrained Camera Platforms
Mobile robotic systems need to perceive their surroundings in order to act independently. In this work a perception framework is developed which interprets the data of a binocular camera in order to transform it into a compact, expressive model of the environment. This model enables a mobile system to move in a targeted way and interact with its surroundings. It is shown how the developed methods also provide a solid basis for technical assistive aids for visually impaired people
Recommended from our members
Camera positioning for 3D panoramic image rendering
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known
camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated
using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation
of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with
a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects.
To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential
of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality
Automatic registration of 3D models to laparoscopic video images for guidance during liver surgery
Laparoscopic liver interventions offer significant advantages over open surgery, such as less pain and trauma, and shorter recovery time for the patient. However, they also bring challenges for the surgeons such as the lack of tactile feedback, limited field of view and occluded anatomy. Augmented reality (AR) can potentially help during laparoscopic liver interventions by displaying sub-surface structures (such as tumours or vasculature). The initial registration between the 3D model extracted from the CT scan and the laparoscopic video feed is essential for an AR system which should be efficient, robust, intuitive to use and with minimal disruption to the surgical procedure. Several challenges of registration methods in laparoscopic interventions include the deformation of the liver due to gas insufflation in the abdomen, partial visibility of the organ and lack of prominent geometrical or texture-wise landmarks. These challenges are discussed in detail and an overview of the state of the art is provided. This research project aims to provide the tools to move towards a completely automatic registration. Firstly, the importance of pre-operative planning is discussed along with the characteristics of the liver that can be used in order to constrain a registration method. Secondly, maximising the amount of information obtained before the surgery, a semi-automatic surface based method is proposed to recover the initial rigid registration irrespective of the position of the shapes. Finally, a fully automatic 3D-2D rigid global registration is proposed which estimates a global alignment of the pre-operative 3D model using a single intra-operative image. Moving towards incorporating the different liver contours can help constrain the registration, especially for partial surfaces. Having a robust, efficient AR system which requires no manual interaction from the surgeon will aid in the translation of such approaches to the clinics
Map-Based Localization for Unmanned Aerial Vehicle Navigation
Unmanned Aerial Vehicles (UAVs) require precise pose estimation when navigating in indoor and GNSS-denied / GNSS-degraded outdoor environments. The possibility of crashing in these environments is high, as spaces are confined, with many moving obstacles. There are many solutions for localization in GNSS-denied environments, and many different technologies are used. Common solutions involve setting up or using existing infrastructure, such as beacons, Wi-Fi, or surveyed targets. These solutions were avoided because the cost should be proportional to the number of users, not the coverage area. Heavy and expensive sensors, for example a high-end IMU, were also avoided. Given these requirements, a camera-based localization solution was selected for the sensor pose estimation. Several camera-based localization approaches were investigated. Map-based localization methods were shown to be the most efficient because they close loops using a pre-existing map, thus the amount of data and the amount of time spent collecting data are reduced as there is no need to re-observe the same areas multiple times. This dissertation proposes a solution to address the task of fully localizing a monocular camera onboard a UAV with respect to a known environment (i.e., it is assumed that a 3D model of the environment is available) for the purpose of navigation for UAVs in structured environments.
Incremental map-based localization involves tracking a map through an image sequence. When the map is a 3D model, this task is referred to as model-based tracking. A by-product of the tracker is the relative 3D pose (position and orientation) between the camera and the object being tracked. State-of-the-art solutions advocate that tracking geometry is more robust than tracking image texture because edges are more invariant to changes in object appearance and lighting. However, model-based trackers have been limited to tracking small simple objects in small environments. An assessment was performed in tracking larger, more complex building models, in larger environments. A state-of-the art model-based tracker called ViSP (Visual Servoing Platform) was applied in tracking outdoor and indoor buildings using a UAVs low-cost camera. The assessment revealed weaknesses at large scales. Specifically, ViSP failed when tracking was lost, and needed to be manually re-initialized. Failure occurred when there was a lack of model features in the cameras field of view, and because of rapid camera motion. Experiments revealed that ViSP achieved positional accuracies similar to single point positioning solutions obtained from single-frequency (L1) GPS observations standard deviations around 10 metres. These errors were considered to be large, considering the geometric accuracy of the 3D model used in the experiments was 10 to 40 cm. The first contribution of this dissertation proposes to increase the performance of the localization system by combining ViSP with map-building incremental localization, also referred to as simultaneous localization and mapping (SLAM). Experimental results in both indoor and outdoor environments show sub-metre positional accuracies were achieved, while reducing the number of tracking losses throughout the image sequence. It is shown that by integrating model-based tracking with SLAM, not only does SLAM improve model tracking performance, but the model-based tracker alleviates the computational expense of SLAMs loop closing procedure to improve runtime performance. Experiments also revealed that ViSP was unable to handle occlusions when a complete 3D building model was used, resulting in large errors in its pose estimates. The second contribution of this dissertation is a novel map-based incremental localization algorithm that improves tracking performance, and increases pose estimation accuracies from ViSP. The novelty of this algorithm is the implementation of an efficient matching process that identifies corresponding linear features from the UAVs RGB image data and a large, complex, and untextured 3D model. The proposed model-based tracker improved positional accuracies from 10 m (obtained with ViSP) to 46 cm in outdoor environments, and improved from an unattainable result using VISP to 2 cm positional accuracies in large indoor environments.
The main disadvantage of any incremental algorithm is that it requires the camera pose of the first frame. Initialization is often a manual process. The third contribution of this dissertation is a map-based absolute localization algorithm that automatically estimates the camera pose when no prior pose information is available. The method benefits from vertical line matching to accomplish a registration procedure of the reference model views with a set of initial input images via geometric hashing. Results demonstrate that sub-metre positional accuracies were achieved and a proposed enhancement of conventional geometric hashing produced more correct matches - 75% of the correct matches were identified, compared to 11%. Further the number of incorrect matches was reduced by 80%
- …