1,822 research outputs found

    Comunicações com câmara e aprendizagem automática para posicionamento por luz visível em espaços interiores

    Get PDF
    In this dissertation, a 3D indoor visible light positioning system based on machine learning and optical camera communications is presented. The system uses LED (light-emitting diode) luminaires as reference points and a rolling shutter complementary metal-oxide semiconductor (CMOS) sensor as the receiver. The LED luminaires are modulated using On-Off Keying (OOK) with unique frequencies. You Only Look Once version 5 (YOLOv5) is used for classification and estimation of the position of each visible LED luminaire in the image. The pose of the receiver is estimated using a perspective-npoint (PnP) problem algorithm. The system is validated using a real-world sized setup containing eight LED luminaires, and achieved an average positioning error of 3.5 cm. The average time to compute the camera pose is approximately 52 ms, which makes it suitable for real-time positioning. To the best of our knowledge, this is the first application of the YOLOv5 algorithm in the field of VLP for indoor environments.Nesta dissertação, é apresentado um sistema de posicionamento 3D por luz visível, baseado em aprendizagem automática e comunicações com câmara. O sistema foi desenvolvido para espaços interiores e utiliza luminárias LED (díodo emissor de luz) como pontos de referência e um sensor CMOS (complementary metal-oxide semiconductor) como recetor. As luminárias LED são moduladas utilizando OOK (On–Off Keying) com frequências únicas. O algoritmo YOLOv5 (You Only Look Once version 5) é utilizado para classificar e estimar a posição de cada luminária LED visível na imagem. A posição e orientação do recetor é estimada utilizando um algoritmo de geometria projetiva. O sistema foi validado utilizando um setup em tamanho real com 8 luminárias LED, e obteve um erro de posicionamento médio de 3.5 cm. O tempo médio para obter a posição e orientação da câmara é de aproximadamente 52ms, o que torna o sistema adequado para posicionamento em tempo real. Tanto quanto sabemos, esta é a primeira aplicação do algoritmo YOLOv5 para localização por luz visível em espaços interiores.Mestrado em Engenharia Eletrónica e Telecomunicaçõe

    Determination of Elevations for Excavation Operations Using Drone Technologies

    Get PDF
    Using deep learning technology to rapidly estimate depth information from a single image has been studied in many situations, but it is new in construction site elevation determinations, and challenges are not limited to the lack of datasets. This dissertation presents the research results of utilizing drone ortho-imaging and deep learning to estimate construction site elevations for excavation operations. It provides two flexible options of fast elevation determination including a low-high-ortho-image-pair-based method and a single-frame-ortho-image-based method. The success of this research project advanced the ortho-imaging utilization in construction surveying, strengthened CNNs (convolutional neural networks) to work with large scale images, and contributed dense image pixel matching with different scales.This research project has three major tasks. First, the high-resolution ortho-image and elevation-map datasets were acquired using the low-high ortho-image pair-based 3D-reconstruction method. In detail, a vertical drone path is designed first to capture a 2:1 scale ortho-image pair of a construction site at two different altitudes. Then, to simultaneously match the pixel pairs and determine elevations, the developed pixel matching and virtual elevation algorithm provides the candidate pixel pairs in each virtual plane for matching, and the four-scaling patch feature descriptors are used to match them. Experimental results show that 92% of pixels in the pixel grid were strongly matched, where the accuracy of elevations was within ±5 cm.Second, the acquired high-resolution datasets were applied to train and test the ortho-image encoder and elevation-map decoder, where the max-pooling and up-sampling layers link the ortho-image and elevation-map in the same pixel coordinate. This convolutional encoder-decoder was supplemented with an input ortho-image overlapping disassembling and output elevation-map assembling algorithm to crop the high-resolution datasets into multiple small-patch datasets for model training and testing. Experimental results indicated 128×128-pixel small-patch had the best elevation estimation performance, where 21.22% of the selected points were exactly matched with “ground truth,” 31.21% points were accurately matched within ±5 cm. Finally, vegetation was identified in high-resolution ortho-images and removed from corresponding elevation-maps using the developed CNN-based image classification model and the vegetation removing algorithm. Experimental results concluded that the developed CNN model using 32×32-pixel ortho-image and class-label small-patch datasets had 93% accuracy in identifying objects and localizing objects’ edges

    Automatic Affine and Elastic Registration Strategies for Multi-dimensional Medical Images

    Get PDF
    Medical images have been used increasingly for diagnosis, treatment planning, monitoring disease processes, and other medical applications. A large variety of medical imaging modalities exists including CT, X-ray, MRI, Ultrasound, etc. Frequently a group of images need to be compared to one another and/or combined for research or cumulative purposes. In many medical studies, multiple images are acquired from subjects at different times or with different imaging modalities. Misalignment inevitably occurs, causing anatomical and/or functional feature shifts within the images. Computerized image registration (alignment) approaches can offer automatic and accurate image alignments without extensive user involvement and provide tools for visualizing combined images. This dissertation focuses on providing automatic image registration strategies. After a through review of existing image registration techniques, we identified two registration strategies that enhance the current field: (1) an automated rigid body and affine registration using voxel similarity measurements based on a sequential hybrid genetic algorithm, and (2) an automated deformable registration approach based upon a linear elastic finite element formulation. Both methods streamlined the registration process. They are completely automatic and require no user intervention. The proposed registration strategies were evaluated with numerous 2D and 3D MR images with a variety of tissue structures, orientations and dimensions. Multiple registration pathways were provided with guidelines for their applications. The sequential genetic algorithm mimics the pathway of an expert manually doing registration. Experiments demonstrated that the sequential genetic algorithm registration provides high alignment accuracy and is reliable for brain tissues. It avoids local minima/maxima traps of conventional optimization techniques, and does not require any preprocessing such as threshold, smoothing, segmentation, or definition of base points or edges. The elastic model was shown to be highly effective to accurately align areas of interest that are automatically extracted from the images, such as brains. Using a finite element method to get the displacement of each element node by applying a boundary mapping, this method provides an accurate image registration with excellent boundary alignment of each pair of slices and consequently align the entire volume automatically. This dissertation presented numerous volume alignments. Surface geometries were created directly from the aligned segmented images using the Multiple Material Marching Cubes algorithm. Using the proposed registration strategies, multiple subjects were aligned to a standard MRI reference, which is aligned to a segmented reference atlas. Consequently, multiple subjects are aligned to the segmented atlas and a full fMRI analysis is possible

    A deep learning algorithm for contour detection in synthetic 2D biplanar X-ray images of the scapula: towards improved 3D reconstruction of the scapula

    Get PDF
    Three-dimensional (3D) reconstruction from X-ray images using statistical shape models (SSM) provides a cost-effective way of increasing the diagnostic utility of two-dimensional (2D) X-ray images, especially in low-resource settings. The landmark-constrained model fitting approach is one way to obtain patient-specific models from a statistical model. This approach requires an accurate selection of corresponding features, usually landmarks, from the bi-planar X-ray images. However, X-ray images are 2D representations of 3D anatomy with super-positioned structures, which confounds this approach. The literature shows that detection and use of contours to locate corresponding landmarks within biplanar X-ray images can address this limitation. The aim of this research project was to train and validate a deep learning algorithm for detection the contour of a scapula in synthetic 2D bi-planar Xray images. Synthetic bi-planar X-ray images were obtained from scapula mesh samples with annotated landmarks generated from a validated SSM obtained from the Division of Biomedical Engineering, University of Cape Town. This was followed by the training of two convolutional neural network models as the first objective of the project; the first model was trained to predict the lateral (LAT) scapula image given the anterior-posterior (AP) image. The second model was trained to predict the AP image given the LAT image. The trained models had an average Dice coefficient value of 0.926 and 0.964 for the predicted LAT and AP images, respectively. However, the trained models did not generalise to the segmented real X-ray images of the scapula. The second objective was to perform landmark-constrained model fitting using the corresponding landmarks embedded in the predicted images. To achieve this objective, the 2D landmark locations were transformed into 3D coordinates using the direct linear transformation. The 3D point localization yielded average errors of (0.35, 0.64, 0.72) mm in the X, Y and Z directions, respectively, and a combined coordinate error of 1.16 mm. The reconstructed landmarks were used to reconstruct meshes that had average surface-to-surface distances of 3.22 mm and 1.72 mm for 3 and 6 landmarks, respectively. The third objective was to reconstruct the scapula mesh using matching points on the scapula contour in the bi-planar images. The average surface-to-surface distances of the reconstructed meshes with 8 matching contour points and 6 corresponding landmarks of the same meshes were 1.40 and 1.91 mm, respectively. In summary, the deep learning models were able to learn the mapping between the bi-planar images of the scapula. Increasing the number of corresponding landmarks from the bi-planar images resulted into better 3D reconstructions. However, obtaining these corresponding landmarks was non-trivial, necessitating the use of matching points selected from the scapulae contours. The results from the latter approach signal a need to explore contour matching methods to obtain more corresponding points in order to improve the scapula 3D reconstruction using landmark-constrained model fitting

    Boosting for Generic 2D/3D Object Recognition

    Get PDF
    Generic object recognition is an important function of the human visual system. For an artificial vision system to be able to emulate the human perception abilities, it should also be able to perform generic object recognition. In this thesis, we address the generic object recognition problem and present different approaches and models which tackle different aspects of this difficult problem. First, we present a model for generic 2D object recognition from complex 2D images. The model exploits only appearance-based information, in the form of a combination of texture and color cues, for binary classification of 2D object classes. Learning is accomplished in a weakly supervised manner using Boosting. However, we live in a 3D world and the ability to recognize 3D objects is very important for any vision system. Therefore, we present a model for generic recognition of 3D objects from range images. Our model makes use of a combination of simple local shape descriptors extracted from range images for recognizing 3D object categories, as shape is an important information provided by range images. Moreover, we present a novel dataset for generic object recognition that provides 2D and range images about different object classes using a Time-of-Flight (ToF) camera. As the surrounding world contains thousands of different object categories, recognizing many different object classes is important as well. Therefore, we extend our generic 3D object recognition model to deal with the multi-class learning and recognition task. Moreover, we extend the multi-class recognition model by introducing a novel model which uses a combination of appearance-based information extracted from 2D images and range-based (shape) information extracted from range images for multi-class generic 3D object recognition and promising results are obtained

    Advancements in multi-view processing for reconstruction, registration and visualization.

    Get PDF
    The ever-increasing diffusion of digital cameras and the advancements in computer vision, image processing and storage capabilities have lead, in the latest years, to the wide diffusion of digital image collections. A set of digital images is usually referred as a multi-view images set when the pictures cover different views of the same physical object or location. In multi-view datasets, correlations between images are exploited in many different ways to increase our capability to gather enhanced understanding and information on a scene. For example, a collection can be enhanced leveraging on the camera position and orientation, or with information about the 3D structure of the scene. The range of applications of multi-view data is really wide, encompassing diverse fields such as image-based reconstruction, image-based localization, navigation of virtual environments, collective photographic retouching, computational photography, object recognition, etc. For all these reasons, the development of new algorithms to effectively create, process, and visualize this type of data is an active research trend. The thesis will present four different advancements related to different aspects of the multi-view data processing: - Image-based 3D reconstruction: we present a pre-processing algorithm, that is a special color-to-gray conversion. This was developed with the aim to improve the accuracy of image-based reconstruction algorithms. In particular, we show how different dense stereo matching results can be enhanced by application of a domain separation approach that pre-computes a single optimized numerical value for each image location. - Image-based appearance reconstruction: we present a multi-view processing algorithm, this can enhance the quality of the color transfer from multi-view images to a geo-referenced 3D model of a location of interest. The proposed approach computes virtual shadows and allows to automatically segment shadowed regions from the input images preventing to use those pixels in subsequent texture synthesis. - 2D to 3D registration: we present an unsupervised localization and registration system. This system can recognize a site that has been framed in a multi-view data and calibrate it on a pre-existing 3D representation. The system has a very high accuracy and it can validate the result in a completely unsupervised manner. The system accuracy is enough to seamlessly view input images correctly super-imposed on the 3D location of interest. - Visualization: we present PhotoCloud, a real-time client-server system for interactive exploration of high resolution 3D models and up to several thousand photographs aligned over this 3D data. PhotoCloud supports any 3D models that can be rendered in a depth-coherent way and arbitrary multi-view image collections. Moreover, it tolerates 2D-to-2D and 2D-to-3D misalignments, and it provides scalable visualization of generic integrated 2D and 3D datasets by exploiting data duality. A set of effective 3D navigation controls, tightly integrated with innovative thumbnail bars, enhances the user navigation. These advancements have been developed in tourism and cultural heritage application contexts, but they are not limited to these
    corecore