33 research outputs found

    Integration of Absolute Orientation Measurements in the KinectFusion Reconstruction pipeline

    Full text link
    In this paper, we show how absolute orientation measurements provided by low-cost but high-fidelity IMU sensors can be integrated into the KinectFusion pipeline. We show that integration improves both runtime, robustness and quality of the 3D reconstruction. In particular, we use this orientation data to seed and regularize the ICP registration technique. We also present a technique to filter the pairs of 3D matched points based on the distribution of their distances. This filter is implemented efficiently on the GPU. Estimating the distribution of the distances helps control the number of iterations necessary for the convergence of the ICP algorithm. Finally, we show experimental results that highlight improvements in robustness, a speed-up of almost 12%, and a gain in tracking quality of 53% for the ATE metric on the Freiburg benchmark.Comment: CVPR Workshop on Visual Odometry and Computer Vision Applications Based on Location Clues 201

    One-To-One Scale Modeling For 3D Printing

    Get PDF
    Current methods of 3D shape acquisition do not take into account the size of the object unless expensive laser scanning equipment is used. This thesis provides details on the design and implementation of creating a 3D model to size for 3D printing. The thesis explores a pipeline using depth information gathered from a Microsoft Kinect and various open source software and freeware. The algorithms and techniques that are utilized to create the 3D models are also described. VisualSFM's point cloud generator, Meshlab's Poisson surface reconstructor, and Microsoft's 3D Builder are some of the pipeline components used in the research. Results obtained at different stages of the process will be presented and discussed

    Comparison of depth cameras for three-dimensional Reconstruction in Medicine

    Get PDF
    KinectFusion is a typical three-dimensional reconstruction technique which enables generation of individual three-dimensional human models from consumer depth cameras for understanding body shapes. The aim of this study was to compare three-dimensional reconstruction results obtained using KinectFusion from data collected with two different types of depth camera (time-of-flight and stereoscopic cameras) and compare these results with those of a commercial three-dimensional scanning system to determine which type of depth camera gives improved reconstruction. Torso mannequins and machined aluminium cylinders were used as the test objects for this study. Two depth cameras, Microsoft Kinect V2 and Intel Realsense D435, were selected as the representatives of time-of-flight and stereoscopic cameras, respectively, to capture scan data for the reconstruction of three-dimensional point clouds by KinectFusion techniques. The results showed that both time-of-flight and stereoscopic cameras, using the developed rotating camera rig, provided repeatable body scanning data with minimal operator-induced error. However, the time-of-flight camera generated more accurate three-dimensional point clouds than the stereoscopic sensor. Thus, this suggests that applications requiring the generation of accurate three-dimensional human models by KinectFusion techniques should consider using a time-of-flight camera, such as the Microsoft Kinect V2, as the image capturing sensor

    Sensor architectures and technologies for upper limb 3d surface reconstruction: A review

    Get PDF
    3D digital models of the upper limb anatomy represent the starting point for the design process of bespoke devices, such as orthoses and prostheses, which can be modeled on the actual patient’s anatomy by using CAD (Computer Aided Design) tools. The ongoing research on optical scanning methodologies has allowed the development of technologies that allow the surface reconstruction of the upper limb anatomy through procedures characterized by minimum discomfort for the patient. However, the 3D optical scanning of upper limbs is a complex task that requires solving problematic aspects, such as the difficulty of keeping the hand in a stable position and the presence of artefacts due to involuntary movements. Scientific literature, indeed, investigated different approaches in this regard by either integrating commercial devices, to create customized sensor architectures, or by developing innovative 3D acquisition techniques. The present work is aimed at presenting an overview of the state of the art of optical technologies and sensor architectures for the surface acquisition of upper limb anatomies. The review analyzes the working principles at the basis of existing devices and proposes a categorization of the approaches based on handling, pre/post-processing effort, and potentialities in real-time scanning. An in-depth analysis of strengths and weaknesses of the approaches proposed by the research community is also provided to give valuable support in selecting the most appropriate solution for the specific application to be addressed

    Estimating Head Measurements from 3D Point Clouds

    Get PDF
    Maße menschlicher Köpfe sind unter anderem nützlich für die Ergonomie, die Akustik, die Medizin, Computer Vision sowie Computergrafik. Solche Maße werden üblicherweise gänzlich oder teilweise manuell gewonnen, was ein umständliches Verfahren darstellt, da die Genauigkeit von der Kompetenz der Person abhängt, die diese Messungen vornimmt. Darüber hinaus enthalten manuell erfasste Daten weniger Informationen, von denen neue Maße abgeleitet werden können, wenn das Subjekt nicht länger verfügbar ist. Um diese Nachteile wettzumachen, wurde ein Verfahren entwickelt, das in diesem Manuskript vorgestellt wird, um automatisch Maße aus 3D Punktwolken zu bestimmen, da diese eine langfristige Repräsentation von Menschen darstellen. Diese 3D Punktwolken wurden mit dem ASUS Xtion Pro Live RGB-D Sensor und KinFu (der open-source Implementierung von KinectFusion) aufgenommen. Es werden sowohl qualitative als auch quantitative Auswertungen der gewonnenen Maße präsentiert. Weiterhin wurde die Umsetzbarkeit des entwickelten Verfahrens anhand einer Fallstudie beurteilt, in der die gewonnenen Maße genutzt wurden, um den Einfluss von anthropometrischen Daten auf die Berechung der interauralen Zeitdifferenz zu schätzen. In Anbetracht der vielversprechenden Ergebnisse der Bestimmung von Maßen aus 3D Modellen, die mit dem Asus Xtion Pro Live Sensor und KinFu erstellt wurden, (sowie der Ergebnisse aus der Literatur) und der Entwicklung neuer RGB-D Sensoren, wird außerdem eine Studie des Einflusses von sieben verschiedenen RGB-D Sensoren auf die Rekonstruktion mittels KinFu dargestellt. Diese Studie enthält qualitative und quantitative Auswertungen von Rekonstruktionen vier verschiedener Objekte, die in unterschiedlichen Distanzen von 40 cm bis 120 cm aufgenommen wurden. Diese Spanne wurde anhand der Reichweite der Sensoren gewählt. Des Weiteren ist eine Sammlung der erhaltenen Rekonstruktionen als Datensatz verfügbar unter http://uni-tuebingen.de/en/138898.Human head measurements are valuable in ergonomics, acoustics, medicine, computer vision, and computer graphics, among other fields. Such measurements are usually obtained using entirely or partially manual tasks, which is a cumbersome practice since the level of accuracy depends on the expertise of the person that takes the measurements. Moreover, manually acquired measurements contain less information from which new measurements can be deduced when the subject is no longer accessible. Therefore, in order to overcome these disadvantages, an approach to automatically estimate measurements from 3D point clouds, which are long-term representations of humans, has been developed and is described in the presented manuscript. The 3D point clouds were acquired using an RGBD sensor Asus Xtion Pro Live and KinFu (open-source implementation of KinectFusion). Qualitative and quantitative evaluations of the estimated measurements are presented. Furthermore, the feasibility of the developed approach was evaluated through a case study in which the estimated measurements were used to appraise the influence of anthropometric data on the computation of the interaural time difference. Considering the promising results obtained from the estimation of measurements from 3D models acquired with the sensor Asus Xtion Pro Live and KinFu (plus the results reported in the literature) and the development of new RGBD sensors, a study of the influence of seven different RGBD sensors on the reconstruction obtained with KinFu is also presented. This study contains qualitative and quantitative evaluations of reconstructions of four diverse objects captured at different distances that range from 40 cm to 120 cm. Such range was established according to the operational range of the sensors. Furthermore, a collection of obtained reconstructions is available as a dataset in http://uni-tuebingen.de/en/138898

    Performance evaluation of depth completion neural networks for various RGB-D camera technologies

    Get PDF
    Le telecamere RGB-D sono dispositivi utilizzati oggi in vari applicazioni e settori di ricerca che riguardano e richiedono una conoscenza tridimensionale dell'ambiente, espressa come un'immagine di profondità dove ciascun pixel rappresenta la distanza dalla telecamera dell'oggetto a cui appartiene. Le tecniche di acquisizione più diffuse includono la stereoscopia attiva, che triangola due immagini da due punti diversi della telecamera, e le telecamere a luce strutturata, che fanno lo stesso con un'immagine della telecamera e un proiettore laser. Un'altra tecnologia popolare che non richiede la triangolazione, utilizzata nelle telecamere LiDAR, è il ToF (Time of Flight): il rilevamento della profondità si basa sul tempo di ricezione di un segnale emesso, ad esempio un segnale IR, in tutto il campo visivo della telecamera. Le maggiori difficoltà riscontrate con l'uso delle telecamere RGB-D si basano sull'ambiente di acquisizione delle immagini e sulle caratteristiche della telecamera stessa: la presenza di bordi e variazioni nelle condizioni di illuminazione possono portare a mappe di profondità rumorose o incomplete, con un impatto negativo sulle prestazioni delle applicazioni di computer vision e robotica che si basano su informazioni precise sull'immagine di profondità. Negli ultimi anni sono state proposte diverse tecniche di miglioramento della profondità, tra cui l'uso di reti neurali per il completamento dell'immagine di profondità. L'obiettivo del completamento della profondità è quello di generare una previsione di profondità densa, quindi continua sull'intera immagine, a partire dalla conoscenza dell'immagine RGB e dell'immagine grezza di profondità acquisita dal sensore RGB-D. I metodi di completamento della profondità utilizzano input RGB e di profondità grezzi attraverso la tecnologia encoder-decoder, con aggiornamenti recenti che utilizzano processi di raffinazione ed informazioni aggiuntive come i dati semantici per migliorare la precisione ed analizzare i bordi degli oggetti. Tuttavia, gli unici metodi utilizzati al momento sono quelli che si basano su un piccolo campo recettivo, come le CNN e le reti di propagazione spaziale locale. Se ci sono zone di pixel non validi che sono troppo grandi, l'utilizzo di un campo ricettivo limitato presenta lo svantaggio di di produrre previsioni errate. In questa tesi viene proposta una valutazione delle prestazioni dell'attuale stato dell'arte del completamento delle immagini di profondità su uno scenario reale indoor. Per la valutazione sperimentale sono stati presi in considerazione diversi sensori RGB-D, evidenziando i pro e i contro delle diverse tecnologie per la misurazione della profondità con le telecamere. Le varie acquisizioni sono state effettuate in ambienti diversi e con telecamere che utilizzano tecnologie diverse per analizzare la criticità delle profondità ottenute prima direttamente con le telecamere e poi applicando le reti neurali allo stato dell'arte. Secondo i risultati di questo lavoro di tesi, le reti allo stato dell'arte non sono ancora abbastanza mature per essere utilizzate in scenari troppo diversi da quelli utilizzati nel rispettivo training. In particolare, sono state scoperte le seguenti limitazioni: per le reti testate con dati indoor, il training su dati outdoor è meno efficace di un approccio diretto basato su operatori morfologici.RGB-D cameras are devices that are used these days in various fields that benefit from the knowledge of depth in an image. The most popular acquisition techniques include active stereoscopic, which triangulates two camera views, and structured light cameras, which do the same with a camera image and a laser projector. Another popular technology that doesn’t require triangulation, used in LiDAR cameras, is ToF (Time of Flight): depth detection is based on the detection time of an emitted signal, such as an IR signal, throughout the camera’s Field of View. The major complexities encountered with the use of RGB-D cameras are based on the image acquisition environment and the camera characteristics themselves: poorly defined edges and variations in light conditions can lead to noisy or incomplete depth maps, which can negatively impact the performance of computer vision and robotics applications that rely on accurate depth information. Several depth enhancement techniques have been proposed in recent years, many of them making use of neural networks for depth completion. The goal of the depth completion task is to generate a dense depth prediction, continuous over the entire image, from knowledge of the RGB image and raw depth image acquired by the RGB-D sensor. Depth completion methods use RGB and sparse depth inputs through encoder-decoder technology, with recent upgrades using refinement and additional information such as semantic data to improve accuracy and analyze object edges and occluded items. However, the only methods used at this time are those that rely on a small receptive field, like CNNs and Local Spatial Propagation networks. If there are invalid pixel holes that are too big and lack a value in the depth map, this limited receptive field has the disadvantage of producing incorrect predictions. In this thesis, a performance evaluation of the current depth completion state-of-the-art on a real indoor scenario is proposed. Several RGB-D sensors have been taken into account for the experimental evaluation, highlighting the pros and cons of different technologies for depth measurements with cameras. The various acquisitions were carried out in different environments and with cameras using different technologies to analyze the criticality of the depths obtained first directly with the cameras and then applying the state-of-the-art depth completion networks. According to the findings of this thesis work, state-of-the-art networks are not yet mature enough to be used in scenarios that are too dissimilar from those used by the respective authors. We discovered the following limitations in particular: deep networks trained using outdoor scenes are not effective when analyzing indoor scenes. In such cases, a straightforward approach based on morphologic operators is more accurate

    Geotecnologías láser y fotogramétricas aplicadas a la modelización 3D de escenarios complejos en infografía forense

    Get PDF
    [ES]El estudio de la reconstrucción tridimensional de escenas y objetos para su posterior análisis es un tema objeto de investigación por diferentes disciplinas. Una de las disciplinas en las que se hace necesaria la obtención de modelos 3D es en la ingeniería forense, y más concretamente en el campo de la infografía. La infografía forense es una técnica que permite la reconstrucción virtual de diferentes hechos a través de la informática y el manejo de imágenes digitales. La gran ventaja que ofrecen las geotecnologías láser y fotogramétricas para la modelización de escenarios complejos en infografía forense es que son técnicas no invasivas y no destructivas. Es decir, a través de ellas quedará constancia documental de los indicios y evidencias presentes en el escenario, sin alterar en ningún momento sus posiciones espaciales ni sus propiedades físicas, además de dotar de rigor, exhaustividad y realismo a la reconstrucción del suceso. En esta Tesis Doctoral se ha demostrado que la aplicación de diversas geotecnologías tales como, las cámaras digitales convencionales (incluyendo los propios ¿Smartphones¿), los escáneres ¿Gaming Sensor¿ y los sistemas de cartografiado de interiores móviles (¿Indoor Mapping¿), son idóneas en la inspección ocular del delito para su posterior representación gráfica tridimensional. Más concretamente, en esta Tesis Doctoral se presentan las siguientes contribuciones: - Se propone una solución, basada en la integración de la fotogrametría de rango cercano y la visión computacional, como una alternativa eficiente a la reconstrucción 3D de objetos y escenarios complejos para infografía forense, garantizando flexibilidad (trabajar con cualquier tipo de cámara), automatismo (paso del 2d-imágenes al 3d-nubes de puntos) y calidad (resoluciones superiores a los sistemas láser) en los resultados. - Se desarrolla y valida una solución tecnológica sencilla y de bajo coste basada en los dispositivos activos de escaneado ¿Gaming Sensors¿ que permite el análisis dimensional y el modelado tridimensional de la escena forense a pequeñas distancias. - Se testea y valida un sistema novedoso de cartografiado de espacios interiores mediante láser móvil (indoor mapping), ideal en aquellos escenarios forenses complejos y de grandes dimensiones. - Se avanza en una estrategia que permite progresar en el paso de las nubes de puntos, ya sean láser y/o fotogramétricas, a los modelos CAD (Computer Aided Design), a través de la segmentación de dichas nubes de puntos en base al análisis de componentes principales (PCA-Principal Component Analysis), lo que supone una contribución directa al campo de la infografía forense

    Automated Pipe Spool Recognition in Cluttered Point Clouds

    Get PDF
    Construction management is inextricably linked to the awareness and control of 3D geometry. Progress tracking, quality assurance/quality control, and the location, movement, and assembly of materials are all critical processes that rely on the ability to monitor 3D geometry. Therefore, advanced capabilities in site metrology and computer vision will be the foundation for the next generation of assessment tools that empower project leaders, planners, and workers. 3D imaging devices enable the capture of the existing geometric conditions of a construction site or a fabricated mechanical or structural assembly objectively, accurately, quickly, and with greater detail and continuity than any manual measurement methods. Within the construction literature, these devices have been applied in systems that compare as-built scans to 3D CAD design files in order to inspect the geometrical compliance of a fabricated assembly to contractually stipulated dtolerances. However, before comparisons of this type can be made, the particular object of interest needs to be isolated from background objects and clutter captured by the indiscriminate 3D imaging device. Thus far, object of interest extraction from cluttered construction data has remained a manual process. This thesis explores the process of automated information extraction in order to improve the availability of information about 3D geometries on construction projects and improve the execution of component inspection, and progress tracking. Specifically, the scope of the research is limited to automatically recognizing and isolating pipe spools from their cluttered point cloud scans. Two approaches are developed and evaluated. The contributions of the work are as follows: (1) A number of challenges involved in applying RANdom SAmple Consensus (RANSAC) to pipe spool recognition are identified. (2) An effective spatial search and pipe spool extraction algorithm based on local data level curvature estimation, density-based clustering, and bag-of-features matching is presented. The algorithm is validated on two case studies and is shown to successfully extract pipe spools from cluttered point clouds and successfully differentiate between the specific pipe spool of interest and other similar pipe spools in the same search space. Finally, (3) the accuracy of curvature estimation using data collected by low-cost range-cameras is tested and the viability of use of low-cost range-cameras for object search, localization, and extraction is critically assessed

    Optical Methods in Sensing and Imaging for Medical and Biological Applications

    Get PDF
    The recent advances in optical sources and detectors have opened up new opportunities for sensing and imaging techniques which can be successfully used in biomedical and healthcare applications. This book, entitled ‘Optical Methods in Sensing and Imaging for Medical and Biological Applications’, focuses on various aspects of the research and development related to these areas. The book will be a valuable source of information presenting the recent advances in optical methods and novel techniques, as well as their applications in the fields of biomedicine and healthcare, to anyone interested in this subject

    Perception-driven approaches to real-time remote immersive visualization

    Get PDF
    In remote immersive visualization systems, real-time 3D perception through RGB-D cameras, combined with modern Virtual Reality (VR) interfaces, enhances the user’s sense of presence in a remote scene through 3D reconstruction rendered in a remote immersive visualization system. Particularly, in situations when there is a need to visualize, explore and perform tasks in inaccessible environments, too hazardous or distant. However, a remote visualization system requires the entire pipeline from 3D data acquisition to VR rendering satisfies the speed, throughput, and high visual realism. Mainly when using point-cloud, there is a fundamental quality difference between the acquired data of the physical world and the displayed data because of network latency and throughput limitations that negatively impact the sense of presence and provoke cybersickness. This thesis presents state-of-the-art research to address these problems by taking the human visual system as inspiration, from sensor data acquisition to VR rendering. The human visual system does not have a uniform vision across the field of view; It has the sharpest visual acuity at the center of the field of view. The acuity falls off towards the periphery. The peripheral vision provides lower resolution to guide the eye movements so that the central vision visits all the interesting crucial parts. As a first contribution, the thesis developed remote visualization strategies that utilize the acuity fall-off to facilitate the processing, transmission, buffering, and rendering in VR of 3D reconstructed scenes while simultaneously reducing throughput requirements and latency. As a second contribution, the thesis looked into attentional mechanisms to select and draw user engagement to specific information from the dynamic spatio-temporal environment. It proposed a strategy to analyze the remote scene concerning the 3D structure of the scene, its layout, and the spatial, functional, and semantic relationships between objects in the scene. The strategy primarily focuses on analyzing the scene with models the human visual perception uses. It sets a more significant proportion of computational resources on objects of interest and creates a more realistic visualization. As a supplementary contribution, A new volumetric point-cloud density-based Peak Signal-to-Noise Ratio (PSNR) metric is proposed to evaluate the introduced techniques. An in-depth evaluation of the presented systems, comparative examination of the proposed point cloud metric, user studies, and experiments demonstrated that the methods introduced in this thesis are visually superior while significantly reducing latency and throughput
    corecore