53 research outputs found

    3D-BEVIS: Bird's-Eye-View Instance Segmentation

    Full text link
    Recent deep learning models achieve impressive results on 3D scene analysis tasks by operating directly on unstructured point clouds. A lot of progress was made in the field of object classification and semantic segmentation. However, the task of instance segmentation is less explored. In this work, we present 3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on point clouds. Following the idea of previous proposal-free instance segmentation approaches, our model learns a feature embedding and groups the obtained feature space into semantic instances. Current point-based methods scale linearly with the number of points by processing local sub-parts of a scene individually. However, to perform instance segmentation by clustering, globally consistent features are required. Therefore, we propose to combine local point geometry with global context information from an intermediate bird's-eye view representation.Comment: camera-ready version for GCPR '1

    Joint geometry and color point cloud denoising based on graph wavelets

    Get PDF
    A point cloud is an effective 3D geometrical presentation of data paired with different attributes such as transparency, normal and color of each point. The imperfect acquisition process of a 3D point cloud usually generates a significant amount of noise. Hence, point cloud denoising has received a lot of attention. Most of the existing techniques perform point cloud denoising based only on the geometry information of the neighbouring points; there are very few works considering the problem of denoising of color attributes of a point cloud, and taking advantage of the correlation between geometry and color. In this article, we introduce a novel non-iterative set-up for the denoising of point cloud based on spectral graph wavelet transform (SGW) that jointly exploits geometry and color to perform denoising of geometry and color attributes in graph spectral domain. The designed framework is based on the construction of joint geometry and color graph that compacts the energy of smooth graph signals in the low-frequency bands. The noise is then removed from the spectral graph wavelet coefficients by applying data-driven adaptive soft-thresholding. Extensive simulation results show that the proposed denoising technique significantly outperforms state-of-the-art methods using both subjective and objective quality metrics

    RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion

    Full text link
    The raw depth image captured by indoor depth sensors usually has an extensive range of missing depth values due to inherent limitations such as the inability to perceive transparent objects and the limited distance range. The incomplete depth map with missing values burdens many downstream vision tasks, and a rising number of depth completion methods have been proposed to alleviate this issue. While most existing methods can generate accurate dense depth maps from sparse and uniformly sampled depth maps, they are not suitable for complementing large contiguous regions of missing depth values, which is common and critical in images captured in indoor environments. To overcome these challenges, we design a novel two-branch end-to-end fusion network named RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to predict a dense and completed depth map. The first branch employs an encoder-decoder structure, by adhering to the Manhattan world assumption and utilizing normal maps from RGB-D information as guidance, to regress the local dense depth values from the raw depth map. In the other branch, we propose an RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate the features across the two branches, and we append a confidence fusion head to fuse the two outputs of the branches for the final depth map. Extensive experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method clearly improves the depth completion performance, especially in a more realistic setting of indoor environments, with the help of our proposed pseudo depth maps in training.Comment: Haowen Wang and Zhengping Che are with equal contributions. Under review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856

    Robust Change Detection Based on Neural Descriptor Fields

    Full text link
    The ability to reason about changes in the environment is crucial for robots operating over extended periods of time. Agents are expected to capture changes during operation so that actions can be followed to ensure a smooth progression of the working session. However, varying viewing angles and accumulated localization errors make it easy for robots to falsely detect changes in the surrounding world due to low observation overlap and drifted object associations. In this paper, based on the recently proposed category-level Neural Descriptor Fields (NDFs), we develop an object-level online change detection approach that is robust to partially overlapping observations and noisy localization results. Utilizing the shape completion capability and SE(3)-equivariance of NDFs, we represent objects with compact shape codes encoding full object shapes from partial observations. The objects are then organized in a spatial tree structure based on object centers recovered from NDFs for fast queries of object neighborhoods. By associating objects via shape code similarity and comparing local object-neighbor spatial layout, our proposed approach demonstrates robustness to low observation overlap and localization noises. We conduct experiments on both synthetic and real-world sequences and achieve improved change detection results compared to multiple baseline methods. Project webpage: https://yilundu.github.io/ndf_changeComment: 8 pages, 8 figures, and 2 tables. Accepted to IROS 2022. Project webpage: https://yilundu.github.io/ndf_chang

    3D data fusion from multiple sensors and its applications

    Get PDF
    The introduction of depth cameras in the mass market contributed to make computer vision applicable to many real world applications, such as human interaction in virtual environments, autonomous driving, robotics and 3D reconstruction. All these problems were originally tackled by means of standard cameras, but the intrinsic ambiguity in the bidimensional images led to the development of depth cameras technologies. Stereo vision was first introduced to provide an estimate of the 3D geometry of the scene. Structured light depth cameras were developed to use the same concepts of stereo vision but overcome some of the problems of passive technologies. Finally, Time-of-Flight (ToF) depth cameras solve the same depth estimation problem by using a different technology. This thesis focuses on the acquisition of depth data from multiple sensors and presents techniques to efficiently combine the information of different acquisition systems. The three main technologies developed to provide depth estimation are first reviewed, presenting operating principles and practical issues of each family of sensors. The use of multiple sensors then is investigated, providing practical solutions to the problem of 3D reconstruction and gesture recognition. Data from stereo vision systems and ToF depth cameras are combined together to provide a higher quality depth map. A confidence measure of depth data from the two systems is used to guide the depth data fusion. The lack of datasets with data from multiple sensors is addressed by proposing a system for the collection of data and ground truth depth, and a tool to generate synthetic data from standard cameras and ToF depth cameras. For gesture recognition, a depth camera is paired with a Leap Motion device to boost the performance of the recognition task. A set of features from the two devices is used in a classification framework based on Support Vector Machines and Random Forests

    Point-based Gesture Recognition Techniques

    Get PDF
    Gesture recognition is a computing process that attempts to recognize and interpret human gestures through the use of mathematical algorithms. In this paper, we describe Point Based Gesture Recognition and Point Clouds nearest neighbors and sampling. Also, we explore these techniques with previous studies

    3D modeling by low-cost range cameras: methods and potentialities

    Get PDF
    Nowadays the demand of 3D models for the documentation and visualization of objects and environments is continually increasing. However, the traditional 3D modeling techniques and systems (i.e. photogrammetry and laser scanners) can be very expensive and/or onerous, as they often need qualified technicians and specific post-processing phases. Thus, it is important to find new instruments, able to provide low-cost 3D data in real time and in a user-friendly way. Range cameras seem one of the most promising tools to achieve this goal: they are low-cost 3D scanners, able to easily collect dense point clouds at high frame rate, in a short range (few meters) from the imaged objects. Such sensors, though, still remain a relatively new 3D measurement technology, not yet exhaustively studied. Thus, it is essential to assess the metric quality of the depth data retrieved by these devices. This thesis is precisely included in this background: the aim is to evaluate the potentialities of range cameras for geomatic applications and to provide useful indications for their practical use. Therefore the three most popular and/or promising low-cost range cameras, namely the Microsoft Kinect v1, the Micorsoft Kinect v2 and the Occipital Structure Sensor, were firstly characterized from a geomatic point of view in order to assess the metric quality of the depth data retrieved by them. These investigations showed that such sensors present a depth precision and a depth accuracy in the range of some millimeters to few centimeters, depending both on the operational principle adopted by the single device (Structured Light or Time of Flight) and on the depth itself. On this basis, two different models were identified for precision and accuracy vs. depth: parabolic for the Structured Light (the Kinect v1 and the Structure Sensor) and linear for Time of Flight (the Kinect v2) sensors, respectively. Then the effectiveness of such accuracy models was demonstrated to be globally compliant with the found precision models for all of the three sensors. Furthermore, the proposed calibration model was validated for the Structure Sensor: with calibration, the overall RMSE, decreased from 27 to 16 mm. Finally four case studies were carried out in order to evaluate: • the performances of the Kinect v2 sensor for monitoring oscillatory motions (relevant for structural and/or industrial monitoring), demonstrating a good ability of the system to detect movements and displacements; • the integration feasibility of Kinect v2 with a classical stereo system, highlighting the need of an integration of range cameras into 3D classical photogrammetric systems especially to overpass limitations due to acquisition completeness; • the potentialities of the Structure Sensor for the 3D surveying of indoor environments, showing a more than sufficient accuracy for most applications; • the potentialities of the Structure Sensor to document archaeological small finds, where metric accuracy seems to be rather good while textured models shows some misalignments. In conclusion, although the experimental results demonstrated that range cameras have the capability to give good and encouraging results, the performances of traditional 3D modeling techniques in terms of accuracy and precision are still superior and must be preferred when the accuracy requirements are restrictive. But for a very wide and continuously increasing range of applications, when the required accuracy can be at the level from few millimeters (very close-range) to few centimeters, then range cameras can be a valuable alternative, especially when non expert users are involved. Furthermore, the technology on which these sensors are based is continually evolving, driven also by the new generation of AR/VR reality kits, and certainly also their geometric performances will soon improve

    In-Field Estimation of Orange Number and Size by 3D Laser Scanning

    Get PDF
    The estimation of fruit load of an orchard prior to harvest is useful for planning harvest logistics and trading decisions. The manual fruit counting and the determination of the harvesting capacity of the field results are expensive and time-consuming. The automatic counting of fruits and their geometry characterization with 3D LiDAR models can be an interesting alternative. Field research has been conducted in the province of Cordoba (Southern Spain) on 24 ‘Salustiana’ variety orange trees—Citrus sinensis (L.) Osbeck—(12 were pruned and 12 unpruned). Harvest size and the number of each fruit were registered. Likewise, the unitary weight of the fruits and their diameter were determined (N = 160). The orange trees were also modelled with 3D LiDAR with colour capture for their subsequent segmentation and fruit detection by using a K-means algorithm. In the case of pruned trees, a significant regression was obtained between the real and modelled fruit number (R2 = 0.63, p = 0.01). The opposite case occurred in the unpruned ones (p = 0.18) due to a leaf occlusion problem. The mean diameters proportioned by the algorithm (72.15 ± 22.62 mm) did not present significant differences (p = 0.35) with the ones measured on fruits (72.68 ± 5.728 mm). Even though the use of 3D LiDAR scans is time-consuming, the harvest size estimation obtained in this research is very accurate

    A Survey of Surface Reconstruction from Point Clouds

    Get PDF
    International audienceThe area of surface reconstruction has seen substantial progress in the past two decades. The traditional problem addressed by surface reconstruction is to recover the digital representation of a physical shape that has been scanned, where the scanned data contains a wide variety of defects. While much of the earlier work has been focused on reconstructing a piece-wise smooth representation of the original shape, recent work has taken on more specialized priors to address significantly challenging data imperfections, where the reconstruction can take on different representations – not necessarily the explicit geometry. We survey the field of surface reconstruction, and provide a categorization with respect to priors, data imperfections, and reconstruction output. By considering a holistic view of surface reconstruction, we show a detailed characterization of the field, highlight similarities between diverse reconstruction techniques, and provide directions for future work in surface reconstruction
    • …
    corecore