53 research outputs found
3D-BEVIS: Bird's-Eye-View Instance Segmentation
Recent deep learning models achieve impressive results on 3D scene analysis
tasks by operating directly on unstructured point clouds. A lot of progress was
made in the field of object classification and semantic segmentation. However,
the task of instance segmentation is less explored. In this work, we present
3D-BEVIS, a deep learning framework for 3D semantic instance segmentation on
point clouds. Following the idea of previous proposal-free instance
segmentation approaches, our model learns a feature embedding and groups the
obtained feature space into semantic instances. Current point-based methods
scale linearly with the number of points by processing local sub-parts of a
scene individually. However, to perform instance segmentation by clustering,
globally consistent features are required. Therefore, we propose to combine
local point geometry with global context information from an intermediate
bird's-eye view representation.Comment: camera-ready version for GCPR '1
Joint geometry and color point cloud denoising based on graph wavelets
A point cloud is an effective 3D geometrical presentation of data paired with different attributes such as transparency, normal and color of each point. The imperfect acquisition process of a 3D point cloud usually generates a significant amount of noise. Hence, point cloud denoising has received a lot of attention. Most of the existing techniques perform point cloud denoising based only on the geometry information of the neighbouring points; there are very few works considering the problem of denoising of color attributes of a point cloud, and taking advantage of the correlation between geometry and color. In this article, we introduce a novel non-iterative set-up for the denoising of point cloud based on spectral graph wavelet transform (SGW) that jointly exploits geometry and color to perform denoising of geometry and color attributes in graph spectral domain. The designed framework is based on the construction of joint geometry and color graph that compacts the energy of smooth graph signals in the low-frequency bands. The noise is then removed from the spectral graph wavelet coefficients by applying data-driven adaptive soft-thresholding. Extensive simulation results show that the proposed denoising technique significantly outperforms state-of-the-art methods using both subjective and objective quality metrics
RDFC-GAN: RGB-Depth Fusion CycleGAN for Indoor Depth Completion
The raw depth image captured by indoor depth sensors usually has an extensive
range of missing depth values due to inherent limitations such as the inability
to perceive transparent objects and the limited distance range. The incomplete
depth map with missing values burdens many downstream vision tasks, and a
rising number of depth completion methods have been proposed to alleviate this
issue. While most existing methods can generate accurate dense depth maps from
sparse and uniformly sampled depth maps, they are not suitable for
complementing large contiguous regions of missing depth values, which is common
and critical in images captured in indoor environments. To overcome these
challenges, we design a novel two-branch end-to-end fusion network named
RDFC-GAN, which takes a pair of RGB and incomplete depth images as input to
predict a dense and completed depth map. The first branch employs an
encoder-decoder structure, by adhering to the Manhattan world assumption and
utilizing normal maps from RGB-D information as guidance, to regress the local
dense depth values from the raw depth map. In the other branch, we propose an
RGB-depth fusion CycleGAN to transfer the RGB image to the fine-grained
textured depth map. We adopt adaptive fusion modules named W-AdaIN to propagate
the features across the two branches, and we append a confidence fusion head to
fuse the two outputs of the branches for the final depth map. Extensive
experiments on NYU-Depth V2 and SUN RGB-D demonstrate that our proposed method
clearly improves the depth completion performance, especially in a more
realistic setting of indoor environments, with the help of our proposed pseudo
depth maps in training.Comment: Haowen Wang and Zhengping Che are with equal contributions. Under
review. An earlier version has been accepted by CVPR 2022 (arXiv:2203.10856
Robust Change Detection Based on Neural Descriptor Fields
The ability to reason about changes in the environment is crucial for robots
operating over extended periods of time. Agents are expected to capture changes
during operation so that actions can be followed to ensure a smooth progression
of the working session. However, varying viewing angles and accumulated
localization errors make it easy for robots to falsely detect changes in the
surrounding world due to low observation overlap and drifted object
associations. In this paper, based on the recently proposed category-level
Neural Descriptor Fields (NDFs), we develop an object-level online change
detection approach that is robust to partially overlapping observations and
noisy localization results. Utilizing the shape completion capability and
SE(3)-equivariance of NDFs, we represent objects with compact shape codes
encoding full object shapes from partial observations. The objects are then
organized in a spatial tree structure based on object centers recovered from
NDFs for fast queries of object neighborhoods. By associating objects via shape
code similarity and comparing local object-neighbor spatial layout, our
proposed approach demonstrates robustness to low observation overlap and
localization noises. We conduct experiments on both synthetic and real-world
sequences and achieve improved change detection results compared to multiple
baseline methods. Project webpage: https://yilundu.github.io/ndf_changeComment: 8 pages, 8 figures, and 2 tables. Accepted to IROS 2022. Project
webpage: https://yilundu.github.io/ndf_chang
3D data fusion from multiple sensors and its applications
The introduction of depth cameras in the mass market contributed to make computer vision applicable to many real world applications, such as human interaction in virtual environments, autonomous driving, robotics and 3D reconstruction. All these problems were originally tackled by means of standard cameras, but the intrinsic ambiguity in the bidimensional images led to the development of depth cameras technologies. Stereo vision was first introduced to provide an estimate of the 3D geometry of the scene. Structured light depth cameras were developed to use the same concepts of stereo vision but overcome some of the problems of passive technologies. Finally, Time-of-Flight (ToF) depth cameras solve the same depth estimation problem by using a different technology.
This thesis focuses on the acquisition of depth data from multiple sensors and presents techniques to efficiently combine the information of different acquisition systems. The three main technologies developed to provide depth estimation are first reviewed, presenting operating principles and practical issues of each family of sensors. The use of multiple sensors then is investigated, providing practical solutions to the problem of 3D reconstruction and gesture recognition. Data from stereo vision systems and ToF depth cameras are combined together to provide a higher quality depth map. A confidence measure of depth data from the two systems is used to guide the depth data fusion. The lack of datasets with data from multiple sensors is addressed by proposing a system for the collection of data and ground truth depth, and a tool to generate synthetic data from standard cameras and ToF depth cameras. For gesture recognition, a depth camera is paired with a Leap Motion device to boost the performance of the recognition task. A set of features from the two devices is used in a classification framework based on Support Vector Machines and Random Forests
Point-based Gesture Recognition Techniques
Gesture recognition is a computing process that attempts to recognize and interpret human gestures through the use of mathematical algorithms. In this paper, we describe Point Based Gesture Recognition and Point Clouds nearest neighbors and sampling. Also, we explore these techniques with previous studies
3D modeling by low-cost range cameras: methods and potentialities
Nowadays the demand of 3D models for the documentation and visualization of objects and environments is continually increasing. However, the traditional 3D modeling techniques and systems (i.e. photogrammetry and laser scanners) can be very expensive and/or onerous, as they often need qualified technicians and specific post-processing phases. Thus, it is important to find new instruments, able to provide low-cost 3D data in real time and in a user-friendly way.
Range cameras seem one of the most promising tools to achieve this goal: they are low-cost 3D scanners, able to easily collect dense point clouds at high frame rate, in a short range (few meters) from the imaged objects.
Such sensors, though, still remain a relatively new 3D measurement technology, not yet exhaustively studied. Thus, it is essential to assess the metric quality of the depth data retrieved by these devices.
This thesis is precisely included in this background: the aim is to evaluate the potentialities of range cameras for geomatic applications and to provide useful indications for their practical use. Therefore the three most popular and/or promising low-cost range cameras, namely the Microsoft Kinect v1, the Micorsoft Kinect v2 and the Occipital Structure Sensor, were firstly characterized from a geomatic point of view in order to assess the metric quality of the depth data retrieved by them.
These investigations showed that such sensors present a depth precision and a depth accuracy in the range of some millimeters to few centimeters, depending both on the operational principle adopted by the single device (Structured Light or Time of Flight) and on the depth itself.
On this basis, two different models were identified for precision and accuracy vs. depth: parabolic for the Structured Light (the Kinect v1 and the Structure Sensor) and linear for Time of Flight (the Kinect v2) sensors, respectively. Then the effectiveness of such accuracy models was demonstrated to be globally compliant with the found precision models for all of the three sensors.
Furthermore, the proposed calibration model was validated for the Structure Sensor: with calibration, the overall RMSE, decreased from 27 to 16 mm.
Finally four case studies were carried out in order to evaluate:
• the performances of the Kinect v2 sensor for monitoring oscillatory motions (relevant for structural and/or industrial monitoring), demonstrating a good ability of the system to detect movements and displacements;
• the integration feasibility of Kinect v2 with a classical stereo system, highlighting the need of an integration of range cameras into 3D classical photogrammetric systems especially to overpass limitations due to acquisition completeness;
• the potentialities of the Structure Sensor for the 3D surveying of indoor environments, showing a more than sufficient accuracy for most applications;
• the potentialities of the Structure Sensor to document archaeological small finds, where metric accuracy seems to be rather good while textured models shows some misalignments.
In conclusion, although the experimental results demonstrated that range cameras have the capability to give good and encouraging results, the performances of traditional 3D modeling techniques in terms of accuracy and precision are still superior and must be preferred when the accuracy requirements are restrictive.
But for a very wide and continuously increasing range of applications, when the required accuracy can be at the level from few millimeters (very close-range) to few centimeters, then range cameras can be a valuable alternative, especially when non expert users are involved. Furthermore, the technology on which these sensors are based is continually evolving, driven also by the new generation of AR/VR reality kits, and certainly also their geometric performances will soon improve
In-Field Estimation of Orange Number and Size by 3D Laser Scanning
The estimation of fruit load of an orchard prior to harvest is useful for planning harvest logistics and trading decisions. The manual fruit counting and the determination of the harvesting capacity of the field results are expensive and time-consuming. The automatic counting of fruits and their geometry characterization with 3D LiDAR models can be an interesting alternative. Field research has been conducted in the province of Cordoba (Southern Spain) on 24 ‘Salustiana’ variety orange trees—Citrus sinensis (L.) Osbeck—(12 were pruned and 12 unpruned). Harvest size and the number of each fruit were registered. Likewise, the unitary weight of the fruits and their diameter were determined (N = 160). The orange trees were also modelled with 3D LiDAR with colour capture for their subsequent segmentation and fruit detection by using a K-means algorithm. In the case of pruned trees, a significant regression was obtained between the real and modelled fruit number (R2 = 0.63, p = 0.01). The opposite case occurred in the unpruned ones (p = 0.18) due to a leaf occlusion problem. The mean diameters proportioned by the algorithm (72.15 ± 22.62 mm) did not present significant differences (p = 0.35) with the ones measured on fruits (72.68 ± 5.728 mm). Even though the use of 3D LiDAR scans is time-consuming, the harvest size estimation obtained in this research is very accurate
A Survey of Surface Reconstruction from Point Clouds
International audienceThe area of surface reconstruction has seen substantial progress in the past two decades. The traditional problem addressed by surface reconstruction is to recover the digital representation of a physical shape that has been scanned, where the scanned data contains a wide variety of defects. While much of the earlier work has been focused on reconstructing a piece-wise smooth representation of the original shape, recent work has taken on more specialized priors to address significantly challenging data imperfections, where the reconstruction can take on different representations – not necessarily the explicit geometry. We survey the field of surface reconstruction, and provide a categorization with respect to priors, data imperfections, and reconstruction output. By considering a holistic view of surface reconstruction, we show a detailed characterization of the field, highlight similarities between diverse reconstruction techniques, and provide directions for future work in surface reconstruction
- …