285 research outputs found
Hierarchical Salient Object Detection for Assisted Grasping
Visual scene decomposition into semantic entities is one of the major
challenges when creating a reliable object grasping system. Recently, we
introduced a bottom-up hierarchical clustering approach which is able to
segment objects and parts in a scene. In this paper, we introduce a transform
from such a segmentation into a corresponding, hierarchical saliency function.
In comprehensive experiments we demonstrate its ability to detect salient
objects in a scene. Furthermore, this hierarchical saliency defines a most
salient corresponding region (scale) for every point in an image. Based on
this, an easy-to-use pick and place manipulation system was developed and
tested exemplarily.Comment: Accepted for ICRA 201
3D Face Reconstruction from Light Field Images: A Model-free Approach
Reconstructing 3D facial geometry from a single RGB image has recently
instigated wide research interest. However, it is still an ill-posed problem
and most methods rely on prior models hence undermining the accuracy of the
recovered 3D faces. In this paper, we exploit the Epipolar Plane Images (EPI)
obtained from light field cameras and learn CNN models that recover horizontal
and vertical 3D facial curves from the respective horizontal and vertical EPIs.
Our 3D face reconstruction network (FaceLFnet) comprises a densely connected
architecture to learn accurate 3D facial curves from low resolution EPIs. To
train the proposed FaceLFnets from scratch, we synthesize photo-realistic light
field images from 3D facial scans. The curve by curve 3D face estimation
approach allows the networks to learn from only 14K images of 80 identities,
which still comprises over 11 Million EPIs/curves. The estimated facial curves
are merged into a single pointcloud to which a surface is fitted to get the
final 3D face. Our method is model-free, requires only a few training samples
to learn FaceLFnet and can reconstruct 3D faces with high accuracy from single
light field images under varying poses, expressions and lighting conditions.
Comparison on the BU-3DFE and BU-4DFE datasets show that our method reduces
reconstruction errors by over 20% compared to recent state of the art
Aggressive saliency-aware point cloud compression
The increasing demand for accurate representations of 3D scenes, combined
with immersive technologies has led point clouds to extensive popularity.
However, quality point clouds require a large amount of data and therefore the
need for compression methods is imperative. In this paper, we present a novel,
geometry-based, end-to-end compression scheme, that combines information on the
geometrical features of the point cloud and the user's position, achieving
remarkable results for aggressive compression schemes demanding very small bit
rates. After separating visible and non-visible points, four saliency maps are
calculated, utilizing the point cloud's geometry and distance from the user,
the visibility information, and the user's focus point. A combination of these
maps results in a final saliency map, indicating the overall significance of
each point and therefore quantizing different regions with a different number
of bits during the encoding process. The decoder reconstructs the point cloud
making use of delta coordinates and solving a sparse linear system. Evaluation
studies and comparisons with the geometry-based point cloud compression (G-PCC)
algorithm by the Moving Picture Experts Group (MPEG), carried out for a variety
of point clouds, demonstrate that the proposed method achieves significantly
better results for small bit rates
Fast Graph-Based Object Segmentation for RGB-D Images
Object segmentation is an important capability for robotic systems, in
particular for grasping. We present a graph- based approach for the
segmentation of simple objects from RGB-D images. We are interested in
segmenting objects with large variety in appearance, from lack of texture to
strong textures, for the task of robotic grasping. The algorithm does not rely
on image features or machine learning. We propose a modified Canny edge
detector for extracting robust edges by using depth information and two simple
cost functions for combining color and depth cues. The cost functions are used
to build an undirected graph, which is partitioned using the concept of
internal and external differences between graph regions. The partitioning is
fast with O(NlogN) complexity. We also discuss ways to deal with missing depth
information. We test the approach on different publicly available RGB-D object
datasets, such as the Rutgers APC RGB-D dataset and the RGB-D Object Dataset,
and compare the results with other existing methods
A Unified BEV Model for Joint Learning of 3D Local Features and Overlap Estimation
Pairwise point cloud registration is a critical task for many applications,
which heavily depends on finding correct correspondences from the two point
clouds. However, the low overlap between input point clouds causes the
registration to fail easily, leading to mistaken overlapping and mismatched
correspondences, especially in scenes where non-overlapping regions contain
similar structures. In this paper, we present a unified bird's-eye view (BEV)
model for jointly learning of 3D local features and overlap estimation to
fulfill pairwise registration and loop closure. Feature description is
performed by a sparse UNet-like network based on BEV representation, and 3D
keypoints are extracted by a detection head for 2D locations, and a regression
head for heights. For overlap detection, a cross-attention module is applied
for interacting contextual information of input point clouds, followed by a
classification head to estimate the overlapping region. We evaluate our unified
model extensively on the KITTI dataset and Apollo-SouthBay dataset. The
experiments demonstrate that our method significantly outperforms existing
methods on overlap estimation, especially in scenes with small overlaps. It
also achieves top registration performance on both datasets in terms of
translation and rotation errors.Comment: 8 pages. Accepted by ICRA-202
- …