5,175 research outputs found
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo
Scene Flow Benchmark in November 201
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
PetroSurf3D - A Dataset for high-resolution 3D Surface Segmentation
The development of powerful 3D scanning hardware and reconstruction
algorithms has strongly promoted the generation of 3D surface reconstructions
in different domains. An area of special interest for such 3D reconstructions
is the cultural heritage domain, where surface reconstructions are generated to
digitally preserve historical artifacts. While reconstruction quality nowadays
is sufficient in many cases, the robust analysis (e.g. segmentation, matching,
and classification) of reconstructed 3D data is still an open topic. In this
paper, we target the automatic and interactive segmentation of high-resolution
3D surface reconstructions from the archaeological domain. To foster research
in this field, we introduce a fully annotated and publicly available
large-scale 3D surface dataset including high-resolution meshes, depth maps and
point clouds as a novel benchmark dataset to the community. We provide baseline
results for our existing random forest-based approach and for the first time
investigate segmentation with convolutional neural networks (CNNs) on the data.
Results show that both approaches have complementary strengths and weaknesses
and that the provided dataset represents a challenge for future research.Comment: CBMI Submission; Dataset and more information can be found at
http://lrs.icg.tugraz.at/research/petroglyphsegmentation
Learning from Millions of 3D Scans for Large-scale 3D Face Recognition
Deep networks trained on millions of facial images are believed to be closely
approaching human-level performance in face recognition. However, open world
face recognition still remains a challenge. Although, 3D face recognition has
an inherent edge over its 2D counterpart, it has not benefited from the recent
developments in deep learning due to the unavailability of large training as
well as large test datasets. Recognition accuracies have already saturated on
existing 3D face datasets due to their small gallery sizes. Unlike 2D
photographs, 3D facial scans cannot be sourced from the web causing a
bottleneck in the development of deep 3D face recognition networks and
datasets. In this backdrop, we propose a method for generating a large corpus
of labeled 3D face identities and their multiple instances for training and a
protocol for merging the most challenging existing 3D datasets for testing. We
also propose the first deep CNN model designed specifically for 3D face
recognition and trained on 3.1 Million 3D facial scans of 100K identities. Our
test dataset comprises 1,853 identities with a single 3D scan in the gallery
and another 31K scans as probes, which is several orders of magnitude larger
than existing ones. Without fine tuning on this dataset, our network already
outperforms state of the art face recognition by over 10%. We fine tune our
network on the gallery set to perform end-to-end large scale 3D face
recognition which further improves accuracy. Finally, we show the efficacy of
our method for the open world face recognition problem.Comment: 11 page
Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis
We introduce a data-driven approach to complete partial 3D shapes through a
combination of volumetric deep neural networks and 3D shape synthesis. From a
partially-scanned input shape, our method first infers a low-resolution -- but
complete -- output. To this end, we introduce a 3D-Encoder-Predictor Network
(3D-EPN) which is composed of 3D convolutional layers. The network is trained
to predict and fill in missing data, and operates on an implicit surface
representation that encodes both known and unknown space. This allows us to
predict global structure in unknown areas at high accuracy. We then correlate
these intermediary results with 3D geometry from a shape database at test time.
In a final pass, we propose a patch-based 3D shape synthesis method that
imposes the 3D geometry from these retrieved shapes as constraints on the
coarsely-completed mesh. This synthesis process enables us to reconstruct
fine-scale detail and generate high-resolution output while respecting the
global mesh structure obtained by the 3D-EPN. Although our 3D-EPN outperforms
state-of-the-art completion method, the main contribution in our work lies in
the combination of a data-driven shape predictor and analytic 3D shape
synthesis. In our results, we show extensive evaluations on a newly-introduced
shape completion benchmark for both real-world and synthetic data
Object segmentation in depth maps with one user click and a synthetically trained fully convolutional network
With more and more household objects built on planned obsolescence and
consumed by a fast-growing population, hazardous waste recycling has become a
critical challenge. Given the large variability of household waste, current
recycling platforms mostly rely on human operators to analyze the scene,
typically composed of many object instances piled up in bulk. Helping them by
robotizing the unitary extraction is a key challenge to speed up this tedious
process. Whereas supervised deep learning has proven very efficient for such
object-level scene understanding, e.g., generic object detection and
segmentation in everyday scenes, it however requires large sets of per-pixel
labeled images, that are hardly available for numerous application contexts,
including industrial robotics. We thus propose a step towards a practical
interactive application for generating an object-oriented robotic grasp,
requiring as inputs only one depth map of the scene and one user click on the
next object to extract. More precisely, we address in this paper the middle
issue of object seg-mentation in top views of piles of bulk objects given a
pixel location, namely seed, provided interactively by a human operator. We
propose a twofold framework for generating edge-driven instance segments.
First, we repurpose a state-of-the-art fully convolutional object contour
detector for seed-based instance segmentation by introducing the notion of
edge-mask duality with a novel patch-free and contour-oriented loss function.
Second, we train one model using only synthetic scenes, instead of manually
labeled training data. Our experimental results show that considering edge-mask
duality for training an encoder-decoder network, as we suggest, outperforms a
state-of-the-art patch-based network in the present application context.Comment: This is a pre-print of an article published in Human Friendly
Robotics, 10th International Workshop, Springer Proceedings in Advanced
Robotics, vol 7. The final authenticated version is available online at:
https://doi.org/10.1007/978-3-319-89327-3\_16, Springer Proceedings in
Advanced Robotics, Siciliano Bruno, Khatib Oussama, In press, Human Friendly
Robotics, 10th International Workshop,
- …