88 research outputs found
DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels
In the context of scene understanding, a variety of methods exists to
estimate different information channels from mono or stereo images, including
disparity, depth, and normals. Although several advances have been reported in
the recent years for these tasks, the estimated information is often imprecise
particularly near depth discontinuities or creases. Studies have however shown
that precisely such depth edges carry critical cues for the perception of
shape, and play important roles in tasks like depth-based segmentation or
foreground selection. Unfortunately, the currently extracted channels often
carry conflicting signals, making it difficult for subsequent applications to
effectively use them. In this paper, we focus on the problem of obtaining
high-precision depth edges (i.e., depth contours and creases) by jointly
analyzing such unreliable information channels. We propose DepthCut, a
data-driven fusion of the channels using a convolutional neural network trained
on a large dataset with known depth. The resulting depth edges can be used for
segmentation, decomposing a scene into depth layers with relatively flat depth,
or improving the accuracy of the depth estimate near depth edges by
constraining its gradients to agree with these edges. Quantitatively, we
compare against 15 variants of baselines and demonstrate that our depth edges
result in an improved segmentation performance and an improved depth estimate
near depth edges compared to data-agnostic channel fusion. Qualitatively, we
demonstrate that the depth edges result in superior segmentation and depth
orderings.Comment: 12 page
SceneNet: Understanding Real World Indoor Scenes With Synthetic Data
Scene understanding is a prerequisite to many high level tasks for any
automated intelligent machine operating in real world environments. Recent
attempts with supervised learning have shown promise in this direction but also
highlighted the need for enormous quantity of supervised data --- performance
increases in proportion to the amount of data used. However, this quickly
becomes prohibitive when considering the manual labour needed to collect such
data. In this work, we focus our attention on depth based semantic per-pixel
labelling as a scene understanding problem and show the potential of computer
graphics to generate virtually unlimited labelled data from synthetic 3D
scenes. By carefully synthesizing training data with appropriate noise models
we show comparable performance to state-of-the-art RGBD systems on NYUv2
dataset despite using only depth data as input and set a benchmark on
depth-based segmentation on SUN RGB-D dataset. Additionally, we offer a route
to generating synthesized frame or video data, and understanding of different
factors influencing performance gains
CaloriNet: From silhouettes to calorie estimation in private environments
We propose a novel deep fusion architecture, CaloriNet, for the online
estimation of energy expenditure for free living monitoring in private
environments, where RGB data is discarded and replaced by silhouettes. Our
fused convolutional neural network architecture is trainable end-to-end, to
estimate calorie expenditure, using temporal foreground silhouettes alongside
accelerometer data. The network is trained and cross-validated on a publicly
available dataset, SPHERE_RGBD + Inertial_calorie. Results show
state-of-the-art minimum error on the estimation of energy expenditure
(calories per minute), outperforming alternative, standard and single-modal
techniques.Comment: 11 pages, 7 figure
Visual Tracking Based on Human Feature Extraction from Surveillance Video for Human Recognition
A multimodal human identification system based on face and body recognition may be made available for effective biometric authentication. The outcomes are achieved by extracting facial recognition characteristics using several extraction techniques, including Eigen-face and Principle Component Analysis (PCA). Systems for authenticating people using their bodies and faces are implemented using artificial neural networks (ANN) and genetic optimization techniques as classifiers. Through feature fusion and scores fusion, the biometric systems for the human body and face are merged to create a single multimodal biometric system. Human bodies may be identified with astonishing accuracy and effectiveness thanks to the SDK for the Kinect sensor. To identify people, biometrics aims to mimic the pattern recognition process. In comparison to traditional authentication methods based on secrets and tokens, it is a more dependable and safe option. Human physiological and behavioral traits are used by biometric technologies to identify people automatically. These characteristics must fulfill many criteria, especially those that relate to universality, efficacy, and applicability
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
Unsupervised object modeling is important in robotics, especially for
handling a large set of objects. We present a method for unsupervised 3D object
discovery, reconstruction, and localization that exploits multiple instances of
an identical object contained in a single RGB-D image. The proposed method does
not rely on segmentation, scene knowledge, or user input, and thus is easily
scalable. Our method aims to find recurrent patterns in a single RGB-D image by
utilizing appearance and geometry of the salient regions. We extract keypoints
and match them in pairs based on their descriptors. We then generate triplets
of the keypoints matching with each other using several geometric criteria to
minimize false matches. The relative poses of the matched triplets are computed
and clustered to discover sets of triplet pairs with similar relative poses.
Triplets belonging to the same set are likely to belong to the same object and
are used to construct an initial object model. Detection of remaining instances
with the initial object model using RANSAC allows to further expand and refine
the model. The automatically generated object models are both compact and
descriptive. We show quantitative and qualitative results on RGB-D images with
various objects including some from the Amazon Picking Challenge. We also
demonstrate the use of our method in an object picking scenario with a robotic
arm
- …