18,189 research outputs found
Hand-Object Interaction Detection with Fully Convolutional Networks
Schröder M, Ritter H. Hand-Object Interaction Detection with Fully Convolutional Networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2017: 18-25.Detecting hand-object interactions is a challenging
problem with many applications in the human-computer interaction
domain. We present a real-time method that automatically
detects hand-object interactions in RGBD sensor
data and tracks the object’s rigid pose over time. The
detection is performed using a fully convolutional neural
network, which is purposefully trained to discern the relationship
between hands and objects and which predicts
pixel-wise class probabilities. This output is used in a probabilistic
pixel labeling strategy that explicitly accounts for
the uncertainty of the prediction. Based on the labeling of
object pixels, the object is tracked over time using modelbased
registration. We evaluate the accuracy and generalizability
of our approach and make our annotated RGBD
dataset as well as our trained models publicly available
Hand-Object Interaction Detection with Fully Convolutional Networks
Schröder M, Ritter H. Hand-Object Interaction Detection with Fully Convolutional Networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 2017: 18-25.Detecting hand-object interactions is a challenging
problem with many applications in the human-computer interaction
domain. We present a real-time method that automatically
detects hand-object interactions in RGBD sensor
data and tracks the object’s rigid pose over time. The
detection is performed using a fully convolutional neural
network, which is purposefully trained to discern the relationship
between hands and objects and which predicts
pixel-wise class probabilities. This output is used in a probabilistic
pixel labeling strategy that explicitly accounts for
the uncertainty of the prediction. Based on the labeling of
object pixels, the object is tracked over time using modelbased
registration. We evaluate the accuracy and generalizability
of our approach and make our annotated RGBD
dataset as well as our trained models publicly available
Forecasting Hands and Objects in Future Frames
This paper presents an approach to forecast future presence and location of
human hands and objects. Given an image frame, the goal is to predict what
objects will appear in the future frame (e.g., 5 seconds later) and where they
will be located at, even when they are not visible in the current frame. The
key idea is that (1) an intermediate representation of a convolutional object
recognition model abstracts scene information in its frame and that (2) we can
predict (i.e., regress) such representations corresponding to the future frames
based on that of the current frame. We design a new two-stream convolutional
neural network (CNN) architecture for videos by extending the state-of-the-art
convolutional object detection network, and present a new fully convolutional
regression network for predicting future scene representations. Our experiments
confirm that combining the regressed future representation with our detection
network allows reliable estimation of future hands and objects in videos. We
obtain much higher accuracy compared to the state-of-the-art future object
presence forecast method on a public dataset
Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression
We design a new approach that allows robot learning of new activities from
unlabeled human example videos. Given videos of humans executing the same
activity from a human's viewpoint (i.e., first-person videos), our objective is
to make the robot learn the temporal structure of the activity as its future
regression network, and learn to transfer such model for its own motor
execution. We present a new deep learning model: We extend the state-of-the-art
convolutional object detection network for the representation/estimation of
human hands in training videos, and newly introduce the concept of using a
fully convolutional network to regress (i.e., predict) the intermediate scene
representation corresponding to the future frame (e.g., 1-2 seconds later).
Combining these allows direct prediction of future locations of human hands and
objects, which enables the robot to infer the motor control plan using our
manipulation network. We experimentally confirm that our approach makes
learning of robot activities from unlabeled human interaction videos possible,
and demonstrate that our robot is able to execute the learned collaborative
activities in real-time directly based on its camera input
Incremental Learning for Robot Perception through HRI
Scene understanding and object recognition is a difficult to achieve yet
crucial skill for robots. Recently, Convolutional Neural Networks (CNN), have
shown success in this task. However, there is still a gap between their
performance on image datasets and real-world robotics scenarios. We present a
novel paradigm for incrementally improving a robot's visual perception through
active human interaction. In this paradigm, the user introduces novel objects
to the robot by means of pointing and voice commands. Given this information,
the robot visually explores the object and adds images from it to re-train the
perception module. Our base perception module is based on recent development in
object detection and recognition using deep learning. Our method leverages
state of the art CNNs from off-line batch learning, human guidance, robot
exploration and incremental on-line learning
- …