14,676 research outputs found
Recommended from our members
Recent advances in the user evaluation methods and studies of non-photorealistic visualisation and rendering techniques
Time-Contrastive Networks: Self-Supervised Learning from Video
We propose a self-supervised approach for learning representations and
robotic behaviors entirely from unlabeled videos recorded from multiple
viewpoints, and study how this representation can be used in two robotic
imitation settings: imitating object interactions from videos of humans, and
imitating human poses. Imitation of human behavior requires a
viewpoint-invariant representation that captures the relationships between
end-effectors (hands or robot grippers) and the environment, object attributes,
and body pose. We train our representations using a metric learning loss, where
multiple simultaneous viewpoints of the same observation are attracted in the
embedding space, while being repelled from temporal neighbors which are often
visually similar but functionally different. In other words, the model
simultaneously learns to recognize what is common between different-looking
images, and what is different between similar-looking images. This signal
causes our model to discover attributes that do not change across viewpoint,
but do change across time, while ignoring nuisance variables such as
occlusions, motion blur, lighting and background. We demonstrate that this
representation can be used by a robot to directly mimic human poses without an
explicit correspondence, and that it can be used as a reward function within a
reinforcement learning algorithm. While representations are learned from an
unlabeled collection of task-related videos, robot behaviors such as pouring
are learned by watching a single 3rd-person demonstration by a human. Reward
functions obtained by following the human demonstrations under the learned
representation enable efficient reinforcement learning that is practical for
real-world robotic systems. Video results, open-source code and dataset are
available at https://sermanet.github.io/imitat
Scene extraction in motion pictures
This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method
A Comparative Evaluation of Heart Rate Estimation Methods using Face Videos
This paper presents a comparative evaluation of methods for remote heart rate
estimation using face videos, i.e., given a video sequence of the face as
input, methods to process it to obtain a robust estimation of the subjects
heart rate at each moment. Four alternatives from the literature are tested,
three based in hand crafted approaches and one based on deep learning. The
methods are compared using RGB videos from the COHFACE database. Experiments
show that the learning-based method achieves much better accuracy than the hand
crafted ones. The low error rate achieved by the learning based model makes
possible its application in real scenarios, e.g. in medical or sports
environments.Comment: Accepted in "IEEE International Workshop on Medical Computing
(MediComp) 2020
Acquisition, Modeling, and Augmentation of Reflectance for Synthetic Optical Flow Reference Data
This thesis is concerned with the acquisition, modeling, and augmentation of material reflectance to simulate high-fidelity synthetic data for computer vision tasks.
The topic is covered in three chapters: I commence with exploring the upper limits of reflectance acquisition.
I analyze state-of-the-art BTF reflectance field renderings and show that they can be applied to optical flow performance analysis with closely matching performance to real-world images.
Next, I present two methods for fitting efficient BRDF reflectance models to measured BTF data.
Both methods combined retain all relevant reflectance information as well as the surface normal details on a pixel level.
I further show that the resulting synthesized images are suited for optical flow performance analysis, with a virtually identical performance for all material types.
Finally, I present a novel method for augmenting real-world datasets with physically plausible precipitation effects, including ground surface wetting, water droplets on the windshield, and water spray and mists.
This is achieved by projecting the realworld image data onto a reconstructed virtual scene, manipulating the scene and the surface reflectance, and performing unbiased light transport simulation of the precipitation effects
Learning 3D Human Pose from Structure and Motion
3D human pose estimation from a single image is a challenging problem,
especially for in-the-wild settings due to the lack of 3D annotated data. We
propose two anatomically inspired loss functions and use them with a
weakly-supervised learning framework to jointly learn from large-scale
in-the-wild 2D and indoor/synthetic 3D data. We also present a simple temporal
network that exploits temporal and structural cues present in predicted pose
sequences to temporally harmonize the pose estimations. We carefully analyze
the proposed contributions through loss surface visualizations and sensitivity
analysis to facilitate deeper understanding of their working mechanism. Our
complete pipeline improves the state-of-the-art by 11.8% and 12% on Human3.6M
and MPI-INF-3DHP, respectively, and runs at 30 FPS on a commodity graphics
card.Comment: ECCV 2018. Project page: https://www.cse.iitb.ac.in/~rdabral/3DPose
- …