1,126 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
FaceVR: Real-Time Facial Reenactment and Eye Gaze Control in Virtual Reality
We introduce FaceVR, a novel method for gaze-aware facial reenactment in the Virtual Reality (VR) context. The key component of FaceVR is a robust algorithm to perform real-time facial motion capture of an actor who is wearing a head-mounted display (HMD), as well as a new data-driven approach for eye tracking from monocular videos. In addition to these face reconstruction components, FaceVR incorporates photo-realistic re-rendering in real time, thus allowing artificial modifications of face and eye appearances. For instance, we can alter facial expressions, change gaze directions, or remove the VR goggles in realistic re-renderings. In a live setup with a source and a target actor, we apply these newly-introduced algorithmic components. We assume that the source actor is wearing a VR device, and we capture his facial expressions and eye movement in real-time. For the target video, we mimic a similar tracking process; however, we use the source input to drive the animations of the target video, thus enabling gaze-aware facial reenactment. To render the modified target video on a stereo display, we augment our capture and reconstruction process with stereo data. In the end, FaceVR produces compelling results for a variety of applications, such as gaze-aware facial reenactment, reenactment in virtual reality, removal of VR goggles, and re-targeting of somebody's gaze direction in a video conferencing call
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
Ego-Downward and Ambient Video based Person Location Association
Using an ego-centric camera to do localization and tracking is highly needed
for urban navigation and indoor assistive system when GPS is not available or
not accurate enough. The traditional hand-designed feature tracking and
estimation approach would fail without visible features. Recently, there are
several works exploring to use context features to do localization. However,
all of these suffer severe accuracy loss if given no visual context
information. To provide a possible solution to this problem, this paper
proposes a camera system with both ego-downward and third-static view to
perform localization and tracking in a learning approach. Besides, we also
proposed a novel action and motion verification model for cross-view
verification and localization. We performed comparative experiments based on
our collected dataset which considers the same dressing, gender, and background
diversity. Results indicate that the proposed model can achieve
improvement in accuracy performance. Eventually, we tested the model on
multi-people scenarios and obtained an average accuracy
3D Gaze Point Localization and Visualization Using LiDAR-based 3D Reconstructions
We present a novel pipeline for localizing a free roaming eye tracker within a LiDAR-based 3D reconstructed scene with high levels of accuracy. By utilizing a combination of reconstruction algorithms that leverage the strengths of global versus local capture methods and user-assisted refinement, we reduce drift errors associated with Dense Simultaneous Localization and Mapping (D-SLAM) techniques. Our framework supports region-of-interest (ROI) annotation and gaze statistics generation and the ability to visualize gaze in 3D from an immersive first person or third person perspective. This approach gives unique insights into viewers\u27 problem solving and search task strategies and has high applicability in indoor static environments such as crime scenes
Application of augmented reality and robotic technology in broadcasting: A survey
As an innovation technique, Augmented Reality (AR) has been gradually deployed in the broadcast, videography and cinematography industries. Virtual graphics generated by AR are dynamic and overlap on the surface of the environment so that the original appearance can be greatly enhanced in comparison with traditional broadcasting. In addition, AR enables broadcasters to interact with augmented virtual 3D models on a broadcasting scene in order to enhance the performance of broadcasting. Recently, advanced robotic technologies have been deployed in a camera shooting system to create a robotic cameraman so that the performance of AR broadcasting could be further improved, which is highlighted in the paper
EgoHumans: An Egocentric 3D Multi-Human Benchmark
We present EgoHumans, a new multi-view multi-human video benchmark to advance
the state-of-the-art of egocentric human 3D pose estimation and tracking.
Existing egocentric benchmarks either capture single subject or indoor-only
scenarios, which limit the generalization of computer vision algorithms for
real-world applications. We propose a novel 3D capture setup to construct a
comprehensive egocentric multi-human benchmark in the wild with annotations to
support diverse tasks such as human detection, tracking, 2D/3D pose estimation,
and mesh recovery. We leverage consumer-grade wearable camera-equipped glasses
for the egocentric view, which enables us to capture dynamic activities like
playing tennis, fencing, volleyball, etc. Furthermore, our multi-view setup
generates accurate 3D ground truth even under severe or complete occlusion. The
dataset consists of more than 125k egocentric images, spanning diverse scenes
with a particular focus on challenging and unchoreographed multi-human
activities and fast-moving egocentric views. We rigorously evaluate existing
state-of-the-art methods and highlight their limitations in the egocentric
scenario, specifically on multi-human tracking. To address such limitations, we
propose EgoFormer, a novel approach with a multi-stream transformer
architecture and explicit 3D spatial reasoning to estimate and track the human
pose. EgoFormer significantly outperforms prior art by 13.6% IDF1 on the
EgoHumans dataset.Comment: Accepted to ICCV 2023 (Oral
- …