800 research outputs found
MetaSpace II: Object and full-body tracking for interaction and navigation in social VR
MetaSpace II (MS2) is a social Virtual Reality (VR) system where multiple
users can not only see and hear but also interact with each other, grasp and
manipulate objects, walk around in space, and get tactile feedback. MS2 allows
walking in physical space by tracking each user's skeleton in real-time and
allows users to feel by employing passive haptics i.e., when users touch or
manipulate an object in the virtual world, they simultaneously also touch or
manipulate a corresponding object in the physical world. To enable these
elements in VR, MS2 creates a correspondence in spatial layout and object
placement by building the virtual world on top of a 3D scan of the real world.
Through the association between the real and virtual world, users are able to
walk freely while wearing a head-mounted device, avoid obstacles like walls and
furniture, and interact with people and objects. Most current virtual reality
(VR) environments are designed for a single user experience where interactions
with virtual objects are mediated by hand-held input devices or hand gestures.
Additionally, users are only shown a representation of their hands in VR
floating in front of the camera as seen from a first person perspective. We
believe, representing each user as a full-body avatar that is controlled by
natural movements of the person in the real world (see Figure 1d), can greatly
enhance believability and a user's sense immersion in VR.Comment: 10 pages, 9 figures. Video:
http://living.media.mit.edu/projects/metaspace-ii
Multi-party holomeetings: toward a new era of low-cost volumetric holographic meetings in virtual reality
© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Fueled by advances in multi-party communications, increasingly mature immersive technologies being adopted, and the COVID-19 pandemic, a new wave of social virtual reality (VR) platforms have emerged to support socialization, interaction, and collaboration among multiple remote users who are integrated into shared virtual environments. Social VR aims to increase levels of (co-)presence and interaction quality by overcoming the limitations of 2D windowed representations in traditional multi-party video conferencing tools, although most existing solutions rely on 3D avatars to represent users. This article presents a social VR platform that supports real-time volumetric holographic representations of users that are based on point clouds captured by off-the-shelf RGB-D sensors, and it analyzes the platform’s potential for conducting interactive holomeetings (i.e., holoconferencing scenarios). This work evaluates such a platform’s performance and readiness for conducting meetings with up to four users, and it provides insights into aspects of the user experience when using single-camera and low-cost capture systems in scenarios with both frontal and side viewpoints. Overall, the obtained results confirm the platform’s maturity and the potential of holographic communications for conducting interactive multi-party meetings, even when using low-cost systems and single-camera capture systems in scenarios where users are sitting or have a limited translational movement along the X, Y, and Z axes within the 3D virtual environment (commonly known as 3 Degrees of Freedom plus, 3DoF+).The authors would like to thank the members of the EU H2020 VR-Together consortium for their valuable contributions, especially Marc Martos and
Mohamad Hjeij for their support in developing and evaluating tasks. This work has been partially funded by: the EU’s Horizon 2020 program, under
agreement nº 762111 (VR-Together project); by ACCIÓ (Generalitat de Catalunya), under agreement COMRDI18-1-0008 (ViVIM project); and by Cisco
Research and the Silicon Valley Community Foundation, under the grant Extended Reality Multipoint Control Unit (ID: 1779376). The work by Mario
Montagud has been additionally funded by Spain’s Agencia Estatal de Investigación under grant RYC2020-030679-I (AEI / 10.13039/501100011033) and
by Fondo Social Europeo. The work of David Rincón was supported by Spain’s Agencia Estatal de Investigación within the Ministerio de Ciencia e
Innovación under Project PID2019-108713RB-C51 MCIN/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version
AFFECT-PRESERVING VISUAL PRIVACY PROTECTION
The prevalence of wireless networks and the convenience of mobile cameras enable many new video applications other than security and entertainment. From behavioral diagnosis to wellness monitoring, cameras are increasing used for observations in various educational and medical settings. Videos collected for such applications are considered protected health information under privacy laws in many countries. Visual privacy protection techniques, such as blurring or object removal, can be used to mitigate privacy concern, but they also obliterate important visual cues of affect and social behaviors that are crucial for the target applications. In this dissertation, we propose to balance the privacy protection and the utility of the data by preserving the privacy-insensitive information, such as pose and expression, which is useful in many applications involving visual understanding.
The Intellectual Merits of the dissertation include a novel framework for visual privacy protection by manipulating facial image and body shape of individuals, which: (1) is able to conceal the identity of individuals; (2) provide a way to preserve the utility of the data, such as expression and pose information; (3) balance the utility of the data and capacity of the privacy protection.
The Broader Impacts of the dissertation focus on the significance of privacy protection on visual data, and the inadequacy of current privacy enhancing technologies in preserving affect and behavioral attributes of the visual content, which are highly useful for behavior observation in educational and medical settings. This work in this dissertation represents one of the first attempts in achieving both goals simultaneously
NSF: Neural Surface Fields for Human Modeling from Monocular Depth
Obtaining personalized 3D animatable avatars from a monocular camera has
several real world applications in gaming, virtual try-on, animation, and
VR/XR, etc. However, it is very challenging to model dynamic and fine-grained
clothing deformations from such sparse data. Existing methods for modeling 3D
humans from depth data have limitations in terms of computational efficiency,
mesh coherency, and flexibility in resolution and topology. For instance,
reconstructing shapes using implicit functions and extracting explicit meshes
per frame is computationally expensive and cannot ensure coherent meshes across
frames. Moreover, predicting per-vertex deformations on a pre-designed human
template with a discrete surface lacks flexibility in resolution and topology.
To overcome these limitations, we propose a novel method `\keyfeature: Neural
Surface Fields' for modeling 3D clothed humans from monocular depth. NSF
defines a neural field solely on the base surface which models a continuous and
flexible displacement field. NSF can be adapted to the base surface with
different resolution and topology without retraining at inference time.
Compared to existing approaches, our method eliminates the expensive per-frame
surface extraction while maintaining mesh coherency, and is capable of
reconstructing meshes with arbitrary resolution without retraining. To foster
research in this direction, we release our code in project page at:
https://yuxuan-xue.com/nsf.Comment: Accpted to ICCV 2023; Homepage at: https://yuxuan-xue.com/ns
COMPARATIVE EVALUATION OF KEYPOINT DETECTORS FOR 3D DIGITAL AVATAR RECONSTRUCTION
Three-dimensional personalized human avatars have been successfully utilized in shopping, entertainment, education, and health applications. However, it is still a challenging task to obtain both a complete and highly detailed avatar automatically. One approach is to use general-purpose, photogrammetry-based algorithms on a series of overlapping images of the person. We argue that the quality of avatar reconstruction can be increased by modifying parts of the photogrammetry-based algorithm pipeline to be more specifically tailored to the human body shape. In this context, we perform an extensive, standalone evaluation of eleven algorithms for keypoint detection, which is the first phase of the photogrammetry-based reconstruction pipeline. We include well established, patented Distinctive image features from scale-invariant keypoints (SIFT) and Speeded up robust features (SURF) detection algorithms as a baseline since they are widely incorporated into photogrammetry-based software. All experiments are conducted on a dataset of 378 images of human body captured in a controlled, multi-view stereo setup. Our findings are that binary detectors highly outperform commonly used SIFT-like detectors in the avatar reconstruction task, both in terms of detection speed and in number of detected keypoints
Real-time marker-less multi-person 3D pose estimation in RGB-Depth camera networks
This paper proposes a novel system to estimate and track the 3D poses of
multiple persons in calibrated RGB-Depth camera networks. The multi-view 3D
pose of each person is computed by a central node which receives the
single-view outcomes from each camera of the network. Each single-view outcome
is computed by using a CNN for 2D pose estimation and extending the resulting
skeletons to 3D by means of the sensor depth. The proposed system is
marker-less, multi-person, independent of background and does not make any
assumption on people appearance and initial pose. The system provides real-time
outcomes, thus being perfectly suited for applications requiring user
interaction. Experimental results show the effectiveness of this work with
respect to a baseline multi-view approach in different scenarios. To foster
research and applications based on this work, we released the source code in
OpenPTrack, an open source project for RGB-D people tracking.Comment: Submitted to the 2018 IEEE International Conference on Robotics and
Automatio
Motion capture based on RGBD data from multiple sensors for avatar animation
With recent advances in technology and emergence of affordable RGB-D sensors for a
wider range of users, markerless motion capture has become an active field of research
both in computer vision and computer graphics.
In this thesis, we designed a POC (Proof of Concept) for a new tool that enables us
to perform motion capture by using a variable number of commodity RGB-D sensors of
different brands and technical specifications on constraint-less layout environments. The
main goal of this work is to provide a tool with motion capture capabilities by using a
handful of RGB-D sensors, without imposing strong requirements in terms of lighting,
background or extension of the motion capture area. Of course, the number of RGB-D
sensors needed is inversely proportional to their resolution, and directly proportional to
the size of the area to track to.
Built on top of the OpenNI 2 library, we made this POC compatible with most of the nonhigh-end
RGB-D sensors currently available in the market. Due to the lack of resources on
a single computer, in order to support more than a couple of sensors working simultaneously,
we need a setup composed of multiple computers. In order to keep data coherency
and synchronization across sensors and computers, our tool makes use of a semi-automatic
calibration method and a message-oriented network protocol.
From color and depth data given by a sensor, we can also obtain a 3D pointcloud representation
of the environment. By combining pointclouds from multiple sensors, we can
collect a complete and animated 3D pointcloud that can be visualized from any viewpoint.
Given a 3D avatar model and its corresponding attached skeleton, we can use an
iterative optimization method (e.g. Simplex) to find a fit between each pointcloud frame
and a skeleton configuration, resulting in 3D avatar animation when using such skeleton
configurations as key frames
HIGH QUALITY HUMAN 3D BODY MODELING, TRACKING AND APPLICATION
Geometric reconstruction of dynamic objects is a fundamental task of computer vision and graphics, and modeling human body of high fidelity is considered to be a core of this problem. Traditional human shape and motion capture techniques require an array of surrounding cameras or subjects wear reflective markers, resulting in a limitation of working space and portability. In this dissertation, a complete process is designed from geometric modeling detailed 3D human full body and capturing shape dynamics over time using a flexible setup to guiding clothes/person re-targeting with such data-driven models. As the mechanical movement of human body can be considered as an articulate motion, which is easy to guide the skin animation but has difficulties in the reverse process to find parameters from images without manual intervention, we present a novel parametric model, GMM-BlendSCAPE, jointly taking both linear skinning model and the prior art of BlendSCAPE (Blend Shape Completion and Animation for PEople) into consideration and develop a Gaussian Mixture Model (GMM) to infer both body shape and pose from incomplete observations. We show the increased accuracy of joints and skin surface estimation using our model compared to the skeleton based motion tracking. To model the detailed body, we start with capturing high-quality partial 3D scans by using a single-view commercial depth camera. Based on GMM-BlendSCAPE, we can then reconstruct multiple complete static models of large pose difference via our novel non-rigid registration algorithm. With vertex correspondences established, these models can be further converted into a personalized drivable template and used for robust pose tracking in a similar GMM framework. Moreover, we design a general purpose real-time non-rigid deformation algorithm to accelerate this registration. Last but not least, we demonstrate a novel virtual clothes try-on application based on our personalized model utilizing both image and depth cues to synthesize and re-target clothes for single-view videos of different people
MM-Fi: Multi-Modal Non-Intrusive 4D Human Dataset for Versatile Wireless Sensing
4D human perception plays an essential role in a myriad of applications, such
as home automation and metaverse avatar simulation. However, existing solutions
which mainly rely on cameras and wearable devices are either privacy intrusive
or inconvenient to use. To address these issues, wireless sensing has emerged
as a promising alternative, leveraging LiDAR, mmWave radar, and WiFi signals
for device-free human sensing. In this paper, we propose MM-Fi, the first
multi-modal non-intrusive 4D human dataset with 27 daily or rehabilitation
action categories, to bridge the gap between wireless sensing and high-level
human perception tasks. MM-Fi consists of over 320k synchronized frames of five
modalities from 40 human subjects. Various annotations are provided to support
potential sensing tasks, e.g., human pose estimation and action recognition.
Extensive experiments have been conducted to compare the sensing capacity of
each or several modalities in terms of multiple tasks. We envision that MM-Fi
can contribute to wireless sensing research with respect to action recognition,
human pose estimation, multi-modal learning, cross-modal supervision, and
interdisciplinary healthcare research.Comment: The paper has been accepted by NeurIPS 2023 Datasets and Benchmarks
Track. Project page: https://ntu-aiot-lab.github.io/mm-f
- …