Search CORE

11 research outputs found

Learning to Dress {3D} People in Generative Clothing

Author: Black M.
Ma Q.
Pons-Moll G.
Pujades S.
Ranjan A.
Tang S.
Yang J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

Three-dimensional human body models are widely used in the analysis of human pose and motion. Existing models, however, are learned from minimally-clothed 3D scans and thus do not generalize to the complexity of dressed people in common images and videos. Additionally, current models lack the expressive power needed to represent the complex non-linear geometry of pose-dependent clothing shapes. To address this, we learn a generative 3D mesh model of clothed people from 3D scans with varying pose and clothing. Specifically, we train a conditional Mesh-VAE-GAN to learn the clothing deformation from the SMPL body model, making clothing an additional term in SMPL. Our model is conditioned on both pose and clothing type, giving the ability to draw samples of clothing to dress different body shapes in a variety of styles and poses. To preserve wrinkle detail, our Mesh-VAE-GAN extends patchwise discriminators to 3D meshes. Our model, named CAPE, represents global shape and fine local structure, effectively extending the SMPL body model to clothing. To our knowledge, this is the first generative model that directly dresses 3D human body meshes and generalizes to different poses. The model, code and data are available for research purposes at https://cape.is.tue.mpg.de.Comment: CVPR-2020 camera ready. Code and data are available at https://cape.is.tue.mpg.d

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

MPG.PuRe

HAL-Rennes 1

Multi-person Implicit Reconstruction from a Single Image

Author: Agapito Lourdes
Caliskan Akin
Hilton Adrian
Mustafa Armin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/11/2021
Field of study

We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and therefore cannot capture accurate 3D models of people with loose clothing and hair; or they require manual intervention to resolve occlusions or interactions. Our method addresses both limitations by introducing the first end-to-end learning approach to perform model-free implicit reconstruction for realistic 3D capture of multiple clothed people in arbitrary poses (with occlusions) from a single image. Our network simultaneously estimates the 3D geometry of each person and their 6DOF spatial locations, to obtain a coherent multi-human reconstruction. In addition, we introduce a new synthetic dataset that depicts images with a varying number of inter-occluded humans and a variety of clothing and hair styles. We demonstrate robust, high-resolution reconstructions on images of multiple humans with complex occlusions, loose clothing and a large variety of poses and scenes. Our quantitative evaluation on both synthetic and real world datasets demonstrates state-of-the-art performance with significant improvements in the accuracy and completeness of the reconstructions over competing approaches

UCL Discovery

SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation

Author: Chen Jiaben
Jiang Huaizu
Publication venue
Publication date: 31/08/2023
Field of study

Human-centric video frame interpolation has great potential for improving people's entertainment experiences and finding commercial applications in the sports analysis industry, e.g., synthesizing slow-motion videos. Although there are multiple benchmark datasets available in the community, none of them is dedicated for human-centric scenarios. To bridge this gap, we introduce SportsSloMo, a benchmark consisting of more than 130K video clips and 1M video frames of high-resolution (

\geq

720p) slow-motion sports videos crawled from YouTube. We re-train several state-of-the-art methods on our benchmark, and the results show a decrease in their accuracy compared to other datasets. It highlights the difficulty of our benchmark and suggests that it poses significant challenges even for the best-performing methods, as human bodies are highly deformable and occlusions are frequent in sports videos. To improve the accuracy, we introduce two loss terms considering the human-aware priors, where we add auxiliary supervision to panoptic segmentation and human keypoints detection, respectively. The loss terms are model agnostic and can be easily plugged into any video frame interpolation approaches. Experimental results validate the effectiveness of our proposed loss terms, leading to consistent performance improvement over 5 existing models, which establish strong baseline models on our benchmark. The dataset and code can be found at: https://neu-vi.github.io/SportsSlomo/.Comment: Project Page: https://neu-vi.github.io/SportsSlomo

arXiv.org e-Print Archive

Editorial: Special Issue on Machine Vision with Deep Learning

Author: Hospedales Timothy
Shao Ling
Shum Hubert
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/04/2020
Field of study

Northumbria University Research Portal

SHARE: Single-view Human Adversarial REconstruction

Author: Liang Junbang
Liao Shijia
Lin Ming
Peng Huaishu
Revankar Shreelekha
Shen Yu
Publication venue
Publication date: 30/12/2023
Field of study

The accuracy of 3D Human Pose and Shape reconstruction (HPS) from an image is progressively improving. Yet, no known method is robust across all image distortion. To address issues due to variations of camera poses, we introduce SHARE, a novel fine-tuning method that utilizes adversarial data augmentation to enhance the robustness of existing HPS techniques. We perform a comprehensive analysis on the impact of camera poses on HPS reconstruction outcomes. We first generated large-scale image datasets captured systematically from diverse camera perspectives. We then established a mapping between camera poses and reconstruction errors as a continuous function that characterizes the relationship between camera poses and HPS quality. Leveraging this representation, we introduce RoME (Regions of Maximal Error), a novel sampling technique for our adversarial fine-tuning method. The SHARE framework is generalizable across various single-view HPS methods and we demonstrate its performance on HMR, SPIN, PARE, CLIFF and ExPose. Our results illustrate a reduction in mean joint errors across single-view HPS techniques, for images captured from multiple camera positions without compromising their baseline performance. In many challenging cases, our method surpasses the performance of existing models, highlighting its practical significance for diverse real-world applications

arXiv.org e-Print Archive

3D Segmentation of Humans in Point Clouds with Synthetic Data

Author: Akçay Mertcan
Engelmann Francis
Kaftan Irem
Leibe Bastian
Schult Jonas
Sumner Robert
Takmaz Ayça
Tang Siyu
Publication venue
Publication date: 18/08/2023
Field of study

Segmenting humans in 3D indoor scenes has become increasingly important with the rise of human-centered robotics and AR/VR applications. To this end, we propose the task of joint 3D human semantic segmentation, instance segmentation and multi-human body-part segmentation. Few works have attempted to directly segment humans in cluttered 3D scenes, which is largely due to the lack of annotated training data of humans interacting with 3D scenes. We address this challenge and propose a framework for generating training data of synthetic humans interacting with real 3D scenes. Furthermore, we propose a novel transformer-based model, Human3D, which is the first end-to-end model for segmenting multiple human instances and their body-parts in a unified manner. The key advantage of our synthetic data generation framework is its ability to generate diverse and realistic human-scene interactions, with highly accurate ground truth. Our experiments show that pre-training on synthetic data improves performance on a wide variety of 3D human segmentation tasks. Finally, we demonstrate that Human3D outperforms even task-specific state-of-the-art 3D segmentation methods.Comment: project page: https://human-3d.github.io

arXiv.org e-Print Archive

Towards Geometric Understanding of Motion

Author: Ranjan Anurag
Publication venue: Universität Tübingen
Publication date: 01/01/2019
Field of study

The motion of the world is inherently dependent on the spatial structure of the world and its geometry. Therefore, classical optical flow methods try to model this geometry to solve for the motion. However, recent deep learning methods take a completely different approach. They try to predict optical flow by learning from labelled data. Although deep networks have shown state-of-the-art performance on classification problems in computer vision, they have not been as effective in solving optical flow. The key reason is that deep learning methods do not explicitly model the structure of the world in a neural network, and instead expect the network to learn about the structure from data. We hypothesize that it is difficult for a network to learn about motion without any constraint on the structure of the world. Therefore, we explore several approaches to explicitly model the geometry of the world and its spatial structure in deep neural networks. The spatial structure in images can be captured by representing it at multiple scales. To represent multiple scales of images in deep neural nets, we introduce a Spatial Pyramid Network (SpyNet). Such a network can leverage global information for estimating large motions and local information for estimating small motions. We show that SpyNet significantly improves over previous optical flow networks while also being the smallest and fastest neural network for motion estimation. SPyNet achieves a 97% reduction in model parameters over previous methods and is more accurate. The spatial structure of the world extends to people and their motion. Humans have a very well-defined structure, and this information is useful in estimating optical flow for humans. To leverage this information, we create a synthetic dataset for human optical flow using a statistical human body model and motion capture sequences. We use this dataset to train deep networks and see significant improvement in the ability of the networks to estimate human optical flow. The structure and geometry of the world affects the motion. Therefore, learning about the structure of the scene together with the motion can benefit both problems. To facilitate this, we introduce Competitive Collaboration, where several neural networks are constrained by geometry and can jointly learn about structure and motion in the scene without any labels. To this end, we show that jointly learning single view depth prediction, camera motion, optical flow and motion segmentation using Competitive Collaboration achieves state-of-the-art results among unsupervised approaches. Our findings provide support for our hypothesis that explicit constraints on structure and geometry of the world lead to better methods for motion estimation

Publikationsserver der Universität Tübingen

MPG.PuRe

Modeling Humans at Rest with Applications to Robot Assistance

Author: Clever Henry M.
Publication venue: Georgia Institute of Technology
Publication date: 14/01/2022
Field of study

Humans spend a large part of their lives resting. Machine perception of this class of body poses would be beneficial to numerous applications, but it is complicated by line-of-sight occlusion from bedding. Pressure sensing mats are a promising alternative, but data is challenging to collect at scale. To overcome this, we use modern physics engines to simulate bodies resting on a soft bed with a pressure sensing mat. This method can efficiently generate data at scale for training deep neural networks. We present a deep model trained on this data that infers 3D human pose and body shape from a pressure image, and show that it transfers well to real world data. We also present a model that infers pose, shape and contact pressure from a depth image facing the person in bed, and it does so in the presence of blankets. This model similarly benefits from synthetic data, which is created by simulating blankets on the bodies in bed. We evaluate this model on real world data and compare it to an existing method that requires RGB, depth, thermal and pressure imagery in the input. Our model only requires an input depth image, yet it is 12% more accurate. Our methods are relevant to applications in healthcare, including patient acuity monitoring and pressure injury prevention. We demonstrate this work in the context of robotic caregiving assistance, by using it to control a robot to move to locations on a person’s body in bed.Ph.D

Scholarly Materials And Research @ Georgia Tech

Autonome Autos

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 02/11/2021
Field of study

Verkehr ist Kultur. Er bestimmt darüber, was sich wo und auf welchen Wegen befindet, wer aufeinandertrifft und wer nicht - er bildet die Grundlage der Netzwerke, die Menschen und Dinge miteinander eingehen. Mit der Automatisierung des Verkehrs, der Etablierung von Fahrassistenzsystemen und der Entwicklung selbstfahrender Autos stehen nicht nur die Verhältnisse menschlicher und nicht-menschlicher Verkehrsteilnehmender in Frage. Nicht nur die ethischen und juristischen Grundlagen des Straßenverkehrs, sondern auch die basalen Bedingungen des Umgangs miteinander müssen neu verhandelt werden. Die Beiträger*innen des Bandes analysieren aus medien- und kulturwissenschaftlicher Perspektive die Transformationen der Mobilität, die Verkehrswende und das vielfach aufgeladene Objekt Auto

Directory of Open Access Books (DOAB)