8,234 research outputs found
Multi-View Face Recognition From Single RGBD Models of the Faces
This work takes important steps towards solving the following problem of current interest: Assuming that each individual in a population can be modeled by a single frontal RGBD face image, is it possible to carry out face recognition for such a population using multiple 2D images captured from arbitrary viewpoints? Although the general problem as stated above is extremely challenging, it encompasses subproblems that can be addressed today. The subproblems addressed in this work relate to: (1) Generating a large set of viewpoint dependent face images from a single RGBD frontal image for each individual; (2) using hierarchical approaches based on view-partitioned subspaces to represent the training data; and (3) based on these hierarchical approaches, using a weighted voting algorithm to integrate the evidence collected from multiple images of the same face as recorded from different viewpoints. We evaluate our methods on three datasets: a dataset of 10 people that we created and two publicly available datasets which include a total of 48 people. In addition to providing important insights into the nature of this problem, our results show that we are able to successfully recognize faces with accuracies of 95% or higher, outperforming existing state-of-the-art face recognition approaches based on deep convolutional neural networks
Alleviating Human-level Shift : A Robust Domain Adaptation Method for Multi-person Pose Estimation
Human pose estimation has been widely studied with much focus on supervised
learning requiring sufficient annotations. However, in real applications, a
pretrained pose estimation model usually need be adapted to a novel domain with
no labels or sparse labels. Such domain adaptation for 2D pose estimation
hasn't been explored. The main reason is that a pose, by nature, has typical
topological structure and needs fine-grained features in local keypoints. While
existing adaptation methods do not consider topological structure of
object-of-interest and they align the whole images coarsely. Therefore, we
propose a novel domain adaptation method for multi-person pose estimation to
conduct the human-level topological structure alignment and fine-grained
feature alignment. Our method consists of three modules: Cross-Attentive
Feature Alignment (CAFA), Intra-domain Structure Adaptation (ISA) and
Inter-domain Human-Topology Alignment (IHTA) module. The CAFA adopts a
bidirectional spatial attention module (BSAM)that focuses on fine-grained local
feature correlation between two humans to adaptively aggregate consistent
features for adaptation. We adopt ISA only in semi-supervised domain adaptation
(SSDA) to exploit the corresponding keypoint semantic relationship for reducing
the intra-domain bias. Most importantly, we propose an IHTA to learn more
domain-invariant human topological representation for reducing the inter-domain
discrepancy. We model the human topological structure via the graph convolution
network (GCN), by passing messages on which, high-order relations can be
considered. This structure preserving alignment based on GCN is beneficial to
the occluded or extreme pose inference. Extensive experiments are conducted on
two popular benchmarks and results demonstrate the competency of our method
compared with existing supervised approaches.Comment: Accepted By ACM MM'202
H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
We present a benchmark for 3D human whole-body pose estimation, which
involves identifying accurate 3D keypoints on the entire human body, including
face, hands, body, and feet. Currently, the lack of a fully annotated and
accurate 3D whole-body dataset results in deep networks being trained
separately on specific body parts, which are combined during inference. Or they
rely on pseudo-groundtruth provided by parametric body models which are not as
accurate as detection based methods. To overcome these issues, we introduce the
Human3.6M 3D WholeBody (H3WB) dataset, which provides whole-body annotations
for the Human3.6M dataset using the COCO Wholebody layout. H3WB comprises 133
whole-body keypoint annotations on 100K images, made possible by our new
multi-view pipeline. We also propose three tasks: i) 3D whole-body pose lifting
from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D
incomplete whole-body pose, and iii) 3D whole-body pose estimation from a
single RGB image. Additionally, we report several baselines from popular
methods for these tasks. Furthermore, we also provide automated 3D whole-body
annotations of TotalCapture and experimentally show that when used with H3WB it
helps to improve the performance. Code and dataset is available at
https://github.com/wholebody3d/wholebody3dComment: Accepted by ICCV 202
High-Quality Animatable Dynamic Garment Reconstruction from Monocular Videos
Much progress has been made in reconstructing garments from an image or a
video. However, none of existing works meet the expectations of digitizing
high-quality animatable dynamic garments that can be adjusted to various unseen
poses. In this paper, we propose the first method to recover high-quality
animatable dynamic garments from monocular videos without depending on scanned
data. To generate reasonable deformations for various unseen poses, we propose
a learnable garment deformation network that formulates the garment
reconstruction task as a pose-driven deformation problem. To alleviate the
ambiguity estimating 3D garments from monocular videos, we design a
multi-hypothesis deformation module that learns spatial representations of
multiple plausible deformations. Experimental results on several public
datasets demonstrate that our method can reconstruct high-quality dynamic
garments with coherent surface details, which can be easily animated under
unseen poses. The code will be provided for research purposes
End-to-end Weakly-supervised Multiple 3D Hand Mesh Reconstruction from Single Image
In this paper, we consider the challenging task of simultaneously locating
and recovering multiple hands from single 2D image. Previous studies either
focus on single hand reconstruction or solve this problem in a multi-stage way.
Moreover, the conventional two-stage pipeline firstly detects hand areas, and
then estimates 3D hand pose from each cropped patch. To reduce the
computational redundancy in preprocessing and feature extraction, we propose a
concise but efficient single-stage pipeline. Specifically, we design a
multi-head auto-encoder structure for multi-hand reconstruction, where each
head network shares the same feature map and outputs the hand center, pose and
texture, respectively. Besides, we adopt a weakly-supervised scheme to
alleviate the burden of expensive 3D real-world data annotations. To this end,
we propose a series of losses optimized by a stage-wise training scheme, where
a multi-hand dataset with 2D annotations is generated based on the publicly
available single hand datasets. In order to further improve the accuracy of the
weakly supervised model, we adopt several feature consistency constraints in
both single and multiple hand settings. Specifically, the keypoints of each
hand estimated from local features should be consistent with the re-projected
points predicted from global features. Extensive experiments on public
benchmarks including FreiHAND, HO3D, InterHand2.6M and RHD demonstrate that our
method outperforms the state-of-the-art model-based methods in both
weakly-supervised and fully-supervised manners
People detection, tracking and biometric data extraction using a single camera for retail usage
Tato práce se zabývá návrhem frameworku, který slouží k analýze video sekvencí z RGB kamery. Framework využívá technik sledování osob a následné extrakce biometrických dat. Biometrická data jsou sbírána za účelem využití v malobochodním prostředí. Navržený framework lze rozdělit do třech menších komponent, tj. detektor osob, sledovač osob a extraktor biometrických dat. Navržený detektor osob využívá různé architektury sítí hlubokého učení k určení polohy osob. Řešení pro sledování osob se řídí známým postupem \uv{online tracking-by-detection} a je navrženo tak, aby bylo robustní vůči zalidněným scénám. Toho je dosaženo začleněním dvou metrik týkající se vzhledu a stavu objektu v asociační fázi. Kromě výpočtu těchto deskriptorů, jsme schopni získat další informace o jednotlivcích jako je věk, pohlaví, emoce, výška a trajektorie. Návržené řešení je ověřeno na datasetu, který je vytvořen speciálně pro tuto úlohu.This thesis proposes a framework that analyzes video sequences from a single RGB camera by extracting useful soft-biometric data about tracked people. The aim is to focus on data that could be utilized in a retail environment. The designed framework can be broken down into the smaller components, i.e., people detector, people tracker, and soft-biometrics extractor. The people detector employs various deep learning architectures that estimate bounding boxes of individuals. The tracking solution follows the well-known online tracking-by-detection approach, while the proposed solution is built to be robust regarding the crowded scenes by incorporating appearance and state features in the matching phase. Apart from calculating appearance descriptors only for matching, we extract additional information of each person in the form of age, gender, emotion, height, and trajectory when possible. The whole framework is validated against the dataset which was created for this propose
- …