33,325 research outputs found
Exploring Shape Embedding for Cloth-Changing Person Re-Identification via 2D-3D Correspondences
Cloth-Changing Person Re-Identification (CC-ReID) is a common and realistic
problem since fashion constantly changes over time and people's aesthetic
preferences are not set in stone. While most existing cloth-changing ReID
methods focus on learning cloth-agnostic identity representations from coarse
semantic cues (e.g. silhouettes and part segmentation maps), they neglect the
continuous shape distributions at the pixel level. In this paper, we propose
Continuous Surface Correspondence Learning (CSCL), a new shape embedding
paradigm for cloth-changing ReID. CSCL establishes continuous correspondences
between a 2D image plane and a canonical 3D body surface via pixel-to-vertex
classification, which naturally aligns a person image to the surface of a 3D
human model and simultaneously obtains pixel-wise surface embeddings. We
further extract fine-grained shape features from the learned surface embeddings
and then integrate them with global RGB features via a carefully designed
cross-modality fusion module. The shape embedding paradigm based on 2D-3D
correspondences remarkably enhances the model's global understanding of human
body shape. To promote the study of ReID under clothing change, we construct 3D
Dense Persons (DP3D), which is the first large-scale cloth-changing ReID
dataset that provides densely annotated 2D-3D correspondences and a precise 3D
mesh for each person image, while containing diverse cloth-changing cases over
all four seasons. Experiments on both cloth-changing and cloth-consistent ReID
benchmarks validate the effectiveness of our method.Comment: Accepted by ACM MM 202
Robust Temporally Coherent Laplacian Protrusion Segmentation of 3D Articulated Bodies
In motion analysis and understanding it is important to be able to fit a
suitable model or structure to the temporal series of observed data, in order
to describe motion patterns in a compact way, and to discriminate between them.
In an unsupervised context, i.e., no prior model of the moving object(s) is
available, such a structure has to be learned from the data in a bottom-up
fashion. In recent times, volumetric approaches in which the motion is captured
from a number of cameras and a voxel-set representation of the body is built
from the camera views, have gained ground due to attractive features such as
inherent view-invariance and robustness to occlusions. Automatic, unsupervised
segmentation of moving bodies along entire sequences, in a temporally-coherent
and robust way, has the potential to provide a means of constructing a
bottom-up model of the moving body, and track motion cues that may be later
exploited for motion classification. Spectral methods such as locally linear
embedding (LLE) can be useful in this context, as they preserve "protrusions",
i.e., high-curvature regions of the 3D volume, of articulated shapes, while
improving their separation in a lower dimensional space, making them in this
way easier to cluster. In this paper we therefore propose a spectral approach
to unsupervised and temporally-coherent body-protrusion segmentation along time
sequences. Volumetric shapes are clustered in an embedding space, clusters are
propagated in time to ensure coherence, and merged or split to accommodate
changes in the body's topology. Experiments on both synthetic and real
sequences of dense voxel-set data are shown. This supports the ability of the
proposed method to cluster body-parts consistently over time in a totally
unsupervised fashion, its robustness to sampling density and shape quality, and
its potential for bottom-up model constructionComment: 31 pages, 26 figure
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
- …