20 research outputs found
Unsupervised Feature Learning of Human Actions as Trajectories in Pose Embedding Manifold
An unsupervised human action modeling framework can provide useful
pose-sequence representation, which can be utilized in a variety of pose
analysis applications. In this work we propose a novel temporal pose-sequence
modeling framework, which can embed the dynamics of 3D human-skeleton joints to
a continuous latent space in an efficient manner. In contrast to end-to-end
framework explored by previous works, we disentangle the task of individual
pose representation learning from the task of learning actions as a trajectory
in pose embedding space. In order to realize a continuous pose embedding
manifold with improved reconstructions, we propose an unsupervised, manifold
learning procedure named Encoder GAN, (or EnGAN). Further, we use the pose
embeddings generated by EnGAN to model human actions using a bidirectional RNN
auto-encoder architecture, PoseRNN. We introduce first-order gradient loss to
explicitly enforce temporal regularity in the predicted motion sequence. A
hierarchical feature fusion technique is also investigated for simultaneous
modeling of local skeleton joints along with global pose variations. We
demonstrate state-of-the-art transfer-ability of the learned representation
against other supervisedly and unsupervisedly learned motion embeddings for the
task of fine-grained action recognition on SBU interaction dataset. Further, we
show the qualitative strengths of the proposed framework by visualizing
skeleton pose reconstructions and interpolations in pose-embedding space, and
low dimensional principal component projections of the reconstructed pose
trajectories.Comment: Accepted at WACV 201
Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence
Understanding the geometry and pose of objects in 2D images is a fundamental
necessity for a wide range of real world applications. Driven by deep neural
networks, recent methods have brought significant improvements to object pose
estimation. However, they suffer due to scarcity of keypoint/pose-annotated
real images and hence can not exploit the object's 3D structural information
effectively. In this work, we propose a data-efficient method which utilizes
the geometric regularity of intraclass objects for pose estimation. First, we
learn pose-invariant local descriptors of object parts from simple 2D RGB
images. These descriptors, along with keypoints obtained from renders of a
fixed 3D template model are then used to generate keypoint correspondence maps
for a given monocular real image. Finally, a pose estimation network predicts
3D pose of the object using these correspondence maps. This pipeline is further
extended to a multi-view approach, which assimilates keypoint information from
correspondence sets generated from multiple views of the 3D template model.
Fusion of multi-view information significantly improves geometric comprehension
of the system which in turn enhances the pose estimation performance.
Furthermore, use of correspondence framework responsible for the learning of
pose invariant keypoint descriptor also allows us to effectively alleviate the
data-scarcity problem. This enables our method to achieve state-of-the-art
performance on multiple real-image viewpoint estimation datasets, such as
Pascal3D+ and ObjectNet3D. To encourage reproducible research, we have released
the codes for our proposed approach.Comment: Accepted in ECCV-W; Code available at this http url:
https://github.com/val-iisc/pose_estimatio
AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation
Supervised deep learning methods have shown promising results for the task of
monocular depth estimation; but acquiring ground truth is costly, and prone to
noise as well as inaccuracies. While synthetic datasets have been used to
circumvent above problems, the resultant models do not generalize well to
natural scenes due to the inherent domain shift. Recent adversarial approaches
for domain adaption have performed well in mitigating the differences between
the source and target domains. But these methods are mostly limited to a
classification setup and do not scale well for fully-convolutional
architectures. In this work, we propose AdaDepth - an unsupervised domain
adaptation strategy for the pixel-wise regression task of monocular depth
estimation. The proposed approach is devoid of above limitations through a)
adversarial learning and b) explicit imposition of content consistency on the
adapted target representation. Our unsupervised approach performs competitively
with other established approaches on depth estimation tasks and achieves
state-of-the-art results in a semi-supervised setting.Comment: CVPR 201
UM-Adapt: Unsupervised Multi-Task Adaptation Using Adversarial Cross-Task Distillation
Aiming towards human-level generalization, there is a need to explore
adaptable representation learning methods with greater transferability. Most
existing approaches independently address task-transferability and cross-domain
adaptation, resulting in limited generalization. In this paper, we propose
UM-Adapt - a unified framework to effectively perform unsupervised domain
adaptation for spatially-structured prediction tasks, simultaneously
maintaining a balanced performance across individual tasks in a multi-task
setting. To realize this, we propose two novel regularization strategies; a)
Contour-based content regularization (CCR) and b) exploitation of inter-task
coherency using a cross-task distillation module. Furthermore, avoiding a
conventional ad-hoc domain discriminator, we re-utilize the cross-task
distillation loss as output of an energy function to adversarially minimize the
input domain discrepancy. Through extensive experiments, we demonstrate
superior generalizability of the learned representations simultaneously for
multiple tasks under domain-shifts from synthetic to natural environments.
UM-Adapt yields state-of-the-art transfer learning results on ImageNet
classification and comparable performance on PASCAL VOC 2007 detection task,
even with a smaller backbone-net. Moreover, the resulting semi-supervised
framework outperforms the current fully-supervised multi-task learning
state-of-the-art on both NYUD and Cityscapes dataset.Comment: ICCV 2019 (Oral
iSPA-Net: Iterative Semantic Pose Alignment Network
Understanding and extracting 3D information of objects from monocular 2D
images is a fundamental problem in computer vision. In the task of 3D object
pose estimation, recent data driven deep neural network based approaches suffer
from scarcity of real images with 3D keypoint and pose annotations. Drawing
inspiration from human cognition, where the annotators use a 3D CAD model as
structural reference to acquire ground-truth viewpoints for real images; we
propose an iterative Semantic Pose Alignment Network, called iSPA-Net. Our
approach focuses on exploiting semantic 3D structural regularity to solve the
task of fine-grained pose estimation by predicting viewpoint difference between
a given pair of images. Such image comparison based approach also alleviates
the problem of data scarcity and hence enhances scalability of the proposed
approach for novel object categories with minimal annotation. The fine-grained
object pose estimator is also aided by correspondence of learned spatial
descriptor of the input image pair. The proposed pose alignment framework
enjoys the faculty to refine its initial pose estimation in consecutive
iterations by utilizing an online rendering setup along with effectiveness of a
non-uniform bin classification of pose-difference. This enables iSPA-Net to
achieve state-of-the-art performance on various real image viewpoint estimation
datasets. Further, we demonstrate effectiveness of the approach for multiple
applications. First, we show results for active object viewpoint localization
to capture images from similar pose considering only a single image as pose
reference. Second, we demonstrate the ability of the learned semantic
correspondence to perform unsupervised part-segmentation transfer using only a
single part-annotated 3D template model per object class. To encourage
reproducible research, we have released the codes for our proposed algorithm.Comment: Accepted at ACMMM 2018. Code available at
https://github.com/val-iisc/iSPA-Ne
Aligning Silhouette Topology for Self-Adaptive 3D Human Pose Recovery
Articulation-centric 2D/3D pose supervision forms the core training objective
in most existing 3D human pose estimation techniques. Except for synthetic
source environments, acquiring such rich supervision for each real target
domain at deployment is highly inconvenient. However, we realize that standard
foreground silhouette estimation techniques (on static camera feeds) remain
unaffected by domain-shifts. Motivated by this, we propose a novel target
adaptation framework that relies only on silhouette supervision to adapt a
source-trained model-based regressor. However, in the absence of any auxiliary
cue (multi-view, depth, or 2D pose), an isolated silhouette loss fails to
provide a reliable pose-specific gradient and requires to be employed in tandem
with a topology-centric loss. To this end, we develop a series of
convolution-friendly spatial transformations in order to disentangle a
topological-skeleton representation from the raw silhouette. Such a design
paves the way to devise a Chamfer-inspired spatial topological-alignment loss
via distance field computation, while effectively avoiding any gradient
hindering spatial-to-pointset mapping. Experimental results demonstrate our
superiority against prior-arts in self-adapting a source trained model to
diverse unlabeled target domains, such as a) in-the-wild datasets, b)
low-resolution image domains, and c) adversarially perturbed image domains (via
UAP).Comment: NeurIPS 202
GAN-Tree: An Incrementally Learned Hierarchical Generative Framework for Multi-Modal Data Distributions
Despite the remarkable success of generative adversarial networks, their
performance seems less impressive for diverse training sets, requiring learning
of discontinuous mapping functions. Though multi-mode prior or multi-generator
models have been proposed to alleviate this problem, such approaches may fail
depending on the empirically chosen initial mode components. In contrast to
such bottom-up approaches, we present GAN-Tree, which follows a hierarchical
divisive strategy to address such discontinuous multi-modal data. Devoid of any
assumption on the number of modes, GAN-Tree utilizes a novel mode-splitting
algorithm to effectively split the parent mode to semantically cohesive
children modes, facilitating unsupervised clustering. Further, it also enables
incremental addition of new data modes to an already trained GAN-Tree, by
updating only a single branch of the tree structure. As compared to prior
approaches, the proposed framework offers a higher degree of flexibility in
choosing a large variety of mutually exclusive and exhaustive tree nodes called
GAN-Set. Extensive experiments on synthetic and natural image datasets
including ImageNet demonstrate the superiority of GAN-Tree against the prior
state-of-the-arts.Comment: ICCV 2019 (code available at https://github.com/val-iisc/GANTree
Universal Source-Free Domain Adaptation
There is a strong incentive to develop versatile learning techniques that can
transfer the knowledge of class-separability from a labeled source domain to an
unlabeled target domain in the presence of a domain-shift. Existing domain
adaptation (DA) approaches are not equipped for practical DA scenarios as a
result of their reliance on the knowledge of source-target label-set
relationship (e.g. Closed-set, Open-set or Partial DA). Furthermore, almost all
prior unsupervised DA works require coexistence of source and target samples
even during deployment, making them unsuitable for real-time adaptation. Devoid
of such impractical assumptions, we propose a novel two-stage learning process.
1) In the Procurement stage, we aim to equip the model for future source-free
deployment, assuming no prior knowledge of the upcoming category-gap and
domain-shift. To achieve this, we enhance the model's ability to reject
out-of-source distribution samples by leveraging the available source data, in
a novel generative classifier framework. 2) In the Deployment stage, the goal
is to design a unified adaptation algorithm capable of operating across a wide
range of category-gaps, with no access to the previously seen source samples.
To this end, in contrast to the usage of complex adversarial training regimes,
we define a simple yet effective source-free adaptation objective by utilizing
a novel instance-level weighting mechanism, named as Source Similarity Metric
(SSM). A thorough evaluation shows the practical usability of the proposed
learning framework with superior DA performance even over state-of-the-art
source-dependent approaches.Comment: CVPR 2020. Code available at https://github.com/val-iisc/usfd
Class-Incremental Domain Adaptation
We introduce a practical Domain Adaptation (DA) paradigm called
Class-Incremental Domain Adaptation (CIDA). Existing DA methods tackle
domain-shift but are unsuitable for learning novel target-domain classes.
Meanwhile, class-incremental (CI) methods enable learning of new classes in
absence of source training data but fail under a domain-shift without labeled
supervision. In this work, we effectively identify the limitations of these
approaches in the CIDA paradigm. Motivated by theoretical and empirical
observations, we propose an effective method, inspired by prototypical
networks, that enables classification of target samples into both shared and
novel (one-shot) target classes, even under a domain-shift. Our approach yields
superior performance as compared to both DA and CI methods in the CIDA
paradigm.Comment: ECCV 202
Towards Inheritable Models for Open-Set Domain Adaptation
There has been a tremendous progress in Domain Adaptation (DA) for visual
recognition tasks. Particularly, open-set DA has gained considerable attention
wherein the target domain contains additional unseen categories. Existing
open-set DA approaches demand access to a labeled source dataset along with
unlabeled target instances. However, this reliance on co-existing source and
target data is highly impractical in scenarios where data-sharing is restricted
due to its proprietary nature or privacy concerns. Addressing this, we
introduce a practical DA paradigm where a source-trained model is used to
facilitate adaptation in the absence of the source dataset in future. To this
end, we formalize knowledge inheritability as a novel concept and propose a
simple yet effective solution to realize inheritable models suitable for the
above practical paradigm. Further, we present an objective way to quantify
inheritability to enable the selection of the most suitable source model for a
given target domain, even in the absence of the source data. We provide
theoretical insights followed by a thorough empirical evaluation demonstrating
state-of-the-art open-set domain adaptation performance.Comment: CVPR 2020 (Oral). Code available at
https://github.com/val-iisc/inheritun