78,458 research outputs found
Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation
Abstract. Most existing zero-shot learning approaches exploit transfer learning via an intermediate-level semantic representation such as visual attributes or semantic word vectors. Such a semantic representation is shared between an annotated auxiliary dataset and a target dataset with no annotation. A projection from a low-level feature space to the seman-tic space is learned from the auxiliary dataset and is applied without adaptation to the target dataset. In this paper we identify an inher-ent limitation with this approach. That is, due to having disjoint and potentially unrelated classes, the projection functions learned from the auxiliary dataset/domain are biased when applied directly to the target dataset/domain. We call this problem the projection domain shift prob-lem and propose a novel framework, transductive multi-view embedding, to solve it. It is ‘transductive ’ in that unlabelled target data points are explored for projection adaptation, and ‘multi-view ’ in that both low-level feature (view) and multiple semantic representations (views) are embedded to rectify the projection shift. We demonstrate through ex-tensive experiments that our framework (1) rectifies the projection shift between the auxiliary and target domains, (2) exploits the complemen-tarity of multiple semantic representations, (3) achieves state-of-the-art recognition results on image and video benchmark datasets, and (4) en-ables novel cross-view annotation tasks.
DDAM-PS: Diligent Domain Adaptive Mixer for Person Search
Person search (PS) is a challenging computer vision problem where the
objective is to achieve joint optimization for pedestrian detection and
re-identification (ReID). Although previous advancements have shown promising
performance in the field under fully and weakly supervised learning fashion,
there exists a major gap in investigating the domain adaptation ability of PS
models. In this paper, we propose a diligent domain adaptive mixer (DDAM) for
person search (DDAP-PS) framework that aims to bridge a gap to improve
knowledge transfer from the labeled source domain to the unlabeled target
domain. Specifically, we introduce a novel DDAM module that generates moderate
mixed-domain representations by combining source and target domain
representations. The proposed DDAM module encourages domain mixing to minimize
the distance between the two extreme domains, thereby enhancing the ReID task.
To achieve this, we introduce two bridge losses and a disparity loss. The
objective of the two bridge losses is to guide the moderate mixed-domain
representations to maintain an appropriate distance from both the source and
target domain representations. The disparity loss aims to prevent the moderate
mixed-domain representations from being biased towards either the source or
target domains, thereby avoiding overfitting. Furthermore, we address the
conflict between the two subtasks, localization and ReID, during domain
adaptation. To handle this cross-task conflict, we forcefully decouple the
norm-aware embedding, which aids in better learning of the moderate
mixed-domain representation. We conduct experiments to validate the
effectiveness of our proposed method. Our approach demonstrates favorable
performance on the challenging PRW and CUHK-SYSU datasets. Our source code is
publicly available at \url{https://github.com/mustansarfiaz/DDAM-PS}.Comment: Accepted in WACV-2024. Code is here at
\url{https://github.com/mustansarfiaz/DDAM-P
A review of domain adaptation without target labels
Domain adaptation has become a prominent problem setting in machine learning
and related fields. This review asks the question: how can a classifier learn
from a source domain and generalize to a target domain? We present a
categorization of approaches, divided into, what we refer to as, sample-based,
feature-based and inference-based methods. Sample-based methods focus on
weighting individual observations during training based on their importance to
the target domain. Feature-based methods revolve around on mapping, projecting
and representing features such that a source classifier performs well on the
target domain and inference-based methods incorporate adaptation into the
parameter estimation procedure, for instance through constraints on the
optimization procedure. Additionally, we review a number of conditions that
allow for formulating bounds on the cross-domain generalization error. Our
categorization highlights recurring ideas and raises questions important to
further research.Comment: 20 pages, 5 figure
Causally Regularized Learning with Agnostic Data Selection Bias
Most of previous machine learning algorithms are proposed based on the i.i.d.
hypothesis. However, this ideal assumption is often violated in real
applications, where selection bias may arise between training and testing
process. Moreover, in many scenarios, the testing data is not even available
during the training process, which makes the traditional methods like transfer
learning infeasible due to their need on prior of test distribution. Therefore,
how to address the agnostic selection bias for robust model learning is of
paramount importance for both academic research and real applications. In this
paper, under the assumption that causal relationships among variables are
robust across domains, we incorporate causal technique into predictive modeling
and propose a novel Causally Regularized Logistic Regression (CRLR) algorithm
by jointly optimize global confounder balancing and weighted logistic
regression. Global confounder balancing helps to identify causal features,
whose causal effect on outcome are stable across domains, then performing
logistic regression on those causal features constructs a robust predictive
model against the agnostic bias. To validate the effectiveness of our CRLR
algorithm, we conduct comprehensive experiments on both synthetic and real
world datasets. Experimental results clearly demonstrate that our CRLR
algorithm outperforms the state-of-the-art methods, and the interpretability of
our method can be fully depicted by the feature visualization.Comment: Oral paper of 2018 ACM Multimedia Conference (MM'18
- …