52 research outputs found
Discriminative Density-ratio Estimation
The covariate shift is a challenging problem in supervised learning that
results from the discrepancy between the training and test distributions. An
effective approach which recently drew a considerable attention in the research
community is to reweight the training samples to minimize that discrepancy. In
specific, many methods are based on developing Density-ratio (DR) estimation
techniques that apply to both regression and classification problems. Although
these methods work well for regression problems, their performance on
classification problems is not satisfactory. This is due to a key observation
that these methods focus on matching the sample marginal distributions without
paying attention to preserving the separation between classes in the reweighted
space. In this paper, we propose a novel method for Discriminative
Density-ratio (DDR) estimation that addresses the aforementioned problem and
aims at estimating the density-ratio of joint distributions in a class-wise
manner. The proposed algorithm is an iterative procedure that alternates
between estimating the class information for the test data and estimating new
density ratio for each class. To incorporate the estimated class information of
the test data, a soft matching technique is proposed. In addition, we employ an
effective criterion which adopts mutual information as an indicator to stop the
iterative procedure while resulting in a decision boundary that lies in a
sparse region. Experiments on synthetic and benchmark datasets demonstrate the
superiority of the proposed method in terms of both accuracy and robustness
Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning
Domain adaptation problems arise in a variety of applications, where a
training dataset from the \textit{source} domain and a test dataset from the
\textit{target} domain typically follow different distributions. The primary
difficulty in designing effective learning models to solve such problems lies
in how to bridge the gap between the source and target distributions. In this
paper, we provide comprehensive analysis of feature learning algorithms used in
conjunction with linear classifiers for domain adaptation. Our analysis shows
that in order to achieve good adaptation performance, the second moments of the
source domain distribution and target domain distribution should be similar.
Based on our new analysis, a novel extremely easy feature learning algorithm
for domain adaptation is proposed. Furthermore, our algorithm is extended by
leveraging multiple layers, leading to a deep linear model. We evaluate the
effectiveness of the proposed algorithms in terms of domain adaptation tasks on
the Amazon review dataset and the spam dataset from the ECML/PKDD 2006
discovery challenge.Comment: ijca
Sparse Domain Adaptation in a Good Similarity-Based Projection Space
International audienceWe address domain adaptation (DA) for binary classification in the challenging case where no target label is available. We propose an original approach that stands in a recent framework of Balcan et al. allowing to learn linear classifiers in an explicit projection space based on good similarity functions that may be not symmetric and not positive semi-definite (PSD). Following the DA frame- work of Ben-David et al., our method looks for a relevant projection space where the source and target distributions tend to be close. This objective is achieved by the use of an additional regularizer motivated by the notion of algorithmic robustness proposed by Xu and Mannor. Our approach is formulated as a linear program with a 1-norm regularization leading to sparse models. We provide a theoretical analysis of this sparsity and a generalization bound. From a practical standpoint, to improve the efficiency of the method we propose an iterative version based on a reweighting scheme of the similarities to move closer the distributions in a new projection space. Hyperparameters and reweighting quality are controlled by a reverse validation process. The evaluation of our approach on a synthetic problem and real image annotation tasks shows good adaptation performances
Cross-domain Adaptation with Discrepancy Minimization for Text-independent Forensic Speaker Verification
Forensic audio analysis for speaker verification offers unique challenges due
to location/scenario uncertainty and diversity mismatch between reference and
naturalistic field recordings. The lack of real naturalistic forensic audio
corpora with ground-truth speaker identity represents a major challenge in this
field. It is also difficult to directly employ small-scale domain-specific data
to train complex neural network architectures due to domain mismatch and loss
in performance. Alternatively, cross-domain speaker verification for multiple
acoustic environments is a challenging task which could advance research in
audio forensics. In this study, we introduce a CRSS-Forensics audio dataset
collected in multiple acoustic environments. We pre-train a CNN-based network
using the VoxCeleb data, followed by an approach which fine-tunes part of the
high-level network layers with clean speech from CRSS-Forensics. Based on this
fine-tuned model, we align domain-specific distributions in the embedding space
with the discrepancy loss and maximum mean discrepancy (MMD). This maintains
effective performance on the clean set, while simultaneously generalizes the
model to other acoustic domains. From the results, we demonstrate that diverse
acoustic environments affect the speaker verification performance, and that our
proposed approach of cross-domain adaptation can significantly improve the
results in this scenario.Comment: To appear in INTERSPEECH 202
- …