1,607 research outputs found
Co-regularized Alignment for Unsupervised Domain Adaptation
Deep neural networks, trained with large amount of labeled data, can fail to
generalize well when tested with examples from a \emph{target domain} whose
distribution differs from the training data distribution, referred as the
\emph{source domain}. It can be expensive or even infeasible to obtain required
amount of labeled data in all possible domains. Unsupervised domain adaptation
sets out to address this problem, aiming to learn a good predictive model for
the target domain using labeled examples from the source domain but only
unlabeled examples from the target domain. Domain alignment approaches this
problem by matching the source and target feature distributions, and has been
used as a key component in many state-of-the-art domain adaptation methods.
However, matching the marginal feature distributions does not guarantee that
the corresponding class conditional distributions will be aligned across the
two domains. We propose co-regularized domain alignment for unsupervised domain
adaptation, which constructs multiple diverse feature spaces and aligns source
and target distributions in each of them individually, while encouraging that
alignments agree with each other with regard to the class predictions on the
unlabeled target examples. The proposed method is generic and can be used to
improve any domain adaptation method which uses domain alignment. We
instantiate it in the context of a recent state-of-the-art method and observe
that it provides significant performance improvements on several domain
adaptation benchmarks.Comment: NIPS 2018 accepted versio
Issue Brief: Focus on Education
This brief focuses on education of boys and young men of color and was prepared for the, "Investing in Boys and Young Men of Color: The Promise and Opportunity," briefing held on June 11, 2014 in Washington, DC. The brief includes data providing the national context, promising program models, and strategies for moving forward
Identical and Fraternal Twins: Fine-Grained Semantic Contrastive Learning of Sentence Representations
The enhancement of unsupervised learning of sentence representations has been
significantly achieved by the utility of contrastive learning. This approach
clusters the augmented positive instance with the anchor instance to create a
desired embedding space. However, relying solely on the contrastive objective
can result in sub-optimal outcomes due to its inability to differentiate subtle
semantic variations between positive pairs. Specifically, common data
augmentation techniques frequently introduce semantic distortion, leading to a
semantic margin between the positive pair. While the InfoNCE loss function
overlooks the semantic margin and prioritizes similarity maximization between
positive pairs during training, leading to the insensitive semantic
comprehension ability of the trained model. In this paper, we introduce a novel
Identical and Fraternal Twins of Contrastive Learning (named IFTCL) framework,
capable of simultaneously adapting to various positive pairs generated by
different augmentation techniques. We propose a \textit{Twins Loss} to preserve
the innate margin during training and promote the potential of data enhancement
in order to overcome the sub-optimal issue. We also present proof-of-concept
experiments combined with the contrastive objective to prove the validity of
the proposed Twins Loss. Furthermore, we propose a hippocampus queue mechanism
to restore and reuse the negative instances without additional calculation,
which further enhances the efficiency and performance of the IFCL. We verify
the IFCL framework on nine semantic textual similarity tasks with both English
and Chinese datasets, and the experimental results show that IFCL outperforms
state-of-the-art methods.Comment: This article has been accepted for publication in European Conference
on Artificial Intelligence (ECAI2023). 9 pages, 4 figure
Bivariate Beta-LSTM
Long Short-Term Memory (LSTM) infers the long term dependency through a cell
state maintained by the input and the forget gate structures, which models a
gate output as a value in [0,1] through a sigmoid function. However, due to the
graduality of the sigmoid function, the sigmoid gate is not flexible in
representing multi-modality or skewness. Besides, the previous models lack
modeling on the correlation between the gates, which would be a new method to
adopt inductive bias for a relationship between previous and current input.
This paper proposes a new gate structure with the bivariate Beta distribution.
The proposed gate structure enables probabilistic modeling on the gates within
the LSTM cell so that the modelers can customize the cell state flow with
priors and distributions. Moreover, we theoretically show the higher upper
bound of the gradient compared to the sigmoid function, and we empirically
observed that the bivariate Beta distribution gate structure provides higher
gradient values in training. We demonstrate the effectiveness of bivariate Beta
gate structure on the sentence classification, image classification, polyphonic
music modeling, and image caption generation.Comment: AAAI 202
- …