848 research outputs found
Unsupervised Domain Adaptation Using Compact Internal Representations
A major technique for tackling unsupervised domain adaptation involves
mapping data points from both the source and target domains into a shared
embedding space. The mapping encoder to the embedding space is trained such
that the embedding space becomes domain agnostic, allowing a classifier trained
on the source domain to generalize well on the target domain. To further
enhance the performance of unsupervised domain adaptation (UDA), we develop an
additional technique which makes the internal distribution of the source domain
more compact, thereby improving the model's ability to generalize in the target
domain.We demonstrate that by increasing the margins between data
representations for different classes in the embedding space, we can improve
the model performance for UDA. To make the internal representation more
compact, we estimate the internally learned multi-modal distribution of the
source domain as Gaussian mixture model (GMM). Utilizing the estimated GMM, we
enhance the separation between different classes in the source domain, thereby
mitigating the effects of domain shift. We offer theoretical analysis to
support outperofrmance of our method. To evaluate the effectiveness of our
approach, we conduct experiments on widely used UDA benchmark UDA datasets. The
results indicate that our method enhances model generalizability and
outperforms existing techniques
Improved Region Proposal Network for Enhanced Few-Shot Object Detection
Despite significant success of deep learning in object detection tasks, the
standard training of deep neural networks requires access to a substantial
quantity of annotated images across all classes. Data annotation is an arduous
and time-consuming endeavor, particularly when dealing with infrequent objects.
Few-shot object detection (FSOD) methods have emerged as a solution to the
limitations of classic object detection approaches based on deep learning. FSOD
methods demonstrate remarkable performance by achieving robust object detection
using a significantly smaller amount of training data. A challenge for FSOD is
that instances from novel classes that do not belong to the fixed set of
training classes appear in the background and the base model may pick them up
as potential objects. These objects behave similarly to label noise because
they are classified as one of the training dataset classes, leading to FSOD
performance degradation. We develop a semi-supervised algorithm to detect and
then utilize these unlabeled novel objects as positive samples during the FSOD
training stage to improve FSOD performance. Specifically, we develop a
hierarchical ternary classification region proposal network (HTRPN) to localize
the potential unlabeled novel objects and assign them new objectness labels to
distinguish these objects from the base training dataset classes. Our improved
hierarchical sampling strategy for the region proposal network (RPN) also
boosts the perception ability of the object detection model for large objects.
We test our approach and COCO and PASCAL VOC baselines that are commonly used
in FSOD literature. Our experimental results indicate that our method is
effective and outperforms the existing state-of-the-art (SOTA) FSOD methods.
Our implementation is provided as a supplement to support reproducibility of
the results.Comment: arXiv admin note: substantial text overlap with arXiv:2303.1042
Online Continual Domain Adaptation for Semantic Image Segmentation Using Internal Representations
Semantic segmentation models trained on annotated data fail to generalize
well when the input data distribution changes over extended time period,
leading to requiring re-training to maintain performance. Classic Unsupervised
domain adaptation (UDA) attempts to address a similar problem when there is
target domain with no annotated data points through transferring knowledge from
a source domain with annotated data. We develop an online UDA algorithm for
semantic segmentation of images that improves model generalization on
unannotated domains in scenarios where source data access is restricted during
adaptation. We perform model adaptation is by minimizing the distributional
distance between the source latent features and the target features in a shared
embedding space. Our solution promotes a shared domain-agnostic latent feature
space between the two domains, which allows for classifier generalization on
the target dataset. To alleviate the need of access to source samples during
adaptation, we approximate the source latent feature distribution via an
appropriate surrogate distribution, in this case a Gassian mixture model (GMM).
We evaluate our approach on well established semantic segmentation datasets and
demonstrate it compares favorably against state-of-the-art (SOTA) UDA semantic
segmentation methods
Cognitively Inspired Cross-Modal Data Generation Using Diffusion Models
Most existing cross-modal generative methods based on diffusion models use
guidance to provide control over the latent space to enable conditional
generation across different modalities. Such methods focus on providing
guidance through separately-trained models, each for one modality. As a result,
these methods suffer from cross-modal information loss and are limited to
unidirectional conditional generation. Inspired by how humans synchronously
acquire multi-modal information and learn the correlation between modalities,
we explore a multi-modal diffusion model training and sampling scheme that uses
channel-wise image conditioning to learn cross-modality correlation during the
training phase to better mimic the learning process in the brain. Our empirical
results demonstrate that our approach can achieve data generation conditioned
on all correlated modalities
- …