11 research outputs found
Generative Adversarial Networks for Video-to-Video Domain Adaptation
Endoscopic videos from multicentres often have different imaging conditions,
e.g., color and illumination, which make the models trained on one domain
usually fail to generalize well to another. Domain adaptation is one of the
potential solutions to address the problem. However, few of existing works
focused on the translation of video-based data. In this work, we propose a
novel generative adversarial network (GAN), namely VideoGAN, to transfer the
video-based data across different domains. As the frames of a video may have
similar content and imaging conditions, the proposed VideoGAN has an X-shape
generator to preserve the intra-video consistency during translation.
Furthermore, a loss function, namely color histogram loss, is proposed to tune
the color distribution of each translated frame. Two colonoscopic datasets from
different centres, i.e., CVC-Clinic and ETIS-Larib, are adopted to evaluate the
performance of domain adaptation of our VideoGAN. Experimental results
demonstrate that the adapted colonoscopic video generated by our VideoGAN can
significantly boost the segmentation accuracy, i.e., an improvement of 5%, of
colorectal polyps on multicentre datasets. As our VideoGAN is a general network
architecture, we also evaluate its performance with the CamVid driving video
dataset on the cloudy-to-sunny translation task. Comprehensive experiments show
that the domain gap could be substantially narrowed down by our VideoGAN.Comment: Accepted by AAAI 202
Adversarial Bipartite Graph Learning for Video Domain Adaptation
Domain adaptation techniques, which focus on adapting models between
distributionally different domains, are rarely explored in the video
recognition area due to the significant spatial and temporal shifts across the
source (i.e. training) and target (i.e. test) domains. As such, recent works on
visual domain adaptation which leverage adversarial learning to unify the
source and target video representations and strengthen the feature
transferability are not highly effective on the videos. To overcome this
limitation, in this paper, we learn a domain-agnostic video classifier instead
of learning domain-invariant representations, and propose an Adversarial
Bipartite Graph (ABG) learning framework which directly models the
source-target interactions with a network topology of the bipartite graph.
Specifically, the source and target frames are sampled as heterogeneous
vertexes while the edges connecting two types of nodes measure the affinity
among them. Through message-passing, each vertex aggregates the features from
its heterogeneous neighbors, forcing the features coming from the same class to
be mixed evenly. Explicitly exposing the video classifier to such cross-domain
representations at the training and test stages makes our model less biased to
the labeled source data, which in-turn results in achieving a better
generalization on the target domain. To further enhance the model capacity and
testify the robustness of the proposed architecture on difficult transfer
tasks, we extend our model to work in a semi-supervised setting using an
additional video-level bipartite graph. Extensive experiments conducted on four
benchmarks evidence the effectiveness of the proposed approach over the SOTA
methods on the task of video recognition.Comment: Proceedings of the 28th ACM International Conference on Multimedia
(MM '20
PALMAR: Towards Adaptive Multi-inhabitant Activity Recognition in Point-Cloud Technology
With the advancement of deep neural networks and computer vision-based Human
Activity Recognition, employment of Point-Cloud Data technologies (LiDAR,
mmWave) has seen a lot interests due to its privacy preserving nature. Given
the high promise of accurate PCD technologies, we develop, PALMAR, a
multiple-inhabitant activity recognition system by employing efficient signal
processing and novel machine learning techniques to track individual person
towards developing an adaptive multi-inhabitant tracking and HAR system. More
specifically, we propose (i) a voxelized feature representation-based real-time
PCD fine-tuning method, (ii) efficient clustering (DBSCAN and BIRCH), Adaptive
Order Hidden Markov Model based multi-person tracking and crossover ambiguity
reduction techniques and (iii) novel adaptive deep learning-based domain
adaptation technique to improve the accuracy of HAR in presence of data
scarcity and diversity (device, location and population diversity). We
experimentally evaluate our framework and systems using (i) a real-time PCD
collected by three devices (3D LiDAR and 79 GHz mmWave) from 6 participants,
(ii) one publicly available 3D LiDAR activity data (28 participants) and (iii)
an embedded hardware prototype system which provided promising HAR performances
in multi-inhabitants (96%) scenario with a 63% improvement of multi-person
tracking than state-of-art framework without losing significant system
performances in the edge computing device.Comment: Accepted in IEEE International Conference on Computer Communications
202
Domain Adaptation for Time Series Forecasting via Attention Sharing
Recent years have witnessed deep neural networks gaining increasing
popularity in the field of time series forecasting. A primary reason of their
success is their ability to effectively capture complex temporal dynamics
across multiple related time series. However, the advantages of these deep
forecasters only start to emerge in the presence of a sufficient amount of
data. This poses a challenge for typical forecasting problems in practice,
where one either has a small number of time series, or limited observations per
time series, or both. To cope with the issue of data scarcity, we propose a
novel domain adaptation framework, Domain Adaptation Forecaster (DAF), that
leverages the statistical strengths from another relevant domain with abundant
data samples (source) to improve the performance on the domain of interest with
limited data (target). In particular, we propose an attention-based shared
module with a domain discriminator across domains as well as private modules
for individual domains. This allows us to jointly train the source and target
domains by generating domain-invariant latent features while retraining
domain-specific features. Extensive experiments on various domains demonstrate
that our proposed method outperforms state-of-the-art baselines on synthetic
and real-world datasets.Comment: 19 pages, 9 figure
Overcoming Label Noise for Source-free Unsupervised Video Domain Adaptation
Despite the progress seen in classification methods, current approaches for
handling videos with distribution shifts in source and target domains remain
source-dependent as they require access to the source data during the
adaptation stage. In this paper, we present a self-training based source-free
video domain adaptation approach to address this challenge by bridging the gap
between the source and the target domains. We use the source pre-trained model
to generate pseudo-labels for the target domain samples, which are inevitably
noisy. Thus, we treat the problem of source-free video domain adaptation as
learning from noisy labels and argue that the samples with correct
pseudo-labels can help us in adaptation. To this end, we leverage the
cross-entropy loss as an indicator of the correctness of the pseudo-labels and
use the resulting small-loss samples from the target domain for fine-tuning the
model. We further enhance the adaptation performance by implementing a
teacher-student framework, in which the teacher, which is updated gradually,
produces reliable pseudo-labels. Meanwhile, the student undergoes fine-tuning
on the target domain videos using these generated pseudo-labels to improve its
performance. Extensive experimental evaluations show that our methods, termed
as CleanAdapt, CleanAdapt + TS, achieve state-of-the-art results, outperforming
the existing approaches on various open datasets. Our source code is publicly
available at https://avijit9.github.io/CleanAdapt.Comment: Extended version of our ICVGIP pape
Adversarial Cross-Domain Action Recognition with Co-Attention
Action recognition has been a widely studied topic with a heavy focus on
supervised learning involving sufficient labeled videos. However, the problem
of cross-domain action recognition, where training and testing videos are drawn
from different underlying distributions, remains largely under-explored.
Previous methods directly employ techniques for cross-domain image recognition,
which tend to suffer from the severe temporal misalignment problem. This paper
proposes a Temporal Co-attention Network (TCoN), which matches the
distributions of temporally aligned action features between source and target
domains using a novel cross-domain co-attention mechanism. Experimental results
on three cross-domain action recognition datasets demonstrate that TCoN
improves both previous single-domain and cross-domain methods significantly
under the cross-domain setting.Comment: AAAI 202