76 research outputs found
Cross-position Activity Recognition with Stratified Transfer Learning
Human activity recognition aims to recognize the activities of daily living
by utilizing the sensors on different body parts. However, when the labeled
data from a certain body position (i.e. target domain) is missing, how to
leverage the data from other positions (i.e. source domain) to help learn the
activity labels of this position? When there are several source domains
available, it is often difficult to select the most similar source domain to
the target domain. With the selected source domain, we need to perform accurate
knowledge transfer between domains. Existing methods only learn the global
distance between domains while ignoring the local property. In this paper, we
propose a \textit{Stratified Transfer Learning} (STL) framework to perform both
source domain selection and knowledge transfer. STL is based on our proposed
\textit{Stratified} distance to capture the local property of domains. STL
consists of two components: Stratified Domain Selection (STL-SDS) can select
the most similar source domain to the target domain; Stratified Activity
Transfer (STL-SAT) is able to perform accurate knowledge transfer. Extensive
experiments on three public activity recognition datasets demonstrate the
superiority of STL. Furthermore, we extensively investigate the performance of
transfer learning across different degrees of similarities and activity levels
between domains. We also discuss the potential applications of STL in other
fields of pervasive computing for future research.Comment: Submit to Pervasive and Mobile Computing as an extension to PerCom 18
paper; First revision. arXiv admin note: substantial text overlap with
arXiv:1801.0082
Joint Air Quality and Weather Prediction Based on Multi-Adversarial Spatiotemporal Networks
Accurate and timely air quality and weather predictions are of great
importance to urban governance and human livelihood. Though many efforts have
been made for air quality or weather prediction, most of them simply employ one
another as feature input, which ignores the inner-connection between two
predictive tasks. On the one hand, the accurate prediction of one task can help
improve another task's performance. On the other hand, geospatially distributed
air quality and weather monitoring stations provide additional hints for
city-wide spatiotemporal dependency modeling. Inspired by the above two
insights, in this paper, we propose the Multi-adversarial spatiotemporal
recurrent Graph Neural Networks (MasterGNN) for joint air quality and weather
predictions. Specifically, we first propose a heterogeneous recurrent graph
neural network to model the spatiotemporal autocorrelation among air quality
and weather monitoring stations. Then, we develop a multi-adversarial graph
learning framework to against observation noise propagation introduced by
spatiotemporal modeling. Moreover, we present an adaptive training strategy by
formulating multi-adversarial learning as a multi-task learning problem.
Finally, extensive experiments on two real-world datasets show that MasterGNN
achieves the best performance compared with seven baselines on both air quality
and weather prediction tasks.Comment: 9 pages, 6 figure
Benchmarking Robustness of Adaptation Methods on Pre-trained Vision-Language Models
Various adaptation methods, such as LoRA, prompts, and adapters, have been
proposed to enhance the performance of pre-trained vision-language models in
specific domains. The robustness of these adaptation methods against
distribution shifts have not been studied. In this study, we assess the
robustness of 11 widely-used adaptation methods across 4 vision-language
datasets under multimodal corruptions. Concretely, we introduce 7 benchmark
datasets, including 96 visual and 87 textual corruptions, to investigate the
robustness of different adaptation methods, the impact of available adaptation
examples, and the influence of trainable parameter size during adaptation. Our
analysis reveals that: 1) Adaptation methods are more sensitive to text
corruptions than visual corruptions. 2) Full fine-tuning does not consistently
provide the highest robustness; instead, adapters can achieve better robustness
with comparable clean performance. 3) Contrary to expectations, our findings
indicate that increasing the number of adaptation data and parameters does not
guarantee enhanced robustness; instead it results in even lower robustness. We
hope this study could benefit future research in the development of robust
multimodal adaptation methods. The benchmark, code, and dataset used in this
study can be accessed at \url{https://adarobustness.github.io}
Boosting Cross-Domain Speech Recognition with Self-Supervision
The cross-domain performance of automatic speech recognition (ASR) could be
severely hampered due to the mismatch between training and testing
distributions. Since the target domain usually lacks labeled data, and domain
shifts exist at acoustic and linguistic levels, it is challenging to perform
unsupervised domain adaptation (UDA) for ASR. Previous work has shown that
self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by
exploiting the self-supervisions of unlabeled data. However, these
self-supervisions also face performance degradation in mismatched domain
distributions, which previous work fails to address. This work presents a
systematic UDA framework to fully utilize the unlabeled data with
self-supervision in the pre-training and fine-tuning paradigm. On the one hand,
we apply continued pre-training and data replay techniques to mitigate the
domain mismatch of the SSL pre-trained model. On the other hand, we propose a
domain-adaptive fine-tuning approach based on the PL technique with three
unique modifications: Firstly, we design a dual-branch PL method to decrease
the sensitivity to the erroneous pseudo-labels; Secondly, we devise an
uncertainty-aware confidence filtering strategy to improve pseudo-label
correctness; Thirdly, we introduce a two-step PL approach to incorporate target
domain linguistic knowledge, thus generating more accurate target domain
pseudo-labels. Experimental results on various cross-domain scenarios
demonstrate that the proposed approach effectively boosts the cross-domain
performance and significantly outperforms previous approaches.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech and Language
Processing (TASLP), 202
Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Self-supervised pre-training could effectively improve the performance of
low-resource automatic speech recognition (ASR). However, existing
self-supervised pre-training are task-agnostic, i.e., could be applied to
various downstream tasks. Although it enlarges the scope of its application,
the capacity of the pre-trained model is not fully utilized for the ASR task,
and the learned representations may not be optimal for ASR. In this work, in
order to build a better pre-trained model for low-resource ASR, we propose a
pre-training approach called wav2vec-S, where we use task-specific
semi-supervised pre-training to refine the self-supervised pre-trained model
for the ASR task thus more effectively utilize the capacity of the pre-trained
model to generate task-specific representations for ASR. Experiments show that
compared to wav2vec 2.0, wav2vec-S only requires a marginal increment of
pre-training time but could significantly improve ASR performance on in-domain,
cross-domain and cross-lingual datasets. Average relative WER reductions are
24.5% and 6.6% for 1h and 10h fine-tuning, respectively. Furthermore, we show
that semi-supervised pre-training could close the representation gap between
the self-supervised pre-trained model and the corresponding fine-tuned model
through canonical correlation analysis.Comment: Accepted by Interspeech 202
Irregular Traffic Time Series Forecasting Based on Asynchronous Spatio-Temporal Graph Convolutional Network
Accurate traffic forecasting at intersections governed by intelligent traffic
signals is critical for the advancement of an effective intelligent traffic
signal control system. However, due to the irregular traffic time series
produced by intelligent intersections, the traffic forecasting task becomes
much more intractable and imposes three major new challenges: 1) asynchronous
spatial dependency, 2) irregular temporal dependency among traffic data, and 3)
variable-length sequence to be predicted, which severely impede the performance
of current traffic forecasting methods. To this end, we propose an Asynchronous
Spatio-tEmporal graph convolutional nEtwoRk (ASeer) to predict the traffic
states of the lanes entering intelligent intersections in a future time window.
Specifically, by linking lanes via a traffic diffusion graph, we first propose
an Asynchronous Graph Diffusion Network to model the asynchronous spatial
dependency between the time-misaligned traffic state measurements of lanes.
After that, to capture the temporal dependency within irregular traffic state
sequence, a learnable personalized time encoding is devised to embed the
continuous time for each lane. Then we propose a Transformable Time-aware
Convolution Network that learns meta-filters to derive time-aware convolution
filters with transformable filter sizes for efficient temporal convolution on
the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network
consisting of a state evolution unit and a semiautoregressive predictor is
designed to effectively and efficiently predict variable-length traffic state
sequences. Extensive experiments on two real-world datasets demonstrate the
effectiveness of ASeer in six metrics
FIXED: Frustratingly Easy Domain Generalization with Mixup
Domain generalization (DG) aims to learn a generalizable model from multiple
training domains such that it can perform well on unseen target domains. A
popular strategy is to augment training data to benefit generalization through
methods such as Mixup~\cite{zhang2018mixup}. While the vanilla Mixup can be
directly applied, theoretical and empirical investigations uncover several
shortcomings that limit its performance. Firstly, Mixup cannot effectively
identify the domain and class information that can be used for learning
invariant representations. Secondly, Mixup may introduce synthetic noisy data
points via random interpolation, which lowers its discrimination capability.
Based on the analysis, we propose a simple yet effective enhancement for
Mixup-based DG, namely domain-invariant Feature mIXup (FIX). It learns
domain-invariant representations for Mixup. To further enhance discrimination,
we leverage existing techniques to enlarge margins among classes to further
propose the domain-invariant Feature MIXup with Enhanced Discrimination (FIXED)
approach. We present theoretical insights about guarantees on its
effectiveness. Extensive experiments on seven public datasets across two
modalities including image classification (Digits-DG, PACS, Office-Home) and
time series (DSADS, PAMAP2, UCI-HAR, and USC-HAD) demonstrate that our approach
significantly outperforms nine state-of-the-art related methods, beating the
best performing baseline by 6.5\% on average in terms of test accuracy. Code is
available at:
https://github.com/jindongwang/transferlearning/tree/master/code/deep/fixed.Comment: First Conference on Parsimony and Learning (CPAL) 2024; code for DG
at: https://github.com/jindongwang/transferlearning/tree/master/code/DeepD
- …