475,106 research outputs found
Stratified Transfer Learning for Cross-domain Activity Recognition
In activity recognition, it is often expensive and time-consuming to acquire
sufficient activity labels. To solve this problem, transfer learning leverages
the labeled samples from the source domain to annotate the target domain which
has few or none labels. Existing approaches typically consider learning a
global domain shift while ignoring the intra-affinity between classes, which
will hinder the performance of the algorithms. In this paper, we propose a
novel and general cross-domain learning framework that can exploit the
intra-affinity of classes to perform intra-class knowledge transfer. The
proposed framework, referred to as Stratified Transfer Learning (STL), can
dramatically improve the classification accuracy for cross-domain activity
recognition. Specifically, STL first obtains pseudo labels for the target
domain via majority voting technique. Then, it performs intra-class knowledge
transfer iteratively to transform both domains into the same subspaces.
Finally, the labels of target domain are obtained via the second annotation. To
evaluate the performance of STL, we conduct comprehensive experiments on three
large public activity recognition datasets~(i.e. OPPORTUNITY, PAMAP2, and UCI
DSADS), which demonstrates that STL significantly outperforms other
state-of-the-art methods w.r.t. classification accuracy (improvement of 7.68%).
Furthermore, we extensively investigate the performance of STL across different
degrees of similarities and activity levels between domains. And we also
discuss the potential of STL in other pervasive computing applications to
provide empirical experience for future research.Comment: 10 pages; accepted by IEEE PerCom 2018; full paper. (camera-ready
version
Audio-Adaptive Activity Recognition Across Video Domains
This paper strives for activity recognition under domain shift, for example
caused by change of scenery or camera viewpoint. The leading approaches reduce
the shift in activity appearance by adversarial training and self-supervised
learning. Different from these vision-focused works we leverage activity sounds
for domain adaptation as they have less variance across domains and can
reliably indicate which activities are not happening. We propose an
audio-adaptive encoder and associated learning methods that discriminatively
adjust the visual feature representation as well as addressing shifts in the
semantic distribution. To further eliminate domain-specific features and
include domain-invariant activity sounds for recognition, an audio-infused
recognizer is proposed, which effectively models the cross-modal interaction
across domains. We also introduce the new task of actor shift, with a
corresponding audio-visual dataset, to challenge our method with situations
where the activity appearance changes dramatically. Experiments on this
dataset, EPIC-Kitchens and CharadesEgo show the effectiveness of our approach.Comment: Accepted at CVPR 202
Cross-Domain HAR: Few Shot Transfer Learning for Human Activity Recognition
The ubiquitous availability of smartphones and smartwatches with integrated
inertial measurement units (IMUs) enables straightforward capturing of human
activities. For specific applications of sensor based human activity
recognition (HAR), however, logistical challenges and burgeoning costs render
especially the ground truth annotation of such data a difficult endeavor,
resulting in limited scale and diversity of datasets. Transfer learning, i.e.,
leveraging publicly available labeled datasets to first learn useful
representations that can then be fine-tuned using limited amounts of labeled
data from a target domain, can alleviate some of the performance issues of
contemporary HAR systems. Yet they can fail when the differences between source
and target conditions are too large and/ or only few samples from a target
application domain are available, each of which are typical challenges in
real-world human activity recognition scenarios. In this paper, we present an
approach for economic use of publicly available labeled HAR datasets for
effective transfer learning. We introduce a novel transfer learning framework,
Cross-Domain HAR, which follows the teacher-student self-training paradigm to
more effectively recognize activities with very limited label information. It
bridges conceptual gaps between source and target domains, including sensor
locations and type of activities. Through our extensive experimental evaluation
on a range of benchmark datasets, we demonstrate the effectiveness of our
approach for practically relevant few shot activity recognition scenarios. We
also present a detailed analysis into how the individual components of our
framework affect downstream performance
ContrasGAN : unsupervised domain adaptation in Human Activity Recognition via adversarial and contrastive learning
Human Activity Recognition (HAR) makes it possible to drive applications directly from embedded and wearable sensors. Machine learning, and especially deep learning, has made significant progress in learning sensor features from raw sensing signals with high recognition accuracy. However, most techniques need to be trained on a large labelled dataset, which is often difficult to acquire. In this paper, we present ContrasGAN, an unsupervised domain adaptation technique that addresses this labelling challenge by transferring an activity model from one labelled domain to other unlabelled domains. ContrasGAN uses bi-directional generative adversarial networks for heterogeneous feature transfer and contrastive learning to capture distinctive features between classes. We evaluate ContrasGAN on three commonly-used HAR datasets under conditions of cross-body, cross-user, and cross-sensor transfer learning. Experimental results show a superior performance of ContrasGAN on all these tasks over a number of state-of-the-art techniques, with relatively low computational cost.PostprintPeer reviewe
Leveraging Smartphone Sensor Data for Human Activity Recognition
Using smartphones for human activity recognition (HAR) has a wide range of applications including healthcare, daily fitness recording, and anomalous situations alerting. This study focuses on human activity recognition based on smartphone embedded sensors. The proposed human activity recognition system recognizes activities including walking, running, sitting, going upstairs, and going downstairs. Embedded sensors (a tri-axial accelerometer and a gyroscope sensor) are employed for motion data collection. Both time-domain and frequency-domain features are extracted and analyzed. Our experiment results show that time-domain features are good enough to recognize basic human activities. The system is implemented in an Android smartphone platform.
While the focus has been on human activity recognition systems based on a supervised learning approach, an incremental clustering algorithm is investigated. The proposed unsupervised (clustering) activity detection scheme works in an incremental manner, which contains two stages. In the first stage, streamed sensor data will be processed. A single-pass clustering algorithm is used to generate pre-clustered results for the next stage. In the second stage, pre-clustered results will be refined to form the final clusters, which means the clusters are built incrementally by adding one cluster at a time. Experiments on smartphone sensor data of five basic human activities show that the proposed scheme can get comparable results with traditional clustering algorithms but working in a streaming and incremental manner.
In order to develop more accurate activity recognition systems independent of smartphone models, effects of sensor differences across various smartphone models are investigated. We present the impairments of different smartphone embedded sensor models on HAR applications. Outlier removal, interpolation, and filtering in pre-processing stage are proposed as mitigating techniques. Based on datasets collected from four distinct smartphones, the proposed mitigating techniques show positive effects on 10-fold cross validation, device-to-device validation, and leave-one-out validation. Improved performance for smartphone based human activity recognition is observed.
With the efforts of developing human activity recognition systems based on supervised learning approach, investigating a clustering based incremental activity recognition system with its potential applications, and applying techniques for alleviating sensor difference effects, a robust human activity recognition system can be trained in either supervised or unsupervised way and can be adapted to multiple devices with being less dependent on different sensor specifications
Domain generalization through audio-visual relative norm alignment in first person action recognition
First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic "environmental bias". This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods to real settings where labeled data are not available during training. In this work, we introduce the first domain generalization approach for egocentric activity recognition, by proposing a new audiovisual loss, called Relative Norm Alignment loss. It rebalances the contributions from the two modalities during training, over different domains, by aligning their feature norm representations. Our approach leads to strong results in domain generalization on both EPIC-Kitchens-55 and EPIC-Kitchens-100, as demonstrated by extensive experiments, and can be extended to work also on domain adaptation settings with competitive results
ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization
Modality selection is an important step when designing multimodal systems, especially in the case of cross-domain activity recognition as certain modalities are more robust to domain shift than others. However, selecting only the modalities which have a positive contribution requires a systematic approach. We tackle this problem by proposing an unsupervised modality selection method (ModSelect), which does not require any ground-truth labels. We determine the correlation between the predictions of multiple unimodal classifiers and the domain discrepancy between their embeddings. Then, we systematically compute modality selection thresholds, which select only modalities with a high correlation and low domain discrepancy. We show in our experiments that our method ModSelect chooses only modalities with positive contributions and consistently improves the performance on a Synthetic-to-Real domain adaptation benchmark, narrowing the domain gap
Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living
Domain shifts, such as appearance changes, are a key challenge in real-world
applications of activity recognition models, which range from assistive
robotics and smart homes to driver observation in intelligent vehicles. For
example, while simulations are an excellent way of economical data collection,
a Synthetic-to-Real domain shift leads to a > 60% drop in accuracy when
recognizing activities of Daily Living (ADLs). We tackle this challenge and
introduce an activity domain generation framework which creates novel ADL
appearances (novel domains) from different existing activity modalities (source
domains) inferred from video training data. Our framework computes human poses,
heatmaps of body joints, and optical flow maps and uses them alongside the
original RGB videos to learn the essence of source domains in order to generate
completely new ADL domains. The model is optimized by maximizing the distance
between the existing source appearances and the generated novel appearances
while ensuring that the semantics of an activity is preserved through an
additional classification loss. While source data multimodality is an important
concept in this design, our setup does not rely on multi-sensor setups, (i.e.,
all source modalities are inferred from a single video only.) The newly created
activity domains are then integrated in the training of the ADL classification
networks, resulting in models far less susceptible to changes in data
distributions. Extensive experiments on the Synthetic-to-Real benchmark
Sims4Action demonstrate the potential of the domain generation paradigm for
cross-domain ADL recognition, setting new state-of-the-art results. Our code is
publicly available at https://github.com/Zrrr1997/syn2real_DGComment: 8 pages, 7 figures, to be published in IROS 202
- …