475,106 research outputs found

    Stratified Transfer Learning for Cross-domain Activity Recognition

    Full text link
    In activity recognition, it is often expensive and time-consuming to acquire sufficient activity labels. To solve this problem, transfer learning leverages the labeled samples from the source domain to annotate the target domain which has few or none labels. Existing approaches typically consider learning a global domain shift while ignoring the intra-affinity between classes, which will hinder the performance of the algorithms. In this paper, we propose a novel and general cross-domain learning framework that can exploit the intra-affinity of classes to perform intra-class knowledge transfer. The proposed framework, referred to as Stratified Transfer Learning (STL), can dramatically improve the classification accuracy for cross-domain activity recognition. Specifically, STL first obtains pseudo labels for the target domain via majority voting technique. Then, it performs intra-class knowledge transfer iteratively to transform both domains into the same subspaces. Finally, the labels of target domain are obtained via the second annotation. To evaluate the performance of STL, we conduct comprehensive experiments on three large public activity recognition datasets~(i.e. OPPORTUNITY, PAMAP2, and UCI DSADS), which demonstrates that STL significantly outperforms other state-of-the-art methods w.r.t. classification accuracy (improvement of 7.68%). Furthermore, we extensively investigate the performance of STL across different degrees of similarities and activity levels between domains. And we also discuss the potential of STL in other pervasive computing applications to provide empirical experience for future research.Comment: 10 pages; accepted by IEEE PerCom 2018; full paper. (camera-ready version

    Audio-Adaptive Activity Recognition Across Video Domains

    Get PDF
    This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and self-supervised learning. Different from these vision-focused works we leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening. We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation as well as addressing shifts in the semantic distribution. To further eliminate domain-specific features and include domain-invariant activity sounds for recognition, an audio-infused recognizer is proposed, which effectively models the cross-modal interaction across domains. We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically. Experiments on this dataset, EPIC-Kitchens and CharadesEgo show the effectiveness of our approach.Comment: Accepted at CVPR 202

    Cross-Domain HAR: Few Shot Transfer Learning for Human Activity Recognition

    Full text link
    The ubiquitous availability of smartphones and smartwatches with integrated inertial measurement units (IMUs) enables straightforward capturing of human activities. For specific applications of sensor based human activity recognition (HAR), however, logistical challenges and burgeoning costs render especially the ground truth annotation of such data a difficult endeavor, resulting in limited scale and diversity of datasets. Transfer learning, i.e., leveraging publicly available labeled datasets to first learn useful representations that can then be fine-tuned using limited amounts of labeled data from a target domain, can alleviate some of the performance issues of contemporary HAR systems. Yet they can fail when the differences between source and target conditions are too large and/ or only few samples from a target application domain are available, each of which are typical challenges in real-world human activity recognition scenarios. In this paper, we present an approach for economic use of publicly available labeled HAR datasets for effective transfer learning. We introduce a novel transfer learning framework, Cross-Domain HAR, which follows the teacher-student self-training paradigm to more effectively recognize activities with very limited label information. It bridges conceptual gaps between source and target domains, including sensor locations and type of activities. Through our extensive experimental evaluation on a range of benchmark datasets, we demonstrate the effectiveness of our approach for practically relevant few shot activity recognition scenarios. We also present a detailed analysis into how the individual components of our framework affect downstream performance

    ContrasGAN : unsupervised domain adaptation in Human Activity Recognition via adversarial and contrastive learning

    Get PDF
    Human Activity Recognition (HAR) makes it possible to drive applications directly from embedded and wearable sensors. Machine learning, and especially deep learning, has made significant progress in learning sensor features from raw sensing signals with high recognition accuracy. However, most techniques need to be trained on a large labelled dataset, which is often difficult to acquire. In this paper, we present ContrasGAN, an unsupervised domain adaptation technique that addresses this labelling challenge by transferring an activity model from one labelled domain to other unlabelled domains. ContrasGAN uses bi-directional generative adversarial networks for heterogeneous feature transfer and contrastive learning to capture distinctive features between classes. We evaluate ContrasGAN on three commonly-used HAR datasets under conditions of cross-body, cross-user, and cross-sensor transfer learning. Experimental results show a superior performance of ContrasGAN on all these tasks over a number of state-of-the-art techniques, with relatively low computational cost.PostprintPeer reviewe

    Leveraging Smartphone Sensor Data for Human Activity Recognition

    Get PDF
    Using smartphones for human activity recognition (HAR) has a wide range of applications including healthcare, daily fitness recording, and anomalous situations alerting. This study focuses on human activity recognition based on smartphone embedded sensors. The proposed human activity recognition system recognizes activities including walking, running, sitting, going upstairs, and going downstairs. Embedded sensors (a tri-axial accelerometer and a gyroscope sensor) are employed for motion data collection. Both time-domain and frequency-domain features are extracted and analyzed. Our experiment results show that time-domain features are good enough to recognize basic human activities. The system is implemented in an Android smartphone platform. While the focus has been on human activity recognition systems based on a supervised learning approach, an incremental clustering algorithm is investigated. The proposed unsupervised (clustering) activity detection scheme works in an incremental manner, which contains two stages. In the first stage, streamed sensor data will be processed. A single-pass clustering algorithm is used to generate pre-clustered results for the next stage. In the second stage, pre-clustered results will be refined to form the final clusters, which means the clusters are built incrementally by adding one cluster at a time. Experiments on smartphone sensor data of five basic human activities show that the proposed scheme can get comparable results with traditional clustering algorithms but working in a streaming and incremental manner. In order to develop more accurate activity recognition systems independent of smartphone models, effects of sensor differences across various smartphone models are investigated. We present the impairments of different smartphone embedded sensor models on HAR applications. Outlier removal, interpolation, and filtering in pre-processing stage are proposed as mitigating techniques. Based on datasets collected from four distinct smartphones, the proposed mitigating techniques show positive effects on 10-fold cross validation, device-to-device validation, and leave-one-out validation. Improved performance for smartphone based human activity recognition is observed. With the efforts of developing human activity recognition systems based on supervised learning approach, investigating a clustering based incremental activity recognition system with its potential applications, and applying techniques for alleviating sensor difference effects, a robust human activity recognition system can be trained in either supervised or unsupervised way and can be adapted to multiple devices with being less dependent on different sensor specifications

    Domain generalization through audio-visual relative norm alignment in first person action recognition

    Get PDF
    First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic "environmental bias". This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods to real settings where labeled data are not available during training. In this work, we introduce the first domain generalization approach for egocentric activity recognition, by proposing a new audiovisual loss, called Relative Norm Alignment loss. It rebalances the contributions from the two modalities during training, over different domains, by aligning their feature norm representations. Our approach leads to strong results in domain generalization on both EPIC-Kitchens-55 and EPIC-Kitchens-100, as demonstrated by extensive experiments, and can be extended to work also on domain adaptation settings with competitive results

    ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization

    Get PDF
    Modality selection is an important step when designing multimodal systems, especially in the case of cross-domain activity recognition as certain modalities are more robust to domain shift than others. However, selecting only the modalities which have a positive contribution requires a systematic approach. We tackle this problem by proposing an unsupervised modality selection method (ModSelect), which does not require any ground-truth labels. We determine the correlation between the predictions of multiple unimodal classifiers and the domain discrepancy between their embeddings. Then, we systematically compute modality selection thresholds, which select only modalities with a high correlation and low domain discrepancy. We show in our experiments that our method ModSelect chooses only modalities with positive contributions and consistently improves the performance on a Synthetic-to-Real domain adaptation benchmark, narrowing the domain gap

    Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living

    Full text link
    Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models, which range from assistive robotics and smart homes to driver observation in intelligent vehicles. For example, while simulations are an excellent way of economical data collection, a Synthetic-to-Real domain shift leads to a > 60% drop in accuracy when recognizing activities of Daily Living (ADLs). We tackle this challenge and introduce an activity domain generation framework which creates novel ADL appearances (novel domains) from different existing activity modalities (source domains) inferred from video training data. Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains in order to generate completely new ADL domains. The model is optimized by maximizing the distance between the existing source appearances and the generated novel appearances while ensuring that the semantics of an activity is preserved through an additional classification loss. While source data multimodality is an important concept in this design, our setup does not rely on multi-sensor setups, (i.e., all source modalities are inferred from a single video only.) The newly created activity domains are then integrated in the training of the ADL classification networks, resulting in models far less susceptible to changes in data distributions. Extensive experiments on the Synthetic-to-Real benchmark Sims4Action demonstrate the potential of the domain generation paradigm for cross-domain ADL recognition, setting new state-of-the-art results. Our code is publicly available at https://github.com/Zrrr1997/syn2real_DGComment: 8 pages, 7 figures, to be published in IROS 202
    • …
    corecore