13 research outputs found

    ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization

    Full text link
    Modality selection is an important step when designing multimodal systems, especially in the case of cross-domain activity recognition as certain modalities are more robust to domain shift than others. However, selecting only the modalities which have a positive contribution requires a systematic approach. We tackle this problem by proposing an unsupervised modality selection method (ModSelect), which does not require any ground-truth labels. We determine the correlation between the predictions of multiple unimodal classifiers and the domain discrepancy between their embeddings. Then, we systematically compute modality selection thresholds, which select only modalities with a high correlation and low domain discrepancy. We show in our experiments that our method ModSelect chooses only modalities with positive contributions and consistently improves the performance on a Synthetic-to-Real domain adaptation benchmark, narrowing the domain gap.Comment: 14 pages, 6 figures, Accepted at ECCV 2022 OOD worksho

    ModSelect: Automatic Modality Selection for Synthetic-to-Real Domain Generalization

    Get PDF
    Modality selection is an important step when designing multimodal systems, especially in the case of cross-domain activity recognition as certain modalities are more robust to domain shift than others. However, selecting only the modalities which have a positive contribution requires a systematic approach. We tackle this problem by proposing an unsupervised modality selection method (ModSelect), which does not require any ground-truth labels. We determine the correlation between the predictions of multiple unimodal classifiers and the domain discrepancy between their embeddings. Then, we systematically compute modality selection thresholds, which select only modalities with a high correlation and low domain discrepancy. We show in our experiments that our method ModSelect chooses only modalities with positive contributions and consistently improves the performance on a Synthetic-to-Real domain adaptation benchmark, narrowing the domain gap

    Multimodal Generation of Novel Action Appearances for Synthetic-to-Real Recognition of Activities of Daily Living

    Full text link
    Domain shifts, such as appearance changes, are a key challenge in real-world applications of activity recognition models, which range from assistive robotics and smart homes to driver observation in intelligent vehicles. For example, while simulations are an excellent way of economical data collection, a Synthetic-to-Real domain shift leads to a > 60% drop in accuracy when recognizing activities of Daily Living (ADLs). We tackle this challenge and introduce an activity domain generation framework which creates novel ADL appearances (novel domains) from different existing activity modalities (source domains) inferred from video training data. Our framework computes human poses, heatmaps of body joints, and optical flow maps and uses them alongside the original RGB videos to learn the essence of source domains in order to generate completely new ADL domains. The model is optimized by maximizing the distance between the existing source appearances and the generated novel appearances while ensuring that the semantics of an activity is preserved through an additional classification loss. While source data multimodality is an important concept in this design, our setup does not rely on multi-sensor setups, (i.e., all source modalities are inferred from a single video only.) The newly created activity domains are then integrated in the training of the ADL classification networks, resulting in models far less susceptible to changes in data distributions. Extensive experiments on the Synthetic-to-Real benchmark Sims4Action demonstrate the potential of the domain generation paradigm for cross-domain ADL recognition, setting new state-of-the-art results. Our code is publicly available at https://github.com/Zrrr1997/syn2real_DGComment: 8 pages, 7 figures, to be published in IROS 202

    Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments

    Full text link
    Deep learning-based models are at the forefront of most driver observation benchmarks due to their remarkable accuracies but are also associated with high computational costs. This is challenging, as resources are often limited in real-world driving scenarios. This paper introduces a lightweight framework for resource-efficient driver activity recognition. The framework enhances 3D MobileNet, a neural architecture optimized for speed in video classification, by incorporating knowledge distillation and model quantization to balance model accuracy and computational efficiency. Knowledge distillation helps maintain accuracy while reducing the model size by leveraging soft labels from a larger teacher model (I3D), instead of relying solely on original ground truth data. Model quantization significantly lowers memory and computation demands by using lower precision integers for model weights and activations. Extensive testing on a public dataset for in-vehicle monitoring during autonomous driving demonstrates that this new framework achieves a threefold reduction in model size and a 1.4-fold improvement in inference time, compared to an already optimized architecture. The code for this study is available at https://github.com/calvintanama/qd-driver-activity-reco.Comment: Accepted at IROS 202

    TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation

    Full text link
    Large pre-trained transformers are on top of contemporary semantic segmentation benchmarks, but come with high computational cost and a lengthy training. To lift this constraint, we look at efficient semantic segmentation from a perspective of comprehensive knowledge distillation and consider to bridge the gap between multi-source knowledge extractions and transformer-specific patch embeddings. We put forward the Transformer-based Knowledge Distillation (TransKD) framework which learns compact student transformers by distilling both feature maps and patch embeddings of large teacher transformers, bypassing the long pre-training process and reducing the FLOPs by >85.0%. Specifically, we propose two fundamental and two optimization modules: (1) Cross Selective Fusion (CSF) enables knowledge transfer between cross-stage features via channel attention and feature map distillation within hierarchical transformers; (2) Patch Embedding Alignment (PEA) performs dimensional transformation within the patchifying process to facilitate the patch embedding distillation; (3) Global-Local Context Mixer (GL-Mixer) extracts both global and local information of a representative embedding; (4) Embedding Assistant (EA) acts as an embedding method to seamlessly bridge teacher and student models with the teacher's number of channels. Experiments on Cityscapes, ACDC, and NYUv2 datasets show that TransKD outperforms state-of-the-art distillation frameworks and rivals the time-consuming pre-training method. Code is available at https://github.com/RuipingL/TransKD.Comment: Code is available at https://github.com/RuipingL/TransK

    Connecting Artificial Brains to Robots in a Comprehensive Simulation Framework: The Neurorobotics Platform

    Get PDF
    Combined efforts in the fields of neuroscience, computer science, and biology allowed to design biologically realistic models of the brain based on spiking neural networks. For a proper validation of these models, an embodiment in a dynamic and rich sensory environment, where the model is exposed to a realistic sensory-motor task, is needed. Due to the complexity of these brain models that, at the current stage, cannot deal with real-time constraints, it is not possible to embed them into a real-world task. Rather, the embodiment has to be simulated as well. While adequate tools exist to simulate either complex neural networks or robots and their environments, there is so far no tool that allows to easily establish a communication between brain and body models. The Neurorobotics Platform is a new web-based environment that aims to fill this gap by offering scientists and technology developers a software infrastructure allowing them to connect brain models to detailed simulations of robot bodies and environments and to use the resulting neurorobotic systems for in silico experimentation. In order to simplify the workflow and reduce the level of the required programming skills, the platform provides editors for the specification of experimental sequences and conditions, environments, robots, and brain–body connectors. In addition to that, a variety of existing robots and environments are provided. This work presents the architecture of the first release of the Neurorobotics Platform developed in subproject 10 “Neurorobotics” of the Human Brain Project (HBP).1 At the current state, the Neurorobotics Platform allows researchers to design and run basic experiments in neurorobotics using simulated robots and simulated environments linked to simplified versions of brain models. We illustrate the capabilities of the platform with three example experiments: a Braitenberg task implemented on a mobile robot, a sensory-motor learning task based on a robotic controller, and a visual tracking embedding a retina model on the iCub humanoid robot. These use-cases allow to assess the applicability of the Neurorobotics Platform for robotic tasks as well as in neuroscientific experiments.The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 604102 (Human Brain Project) and from the European Unions Horizon 2020 Research and Innovation Programme under Grant Agreement No. 720270 (HBP SGA1)
    corecore