57 research outputs found
Quantized Distillation: Optimizing Driver Activity Recognition Models for Resource-Constrained Environments
Deep learning-based models are at the forefront of most driver observation
benchmarks due to their remarkable accuracies but are also associated with high
computational costs. This is challenging, as resources are often limited in
real-world driving scenarios. This paper introduces a lightweight framework for
resource-efficient driver activity recognition. The framework enhances 3D
MobileNet, a neural architecture optimized for speed in video classification,
by incorporating knowledge distillation and model quantization to balance model
accuracy and computational efficiency. Knowledge distillation helps maintain
accuracy while reducing the model size by leveraging soft labels from a larger
teacher model (I3D), instead of relying solely on original ground truth data.
Model quantization significantly lowers memory and computation demands by using
lower precision integers for model weights and activations. Extensive testing
on a public dataset for in-vehicle monitoring during autonomous driving
demonstrates that this new framework achieves a threefold reduction in model
size and a 1.4-fold improvement in inference time, compared to an already
optimized architecture. The code for this study is available at
https://github.com/calvintanama/qd-driver-activity-reco.Comment: Accepted at IROS 202
OAFuser: Towards Omni-Aperture Fusion for Light Field Semantic Segmentation of Road Scenes
Light field cameras can provide rich angular and spatial information to
enhance image semantic segmentation for scene understanding in the field of
autonomous driving. However, the extensive angular information of light field
cameras contains a large amount of redundant data, which is overwhelming for
the limited hardware resource of intelligent vehicles. Besides, inappropriate
compression leads to information corruption and data loss. To excavate
representative information, we propose an Omni-Aperture Fusion model (OAFuser),
which leverages dense context from the central view and discovers the angular
information from sub-aperture images to generate a semantically-consistent
result. To avoid feature loss during network propagation and simultaneously
streamline the redundant information from the light field camera, we present a
simple yet very effective Sub-Aperture Fusion Module (SAFM) to embed
sub-aperture images into angular features without any additional memory cost.
Furthermore, to address the mismatched spatial information across viewpoints,
we present Center Angular Rectification Module (CARM) realized feature
resorting and prevent feature occlusion caused by asymmetric information. Our
proposed OAFuser achieves state-of-the-art performance on the UrbanLF-Real and
-Syn datasets and sets a new record of 84.93% in mIoU on the UrbanLF-Real
Extended dataset, with a gain of +4.53%. The source code of OAFuser will be
made publicly available at https://github.com/FeiBryantkit/OAFuser.Comment: The source code of OAFuser will be made publicly available at
https://github.com/FeiBryantkit/OAFuse
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
Large pre-trained transformers are on top of contemporary semantic
segmentation benchmarks, but come with high computational cost and a lengthy
training. To lift this constraint, we look at efficient semantic segmentation
from a perspective of comprehensive knowledge distillation and consider to
bridge the gap between multi-source knowledge extractions and
transformer-specific patch embeddings. We put forward the Transformer-based
Knowledge Distillation (TransKD) framework which learns compact student
transformers by distilling both feature maps and patch embeddings of large
teacher transformers, bypassing the long pre-training process and reducing the
FLOPs by >85.0%. Specifically, we propose two fundamental and two optimization
modules: (1) Cross Selective Fusion (CSF) enables knowledge transfer between
cross-stage features via channel attention and feature map distillation within
hierarchical transformers; (2) Patch Embedding Alignment (PEA) performs
dimensional transformation within the patchifying process to facilitate the
patch embedding distillation; (3) Global-Local Context Mixer (GL-Mixer)
extracts both global and local information of a representative embedding; (4)
Embedding Assistant (EA) acts as an embedding method to seamlessly bridge
teacher and student models with the teacher's number of channels. Experiments
on Cityscapes, ACDC, and NYUv2 datasets show that TransKD outperforms
state-of-the-art distillation frameworks and rivals the time-consuming
pre-training method. Code is available at https://github.com/RuipingL/TransKD.Comment: Code is available at https://github.com/RuipingL/TransK
360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View
Seeing only a tiny part of the whole is not knowing the full circumstance.
Bird's-eye-view (BEV) perception, a process of obtaining allocentric maps from
egocentric views, is restricted when using a narrow Field of View (FoV) alone.
In this work, mapping from 360{\deg} panoramas to BEV semantics, the 360BEV
task, is established for the first time to achieve holistic representations of
indoor scenes in a top-down view. Instead of relying on narrow-FoV image
sequences, a panoramic image with depth information is sufficient to generate a
holistic BEV semantic map. To benchmark 360BEV, we present two indoor datasets,
360BEV-Matterport and 360BEV-Stanford, both of which include egocentric
panoramic images and semantic segmentation labels, as well as allocentric
semantic maps. Besides delving deep into different mapping paradigms, we
propose a dedicated solution for panoramic semantic mapping, namely 360Mapper.
Through extensive experiments, our methods achieve 44.32% and 45.78% in mIoU on
both datasets respectively, surpassing previous counterparts with gains of
+7.60% and +9.70% in mIoU. Code and datasets are available at the project page:
https://jamycheung.github.io/360BEV.html.Comment: Code and datasets are available at the project page:
https://jamycheung.github.io/360BEV.html. Accepted to WACV 202
- …