3,402 research outputs found
Coarse Temporal Attention Network (CTA-Net) for Driver’s Activity Recognition
There is significant progress in recognizing traditional human activities
from videos focusing on highly distinctive actions involving discriminative
body movements, body-object and/or human-human interactions. Driver's
activities are different since they are executed by the same subject with
similar body parts movements, resulting in subtle changes. To address this, we
propose a novel framework by exploiting the spatiotemporal attention to model
the subtle changes. Our model is named Coarse Temporal Attention Network
(CTA-Net), in which coarse temporal branches are introduced in a trainable
glimpse network. The goal is to allow the glimpse to capture high-level
temporal relationships, such as 'during', 'before' and 'after' by focusing on a
specific part of a video. These branches also respect the topology of the
temporal dynamics in the video, ensuring that different branches learn
meaningful spatial and temporal changes. The model then uses an innovative
attention mechanism to generate high-level action specific contextual
information for activity recognition by exploring the hidden states of an LSTM.
The attention mechanism helps in learning to decide the importance of each
hidden state for the recognition task by weighing them when constructing the
representation of the video. Our approach is evaluated on four publicly
accessible datasets and significantly outperforms the state-of-the-art by a
considerable margin with only RGB video as input.Comment: Extended version of the accepted WACV 202
- …