135 research outputs found
Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition
Recently, Long Short-Term Memory (LSTM) has become a popular choice to model
individual dynamics for single-person action recognition due to its ability of
modeling the temporal information in various ranges of dynamic contexts.
However, existing RNN models only focus on capturing the temporal dynamics of
the person-person interactions by naively combining the activity dynamics of
individuals or modeling them as a whole. This neglects the inter-related
dynamics of how person-person interactions change over time. To this end, we
propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co-LSTSM) to
model the long-term inter-related dynamics between two interacting people on
the bounding boxes covering people. Specifically, for each frame, two
sub-memory units store individual motion information, while a concurrent LSTM
unit selectively integrates and stores inter-related motion information between
interacting people from these two sub-memory units via a new co-memory cell.
Experimental results on the BIT and UT datasets show the superiority of
Co-LSTSM compared with the state-of-the-art methods
Learning Contrastive Self-Distillation for Ultra-Fine-Grained Visual Categorization Targeting Limited Samples
In the field of intelligent multimedia analysis, ultra-fine-grained visual
categorization (Ultra-FGVC) plays a vital role in distinguishing intricate
subcategories within broader categories. However, this task is inherently
challenging due to the complex granularity of category subdivisions and the
limited availability of data for each category. To address these challenges,
this work proposes CSDNet, a pioneering framework that effectively explores
contrastive learning and self-distillation to learn discriminative
representations specifically designed for Ultra-FGVC tasks. CSDNet comprises
three main modules: Subcategory-Specific Discrepancy Parsing (SSDP), Dynamic
Discrepancy Learning (DDL), and Subcategory-Specific Discrepancy Transfer
(SSDT), which collectively enhance the generalization of deep models across
instance, feature, and logit prediction levels. To increase the diversity of
training samples, the SSDP module introduces augmented samples from different
viewpoints to spotlight subcategory-specific discrepancies. Simultaneously, the
proposed DDL module stores historical intermediate features by a dynamic memory
queue, which optimizes the feature learning space through iterative contrastive
learning. Furthermore, the SSDT module is developed by a novel
self-distillation paradigm at the logit prediction level of raw and augmented
samples, which effectively distills more subcategory-specific discrepancies
knowledge from the inherent structure of limited training data without
requiring additional annotations. Experimental results demonstrate that CSDNet
outperforms current state-of-the-art Ultra-FGVC methods, emphasizing its
powerful efficacy and adaptability in addressing Ultra-FGVC tasks.Comment: The first two authors contributed equally to this wor
ADPS: Asymmetric Distillation Post-Segmentation for Image Anomaly Detection
Knowledge Distillation-based Anomaly Detection (KDAD) methods rely on the
teacher-student paradigm to detect and segment anomalous regions by contrasting
the unique features extracted by both networks. However, existing KDAD methods
suffer from two main limitations: 1) the student network can effortlessly
replicate the teacher network's representations, and 2) the features of the
teacher network serve solely as a ``reference standard" and are not fully
leveraged. Toward this end, we depart from the established paradigm and instead
propose an innovative approach called Asymmetric Distillation Post-Segmentation
(ADPS). Our ADPS employs an asymmetric distillation paradigm that takes
distinct forms of the same image as the input of the teacher-student networks,
driving the student network to learn discriminating representations for
anomalous regions.
Meanwhile, a customized Weight Mask Block (WMB) is proposed to generate a
coarse anomaly localization mask that transfers the distilled knowledge
acquired from the asymmetric paradigm to the teacher network. Equipped with
WMB, the proposed Post-Segmentation Module (PSM) is able to effectively detect
and segment abnormal regions with fine structures and clear boundaries.
Experimental results demonstrate that the proposed ADPS outperforms the
state-of-the-art methods in detecting and segmenting anomalies. Surprisingly,
ADPS significantly improves Average Precision (AP) metric by 9% and 20% on the
MVTec AD and KolektorSDD2 datasets, respectively.Comment: 11pages,9 figure
- …