32 research outputs found
NCGNN: Node-level Capsule Graph Neural Network
Message passing has evolved as an effective tool for designing Graph Neural
Networks (GNNs). However, most existing works naively sum or average all the
neighboring features to update node representations, which suffers from the
following limitations: (1) lack of interpretability to identify crucial node
features for GNN's prediction; (2) over-smoothing issue where repeated
averaging aggregates excessive noise, making features of nodes in different
classes over-mixed and thus indistinguishable. In this paper, we propose the
Node-level Capsule Graph Neural Network (NCGNN) to address these issues with an
improved message passing scheme. Specifically, NCGNN represents nodes as groups
of capsules, in which each capsule extracts distinctive features of its
corresponding node. For each node-level capsule, a novel dynamic routing
procedure is developed to adaptively select appropriate capsules for
aggregation from a subgraph identified by the designed graph filter.
Consequently, as only the advantageous capsules are aggregated and harmful
noise is restrained, over-mixing features of interacting nodes in different
classes tends to be avoided to relieve the over-smoothing issue. Furthermore,
since the graph filter and the dynamic routing identify a subgraph and a subset
of node features that are most influential for the prediction of the model,
NCGNN is inherently interpretable and exempt from complex post-hoc
explanations. Extensive experiments on six node classification benchmarks
demonstrate that NCGNN can well address the over-smoothing issue and
outperforms the state of the arts by producing better node embeddings for
classification
Towards Accurate One-Stage Object Detection with AP-Loss
One-stage object detectors are trained by optimizing classification-loss and
localization-loss simultaneously, with the former suffering much from extreme
foreground-background class imbalance issue due to the large number of anchors.
This paper alleviates this issue by proposing a novel framework to replace the
classification task in one-stage detectors with a ranking task, and adopting
the Average-Precision loss (AP-loss) for the ranking problem. Due to its
non-differentiability and non-convexity, the AP-loss cannot be optimized
directly. For this purpose, we develop a novel optimization algorithm, which
seamlessly combines the error-driven update scheme in perceptron learning and
backpropagation algorithm in deep networks. We verify good convergence property
of the proposed algorithm theoretically and empirically. Experimental results
demonstrate notable performance improvement in state-of-the-art one-stage
detectors based on AP-loss over different kinds of classification-losses on
various benchmarks, without changing the network architectures. Code is
available at https://github.com/cccorn/AP-loss.Comment: 13 pages, 7 figures, 4 tables, main paper + supplementary material,
accepted to CVPR 201
Frequency-Aware Transformer for Learned Image Compression
Learned image compression (LIC) has gained traction as an effective solution
for image storage and transmission in recent years. However, existing LIC
methods are redundant in latent representation due to limitations in capturing
anisotropic frequency components and preserving directional details. To
overcome these challenges, we propose a novel frequency-aware transformer (FAT)
block that for the first time achieves multiscale directional ananlysis for
LIC. The FAT block comprises frequency-decomposition window attention (FDWA)
modules to capture multiscale and directional frequency components of natural
images. Additionally, we introduce frequency-modulation feed-forward network
(FMFFN) to adaptively modulate different frequency components, improving
rate-distortion performance. Furthermore, we present a transformer-based
channel-wise autoregressive (T-CA) model that effectively exploits channel
dependencies. Experiments show that our method achieves state-of-the-art
rate-distortion performance compared to existing LIC methods, and evidently
outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in
BD-rate on the Kodak, Tecnick, and CLIC datasets
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Representation learning has been evolving from traditional supervised
training to Contrastive Learning (CL) and Masked Image Modeling (MIM). Previous
works have demonstrated their pros and cons in specific scenarios, i.e., CL and
supervised pre-training excel at capturing longer-range global patterns and
enabling better feature discrimination, while MIM can introduce more local and
diverse attention across all transformer layers. In this paper, we explore how
to obtain a model that combines their strengths. We start by examining previous
feature distillation and mask feature reconstruction methods and identify their
limitations. We find that their increasing diversity mainly derives from the
asymmetric designs, but these designs may in turn compromise the discrimination
ability. In order to better obtain both discrimination and diversity, we
propose a simple but effective Hybrid Distillation strategy, which utilizes
both the supervised/CL teacher and the MIM teacher to jointly guide the student
model. Hybrid Distill imitates the token relations of the MIM teacher to
alleviate attention collapse, as well as distills the feature maps of the
supervised/CL teacher to enable discrimination. Furthermore, a progressive
redundant token masking strategy is also utilized to reduce the distilling
costs and avoid falling into local optima. Experiment results prove that Hybrid
Distill can achieve superior performance on different benchmarks