32 research outputs found

    NCGNN: Node-level Capsule Graph Neural Network

    Full text link
    Message passing has evolved as an effective tool for designing Graph Neural Networks (GNNs). However, most existing works naively sum or average all the neighboring features to update node representations, which suffers from the following limitations: (1) lack of interpretability to identify crucial node features for GNN's prediction; (2) over-smoothing issue where repeated averaging aggregates excessive noise, making features of nodes in different classes over-mixed and thus indistinguishable. In this paper, we propose the Node-level Capsule Graph Neural Network (NCGNN) to address these issues with an improved message passing scheme. Specifically, NCGNN represents nodes as groups of capsules, in which each capsule extracts distinctive features of its corresponding node. For each node-level capsule, a novel dynamic routing procedure is developed to adaptively select appropriate capsules for aggregation from a subgraph identified by the designed graph filter. Consequently, as only the advantageous capsules are aggregated and harmful noise is restrained, over-mixing features of interacting nodes in different classes tends to be avoided to relieve the over-smoothing issue. Furthermore, since the graph filter and the dynamic routing identify a subgraph and a subset of node features that are most influential for the prediction of the model, NCGNN is inherently interpretable and exempt from complex post-hoc explanations. Extensive experiments on six node classification benchmarks demonstrate that NCGNN can well address the over-smoothing issue and outperforms the state of the arts by producing better node embeddings for classification

    Towards Accurate One-Stage Object Detection with AP-Loss

    Full text link
    One-stage object detectors are trained by optimizing classification-loss and localization-loss simultaneously, with the former suffering much from extreme foreground-background class imbalance issue due to the large number of anchors. This paper alleviates this issue by proposing a novel framework to replace the classification task in one-stage detectors with a ranking task, and adopting the Average-Precision loss (AP-loss) for the ranking problem. Due to its non-differentiability and non-convexity, the AP-loss cannot be optimized directly. For this purpose, we develop a novel optimization algorithm, which seamlessly combines the error-driven update scheme in perceptron learning and backpropagation algorithm in deep networks. We verify good convergence property of the proposed algorithm theoretically and empirically. Experimental results demonstrate notable performance improvement in state-of-the-art one-stage detectors based on AP-loss over different kinds of classification-losses on various benchmarks, without changing the network architectures. Code is available at https://github.com/cccorn/AP-loss.Comment: 13 pages, 7 figures, 4 tables, main paper + supplementary material, accepted to CVPR 201

    Frequency-Aware Transformer for Learned Image Compression

    Full text link
    Learned image compression (LIC) has gained traction as an effective solution for image storage and transmission in recent years. However, existing LIC methods are redundant in latent representation due to limitations in capturing anisotropic frequency components and preserving directional details. To overcome these challenges, we propose a novel frequency-aware transformer (FAT) block that for the first time achieves multiscale directional ananlysis for LIC. The FAT block comprises frequency-decomposition window attention (FDWA) modules to capture multiscale and directional frequency components of natural images. Additionally, we introduce frequency-modulation feed-forward network (FMFFN) to adaptively modulate different frequency components, improving rate-distortion performance. Furthermore, we present a transformer-based channel-wise autoregressive (T-CA) model that effectively exploits channel dependencies. Experiments show that our method achieves state-of-the-art rate-distortion performance compared to existing LIC methods, and evidently outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in BD-rate on the Kodak, Tecnick, and CLIC datasets

    Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners

    Full text link
    Representation learning has been evolving from traditional supervised training to Contrastive Learning (CL) and Masked Image Modeling (MIM). Previous works have demonstrated their pros and cons in specific scenarios, i.e., CL and supervised pre-training excel at capturing longer-range global patterns and enabling better feature discrimination, while MIM can introduce more local and diverse attention across all transformer layers. In this paper, we explore how to obtain a model that combines their strengths. We start by examining previous feature distillation and mask feature reconstruction methods and identify their limitations. We find that their increasing diversity mainly derives from the asymmetric designs, but these designs may in turn compromise the discrimination ability. In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid Distillation strategy, which utilizes both the supervised/CL teacher and the MIM teacher to jointly guide the student model. Hybrid Distill imitates the token relations of the MIM teacher to alleviate attention collapse, as well as distills the feature maps of the supervised/CL teacher to enable discrimination. Furthermore, a progressive redundant token masking strategy is also utilized to reduce the distilling costs and avoid falling into local optima. Experiment results prove that Hybrid Distill can achieve superior performance on different benchmarks
    corecore