Search CORE

117 research outputs found

Less is More: Focus Attention for Efficient DETR

Author: Chen Xinghao
Dong Wenhui
Hu Hailin
Wang Yunhe
Zheng Dehua
Publication venue
Publication date: 24/07/2023
Field of study

DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance through the sparse encoder. But these methods tend to rely on unreliable model statistics. Moreover, simply reducing the token population hinders the detection performance to a large extent, limiting the application of these sparse models. We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Specifically, we reconstruct the encoder with dual attention, which includes a token scoring mechanism that considers both localization and category semantic information of the objects from multi-scale feature maps. We efficiently abandon the background queries and enhance the semantic interaction of the fine-grained object queries based on the scores. Compared with the state-of-the-art sparse DETR-like detectors under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. The code is available at https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR and https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR.Comment: 8 pages, 6 figures, accepted to ICCV202

arXiv.org e-Print Archive

One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation

Author: Guo Jianyuan
Han Kai
Hao Zhiwei
Hu Han
Tang Yehui
Wang Yunhe
Xu Chang
Publication venue
Publication date: 30/10/2023
Field of study

Knowledge distillation~(KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD

arXiv.org e-Print Archive

Data Upcycling Knowledge Distillation for Image Super-Resolution

Author: Chen Hanting
Hu Jie
Jing Bingyi
Li Simiao
Li Wei
Tu Zhijun
Wang Hailing
Wang Wenjia
Wang Yunhe
Zhang Yun
Publication venue
Publication date: 25/09/2023
Field of study

Knowledge distillation (KD) emerges as a challenging yet promising technique for compressing deep learning models, characterized by the transmission of extensive learning representations from proficient and computationally intensive teacher models to compact student models. However, only a handful of studies have endeavored to compress the models for single image super-resolution (SISR) through KD, with their effects on student model enhancement remaining marginal. In this paper, we put forth an approach from the perspective of efficient data utilization, namely, the Data Upcycling Knowledge Distillation (DUKD) which facilitates the student model by the prior knowledge teacher provided via upcycled in-domain data derived from their inputs. This upcycling process is realized through two efficient image zooming operations and invertible data augmentations which introduce the label consistency regularization to the field of KD for SISR and substantially boosts student model's generalization. The DUKD, due to its versatility, can be applied across a broad spectrum of teacher-student architectures. Comprehensive experiments across diverse benchmarks demonstrate that our proposed DUKD method significantly outperforms previous art, exemplified by an increase of up to 0.5dB in PSNR over baselines methods, and a 67% parameters reduced RCAN model's performance remaining on par with that of the RCAN teacher model

arXiv.org e-Print Archive

Forecasting Regional Carbon Prices in China Based on Secondary Decomposition and a Hybrid Kernel-Based Extreme Learning Machine

Author: Beibei Hu
Yunhe Cheng
Publication venue: 'MDPI AG'
Publication date: 12/05/2022
Field of study

Accurately forecasting carbon prices is key to managing associated risks in the financial market for carbon. To this end, the traditional strategy does not adequately decompose carbon prices, and the kernel extreme learning machine (KELM) with a single kernel function struggles to adapt to the nonlinearity, nonstationarity, and multiple frequencies of regional carbon prices in China. This study constructs a model, called the VMD-ICEEMDAN-RE-SSA-HKELM model, to forecast regional carbon prices in China based on the idea of ‘decomposition–reconstruction–integration’. The VMD is first used to decompose carbon prices and the ICEEMDAN is then used to decompose the residual term that contains complex information. To reduce the systematic error caused by increases in the mode components of carbon price, range entropy (RE) is used to reconstruct the results of its secondary decomposition. Following this, HKELM is optimized by the sparrow search algorithm and used to forecast each subseries of carbon prices. Finally, predictions of the price of carbon are obtained by linearly superimposing the results of the forecasts of each of its subseries. The results of experiments show that the secondary decomposition strategy proposed in this paper is superior to the traditional decomposition strategy, and the proposed model for forecasting carbon prices has significant advantages over a considered reference group of models

Multidisciplinary Digital Publishing Institute

Forecasting Regional Carbon Prices in China Based on Secondary Decomposition and a Hybrid Kernel-Based Extreme Learning Machine

Author: Beibei Hu
Yunhe Cheng
Publication venue: MDPI AG
Publication date: 01/05/2022
Field of study

Directory of Open Access Journals

Analysis of the REs of VMFs and IMFs.

Author: Beibei Hu (5347559)
Yunhe Cheng (13148214)
Publication venue
Publication date: 12/12/2023
Field of study

Accurately predicting carbon price is crucial for risk avoidance in the carbon financial market. In light of the complex characteristics of the regional carbon price in China, this paper proposes a model to forecast carbon price based on the multi-factor hybrid kernel-based extreme learning machine (HKELM) by combining secondary decomposition and ensemble learning. Variational mode decomposition (VMD) is first used to decompose the carbon price into several modes, and range entropy is then used to reconstruct these modes. The multi-factor HKELM optimized by the sparrow search algorithm is used to forecast the reconstructed subsequences, where the main external factors innovatively selected by maximum information coefficient and historical time-series data on carbon prices are both considered as input variables to the forecasting model. Following this, the improved complete ensemble-based empirical mode decomposition with adaptive noise and range entropy are respectively used to decompose and reconstruct the residual term generated by VMD. Finally, the nonlinear ensemble learning method is introduced to determine the predictions of residual term and final carbon price. In the empirical analysis of Guangzhou market, the root mean square error(RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) of the model are 0.1716, 0.1218 and 0.0026, respectively. The proposed model outperforms other comparative models in predicting accuracy. The work here extends the research on forecasting theory and methods of predicting the carbon price.</div

FigShare

Implementation steps of ICEEMDAN.

Author: Beibei Hu (5347559)
Yunhe Cheng (13148214)
Publication venue
Publication date: 12/12/2023
Field of study

FigShare

Results of correlation between external factors and carbon price in Hubei.

Author: Beibei Hu (5347559)
Yunhe Cheng (13148214)
Publication venue
Publication date: 12/12/2023
Field of study

Results of correlation between external factors and carbon price in Hubei.</p

FigShare

PACF results for the subseries of the carbon price.

Author: Beibei Hu (5347559)
Yunhe Cheng (13148214)
Publication venue
Publication date: 12/12/2023
Field of study

PACF results for the subseries of the carbon price.</p

FigShare