Search CORE

132 research outputs found

KernelWarehouse: Towards Parameter-Efficient Dynamic Convolution

Author: Li Chao
Yao Anbang
Publication venue
Publication date: 16/08/2023
Field of study

Dynamic convolution learns a linear mixture of

n

static kernels weighted with their sample-dependent attentions, demonstrating superior performance compared to normal convolution. However, existing designs are parameter-inefficient: they increase the number of convolutional parameters by

n

times. This and the optimization difficulty lead to no research progress in dynamic convolution that can allow us to use a significant large value of

n

(e.g.,

n>100

instead of typical setting

n<10

) to push forward the performance boundary. In this paper, we propose

KernelWarehouse

, a more general form of dynamic convolution, which can strike a favorable trade-off between parameter efficiency and representation power. Its key idea is to redefine the basic concepts of "

kernels

" and "

assembling

kernels

" in dynamic convolution from the perspective of reducing kernel dimension and increasing kernel number significantly. In principle, KernelWarehouse enhances convolutional parameter dependencies within the same layer and across successive layers via tactful kernel partition and warehouse sharing, yielding a high degree of freedom to fit a desired parameter budget. We validate our method on ImageNet and MS-COCO datasets with different ConvNet architectures, and show that it attains state-of-the-art results. For instance, the ResNet18|ResNet50|MobileNetV2|ConvNeXt-Tiny model trained with KernelWarehouse on ImageNet reaches 76.05%|81.05%|75.52%|82.51% top-1 accuracy. Thanks to its flexible design, KernelWarehouse can even reduce the model size of a ConvNet while improving the accuracy, e.g., our ResNet18 model with 36.45%|65.10% parameter reduction to the baseline shows 2.89%|2.29% absolute improvement to top-1 accuracy.Comment: This research work was completed and submitted in early May 2023. Code and pre-trained models are available at https://github.com/OSVAI/KernelWarehous

arXiv.org e-Print Archive

NORM: Knowledge Distillation via N-to-One Representation Matching

Author: Li Chao
Li Lujun
Liu Xiaolong
Yao Anbang
Publication venue
Publication date: 23/05/2023
Field of study

Existing feature distillation methods commonly adopt the One-to-one Representation Matching between any pre-selected teacher-student layer pair. In this paper, we present N-to-One Representation (NORM), a new two-stage knowledge distillation method, which relies on a simple Feature Transform (FT) module consisting of two linear layers. In view of preserving the intact information learnt by the teacher network, during training, our FT module is merely inserted after the last convolutional layer of the student network. The first linear layer projects the student representation to a feature space having N times feature channels than the teacher representation from the last convolutional layer, and the second linear layer contracts the expanded output back to the original feature space. By sequentially splitting the expanded student representation into N non-overlapping feature segments having the same number of feature channels as the teacher's, they can be readily forced to approximate the intact teacher representation simultaneously, formulating a novel many-to-one representation matching mechanism conditioned on a single teacher-student layer pair. After training, such an FT module will be naturally merged into the subsequent fully connected layer thanks to its linear property, introducing no extra parameters or architectural modifications to the student network at inference. Extensive experiments on different visual recognition benchmarks demonstrate the leading performance of our method. For instance, the ResNet18|MobileNet|ResNet50-1/4 model trained by NORM reaches 72.14%|74.26%|68.03% top-1 accuracy on the ImageNet dataset when using a pre-trained ResNet34|ResNet50|ResNet50 model as the teacher, achieving an absolute improvement of 2.01%|4.63%|3.03% against the individually trained counterpart. Code is available at https://github.com/OSVAI/NORMComment: The paper of NORM is published at ICLR 2023. Code and models are available at https://github.com/OSVAI/NOR

arXiv.org e-Print Archive

Omni-Dimensional Dynamic Convolution

Author: Li Chao
Yao Anbang
Zhou Aojun
Publication venue
Publication date: 16/09/2022
Field of study

Learning a single static convolutional kernel in each convolutional layer is the common training paradigm of modern Convolutional Neural Networks (CNNs). Instead, recent research in dynamic convolution shows that learning a linear combination of

n

convolutional kernels weighted with their input-dependent attentions can significantly improve the accuracy of light-weight CNNs, while maintaining efficient inference. However, we observe that existing works endow convolutional kernels with the dynamic property through one dimension (regarding the convolutional kernel number) of the kernel space, but the other three dimensions (regarding the spatial size, the input channel number and the output channel number for each convolutional kernel) are overlooked. Inspired by this, we present Omni-dimensional Dynamic Convolution (ODConv), a more generalized yet elegant dynamic convolution design, to advance this line of research. ODConv leverages a novel multi-dimensional attention mechanism with a parallel strategy to learn complementary attentions for convolutional kernels along all four dimensions of the kernel space at any convolutional layer. As a drop-in replacement of regular convolutions, ODConv can be plugged into many CNN architectures. Extensive experiments on the ImageNet and MS-COCO datasets show that ODConv brings solid accuracy boosts for various prevailing CNN backbones including both light-weight and large ones, e.g., 3.77%~5.71%|1.86%~3.72% absolute top-1 improvements to MobivleNetV2|ResNet family on the ImageNet dataset. Intriguingly, thanks to its improved feature learning ability, ODConv with even one single kernel can compete with or outperform existing dynamic convolution counterparts with multiple kernels, substantially reducing extra parameters. Furthermore, ODConv is also superior to other attention modules for modulating the output features or the convolutional weights.Comment: Spotlight paper at ICLR 2022. Code and models are available at https://github.com/OSVAI/ODCon

arXiv.org e-Print Archive

Physics Inspired Optimization on Semantic Transfer Features: An Alternative Method for Room Layout Estimation

Author: Chen Yurong
Guo Yiwen
Lu Ming
Yao Anbang
Zhang Li
Zhao Hao
Publication venue
Publication date: 02/07/2017
Field of study

In this paper, we propose an alternative method to estimate room layouts of cluttered indoor scenes. This method enjoys the benefits of two novel techniques. The first one is semantic transfer (ST), which is: (1) a formulation to integrate the relationship between scene clutter and room layout into convolutional neural networks; (2) an architecture that can be end-to-end trained; (3) a practical strategy to initialize weights for very deep networks under unbalanced training data distribution. ST allows us to extract highly robust features under various circumstances, and in order to address the computation redundance hidden in these features we develop a principled and efficient inference scheme named physics inspired optimization (PIO). PIO's basic idea is to formulate some phenomena observed in ST features into mechanics concepts. Evaluations on public datasets LSUN and Hedau show that the proposed method is more accurate than state-of-the-art methods.Comment: To appear in CVPR 2017. Project Page: https://sites.google.com/view/st-pio

arXiv.org e-Print Archive

Crossref

A Computational Study of Negative Surface Discharges: Characteristics of Surface Streamers and Surface Charges

Author: Li Xiaoran
Sun Anbang
Teunissen Jannis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/05/2020
Field of study

We investigate the dynamics of negative surface discharges in air through numerical simulations with a 2D fluid model. A geometry consisting of a flat dielectric embedded between parallel-plate electrodes is used. Compared to negative streamers in bulk gas, negative surface streamers are observed to have a higher electron density, a higher electric field and higher propagation velocity. On the other hand, their maximum electric field and velocity are lower than for positive surface streamers. In our simulations, negative surface streamers are slower for larger relative permittivity. Negative charge accumulates on a dielectric surface when a negative streamer propagates along it, which can lead to a high electric field inside the dielectric. If we initially put negative surface charge on the dielectric, the growth of negative surface discharges is delayed or inhibited. Positive surface charge has the opposite effect.Comment: 8 page

arXiv.org e-Print Archive

CWI's Institutional Repository

A computational study of positive streamers interacting with dielectrics

Author: Li Xiaoran
Sun Anbang
Teunissen Jannis
Zhang Guanjun
Publication venue: 'IOP Publishing'
Publication date: 18/05/2020
Field of study

We use numerical simulations to study the dynamics of surface discharges, which are common in high-voltage engineering. We simulate positive streamer discharges that propagate towards a dielectric surface, attach to it, and then propagate over the surface. The simulations are performed in air with a two-dimensional plasma fluid model, in which a flat dielectric is placed between two plate electrodes. Electrostatic attraction is the main mechanism that causes streamers to grow towards the dielectric. Due to the net charge in the streamer head, the dielectric gets polarized, and the electric field between the streamer and the dielectric is increased. Compared to streamers in bulk gas, surface streamers have a smaller radius, a higher electric field, a higher electron density, and higher propagation velocity. A higher applied voltage leads to faster inception and faster propagation of the surface discharge. A higher dielectric permittivity leads to more rapid attachment of the streamer to the surface and a thinner surface streamer. Secondary emission coefficients are shown to play a modest role, which is due to relatively strong photoionization in air. In the simulations, a high electric field is present between the positive streamers and the dielectric surface. We show that the magnitude and decay of this field are affected by the positive ion mobility.Comment: 13 pages, 18 figures, 47 reference

arXiv.org e-Print Archive

CWI's Institutional Repository