Search CORE

4 research outputs found

Learning Distributions via Monte-Carlo Marginalization

Author: Basu Anup
Dong Guanfang
Zhao Chenqiu
Publication venue
Publication date: 11/08/2023
Field of study

We propose a novel method to learn intractable distributions from their samples. The main idea is to use a parametric distribution model, such as a Gaussian Mixture Model (GMM), to approximate intractable distributions by minimizing the KL-divergence. Based on this idea, there are two challenges that need to be addressed. First, the computational complexity of KL-divergence is unacceptable when the dimensions of distributions increases. The Monte-Carlo Marginalization (MCMarg) is proposed to address this issue. The second challenge is the differentiability of the optimization process, since the target distribution is intractable. We handle this problem by using Kernel Density Estimation (KDE). The proposed approach is a powerful tool to learn complex distributions and the entire process is differentiable. Thus, it can be a better substitute of the variational inference in variational auto-encoders (VAE). One strong evidence of the benefit of our method is that the distributions learned by the proposed approach can generate better images even based on a pre-trained VAE's decoder. Based on this point, we devise a distribution learning auto-encoder which is better than VAE under the same network architecture. Experiments on standard dataset and synthetic data demonstrate the efficiency of the proposed approach

arXiv.org e-Print Archive

Affine-Transformation-Invariant Image Classification by Differentiable Arithmetic Distribution Module

Author: Basu Anup
Dong Guanfang
Tan Zijie
Zhao Chenqiu
Publication venue
Publication date: 01/09/2023
Field of study

Although Convolutional Neural Networks (CNNs) have achieved promising results in image classification, they still are vulnerable to affine transformations including rotation, translation, flip and shuffle. The drawback motivates us to design a module which can alleviate the impact from different affine transformations. Thus, in this work, we introduce a more robust substitute by incorporating distribution learning techniques, focusing particularly on learning the spatial distribution information of pixels in images. To rectify the issue of non-differentiability of prior distribution learning methods that rely on traditional histograms, we adopt the Kernel Density Estimation (KDE) to formulate differentiable histograms. On this foundation, we present a novel Differentiable Arithmetic Distribution Module (DADM), which is designed to extract the intrinsic probability distributions from images. The proposed approach is able to enhance the model's robustness to affine transformations without sacrificing its feature extraction capabilities, thus bridging the gap between traditional CNNs and distribution-based learning. We validate the effectiveness of the proposed approach through ablation study and comparative experiments with LeNet

arXiv.org e-Print Archive

Is Deep Learning Network Necessary for Image Generation?

Author: Basu Anup
Dong Guanfang
Zhao Chenqiu
Publication venue
Publication date: 02/11/2023
Field of study

Recently, images are considered samples from a high-dimensional distribution, and deep learning has become almost synonymous with image generation. However, is a deep learning network truly necessary for image generation? In this paper, we investigate the possibility of image generation without using a deep learning network, motivated by validating the assumption that images follow a high-dimensional distribution. Since images are assumed to be samples from such a distribution, we utilize the Gaussian Mixture Model (GMM) to describe it. In particular, we employ a recent distribution learning technique named as Monte-Carlo Marginalization to capture the parameters of the GMM based on image samples. Moreover, we also use the Singular Value Decomposition (SVD) for dimensionality reduction to decrease computational complexity. During our evaluation experiment, we first attempt to model the distribution of image samples directly to verify the assumption that images truly follow a distribution. We then use the SVD for dimensionality reduction. The principal components, rather than raw image data, are used for distribution learning. Compared to methods relying on deep learning networks, our approach is more explainable, and its performance is promising. Experiments show that our images have a lower FID value compared to those generated by variational auto-encoders, demonstrating the feasibility of image generation without deep learning networks.Comment: This paper has been reject. I am planning to combine this paper with my another paper to make one strong pape

arXiv.org e-Print Archive

Frequency Regularization: Reducing Information Redundancy in Convolutional Neural Networks

Author: Anup Basu
Chenqiu Zhao
Guanfang Dong
Shupei Zhang
Zijie Tan
Publication venue: IEEE
Publication date: 01/01/2023
Field of study

Convolutional neural networks have demonstrated impressive results in many computer vision tasks. However, the increasing size of these networks raises concerns about the information overload resulting from the large number of network parameters. In this paper, we propose Frequency Regularization to restrict the non-zero elements of the network parameters in the frequency domain. The proposed approach operates at the tensor level, and can be applied to almost all network architectures. Specifically, the tensors of parameters are maintained in the frequency domain, where high-frequency components can be eliminated by zigzag setting tensor elements to zero. Then, the inverse discrete cosine transform (IDCT) is used to reconstruct the spatial tensors for matrix operations during network training. Since high-frequency components of images are known to be less critical, a large proportion of these parameters can be set to zero when networks are trained with the proposed frequency regularization. Comprehensive evaluations on various state-of-the-art network architectures, including LeNet, Alexnet, VGG, Resnet, ViT, UNet, GAN, and VAE, demonstrate the effectiveness of the proposed frequency regularization. For a very small accuracy decrease (less than 2%), a LeNet5 with 0.4M parameters can be represented by only 776 float16 numbers (over

1100\times

reduction), and a UNet with 34M parameters can be represented by only 759 float16 numbers (over

80000\times

reduction). In particular, the original size of the UNet model is reduced from 366 Mb to 4.5 Kb

Directory of Open Access Journals