4 research outputs found
Learning Distributions via Monte-Carlo Marginalization
We propose a novel method to learn intractable distributions from their
samples. The main idea is to use a parametric distribution model, such as a
Gaussian Mixture Model (GMM), to approximate intractable distributions by
minimizing the KL-divergence. Based on this idea, there are two challenges that
need to be addressed. First, the computational complexity of KL-divergence is
unacceptable when the dimensions of distributions increases. The Monte-Carlo
Marginalization (MCMarg) is proposed to address this issue. The second
challenge is the differentiability of the optimization process, since the
target distribution is intractable. We handle this problem by using Kernel
Density Estimation (KDE). The proposed approach is a powerful tool to learn
complex distributions and the entire process is differentiable. Thus, it can be
a better substitute of the variational inference in variational auto-encoders
(VAE). One strong evidence of the benefit of our method is that the
distributions learned by the proposed approach can generate better images even
based on a pre-trained VAE's decoder. Based on this point, we devise a
distribution learning auto-encoder which is better than VAE under the same
network architecture. Experiments on standard dataset and synthetic data
demonstrate the efficiency of the proposed approach
Affine-Transformation-Invariant Image Classification by Differentiable Arithmetic Distribution Module
Although Convolutional Neural Networks (CNNs) have achieved promising results
in image classification, they still are vulnerable to affine transformations
including rotation, translation, flip and shuffle. The drawback motivates us to
design a module which can alleviate the impact from different affine
transformations. Thus, in this work, we introduce a more robust substitute by
incorporating distribution learning techniques, focusing particularly on
learning the spatial distribution information of pixels in images. To rectify
the issue of non-differentiability of prior distribution learning methods that
rely on traditional histograms, we adopt the Kernel Density Estimation (KDE) to
formulate differentiable histograms. On this foundation, we present a novel
Differentiable Arithmetic Distribution Module (DADM), which is designed to
extract the intrinsic probability distributions from images. The proposed
approach is able to enhance the model's robustness to affine transformations
without sacrificing its feature extraction capabilities, thus bridging the gap
between traditional CNNs and distribution-based learning. We validate the
effectiveness of the proposed approach through ablation study and comparative
experiments with LeNet
Is Deep Learning Network Necessary for Image Generation?
Recently, images are considered samples from a high-dimensional distribution,
and deep learning has become almost synonymous with image generation. However,
is a deep learning network truly necessary for image generation? In this paper,
we investigate the possibility of image generation without using a deep
learning network, motivated by validating the assumption that images follow a
high-dimensional distribution. Since images are assumed to be samples from such
a distribution, we utilize the Gaussian Mixture Model (GMM) to describe it. In
particular, we employ a recent distribution learning technique named as
Monte-Carlo Marginalization to capture the parameters of the GMM based on image
samples. Moreover, we also use the Singular Value Decomposition (SVD) for
dimensionality reduction to decrease computational complexity. During our
evaluation experiment, we first attempt to model the distribution of image
samples directly to verify the assumption that images truly follow a
distribution. We then use the SVD for dimensionality reduction. The principal
components, rather than raw image data, are used for distribution learning.
Compared to methods relying on deep learning networks, our approach is more
explainable, and its performance is promising. Experiments show that our images
have a lower FID value compared to those generated by variational
auto-encoders, demonstrating the feasibility of image generation without deep
learning networks.Comment: This paper has been reject. I am planning to combine this paper with
my another paper to make one strong pape
Frequency Regularization: Reducing Information Redundancy in Convolutional Neural Networks
Convolutional neural networks have demonstrated impressive results in many computer vision tasks. However, the increasing size of these networks raises concerns about the information overload resulting from the large number of network parameters. In this paper, we propose Frequency Regularization to restrict the non-zero elements of the network parameters in the frequency domain. The proposed approach operates at the tensor level, and can be applied to almost all network architectures. Specifically, the tensors of parameters are maintained in the frequency domain, where high-frequency components can be eliminated by zigzag setting tensor elements to zero. Then, the inverse discrete cosine transform (IDCT) is used to reconstruct the spatial tensors for matrix operations during network training. Since high-frequency components of images are known to be less critical, a large proportion of these parameters can be set to zero when networks are trained with the proposed frequency regularization. Comprehensive evaluations on various state-of-the-art network architectures, including LeNet, Alexnet, VGG, Resnet, ViT, UNet, GAN, and VAE, demonstrate the effectiveness of the proposed frequency regularization. For a very small accuracy decrease (less than 2%), a LeNet5 with 0.4M parameters can be represented by only 776 float16 numbers (over reduction), and a UNet with 34M parameters can be represented by only 759 float16 numbers (over reduction). In particular, the original size of the UNet model is reduced from 366 Mb to 4.5 Kb