301 research outputs found
Learning Mixtures of Bernoulli Templates by Two-Round EM with Performance Guarantee
Dasgupta and Shulman showed that a two-round variant of the EM algorithm can
learn mixture of Gaussian distributions with near optimal precision with high
probability if the Gaussian distributions are well separated and if the
dimension is sufficiently high. In this paper, we generalize their theory to
learning mixture of high-dimensional Bernoulli templates. Each template is a
binary vector, and a template generates examples by randomly switching its
binary components independently with a certain probability. In computer vision
applications, a binary vector is a feature map of an image, where each binary
component indicates whether a local feature or structure is present or absent
within a certain cell of the image domain. A Bernoulli template can be
considered as a statistical model for images of objects (or parts of objects)
from the same category. We show that the two-round EM algorithm can learn
mixture of Bernoulli templates with near optimal precision with high
probability, if the Bernoulli templates are sufficiently different and if the
number of features is sufficiently high. We illustrate the theoretical results
by synthetic and real examples.Comment: 27 pages, 8 figure
Learning Spatially-Adaptive Squeeze-Excitation Networks for Image Synthesis and Image Recognition
Learning light-weight yet expressive deep networks in both image synthesis
and image recognition remains a challenging problem. Inspired by a more recent
observation that it is the data-specificity that makes the multi-head
self-attention (MHSA) in the Transformer model so powerful, this paper proposes
to extend the widely adopted light-weight Squeeze-Excitation (SE) module to be
spatially-adaptive to reinforce its data specificity, as a convolutional
alternative of the MHSA, while retaining the efficiency of SE and the inductive
basis of convolution. It presents two designs of spatially-adaptive
squeeze-excitation (SASE) modules for image synthesis and image recognition
respectively. For image synthesis tasks, the proposed SASE is tested in both
low-shot and one-shot learning tasks. It shows better performance than prior
arts. For image recognition tasks, the proposed SASE is used as a drop-in
replacement for convolution layers in ResNets and achieves much better accuracy
than the vanilla ResNets, and slightly better than the MHSA counterparts such
as the Swin-Transformer and Pyramid-Transformer in the ImageNet-1000 dataset,
with significantly smaller models
- …