516 research outputs found
Discussion of "EQUI-energy sampler" by Kou, Zhou and Wong
Discussion of ``EQUI-energy sampler'' by Kou, Zhou and Wong [math.ST/0507080]Comment: Published at http://dx.doi.org/10.1214/009053606000000506 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Learning Mixtures of Bernoulli Templates by Two-Round EM with Performance Guarantee
Dasgupta and Shulman showed that a two-round variant of the EM algorithm can
learn mixture of Gaussian distributions with near optimal precision with high
probability if the Gaussian distributions are well separated and if the
dimension is sufficiently high. In this paper, we generalize their theory to
learning mixture of high-dimensional Bernoulli templates. Each template is a
binary vector, and a template generates examples by randomly switching its
binary components independently with a certain probability. In computer vision
applications, a binary vector is a feature map of an image, where each binary
component indicates whether a local feature or structure is present or absent
within a certain cell of the image domain. A Bernoulli template can be
considered as a statistical model for images of objects (or parts of objects)
from the same category. We show that the two-round EM algorithm can learn
mixture of Bernoulli templates with near optimal precision with high
probability, if the Bernoulli templates are sufficiently different and if the
number of features is sufficiently high. We illustrate the theoretical results
by synthetic and real examples.Comment: 27 pages, 8 figure
Interpretable Convolutional Neural Networks
This paper proposes a method to modify traditional convolutional neural
networks (CNNs) into interpretable CNNs, in order to clarify knowledge
representations in high conv-layers of CNNs. In an interpretable CNN, each
filter in a high conv-layer represents a certain object part. We do not need
any annotations of object parts or textures to supervise the learning process.
Instead, the interpretable CNN automatically assigns each filter in a high
conv-layer with an object part during the learning process. Our method can be
applied to different types of CNNs with different structures. The clear
knowledge representation in an interpretable CNN can help people understand the
logics inside a CNN, i.e., based on which patterns the CNN makes the decision.
Experiments showed that filters in an interpretable CNN were more semantically
meaningful than those in traditional CNNs.Comment: In this version, we release the website of the code. Compared to the
previous version, we have corrected all values of location instability in
Table 3--6 by dividing the values by sqrt(2), i.e., a=a/sqrt(2). Such
revisions do NOT decrease the significance of the superior performance of our
method, because we make the same correction to location-instability values of
all baseline
Synthesizing Dynamic Patterns by Spatial-Temporal Generative ConvNet
Video sequences contain rich dynamic patterns, such as dynamic texture
patterns that exhibit stationarity in the temporal domain, and action patterns
that are non-stationary in either spatial or temporal domain. We show that a
spatial-temporal generative ConvNet can be used to model and synthesize dynamic
patterns. The model defines a probability distribution on the video sequence,
and the log probability is defined by a spatial-temporal ConvNet that consists
of multiple layers of spatial-temporal filters to capture spatial-temporal
patterns of different scales. The model can be learned from the training video
sequences by an "analysis by synthesis" learning algorithm that iterates the
following two steps. Step 1 synthesizes video sequences from the currently
learned model. Step 2 then updates the model parameters based on the difference
between the synthesized video sequences and the observed training sequences. We
show that the learning algorithm can synthesize realistic dynamic patterns
- …
