2,091 research outputs found
Compositional Model based Fisher Vector Coding for Image Classification
Deriving from the gradient vector of a generative model of local features,
Fisher vector coding (FVC) has been identified as an effective coding method
for image classification. Most, if not all, FVC implementations employ the
Gaussian mixture model (GMM) to depict the generation process of local
features. However, the representative power of the GMM could be limited because
it essentially assumes that local features can be characterized by a fixed
number of feature prototypes and the number of prototypes is usually small in
FVC. To handle this limitation, in this paper we break the convention which
assumes that a local feature is drawn from one of few Gaussian distributions.
Instead, we adopt a compositional mechanism which assumes that a local feature
is drawn from a Gaussian distribution whose mean vector is composed as the
linear combination of multiple key components and the combination weight is a
latent random variable. In this way, we can greatly enhance the representative
power of the generative model of FVC. To implement our idea, we designed two
particular generative models with such a compositional mechanism.Comment: Fixed typos. 16 pages. Appearing in IEEE T. Pattern Analysis and
Machine Intelligence (TPAMI
Supervised Dictionary Learning
It is now well established that sparse signal models are well suited to
restoration tasks and can effectively be learned from audio, image, and video
data. Recent research has been aimed at learning discriminative sparse models
instead of purely reconstructive ones. This paper proposes a new step in that
direction, with a novel sparse representation for signals belonging to
different classes in terms of a shared dictionary and multiple class-decision
functions. The linear variant of the proposed model admits a simple
probabilistic interpretation, while its most general variant admits an
interpretation in terms of kernels. An optimization framework for learning all
the components of the proposed model is presented, along with experimental
results on standard handwritten digit and texture classification tasks
Video Compressive Sensing for Dynamic MRI
We present a video compressive sensing framework, termed kt-CSLDS, to
accelerate the image acquisition process of dynamic magnetic resonance imaging
(MRI). We are inspired by a state-of-the-art model for video compressive
sensing that utilizes a linear dynamical system (LDS) to model the motion
manifold. Given compressive measurements, the state sequence of an LDS can be
first estimated using system identification techniques. We then reconstruct the
observation matrix using a joint structured sparsity assumption. In particular,
we minimize an objective function with a mixture of wavelet sparsity and joint
sparsity within the observation matrix. We derive an efficient convex
optimization algorithm through alternating direction method of multipliers
(ADMM), and provide a theoretical guarantee for global convergence. We
demonstrate the performance of our approach for video compressive sensing, in
terms of reconstruction accuracy. We also investigate the impact of various
sampling strategies. We apply this framework to accelerate the acquisition
process of dynamic MRI and show it achieves the best reconstruction accuracy
with the least computational time compared with existing algorithms in the
literature.Comment: 30 pages, 9 figure
Compact Bilinear Pooling
Bilinear models has been shown to achieve impressive performance on a wide
range of visual tasks, such as semantic segmentation, fine grained recognition
and face recognition. However, bilinear features are high dimensional,
typically on the order of hundreds of thousands to a few million, which makes
them impractical for subsequent analysis. We propose two compact bilinear
representations with the same discriminative power as the full bilinear
representation but with only a few thousand dimensions. Our compact
representations allow back-propagation of classification errors enabling an
end-to-end optimization of the visual recognition system. The compact bilinear
representations are derived through a novel kernelized analysis of bilinear
pooling which provide insights into the discriminative power of bilinear
pooling, and a platform for further research in compact pooling methods.
Experimentation illustrate the utility of the proposed representations for
image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP
- …