14,138 research outputs found
Sparse Transfer Learning for Interactive Video Search Reranking
Visual reranking is effective to improve the performance of the text-based
video search. However, existing reranking algorithms can only achieve limited
improvement because of the well-known semantic gap between low level visual
features and high level semantic concepts. In this paper, we adopt interactive
video search reranking to bridge the semantic gap by introducing user's
labeling effort. We propose a novel dimension reduction tool, termed sparse
transfer learning (STL), to effectively and efficiently encode user's labeling
information. STL is particularly designed for interactive video search
reranking. Technically, it a) considers the pair-wise discriminative
information to maximally separate labeled query relevant samples from labeled
query irrelevant ones, b) achieves a sparse representation for the subspace to
encodes user's intention by applying the elastic net penalty, and c) propagates
user's labeling information from labeled samples to unlabeled samples by using
the data distribution knowledge. We conducted extensive experiments on the
TRECVID 2005, 2006 and 2007 benchmark datasets and compared STL with popular
dimension reduction algorithms. We report superior performance by using the
proposed STL based interactive video search reranking.Comment: 17 page
Latent Fisher Discriminant Analysis
Linear Discriminant Analysis (LDA) is a well-known method for dimensionality
reduction and classification. Previous studies have also extended the
binary-class case into multi-classes. However, many applications, such as
object detection and keyframe extraction cannot provide consistent
instance-label pairs, while LDA requires labels on instance level for training.
Thus it cannot be directly applied for semi-supervised classification problem.
In this paper, we overcome this limitation and propose a latent variable Fisher
discriminant analysis model. We relax the instance-level labeling into
bag-level, is a kind of semi-supervised (video-level labels of event type are
required for semantic frame extraction) and incorporates a data-driven prior
over the latent variables. Hence, our method combines the latent variable
inference and dimension reduction in an unified bayesian framework. We test our
method on MUSK and Corel data sets and yield competitive results compared to
the baseline approach. We also demonstrate its capacity on the challenging
TRECVID MED11 dataset for semantic keyframe extraction and conduct a
human-factors ranking-based experimental evaluation, which clearly demonstrates
our proposed method consistently extracts more semantically meaningful
keyframes than challenging baselines.Comment: 12 page
Bounded-Distortion Metric Learning
Metric learning aims to embed one metric space into another to benefit tasks
like classification and clustering. Although a greatly distorted metric space
has a high degree of freedom to fit training data, it is prone to overfitting
and numerical inaccuracy. This paper presents {\it bounded-distortion metric
learning} (BDML), a new metric learning framework which amounts to finding an
optimal Mahalanobis metric space with a bounded-distortion constraint. An
efficient solver based on the multiplicative weights update method is proposed.
Moreover, we generalize BDML to pseudo-metric learning and devise the
semidefinite relaxation and a randomized algorithm to approximately solve it.
We further provide theoretical analysis to show that distortion is a key
ingredient for stability and generalization ability of our BDML algorithm.
Extensive experiments on several benchmark datasets yield promising results
Regression on fixed-rank positive semidefinite matrices: a Riemannian approach
The paper addresses the problem of learning a regression model parameterized
by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear
nature of the search space and on scalability to high-dimensional problems. The
mathematical developments rely on the theory of gradient descent algorithms
adapted to the Riemannian geometry that underlies the set of fixed-rank
positive semidefinite matrices. In contrast with previous contributions in the
literature, no restrictions are imposed on the range space of the learned
matrix. The resulting algorithms maintain a linear complexity in the problem
size and enjoy important invariance properties. We apply the proposed
algorithms to the problem of learning a distance function parameterized by a
positive semidefinite matrix. Good performance is observed on classical
benchmarks
Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression
This paper addresses the problem of localizing audio sources using binaural
measurements. We propose a supervised formulation that simultaneously localizes
multiple sources at different locations. The approach is intrinsically
efficient because, contrary to prior work, it relies neither on source
separation, nor on monaural segregation. The method starts with a training
stage that establishes a locally-linear Gaussian regression model between the
directional coordinates of all the sources and the auditory features extracted
from binaural measurements. While fixed-length wide-spectrum sounds (white
noise) are used for training to reliably estimate the model parameters, we show
that the testing (localization) can be extended to variable-length
sparse-spectrum sounds (such as speech), thus enabling a wide range of
realistic applications. Indeed, we demonstrate that the method can be used for
audio-visual fusion, namely to map speech signals onto images and hence to
spatially align the audio and visual modalities, thus enabling to discriminate
between speaking and non-speaking faces. We release a novel corpus of real-room
recordings that allow quantitative evaluation of the co-localization method in
the presence of one or two sound sources. Experiments demonstrate increased
accuracy and speed relative to several state-of-the-art methods.Comment: 15 pages, 8 figure
- …