45,378 research outputs found
Video Logo Retrieval based on local Features
Estimation of the frequency and duration of logos in videos is important and
challenging in the advertisement industry as a way of estimating the impact of
ad purchases. Since logos occupy only a small area in the videos, the popular
methods of image retrieval could fail. This paper develops an algorithm called
Video Logo Retrieval (VLR), which is an image-to-video retrieval algorithm
based on the spatial distribution of local image descriptors that measure the
distance between the query image (the logo) and a collection of video images.
VLR uses local features to overcome the weakness of global feature-based models
such as convolutional neural networks (CNN). Meanwhile, VLR is flexible and
does not require training after setting some hyper-parameters. The performance
of VLR is evaluated on two challenging open benchmark tasks (SoccerNet and
Standford I2V), and compared with other state-of-the-art logo retrieval or
detection algorithms. Overall, VLR shows significantly higher accuracy compared
with the existing methods.Comment: Accepted by ICIP 20. Contact author: Bochen Guan ([email protected]
Geodesics on the manifold of multivariate generalized Gaussian distributions with an application to multicomponent texture discrimination
We consider the Rao geodesic distance (GD) based on the Fisher information as a similarity measure on the manifold of zero-mean multivariate generalized Gaussian distributions (MGGD). The MGGD is shown to be an adequate model for the heavy-tailed wavelet statistics in multicomponent images, such as color or multispectral images. We discuss the estimation of MGGD parameters using various methods. We apply the GD between MGGDs to color texture discrimination in several classification experiments, taking into account the correlation structure between the spectral bands in the wavelet domain. We compare the performance, both in terms of texture discrimination capability and computational load, of the GD and the Kullback-Leibler divergence (KLD). Likewise, both uni- and multivariate generalized Gaussian models are evaluated, characterized by a fixed or a variable shape parameter. The modeling of the interband correlation significantly improves classification efficiency, while the GD is shown to consistently outperform the KLD as a similarity measure
An Appearance-Based Framework for 3D Hand Shape Classification and Camera Viewpoint Estimation
An appearance-based framework for 3D hand shape classification and simultaneous camera viewpoint estimation is presented. Given an input image of a segmented hand, the most similar matches from a large database of synthetic hand images are retrieved. The ground truth labels of those matches, containing hand shape and camera viewpoint information, are returned by the system as estimates for the input image. Database retrieval is done hierarchically, by first quickly rejecting the vast majority of all database views, and then ranking the remaining candidates in order of similarity to the input. Four different similarity measures are employed, based on edge location, edge orientation, finger location and geometric moments.National Science Foundation (IIS-9912573, EIA-9809340
Learning to Navigate the Energy Landscape
In this paper, we present a novel and efficient architecture for addressing
computer vision problems that use `Analysis by Synthesis'. Analysis by
synthesis involves the minimization of the reconstruction error which is
typically a non-convex function of the latent target variables.
State-of-the-art methods adopt a hybrid scheme where discriminatively trained
predictors like Random Forests or Convolutional Neural Networks are used to
initialize local search algorithms. While these methods have been shown to
produce promising results, they often get stuck in local optima. Our method
goes beyond the conventional hybrid architecture by not only proposing multiple
accurate initial solutions but by also defining a navigational structure over
the solution space that can be used for extremely efficient gradient-free local
search. We demonstrate the efficacy of our approach on the challenging problem
of RGB Camera Relocalization. To make the RGB camera relocalization problem
particularly challenging, we introduce a new dataset of 3D environments which
are significantly larger than those found in other publicly-available datasets.
Our experiments reveal that the proposed method is able to achieve
state-of-the-art camera relocalization results. We also demonstrate the
generalizability of our approach on Hand Pose Estimation and Image Retrieval
tasks
Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions
Visual localization enables autonomous vehicles to navigate in their
surroundings and augmented reality applications to link virtual to real worlds.
Practical visual localization approaches need to be robust to a wide variety of
viewing condition, including day-night changes, as well as weather and seasonal
variations, while providing highly accurate 6 degree-of-freedom (6DOF) camera
pose estimates. In this paper, we introduce the first benchmark datasets
specifically designed for analyzing the impact of such factors on visual
localization. Using carefully created ground truth poses for query images taken
under a wide variety of conditions, we evaluate the impact of various factors
on 6DOF camera pose estimation accuracy through extensive experiments with
state-of-the-art localization approaches. Based on our results, we draw
conclusions about the difficulty of different conditions, showing that
long-term localization is far from solved, and propose promising avenues for
future work, including sequence-based localization approaches and the need for
better local features. Our benchmark is available at visuallocalization.net.Comment: Accepted to CVPR 2018 as a spotligh
A framework for improving the performance of verification algorithms with a low false positive rate requirement and limited training data
In this paper we address the problem of matching patterns in the so-called
verification setting in which a novel, query pattern is verified against a
single training pattern: the decision sought is whether the two match (i.e.
belong to the same class) or not. Unlike previous work which has universally
focused on the development of more discriminative distance functions between
patterns, here we consider the equally important and pervasive task of
selecting a distance threshold which fits a particular operational requirement
- specifically, the target false positive rate (FPR). First, we argue on
theoretical grounds that a data-driven approach is inherently ill-conditioned
when the desired FPR is low, because by the very nature of the challenge only a
small portion of training data affects or is affected by the desired threshold.
This leads us to propose a general, statistical model-based method instead. Our
approach is based on the interpretation of an inter-pattern distance as
implicitly defining a pattern embedding which approximately distributes
patterns according to an isotropic multi-variate normal distribution in some
space. This interpretation is then used to show that the distribution of
training inter-pattern distances is the non-central chi2 distribution,
differently parameterized for each class. Thus, to make the class-specific
threshold choice we propose a novel analysis-by-synthesis iterative algorithm
which estimates the three free parameters of the model (for each class) using
task-specific constraints. The validity of the premises of our work and the
effectiveness of the proposed method are demonstrated by applying the method to
the task of set-based face verification on a large database of pseudo-random
head motion videos.Comment: IEEE/IAPR International Joint Conference on Biometrics, 201
Pose Embeddings: A Deep Architecture for Learning to Match Human Poses
We present a method for learning an embedding that places images of humans in
similar poses nearby. This embedding can be used as a direct method of
comparing images based on human pose, avoiding potential challenges of
estimating body joint positions. Pose embedding learning is formulated under a
triplet-based distance criterion. A deep architecture is used to allow learning
of a representation capable of making distinctions between different poses.
Experiments on human pose matching and retrieval from video data demonstrate
the potential of the method
- …