56,281 research outputs found
Structured learning for information retrieval
Information retrieval is the area of study concerned with the process of searching, recovering and interpreting information from large amounts of data. In this Thesis we show that many of the problems in information retrieval consist of structured learning, where the goal is to learn predictors of complex output structures, consisting of many inter-dependent variables. We then attack these problems using principled machine learning methods that are specifically suited for such scenarios. In the process of doing so, we develop new models, new model extensions and new algorithms that, when integrated with existing methodology, comprise a new set of tools for solving a variety of information retrieval problems.
Firstly, we cover the multi-label classification problem, where we seek to predict a set of labels associated with a given object; the output in this case is structured, as the output variables are interdependent. Secondly, we focus on document ranking, where given a query and a set of documents associated with it we want to rank them according to their relevance with respect to the query; here, again, we have a structured output - a ranking of documents. Thirdly, we address topic models, where we are given a set of documents and attempt to find a compact representation of them, by learning latent topics and associating a topic distribution to each document; the output is again structured, consisting of word and topic distributions.
For all the above problems, we obtain state-of-the-art solutions as attested by empirical performance in publicly available real-world datasets
Object Level Deep Feature Pooling for Compact Image Representation
Convolutional Neural Network (CNN) features have been successfully employed
in recent works as an image descriptor for various vision tasks. But the
inability of the deep CNN features to exhibit invariance to geometric
transformations and object compositions poses a great challenge for image
search. In this work, we demonstrate the effectiveness of the objectness prior
over the deep CNN features of image regions for obtaining an invariant image
representation. The proposed approach represents the image as a vector of
pooled CNN features describing the underlying objects. This representation
provides robustness to spatial layout of the objects in the scene and achieves
invariance to general geometric transformations, such as translation, rotation
and scaling. The proposed approach also leads to a compact representation of
the scene, making each image occupy a smaller memory footprint. Experiments
show that the proposed representation achieves state of the art retrieval
results on a set of challenging benchmark image datasets, while maintaining a
compact representation.Comment: Deep Vision 201
Group Invariant Deep Representations for Image Instance Retrieval
Most image instance retrieval pipelines are based on comparison of vectors
known as global image descriptors between a query image and the database
images. Due to their success in large scale image classification,
representations extracted from Convolutional Neural Networks (CNN) are quickly
gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors
for image instance retrieval. While CNN-based descriptors are generally
remarked for good retrieval performance at lower bitrates, they nevertheless
present a number of drawbacks including the lack of robustness to common object
transformations such as rotations compared with their interest point based FV
counterparts.
In this paper, we propose a method for computing invariant global descriptors
from CNNs. Our method implements a recently proposed mathematical theory for
invariance in a sensory cortex modeled as a feedforward neural network. The
resulting global descriptors can be made invariant to multiple arbitrary
transformation groups while retaining good discriminativeness.
Based on a thorough empirical evaluation using several publicly available
datasets, we show that our method is able to significantly and consistently
improve retrieval results every time a new type of invariance is incorporated.
We also show that our method which has few parameters is not prone to
overfitting: improvements generalize well across datasets with different
properties with regard to invariances. Finally, we show that our descriptors
are able to compare favourably to other state-of-the-art compact descriptors in
similar bitranges, exceeding the highest retrieval results reported in the
literature on some datasets. A dedicated dimensionality reduction step
--quantization or hashing-- may be able to further improve the competitiveness
of the descriptors
Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search
Mobile landmark search (MLS) recently receives increasing attention for its
great practical values. However, it still remains unsolved due to two important
challenges. One is high bandwidth consumption of query transmission, and the
other is the huge visual variations of query images sent from mobile devices.
In this paper, we propose a novel hashing scheme, named as canonical view based
discrete multi-modal hashing (CV-DMH), to handle these problems via a novel
three-stage learning procedure. First, a submodular function is designed to
measure visual representativeness and redundancy of a view set. With it,
canonical views, which capture key visual appearances of landmark with limited
redundancy, are efficiently discovered with an iterative mining strategy.
Second, multi-modal sparse coding is applied to transform visual features from
multiple modalities into an intermediate representation. It can robustly and
adaptively characterize visual contents of varied landmark images with certain
canonical views. Finally, compact binary codes are learned on intermediate
representation within a tailored discrete binary embedding model which
preserves visual relations of images measured with canonical views and removes
the involved noises. In this part, we develop a new augmented Lagrangian
multiplier (ALM) based optimization method to directly solve the discrete
binary codes. We can not only explicitly deal with the discrete constraint, but
also consider the bit-uncorrelated constraint and balance constraint together.
Experiments on real world landmark datasets demonstrate the superior
performance of CV-DMH over several state-of-the-art methods
- …