381,092 research outputs found
Recommended from our members
Neural Generative Models and Representation Learning for Information Retrieval
Information Retrieval (IR) concerns about the structure, analysis, organization, storage, and retrieval of information. Among different retrieval models proposed in the past decades, generative retrieval models, especially those under the statistical probabilistic framework, are one of the most popular techniques that have been widely applied to Information Retrieval problems. While they are famous for their well-grounded theory and good empirical performance in text retrieval, their applications in IR are often limited by their complexity and low extendability in the modeling of high-dimensional information. Recently, advances in deep learning techniques provide new opportunities for representation learning and generative models for information retrieval. In contrast to statistical models, neural models have much more flexibility because they model information and data correlation in latent spaces without explicitly relying on any prior knowledge. Previous studies on pattern recognition and natural language processing have shown that semantically meaningful representations of text, images, and many types of information can be acquired with neural models through supervised or unsupervised training. Nonetheless, the effectiveness of neural models for information retrieval is mostly unexplored. In this thesis, we study how to develop new generative models and representation learning frameworks with neural models for information retrieval. Specifically, our contributions include three main components: (1) Theoretical Analysis: We present the first theoretical analysis and adaptation of existing neural embedding models for ad-hoc retrieval tasks; (2) Design Practice: Based on our experience and knowledge, we show how to design an embedding-based neural generative model for practical information retrieval tasks such as personalized product search; And (3) Generic Framework: We further generalize our proposed neural generative framework for complicated heterogeneous information retrieval scenarios that concern text, images, knowledge entities, and their relationships. Empirical results show that the proposed neural generative framework can effectively learn information representations and construct retrieval models that outperform the state-of-the-art systems in a variety of IR tasks
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Detach and Adapt: Learning Cross-Domain Disentangled Deep Representation
While representation learning aims to derive interpretable features for
describing visual data, representation disentanglement further results in such
features so that particular image attributes can be identified and manipulated.
However, one cannot easily address this task without observing ground truth
annotation for the training data. To address this problem, we propose a novel
deep learning model of Cross-Domain Representation Disentangler (CDRD). By
observing fully annotated source-domain data and unlabeled target-domain data
of interest, our model bridges the information across data domains and
transfers the attribute information accordingly. Thus, cross-domain joint
feature disentanglement and adaptation can be jointly performed. In the
experiments, we provide qualitative results to verify our disentanglement
capability. Moreover, we further confirm that our model can be applied for
solving classification tasks of unsupervised domain adaptation, and performs
favorably against state-of-the-art image disentanglement and translation
methods.Comment: CVPR 2018 Spotligh
Dissimilarity-based Ensembles for Multiple Instance Learning
In multiple instance learning, objects are sets (bags) of feature vectors
(instances) rather than individual feature vectors. In this paper we address
the problem of how these bags can best be represented. Two standard approaches
are to use (dis)similarities between bags and prototype bags, or between bags
and prototype instances. The first approach results in a relatively
low-dimensional representation determined by the number of training bags, while
the second approach results in a relatively high-dimensional representation,
determined by the total number of instances in the training set. In this paper
a third, intermediate approach is proposed, which links the two approaches and
combines their strengths. Our classifier is inspired by a random subspace
ensemble, and considers subspaces of the dissimilarity space, defined by
subsets of instances, as prototypes. We provide guidelines for using such an
ensemble, and show state-of-the-art performances on a range of multiple
instance learning problems.Comment: Submitted to IEEE Transactions on Neural Networks and Learning
Systems, Special Issue on Learning in Non-(geo)metric Space
Semantic Autoencoder for Zero-Shot Learning
Existing zero-shot learning (ZSL) models typically learn a projection
function from a feature space to a semantic embedding space (e.g.~attribute
space). However, such a projection function is only concerned with predicting
the training seen class semantic representation (e.g.~attribute prediction) or
classification. When applied to test data, which in the context of ZSL contains
different (unseen) classes without training data, a ZSL model typically suffers
from the project domain shift problem. In this work, we present a novel
solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the
encoder-decoder paradigm, an encoder aims to project a visual feature vector
into the semantic space as in the existing ZSL models. However, the decoder
exerts an additional constraint, that is, the projection/code must be able to
reconstruct the original visual feature. We show that with this additional
reconstruction constraint, the learned projection function from the seen
classes is able to generalise better to the new unseen classes. Importantly,
the encoder and decoder are linear and symmetric which enable us to develop an
extremely efficient learning algorithm. Extensive experiments on six benchmark
datasets demonstrate that the proposed SAE outperforms significantly the
existing ZSL models with the additional benefit of lower computational cost.
Furthermore, when the SAE is applied to supervised clustering problem, it also
beats the state-of-the-art.Comment: accepted to CVPR201
Visual Dynamics: Stochastic Future Generation via Layered Cross Convolutional Networks
We study the problem of synthesizing a number of likely future frames from a
single input image. In contrast to traditional methods that have tackled this
problem in a deterministic or non-parametric way, we propose to model future
frames in a probabilistic manner. Our probabilistic model makes it possible for
us to sample and synthesize many possible future frames from a single input
image. To synthesize realistic movement of objects, we propose a novel network
structure, namely a Cross Convolutional Network; this network encodes image and
motion information as feature maps and convolutional kernels, respectively. In
experiments, our model performs well on synthetic data, such as 2D shapes and
animated game sprites, and on real-world video frames. We present analyses of
the learned network representations, showing it is implicitly learning a
compact encoding of object appearance and motion. We also demonstrate a few of
its applications, including visual analogy-making and video extrapolation.Comment: Journal preprint of arXiv:1607.02586 (IEEE TPAMI, 2019). The first
two authors contributed equally to this work. Project page:
http://visualdynamics.csail.mit.ed
- …