7,070 research outputs found
Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection
Modeling hypernymy, such as poodle is-a dog, is an important generalization
aid to many NLP tasks, such as entailment, coreference, relation extraction,
and question answering. Supervised learning from labeled hypernym sources, such
as WordNet, limits the coverage of these models, which can be addressed by
learning hypernyms from unlabeled text. Existing unsupervised methods either do
not scale to large vocabularies or yield unacceptably poor accuracy. This paper
introduces distributional inclusion vector embedding (DIVE), a
simple-to-implement unsupervised method of hypernym discovery via per-word
non-negative vector embeddings which preserve the inclusion property of word
contexts in a low-dimensional and interpretable space. In experimental
evaluations more comprehensive than any previous literature of which we are
aware-evaluating on 11 datasets using multiple existing as well as newly
proposed scoring functions-we find that our method provides up to double the
precision of previous unsupervised embeddings, and the highest average
performance, using a much more compact word representation, and yielding many
new state-of-the-art results.Comment: NAACL 201
Unsupervised Visual and Textual Information Fusion in Multimedia Retrieval - A Graph-based Point of View
Multimedia collections are more than ever growing in size and diversity.
Effective multimedia retrieval systems are thus critical to access these
datasets from the end-user perspective and in a scalable way. We are interested
in repositories of image/text multimedia objects and we study multimodal
information fusion techniques in the context of content based multimedia
information retrieval. We focus on graph based methods which have proven to
provide state-of-the-art performances. We particularly examine two of such
methods : cross-media similarities and random walk based scores. From a
theoretical viewpoint, we propose a unifying graph based framework which
encompasses the two aforementioned approaches. Our proposal allows us to
highlight the core features one should consider when using a graph based
technique for the combination of visual and textual information. We compare
cross-media and random walk based results using three different real-world
datasets. From a practical standpoint, our extended empirical analysis allow us
to provide insights and guidelines about the use of graph based methods for
multimodal information fusion in content based multimedia information
retrieval.Comment: An extended version of the paper: Visual and Textual Information
Fusion in Multimedia Retrieval using Semantic Filtering and Graph based
Methods, by J. Ah-Pine, G. Csurka and S. Clinchant, submitted to ACM
Transactions on Information System
Knowledge-aware Complementary Product Representation Learning
Learning product representations that reflect complementary relationship
plays a central role in e-commerce recommender system. In the absence of the
product relationships graph, which existing methods rely on, there is a need to
detect the complementary relationships directly from noisy and sparse customer
purchase activities. Furthermore, unlike simple relationships such as
similarity, complementariness is asymmetric and non-transitive. Standard usage
of representation learning emphasizes on only one set of embedding, which is
problematic for modelling such properties of complementariness. We propose
using knowledge-aware learning with dual product embedding to solve the above
challenges. We encode contextual knowledge into product representation by
multi-task learning, to alleviate the sparsity issue. By explicitly modelling
with user bias terms, we separate the noise of customer-specific preferences
from the complementariness. Furthermore, we adopt the dual embedding framework
to capture the intrinsic properties of complementariness and provide geometric
interpretation motivated by the classic separating hyperplane theory. Finally,
we propose a Bayesian network structure that unifies all the components, which
also concludes several popular models as special cases. The proposed method
compares favourably to state-of-art methods, in downstream classification and
recommendation tasks. We also develop an implementation that scales efficiently
to a dataset with millions of items and customers
- …