26 research outputs found
Word-Entity Duet Representations for Document Ranking
This paper presents a word-entity duet framework for utilizing knowledge
bases in ad-hoc retrieval. In this work, the query and documents are modeled by
word-based representations and entity-based representations. Ranking features
are generated by the interactions between the two representations,
incorporating information from the word space, the entity space, and the
cross-space connections through the knowledge graph. To handle the
uncertainties from the automatically constructed entity representations, an
attention-based ranking model AttR-Duet is developed. With back-propagation
from ranking labels, the model learns simultaneously how to demote noisy
entities and how to rank documents with the word-entity duet. Evaluation
results on TREC Web Track ad-hoc task demonstrate that all of the four-way
interactions in the duet are useful, the attention mechanism successfully
steers the model away from noisy entities, and together they significantly
outperform both word-based and entity-based learning to rank systems
Knowledge-rich Image Gist Understanding Beyond Literal Meaning
We investigate the problem of understanding the message (gist) conveyed by
images and their captions as found, for instance, on websites or news articles.
To this end, we propose a methodology to capture the meaning of image-caption
pairs on the basis of large amounts of machine-readable knowledge that has
previously been shown to be highly effective for text understanding. Our method
identifies the connotation of objects beyond their denotation: where most
approaches to image understanding focus on the denotation of objects, i.e.,
their literal meaning, our work addresses the identification of connotations,
i.e., iconic meanings of objects, to understand the message of images. We view
image understanding as the task of representing an image-caption pair on the
basis of a wide-coverage vocabulary of concepts such as the one provided by
Wikipedia, and cast gist detection as a concept-ranking problem with
image-caption pairs as queries. To enable a thorough investigation of the
problem of gist understanding, we produce a gold standard of over 300
image-caption pairs and over 8,000 gist annotations covering a wide variety of
topics at different levels of abstraction. We use this dataset to
experimentally benchmark the contribution of signals from heterogeneous
sources, namely image and text. The best result with a Mean Average Precision
(MAP) of 0.69 indicate that by combining both dimensions we are able to better
understand the meaning of our image-caption pairs than when using language or
vision information alone. We test the robustness of our gist detection approach
when receiving automatically generated input, i.e., using automatically
generated image tags or generated captions, and prove the feasibility of an
end-to-end automated process
Ad Hoc Table Retrieval using Semantic Similarity
We introduce and address the problem of ad hoc table retrieval: answering a
keyword query with a ranked list of tables. This task is not only interesting
on its own account, but is also being used as a core component in many other
table-based information access scenarios, such as table completion or table
mining. The main novel contribution of this work is a method for performing
semantic matching between queries and tables. Specifically, we (i) represent
queries and tables in multiple semantic spaces (both discrete sparse and
continuous dense vector representations) and (ii) introduce various similarity
measures for matching those semantic representations. We consider all possible
combinations of semantic representations and similarity measures and use these
as features in a supervised learning model. Using a purpose-built test
collection based on Wikipedia tables, we demonstrate significant and
substantial improvements over a state-of-the-art baseline.Comment: The web conference 2018 (WWW'18
DREQ: Document Re-Ranking Using Entity-based Query Understanding
While entity-oriented neural IR models have advanced significantly, they
often overlook a key nuance: the varying degrees of influence individual
entities within a document have on its overall relevance. Addressing this gap,
we present DREQ, an entity-oriented dense document re-ranking model. Uniquely,
we emphasize the query-relevant entities within a document's representation
while simultaneously attenuating the less relevant ones, thus obtaining a
query-specific entity-centric document representation. We then combine this
entity-centric document representation with the text-centric representation of
the document to obtain a "hybrid" representation of the document. We learn a
relevance score for the document using this hybrid representation. Using four
large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural
and non-neural re-ranking methods, highlighting the effectiveness of our
entity-oriented representation approach.Comment: To be presented as a full paper at ECIR 2024 in Glasgpow, U
Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling
This paper presents a Kernel Entity Salience Model (KESM) that improves text
understanding and retrieval by better estimating entity salience (importance)
in documents. KESM represents entities by knowledge enriched distributed
representations, models the interactions between entities and words by kernels,
and combines the kernel scores to estimate entity salience. The whole model is
learned end-to-end using entity salience labels. The salience model also
improves ad hoc search accuracy, providing effective ranking features by
modeling the salience of query entities in candidate documents. Our experiments
on two entity salience corpora and two TREC ad hoc search datasets demonstrate
the effectiveness of KESM over frequency-based and feature-based methods. We
also provide examples showing how KESM conveys its text understanding ability
learned from entity salience to search
Learning to Ask: Question-based Sequential Bayesian Product Search
Product search is generally recognized as the first and foremost stage of
online shopping and thus significant for users and retailers of e-commerce.
Most of the traditional retrieval methods use some similarity functions to
match the user's query and the document that describes a product, either
directly or in a latent vector space. However, user queries are often too
general to capture the minute details of the specific product that a user is
looking for. In this paper, we propose a novel interactive method to
effectively locate the best matching product. The method is based on the
assumption that there is a set of candidate questions for each product to be
asked. In this work, we instantiate this candidate set by making the hypothesis
that products can be discriminated by the entities that appear in the documents
associated with them. We propose a Question-based Sequential Bayesian Product
Search method, QSBPS, which directly queries users on the expected presence of
entities in the relevant product documents. The method learns the product
relevance as well as the reward of the potential questions to be asked to the
user by being trained on the search history and purchase behavior of a specific
user together with that of other users. The experimental results show that the
proposed method can greatly improve the performance of product search compared
to the state-of-the-art baselines.Comment: This paper is accepted by CIKM 201
DREQ: Document Re-Ranking Using Entity-based Query Understanding
While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document’s representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a “hybrid” representation of the document. We learn a relevance score for the document using this hybrid representation. Using four largescale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach