2,718 research outputs found
Sequence to Sequence Learning for Query Expansion
Using sequence to sequence algorithms for query expansion has not been
explored yet in Information Retrieval literature nor in Question-Answering's.
We tried to fill this gap in the literature with a custom Query Expansion
engine trained and tested on open datasets. Starting from open datasets, we
built a Query Expansion training set using sentence-embeddings-based Keyword
Extraction. We therefore assessed the ability of the Sequence to Sequence
neural networks to capture expanding relations in the words embeddings' space.Comment: 8 pages, 2 figures, AAAI-19 Student Abstract and Poster Progra
Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search
Retrieval pipelines commonly rely on a term-based search to obtain candidate
records, which are subsequently re-ranked. Some candidates are missed by this
approach, e.g., due to a vocabulary mismatch. We address this issue by
replacing the term-based search with a generic k-NN retrieval algorithm, where
a similarity function can take into account subtle term associations. While an
exact brute-force k-NN search using this similarity function is slow, we
demonstrate that an approximate algorithm can be nearly two orders of magnitude
faster at the expense of only a small loss in accuracy. A retrieval pipeline
using an approximate k-NN search can be more effective and efficient than the
term-based pipeline. This opens up new possibilities for designing effective
retrieval pipelines. Our software (including data-generating code) and
derivative data based on the Stack Overflow collection is available online
Automatic Synonym Discovery with Knowledge Bases
Recognizing entity synonyms from text has become a crucial task in many
entity-leveraging applications. However, discovering entity synonyms from
domain-specific text corpora (e.g., news articles, scientific papers) is rather
challenging. Current systems take an entity name string as input to find out
other names that are synonymous, ignoring the fact that often times a name
string can refer to multiple entities (e.g., "apple" could refer to both Apple
Inc and the fruit apple). Moreover, most existing methods require training data
manually created by domain experts to construct supervised-learning systems. In
this paper, we study the problem of automatic synonym discovery with knowledge
bases, that is, identifying synonyms for knowledge base entities in a given
domain-specific corpus. The manually-curated synonyms for each entity stored in
a knowledge base not only form a set of name strings to disambiguate the
meaning for each other, but also can serve as "distant" supervision to help
determine important features for the task. We propose a novel framework, called
DPE, to integrate two kinds of mutually-complementing signals for synonym
discovery, i.e., distributional features based on corpus-level statistics and
textual patterns based on local contexts. In particular, DPE jointly optimizes
the two kinds of signals in conjunction with distant supervision, so that they
can mutually enhance each other in the training stage. At the inference stage,
both signals will be utilized to discover synonyms for the given entities.
Experimental results prove the effectiveness of the proposed framework
Word-Entity Duet Representations for Document Ranking
This paper presents a word-entity duet framework for utilizing knowledge
bases in ad-hoc retrieval. In this work, the query and documents are modeled by
word-based representations and entity-based representations. Ranking features
are generated by the interactions between the two representations,
incorporating information from the word space, the entity space, and the
cross-space connections through the knowledge graph. To handle the
uncertainties from the automatically constructed entity representations, an
attention-based ranking model AttR-Duet is developed. With back-propagation
from ranking labels, the model learns simultaneously how to demote noisy
entities and how to rank documents with the word-entity duet. Evaluation
results on TREC Web Track ad-hoc task demonstrate that all of the four-way
interactions in the duet are useful, the attention mechanism successfully
steers the model away from noisy entities, and together they significantly
outperform both word-based and entity-based learning to rank systems
Question Answering in Conversations: Query Refinement Using Contextual and Semantic Information
This paper introduces a query refinement method applied to questions asked by users to a system during a meeting or a conversation that they have with other users. To answer the questions, the proposed method leverages the local context of the conversation along with semantic resources, either WordNet or word embeddings from word2vec. The method first represents the local context by extracting keywords from the transcript of the conversation, which is obtained from a real-time Automatic Speech Recognition (ASR) system and may contain noise. It then expands the queries with keywords that best represent the topic of the query, i.e.\ expansion keywords accompanied by weights indicating their topical similarity to the query. Finally, semantically related terms are added, using two options: either synonymous terms drawn from WordNet or similar words based on distributed representations in a low-dimensional word embedding space learned using word2vec. To evaluate the system, we introduce a dataset (named AREX for AMI Requests for Explanations) and an evaluation metric based on relevance judgments collected by crowdsourcing. We compare our query expansion approach with other methods, over queries from the AREX dataset, showing the superiority of our method when either manual or automatic transcripts of the AMI Meeting Corpus are used
Query Expansion with Locally-Trained Word Embeddings
Continuous space word embeddings have received a great deal of attention in
the natural language processing and machine learning communities for their
ability to model term similarity and other relationships. We study the use of
term relatedness in the context of query expansion for ad hoc information
retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when
trained globally, underperform corpus and query specific embeddings for
retrieval tasks. These results suggest that other tasks benefiting from global
embeddings may also benefit from local embeddings
- …