66,774 research outputs found
A Multi-Resolution Word Embedding for Document Retrieval from Large Unstructured Knowledge Bases
Deep language models learning a hierarchical representation proved to be a
powerful tool for natural language processing, text mining and information
retrieval. However, representations that perform well for retrieval must
capture semantic meaning at different levels of abstraction or context-scopes.
In this paper, we propose a new method to generate multi-resolution word
embeddings that represent documents at multiple resolutions in terms of
context-scopes. In order to investigate its performance,we use the Stanford
Question Answering Dataset (SQuAD) and the Question Answering by Search And
Reading (QUASAR) in an open-domain question-answering setting, where the first
task is to find documents useful for answering a given question. To this end,
we first compare the quality of various text-embedding methods for retrieval
performance and give an extensive empirical comparison with the performance of
various non-augmented base embeddings with and without multi-resolution
representation. We argue that multi-resolution word embeddings are consistently
superior to the original counterparts and deep residual neural models
specifically trained for retrieval purposes can yield further significant gains
when they are used for augmenting those embeddings
SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting
Learning knowledge graph (KG) embeddings is an emerging technique for a
variety of downstream tasks such as summarization, link prediction, information
retrieval, and question answering. However, most existing KG embedding models
neglect space and, therefore, do not perform well when applied to (geo)spatial
data and tasks. For those models that consider space, most of them primarily
rely on some notions of distance. These models suffer from higher computational
complexity during training while still losing information beyond the relative
distance between entities. In this work, we propose a location-aware KG
embedding model called SE-KGE. It directly encodes spatial information such as
point coordinates or bounding boxes of geographic entities into the KG
embedding space. The resulting model is capable of handling different types of
spatial reasoning. We also construct a geographic knowledge graph as well as a
set of geographic query-answer pairs called DBGeo to evaluate the performance
of SE-KGE in comparison to multiple baselines. Evaluation results show that
SE-KGE outperforms these baselines on the DBGeo dataset for geographic logic
query answering task. This demonstrates the effectiveness of our
spatially-explicit model and the importance of considering the scale of
different geographic entities. Finally, we introduce a novel downstream task
called spatial semantic lifting which links an arbitrary location in the study
area to entities in the KG via some relations. Evaluation on DBGeo shows that
our model outperforms the baseline by a substantial margin.Comment: Accepted to Transactions in GI
Open-Retrieval Conversational Question Answering
Conversational search is one of the ultimate goals of information retrieval.
Recent research approaches conversational search by simplified settings of
response ranking and conversational question answering, where an answer is
either selected from a given candidate set or extracted from a given passage.
These simplifications neglect the fundamental role of retrieval in
conversational search. To address this limitation, we introduce an
open-retrieval conversational question answering (ORConvQA) setting, where we
learn to retrieve evidence from a large collection before extracting answers,
as a further step towards building functional conversational search systems. We
create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an
end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader
that are all based on Transformers. Our extensive experiments on OR-QuAC
demonstrate that a learnable retriever is crucial for ORConvQA. We further show
that our system can make a substantial improvement when we enable history
modeling in all system components. Moreover, we show that the reranker
component contributes to the model performance by providing a regularization
effect. Finally, further in-depth analyses are performed to provide new
insights into ORConvQA.Comment: Accepted to SIGIR'2
- …