25,641 research outputs found
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Evidence retrieval is a critical stage of question answering (QA), necessary
not only to improve performance, but also to explain the decisions of the
corresponding QA method. We introduce a simple, fast, and unsupervised
iterative evidence retrieval method, which relies on three ideas: (a) an
unsupervised alignment approach to soft-align questions and answers with
justification sentences using only GloVe embeddings, (b) an iterative process
that reformulates queries focusing on terms that are not covered by existing
justifications, which (c) a stopping criterion that terminates retrieval when
the terms in the given question and candidate answers are covered by the
retrieved justifications. Despite its simplicity, our approach outperforms all
the previous methods (including supervised methods) on the evidence selection
task on two datasets: MultiRC and QASC. When these evidence sentences are fed
into a RoBERTa answer classification component, we achieve state-of-the-art QA
performance on these two datasets.Comment: Accepted at ACL 2020 as a long conference pape
Natural Language Processing for Information Retrieval and Knowledge Discovery
Natural Language Processing (NLP) is a powerful technology for the vital tasks of information retrieval (IR) and knowledge discovery (KD) which, in turn, feed the visualization systems of the present and future and enable knowledge workers to focus more of their time on the vital tasks of analysis and prediction.published or submitted for publicatio
Entity Query Feature Expansion Using Knowledge Base Links
Recent advances in automatic entity linking and knowledge base
construction have resulted in entity annotations for document and
query collections. For example, annotations of entities from large
general purpose knowledge bases, such as Freebase and the Google
Knowledge Graph. Understanding how to leverage these entity
annotations of text to improve ad hoc document retrieval is an open
research area. Query expansion is a commonly used technique to
improve retrieval effectiveness. Most previous query expansion
approaches focus on text, mainly using unigram concepts. In this
paper, we propose a new technique, called entity query feature
expansion (EQFE) which enriches the query with features from
entities and their links to knowledge bases, including structured
attributes and text. We experiment using both explicit query entity
annotations and latent entities. We evaluate our technique on TREC
text collections automatically annotated with knowledge base entity
links, including the Google Freebase Annotations (FACC1) data.
We find that entity-based feature expansion results in significant
improvements in retrieval effectiveness over state-of-the-art text
expansion approaches
Know2Look: Commonsense Knowledge for Visual Search
With the rise in popularity of social media, images accompanied by contextual text form a huge section of the web. However, search and retrieval of documents are still largely dependent on solely textual cues. Although visual cues have started to gain focus, the imperfection in object/scene detection do not lead to significantly improved results. We hypothesize that the use of background commonsense knowledge on query terms can significantly aid in retrieval of documents with associated images. To this end we deploy three different modalities - text, visual cues, and commonsense knowledge pertaining to the query - as a recipe for efficient search and retrieval
Extracting corpus specific knowledge bases from Wikipedia
Thesauri are useful knowledge structures for assisting information retrieval. Yet their production is labor-intensive, and few domains have comprehensive thesauri that cover domain-specific concepts and contemporary usage. One approach, which has been attempted without much success for decades, is to seek statistical natural language processing algorithms that work on free text. Instead, we propose to replace costly professional indexers with thousands of dedicated amateur volunteers--namely, those that are producing Wikipedia. This vast, open encyclopedia represents a rich tapestry of topics and semantics and a huge investment of human effort and judgment. We show how this can be directly exploited to provide WikiSauri: manually-defined yet inexpensive thesaurus structures that are specifically tailored to expose the topics, terminology and semantics of individual document collections. We also offer concrete evidence of the effectiveness of WikiSauri for assisting information retrieval
A Factoid Question Answering System for Vietnamese
In this paper, we describe the development of an end-to-end factoid question
answering system for the Vietnamese language. This system combines both
statistical models and ontology-based methods in a chain of processing modules
to provide high-quality mappings from natural language text to entities. We
present the challenges in the development of such an intelligent user interface
for an isolating language like Vietnamese and show that techniques developed
for inflectional languages cannot be applied "as is". Our question answering
system can answer a wide range of general knowledge questions with promising
accuracy on a test set.Comment: In the proceedings of the HQA'18 workshop, The Web Conference
Companion, Lyon, Franc
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
Word-Entity Duet Representations for Document Ranking
This paper presents a word-entity duet framework for utilizing knowledge
bases in ad-hoc retrieval. In this work, the query and documents are modeled by
word-based representations and entity-based representations. Ranking features
are generated by the interactions between the two representations,
incorporating information from the word space, the entity space, and the
cross-space connections through the knowledge graph. To handle the
uncertainties from the automatically constructed entity representations, an
attention-based ranking model AttR-Duet is developed. With back-propagation
from ranking labels, the model learns simultaneously how to demote noisy
entities and how to rank documents with the word-entity duet. Evaluation
results on TREC Web Track ad-hoc task demonstrate that all of the four-way
interactions in the duet are useful, the attention mechanism successfully
steers the model away from noisy entities, and together they significantly
outperform both word-based and entity-based learning to rank systems
- âŠ