100,505 research outputs found
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Evidence retrieval is a critical stage of question answering (QA), necessary
not only to improve performance, but also to explain the decisions of the
corresponding QA method. We introduce a simple, fast, and unsupervised
iterative evidence retrieval method, which relies on three ideas: (a) an
unsupervised alignment approach to soft-align questions and answers with
justification sentences using only GloVe embeddings, (b) an iterative process
that reformulates queries focusing on terms that are not covered by existing
justifications, which (c) a stopping criterion that terminates retrieval when
the terms in the given question and candidate answers are covered by the
retrieved justifications. Despite its simplicity, our approach outperforms all
the previous methods (including supervised methods) on the evidence selection
task on two datasets: MultiRC and QASC. When these evidence sentences are fed
into a RoBERTa answer classification component, we achieve state-of-the-art QA
performance on these two datasets.Comment: Accepted at ACL 2020 as a long conference pape
ImageSieve: Exploratory search of museum archives with named entity-based faceted browsing
Over the last few years, faceted search emerged as an attractive alternative to the traditional "text box" search and has become one of the standard ways of interaction on many e-commerce sites. However, these applications of faceted search are limited to domains where the objects of interests have already been classified along several independent dimensions, such as price, year, or brand. While automatic approaches to generate faceted search interfaces were proposed, it is not yet clear to what extent the automatically-produced interfaces will be useful to real users, and whether their quality can match or surpass their manually-produced predecessors. The goal of this paper is to introduce an exploratory search interface called ImageSieve, which shares many features with traditional faceted browsing, but can function without the use of traditional faceted metadata. ImageSieve uses automatically extracted and classified named entities, which play important roles in many domains (such as news collections, image archives, etc.). We describe one specific application of ImageSieve for image search. Here, named entities extracted from the descriptions of the retrieved images are used to organize a faceted browsing interface, which then helps users to make sense of and further explore the retrieved images. The results of a user study of ImageSieve demonstrate that a faceted search system based on named entities can help users explore large collections and find relevant information more effectively
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Recommended from our members
AQUA: an ontology driven question answering system
This paper describes AQUA our question answering over the Web. AQUA was designed to work over heterogeneous sources. This means that AQUA is equipped to work as closed domain and in addition to open-domain question answering. As a first instance, AQUA tries to answer a question using a Knowledge base. If a query cannot be satisfied over a knowledge base/database. Then, AQUA tries to find an answer on web pages (i.e. it uses as corpus the internet as resource). Our system uses NLP (Natural Language Processing), First order logic and Information Extraction technologies. AQUA has been tested using an ontology which describes academic life. Keywords Ontologies, Information Extraction, Machine Learnin
PLuTO: MT for online patent translation
PLuTO â Patent Language Translation Online â is a partially EU-funded commercialization project which specializes in the automatic retrieval and translation of patent documents. At the core of the PLuTO framework is a machine translation (MT) engine through which web-based translation services are offered. The fully integrated PLuTO architecture includes a translation engine coupling MT with translation memories (TM), and a patent search and retrieval engine. In this paper, we first describe the motivating factors behind the provision of such a service. Following this, we give an overview of the PLuTO framework as a whole, with particular emphasis on the MT components, and provide a real world use case scenario in which PLuTO MT services are exploited
Sequence to Sequence Learning for Query Expansion
Using sequence to sequence algorithms for query expansion has not been
explored yet in Information Retrieval literature nor in Question-Answering's.
We tried to fill this gap in the literature with a custom Query Expansion
engine trained and tested on open datasets. Starting from open datasets, we
built a Query Expansion training set using sentence-embeddings-based Keyword
Extraction. We therefore assessed the ability of the Sequence to Sequence
neural networks to capture expanding relations in the words embeddings' space.Comment: 8 pages, 2 figures, AAAI-19 Student Abstract and Poster Progra
- âŠ