47,710 research outputs found
Improving Retrieval-Based Question Answering with Deep Inference Models
Question answering is one of the most important and difficult applications at
the border of information retrieval and natural language processing, especially
when we talk about complex science questions which require some form of
inference to determine the correct answer. In this paper, we present a two-step
method that combines information retrieval techniques optimized for question
answering with deep learning models for natural language inference in order to
tackle the multi-choice question answering in the science domain. For each
question-answer pair, we use standard retrieval-based models to find relevant
candidate contexts and decompose the main problem into two different
sub-problems. First, assign correctness scores for each candidate answer based
on the context using retrieval models from Lucene. Second, we use deep learning
architectures to compute if a candidate answer can be inferred from some
well-chosen context consisting of sentences retrieved from the knowledge base.
In the end, all these solvers are combined using a simple neural network to
predict the correct answer. This proposed two-step model outperforms the best
retrieval-based solver by over 3% in absolute accuracy.Comment: 8 pages, 2 figures, 8 tables, accepted at IJCNN 201
Concept-based Interactive Query Expansion Support Tool (CIQUEST)
This report describes a three-year project (2000-03) undertaken in the Information Studies
Department at The University of Sheffield and funded by Resource, The Council for
Museums, Archives and Libraries. The overall aim of the research was to provide user
support for query formulation and reformulation in searching large-scale textual resources
including those of the World Wide Web. More specifically the objectives were: to investigate
and evaluate methods for the automatic generation and organisation of concepts derived from
retrieved document sets, based on statistical methods for term weighting; and to conduct
user-based evaluations on the understanding, presentation and retrieval effectiveness of
concept structures in selecting candidate terms for interactive query expansion.
The TREC test collection formed the basis for the seven evaluative experiments conducted in
the course of the project. These formed four distinct phases in the project plan. In the first
phase, a series of experiments was conducted to investigate further techniques for concept
derivation and hierarchical organisation and structure. The second phase was concerned with
user-based validation of the concept structures. Results of phases 1 and 2 informed on the
design of the test system and the user interface was developed in phase 3. The final phase
entailed a user-based summative evaluation of the CiQuest system.
The main findings demonstrate that concept hierarchies can effectively be generated from
sets of retrieved documents and displayed to searchers in a meaningful way. The approach
provides the searcher with an overview of the contents of the retrieved documents, which in
turn facilitates the viewing of documents and selection of the most relevant ones. Concept
hierarchies are a good source of terms for query expansion and can improve precision. The
extraction of descriptive phrases as an alternative source of terms was also effective. With
respect to presentation, cascading menus were easy to browse for selecting terms and for
viewing documents. In conclusion the project dissemination programme and future work are
outlined
Graph-Embedding Empowered Entity Retrieval
In this research, we improve upon the current state of the art in entity
retrieval by re-ranking the result list using graph embeddings. The paper shows
that graph embeddings are useful for entity-oriented search tasks. We
demonstrate empirically that encoding information from the knowledge graph into
(graph) embeddings contributes to a higher increase in effectiveness of entity
retrieval results than using plain word embeddings. We analyze the impact of
the accuracy of the entity linker on the overall retrieval effectiveness. Our
analysis further deploys the cluster hypothesis to explain the observed
advantages of graph embeddings over the more widely used word embeddings, for
user tasks involving ranking entities
Using Generic Summarization to Improve Music Information Retrieval Tasks
In order to satisfy processing time constraints, many MIR tasks process only
a segment of the whole music signal. This practice may lead to decreasing
performance, since the most important information for the tasks may not be in
those processed segments. In this paper, we leverage generic summarization
algorithms, previously applied to text and speech summarization, to summarize
items in music datasets. These algorithms build summaries, that are both
concise and diverse, by selecting appropriate segments from the input signal
which makes them good candidates to summarize music as well. We evaluate the
summarization process on binary and multiclass music genre classification
tasks, by comparing the performance obtained using summarized datasets against
the performances obtained using continuous segments (which is the traditional
method used for addressing the previously mentioned time constraints) and full
songs of the same original dataset. We show that GRASSHOPPER, LexRank, LSA,
MMR, and a Support Sets-based Centrality model improve classification
performance when compared to selected 30-second baselines. We also show that
summarized datasets lead to a classification performance whose difference is
not statistically significant from using full songs. Furthermore, we make an
argument stating the advantages of sharing summarized datasets for future MIR
research.Comment: 24 pages, 10 tables; Submitted to IEEE/ACM Transactions on Audio,
Speech and Language Processin
Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering
Evidence retrieval is a critical stage of question answering (QA), necessary
not only to improve performance, but also to explain the decisions of the
corresponding QA method. We introduce a simple, fast, and unsupervised
iterative evidence retrieval method, which relies on three ideas: (a) an
unsupervised alignment approach to soft-align questions and answers with
justification sentences using only GloVe embeddings, (b) an iterative process
that reformulates queries focusing on terms that are not covered by existing
justifications, which (c) a stopping criterion that terminates retrieval when
the terms in the given question and candidate answers are covered by the
retrieved justifications. Despite its simplicity, our approach outperforms all
the previous methods (including supervised methods) on the evidence selection
task on two datasets: MultiRC and QASC. When these evidence sentences are fed
into a RoBERTa answer classification component, we achieve state-of-the-art QA
performance on these two datasets.Comment: Accepted at ACL 2020 as a long conference pape
Utilising semantic technologies for intelligent indexing and retrieval of digital images
The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion
An Efficient Approximate kNN Graph Method for Diffusion on Image Retrieval
The application of the diffusion in many computer vision and artificial
intelligence projects has been shown to give excellent improvements in
performance. One of the main bottlenecks of this technique is the quadratic
growth of the kNN graph size due to the high-quantity of new connections
between nodes in the graph, resulting in long computation times. Several
strategies have been proposed to address this, but none are effective and
efficient. Our novel technique, based on LSH projections, obtains the same
performance as the exact kNN graph after diffusion, but in less time
(approximately 18 times faster on a dataset of a hundred thousand images). The
proposed method was validated and compared with other state-of-the-art on
several public image datasets, including Oxford5k, Paris6k, and Oxford105k
- …