5,019 research outputs found
On the Feasibility of Automated Detection of Allusive Text Reuse
The detection of allusive text reuse is particularly challenging due to the
sparse evidence on which allusive references rely---commonly based on none or
very few shared words. Arguably, lexical semantics can be resorted to since
uncovering semantic relations between words has the potential to increase the
support underlying the allusion and alleviate the lexical sparsity. A further
obstacle is the lack of evaluation benchmark corpora, largely due to the highly
interpretative character of the annotation process. In the present paper, we
aim to elucidate the feasibility of automated allusion detection. We approach
the matter from an Information Retrieval perspective in which referencing texts
act as queries and referenced texts as relevant documents to be retrieved, and
estimate the difficulty of benchmark corpus compilation by a novel
inter-annotator agreement study on query segmentation. Furthermore, we
investigate to what extent the integration of lexical semantic information
derived from distributional models and ontologies can aid retrieving cases of
allusive reuse. The results show that (i) despite low agreement scores, using
manual queries considerably improves retrieval performance with respect to a
windowing approach, and that (ii) retrieval performance can be moderately
boosted with distributional semantics
Exploring a Multidimensional Representation of Documents and Queries (extended version)
In Information Retrieval (IR), whether implicitly or explicitly, queries and
documents are often represented as vectors. However, it may be more beneficial
to consider documents and/or queries as multidimensional objects. Our belief is
this would allow building "truly" interactive IR systems, i.e., where
interaction is fully incorporated in the IR framework.
The probabilistic formalism of quantum physics represents events and
densities as multidimensional objects. This paper presents our first step
towards building an interactive IR framework upon this formalism, by stating
how the first interaction of the retrieval process, when the user types a
query, can be formalised. Our framework depends on a number of parameters
affecting the final document ranking. In this paper we experimentally
investigate the effect of these parameters, showing that the proposed
representation of documents and queries as multidimensional objects can compete
with standard approaches, with the additional prospect to be applied to
interactive retrieval
ANSWERING TOPICAL INFORMATION NEEDS USING NEURAL ENTITY-ORIENTED INFORMATION RETRIEVAL AND EXTRACTION
In the modern world, search engines are an integral part of human lives. The field of Information Retrieval (IR) is concerned with finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need (query) from within large collections (usually stored on computers). The search engine then displays a ranked list of results relevant to our query. Traditional document retrieval algorithms match a query to a document using the overlap of words in both. However, the last decade has seen the focus shifting to leveraging the rich semantic information available in the form of entities. Entities are uniquely identifiable objects or things such as places, events, diseases, etc. that exist in the real or fictional world. Entity-oriented search systems leverage the semantic information associated with entities (e.g., names, types, etc.) to better match documents to queries. Web search engines would provide better search results if they understand the meaning of a query.
This dissertation advances the state-of-the-art in IR by developing novel algorithmsthat understand text (query, document, question, sentence, etc.) at the semantic level. To this end, this dissertation aims to understand the fine-grained meaning of entities from the context in which the entities have been mentioned, for example, “oysters” in the context of food versus ecosystems. Further, we aim to automatically learn (vector) representations of entities that incorporate this fine-grained knowledge and knowledge about the query. This work refines the automatic understanding of text passages using deep learning, a modern artificial intelligence paradigm.
This dissertation utilized the semantic information extracted from entities to retrieve materials (text and entities) relevant to a query. The interplay between text and entities in the text is studied by addressing three related prediction problems: (1) Identify entities that are relevant for the query, (2) Understand an entity’s meaning in the context of the query, and (3) Identify text passages that elaborate the connection between the query and an entity.
The research presented in this dissertation may be integrated into a larger system de-signed for answering complex topical queries such as dark chocolate health benefits which require the search engine to automatically understand the connections between the query and the relevant material, thus transforming the search engine into an answering engine
Recommended from our members
Neural Approaches to Feedback in Information Retrieval
Relevance feedback on search results indicates users\u27 search intent and preferences. Extensive studies have shown that incorporating relevance feedback (RF) on the top k (usually 10) ranked results significantly improves the performance of re-ranking. However, most existing research on user feedback focuses on words-based retrieval models. Recently, neural retrieval models have shown their efficacy in capturing relevance matching in retrieval but little research has been conducted on neural approaches to feedback. This leads us to study different aspects of feedback with neural approaches in the dissertation.
RF techniques are seldom used in real search scenarios since they can require significant manual efforts to obtain explicit judgments for search results. However, with mobile or voice-based intelligent assistants being more popular nowadays, user feedback of result quality could be collected potentially during their interactions with the assistants. We study both positive and negative RF to refine the re-ranking performance. Positive feedback aims to find more relevant results given some known relevant results while negative feedback targets identifying the first relevant result. In most cases, it is more beneficial to find the first relevant result compared with finding additional relevant results. However, negative feedback is much more challenging than positive feedback since relevant results are usually similar while non-relevant results could vary considerably.
We focus on the tasks of text retrieval and product search to study the different aspects of incorporating feedback for ranking refinement with neural approaches. Our contributions are: (1) we show that iterative relevance feedback (IRF) is more effective than top-k RF on answer passages and we further improve IRF with neural approaches; (2) we propose an effective RF technique based on neural models for product search; (3) we study how to refine re-ranking with negative feedback for conversational product search; (4) we leverage negative feedback in user responses to ask clarifying questions in open-domain conversational search. Our research improves retrieval performance by incorporating feedback in interactive retrieval and approaches multi-turn conversational information-seeking tasks with a focus on positive and negative feedback
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
Using Learning to Rank Approach to Promoting Diversity for Biomedical Information Retrieval with Wikipedia
In most of the traditional information retrieval (IR) models, the independent
relevance assumption is taken, which assumes the relevance of a document is
independent of other documents. However, the pitfall of this is the high redundancy
and low diversity of retrieval result. This has been seen in many scenarios, especially
in biomedical IR, where the information need of one query may refer to different
aspects. Promoting diversity in IR takes the relationship between documents into
account. Unlike previous studies, we tackle this problem in the learning to rank
perspective. The main challenges are how to find salient features for biomedical data
and how to integrate dynamic features into the ranking model. To address these
challenges, Wikipedia is used to detect topics of documents for generating diversity
biased features. A combined model is proposed and studied to learn a diversified
ranking result. Experiment results show the proposed method outperforms baseline
models
- …