135,452 research outputs found
Intent Models for Contextualising and Diversifying Query Suggestions
The query suggestion or auto-completion mechanisms help users to type less
while interacting with a search engine. A basic approach that ranks suggestions
according to their frequency in the query logs is suboptimal. Firstly, many
candidate queries with the same prefix can be removed as redundant. Secondly,
the suggestions can also be personalised based on the user's context. These two
directions to improve the aforementioned mechanisms' quality can be in
opposition: while the latter aims to promote suggestions that address search
intents that a user is likely to have, the former aims to diversify the
suggestions to cover as many intents as possible. We introduce a
contextualisation framework that utilises a short-term context using the user's
behaviour within the current search session, such as the previous query, the
documents examined, and the candidate query suggestions that the user has
discarded. This short-term context is used to contextualise and diversify the
ranking of query suggestions, by modelling the user's information need as a
mixture of intent-specific user models. The evaluation is performed offline on
a set of approximately 1.0M test user sessions. Our results suggest that the
proposed approach significantly improves query suggestions compared to the
baseline approach.Comment: A short version of this paper was presented at CIKM 201
Evaluating the implicit feedback models for adaptive video retrieval
Interactive video retrieval systems are becoming popular. On the one hand, these systems try to reduce the effect of the semantic gap, an issue currently being addressed by the multimedia retrieval community. On the other hand, such systems enhance the quality of information seeking for the user by supporting query formulation and reformulation. Interactive systems are very popular in the textual retrieval domain. However, they are relatively unexplored in the case of multimedia retrieval. The main problem in the development of interactive retrieval systems is the evaluation cost.The traditional evaluation methodology, as used in the information retrieval domain, is not applicable. An alternative is to use a user-centred evaluation methodology. However, such schemes are expensive in terms of effort, cost and are not scalable. This problem gets exacerbated by the use of implicit indicators, which are useful and increasingly used in predicting user intentions. In this paper, we explore the effectiveness of a number of interfaces and feedback mechanisms and compare their relative performance using a simulated evaluation methodology. The results show the relatively better performance of a search interface with the combination of explicit and implicit features
Improved algorithms for topic distillation in a hyperlinked environment
Abstract This paper addresses the problem of topic distillation on the World Wide Web, namely, given a typical user query to find quality documents related to the query topic. Connectivity analysis has been shown to be useful in identifying high quality pages within a topic specific graph of hyperlinked documents. The essence of our approach is to augment a previous connectivity analysis based algorithm with content analysis. We identify three problems with the existing approach and devise algorithms to tackle them. The results of a user evaluation are reported that show an improvement of precision at 10 documents by at least 45 % over pure connectivity analysis.
ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs
Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users’ needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline
ISEEQ: Information Seeking Question Generation using Dynamic Meta-Information Retrieval and Knowledge Graphs
Conversational Information Seeking (CIS) is a relatively new research area within conversational AI that attempts to seek information from end-users in order to understand and satisfy users’ needs. If realized, such a system has far-reaching benefits in the real world; for example, a CIS system can assist clinicians in pre-screening or triaging patients in healthcare. A key open sub-problem in CIS that remains unaddressed in the literature is generating Information Seeking Questions (ISQs) based on a short initial query from the end user. To address this open problem, we propose Information SEEking Question generator (ISEEQ), a novel approach for generating ISQs from just a short user query, given a large text corpus relevant to the user query. Firstly, ISEEQ uses a knowledge graph to enrich the user query. Secondly, ISEEQ uses the knowledge-enriched query to retrieve relevant context passages to ask coherent ISQs adhering to a conceptual flow. Thirdly, ISEEQ introduces a new deep generative adversarial reinforcement learning-based approach for generating ISQs. We show that ISEEQ can generate high-quality ISQs to promote the development of CIS agents. ISEEQ significantly outperforms comparable baselines on five ISQ evaluation metrics across four datasets having user queries from diverse domains. Further, we argue that ISEEQ is transferable across domains for generating ISQs, as it shows the acceptable performance when trained and tested on different pairs of domains. The qualitative human evaluation confirms ISEEQ-generated ISQs are comparable in quality to human-generated questions and outperform the best comparable baseline
Measures to Evaluate the Superiority of a Search Engine
Main objective of a search engine is to return relevant results according to user query in less time. Evaluation metrics are used to measure the superiority of a search engine in terms of quality. This is a review paper presenting a summary of different metrics used for evaluation of a search engine in terms of effectiveness, efficiency and relevancy
Learning to Attend, Copy, and Generate for Session-Based Query Suggestion
Users try to articulate their complex information needs during search
sessions by reformulating their queries. To make this process more effective,
search engines provide related queries to help users in specifying the
information need in their search process. In this paper, we propose a
customized sequence-to-sequence model for session-based query suggestion. In
our model, we employ a query-aware attention mechanism to capture the structure
of the session context. is enables us to control the scope of the session from
which we infer the suggested next query, which helps not only handle the noisy
data but also automatically detect session boundaries. Furthermore, we observe
that, based on the user query reformulation behavior, within a single session a
large portion of query terms is retained from the previously submitted queries
and consists of mostly infrequent or unseen terms that are usually not included
in the vocabulary. We therefore empower the decoder of our model to access the
source words from the session context during decoding by incorporating a copy
mechanism. Moreover, we propose evaluation metrics to assess the quality of the
generative models for query suggestion. We conduct an extensive set of
experiments and analysis. e results suggest that our model outperforms the
baselines both in terms of the generating queries and scoring candidate queries
for the task of query suggestion.Comment: Accepted to be published at The 26th ACM International Conference on
Information and Knowledge Management (CIKM2017
OCR Quality Affects Perceived Usefulness of Historical Newspaper Clippings. A User Study
Publisher Copyright: © 2022 Copyright for this paper by its authors.Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so far been studied in data-oriented scenarios regarding the effectiveness of retrieval results. Such studies have either focused on the effects of artificially degraded OCR quality (see, e.g., [1-2]) or utilized test collections containing texts based on authentic low quality OCR data (see, e.g., [3]). In this paper the effects of OCR quality are studied in a user-oriented information retrieval setting. Thirty-two users evaluated subjectively query results of six topics each (out of 30 topics) based on pre-formulated queries using a simulated work task setting. To the best of our knowledge our simulated work task experiment is the first one showing empirically that users' subjective relevance assessments of retrieved documents are affected by a change in the quality of optically read text. Users of historical newspaper collections have so far commented effects of OCR'ed data quality mainly in impressionistic ways, and controlled user environments for studying effects of OCR quality on users' relevance assessments of the retrieval results have so far been missing. To remedy this The National Library of Finland (NLF) set up an experimental query environment for the contents of one Finnish historical newspaper, Uusi Suometar 1869-1918, to be able to compare users' evaluation of search results of two different OCR qualities for digitized newspaper articles. The query interface was able to present the same underlying document for the user based on two alternatives: either based on the lower OCR quality, or based on the higher OCR quality, and the choice was randomized. The users did not know about quality differences in the article texts they evaluated. The main result of the study is that improved optical character recognition quality affects perceived usefulness of historical newspaper articles significantly. The mean average evaluation score for the improved OCR results was 7.94% higher than the mean average evaluation score of the old OCR results.Peer reviewe
Exploiting the potential of large databases of electronic health records for research using rapid search algorithms and an intuitive query interface.
Objective: UK primary care databases, which contain diagnostic, demographic and prescribing information for millions of patients geographically representative of the UK, represent a significant resource for health services and clinical research. They can be used to identify patients with a specified disease or condition (phenotyping) and to investigate patterns of diagnosis and symptoms. Currently, extracting such information manually is time-consuming and requires considerable expertise. In order to exploit more fully the potential of these large and complex databases, our interdisciplinary team developed generic methods allowing access to different types of user.
Materials and methods: Using the Clinical Practice Research Datalink database, we have developed an online user-focused system (TrialViz), which enables users interactively to select suitable medical general practices based on two criteria: suitability of the patient base for the intended study (phenotyping) and measures of data quality.
Results: An end-to-end system, underpinned by an innovative search algorithm, allows the user to extract information in near real-time via an intuitive query interface and to explore this information using interactive visualization tools. A usability evaluation of this system produced positive results.
Discussion: We present the challenges and results in the development of TrialViz and our plans for its extension for wider applications of clinical research.
Conclusions: Our fast search algorithms and simple query algorithms represent a significant advance for users of clinical research databases
- …