Search CORE

6 research outputs found

Reducing long queries using query quality predictors

Author: Giridhar Kumaran
Vitor R. Carvalho
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Long queries frequently contain many extraneous terms that hinder retrieval of relevant documents. We present techniques to reduce long queries to more effective shorter ones that lack those extraneous terms. Our work is motivated by the observation that perfectly reducing long TREC description queries can lead to an average improvement of 30 % in mean average precision. Our approach involves transforming the reduction problem into a problem of learning to rank all sub-sets of the original query (sub-queries) based on their predicted quality, and select the top sub-query. We use various measures of query quality described in the literature as features to represent sub-queries, and train a classifier. Replacing the original long query with the top-ranked subquery chosen by the ranking classifier results in a statistically significant average improvement of 8 % on our test sets. Analysis of the results shows that query reduction is wellsuited for moderately-performing long queries, and a small set of query quality predictors are well-suited for the task of ranking sub-queries

CiteSeerX

Crossref

Managing tail latency in large scale information retrieval systems

Author: Mackenzie J
Publication venue: RMIT University
Publication date
Field of study

As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem - how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system - in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as &quot;what is the median latency of our search engine?&quot; to questions which more accurately capture user experience such as &quot;how many queries take more than 200ms to return answers?&quot; or &quot;what is the worst case latency that a user may be subject to, and how often might it occur?&quot; While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related queries together, known as multi-queries, to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Ultimately, we find that some solutions yield a low tail latency, and are hence suitable for use in real-time search environments. Finally, we examine how predictive models can be used to improve the tail latency and end-to-end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency

RMIT Research Repository

Automated Data Mapping Specifications via Schema Heuristics and User Interaction

Author: Hermann Stoeckle
John Grundy
John Hosking
Robert Amor
Sebastian Bossung
Publication venue
Publication date: 01/01/2004
Field of study

Data transformation problems are very common but they are challenging to implement for large, complex datasets. We describe a new approach for specifying data mapping transformations between XML schema using a combination of automated schema analysis agents and selective user interaction. A graphical tool visualises parts of the two schemas to be mapped and a variety of agents analyse all or parts of the schema, voting on the likelihood of matching subsets. The user can confirm or reject suggestions, or even allow schema matches to be automatically determined, incrementally building up a fully-mapped schema. An implementation of the mapping specification can then be generated from the various inter-schema matches. 1

CiteSeerX

Automated Data Mapping Specification via Schema Heuristics and User Interaction

Author: Hermann Stoeckle
John Grundy
John Hosking
Robert Amor
Sebastian Bossung
Publication venue: IEEE CS Press
Publication date
Field of study

Data transformation problems are very common and are challenging to implement for large and complex datasets. We describe a new approach for specifying data mapping transformations between XML schemas using a combination of automated schema analysis agents and selective user interaction. A graphical tool visualises parts of the two schemas to be mapped and a variety of agents analyse all or parts of the schema, voting on the likelihood of matching subsets. The user can confirm or reject suggestions, or even allow schema matches to be automatically determined, incrementally building up to a fully-mapped schema. An implementation of the mapping specification can then be generated. 1

CiteSeerX

Recommended from our members

Interactive reformulation of long queries

Author: Kumaran Giridhar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2008
Field of study

We present new ways of interacting with a user based on query analysis and reformulation. Our goal is to not only improve retrieval performance but also help the user understand the retrieval process and collection she is searching. We do this by providing users information reflecting the potential impact their decisions will have on the retrieval process. This way, users can make more informed choices from the options presented to them by the retrieval system. Unlike most previous work in user interaction where a one-procedure-fits-all strategy was pursued, user interaction must be invoked only when there is potential for improvement. This is important as tedious user interaction can have an unfavorable impact on user experience. We present techniques for selective user interaction and show their utility in the context of two interaction techniques we have developed. Our results show that user interaction can be avoided in a vast number of cases without much deterioration in performance. User interaction can be made more productive by providing users with an optimally-sized set of high quality options. We present efficient techniques to determine such a set. When faced with a decision to interact with a user given a particular query, it is beneficial to determine the best interaction technique suited for that query. We solve this problem by obtaining implicit feedback from the user. By utilizing all the interaction-related techniques described in this thesis, we show through simulations and user studies that users can obtain better performance with less effort

ScholarWorks@UMass Amherst