Search CORE

4,833 research outputs found

Query-Based Sampling using Only Snippets

Author: Hiemstra Djoerd
Tigelaar Almer S.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2009
Field of study

Query-based sampling is a popular approach to model the content of an uncooperative server. It works by sending queries to the server and downloading the returned documents in the search results in full. This sample of documents then represents the server’s content. We present an approach that uses the document snippets as samples instead of downloading entire documents. This yields more stable results at the same amount of bandwidth usage as the full document approach. Additionally, we show that using snippets does not necessarily incur more latency, but can actually save time

CiteSeerX

Radboud Repository

University of Twente Research Information

Overview of the TREC 2013 federated web search track

Author: Demeester Thomas
Hiemstra D
Nguyen D
Trieschnigg D
Publication venue
Publication date: 01/01/2013
Field of study

Ghent University Academic Bibliography

Searching and Stopping: An Analysis of Stopping Rules and Strategies

Author: Baskaya F.
Bates M.J.
Kraft D.H.
Nickles K.R.
Smucker M.D.
Wu W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Searching naturally involves stopping points, both at a query level (how far down the ranked list should I go?) and at a session level (how many queries should I issue?). Understanding when searchers stop has been of much interest to the community because it is fundamental to how we evaluate search behaviour and performance. Research has shown that searchers find it difficult to formalise stopping criteria, and typically resort to their intuition of what is "good enough". While various heuristics and stopping criteria have been proposed, little work has investigated how well they perform, and whether searchers actually conform to any of these rules. In this paper, we undertake the first large scale study of stopping rules, investigating how they influence overall session performance, and which rules best match actual stopping behaviour. Our work is focused on stopping at the query level in the context of ad-hoc topic retrieval, where searchers undertake search tasks within a fixed time period. We show that stopping strategies based upon the disgust or frustration point rules - both of which capture a searcher's tolerance to non-relevance - typically result in (i) the best overall performance, and (ii) provide the closest approximation to actual searcher behaviour, although a fixed depth approach also performs remarkably well. Findings from this study have implications regarding how we build measures, and how we conduct simulations of search behaviours

Crossref

University of Strathclyde Institutional Repository

Enlighten

Query-Based Sampling using Snippets

Author: Hiemstra D.
Tigelaar Almer S.
Publication venue: ACM
Publication date: 01/01/2010
Field of study

Query-based sampling is a commonly used approach to model the content of servers. Conventionally, queries are sent to a server and the documents in the search results returned are downloaded in full as representation of the server’s content. We present an approach that uses the document snippets in the search results as samples instead of downloading the entire documents. We show this yields equal or better modeling performance for the same bandwidth consumption depending on collection characteristics, like document length distribution and homogeneity. Query-based sampling using snippets is a useful approach for real-world systems, since it requires no extra operations beyond exchanging queries and search results

Radboud Repository

University of Twente Research Information

Can Automatic Abstracting Improve on Current Extracting Techniques in Aiding Users to Judge the Relevance of Pages in Search Engine Results?

Author: Liang SF
Publication venue
Publication date: 01/01/2004
Field of study

Current search engines use sentence extraction techniques to produce snippet result summaries, which users may find less than ideal for determining the relevance of pages. Unlike extracting, abstracting programs analyse the context of documents and rewrite them into informative summaries. Our project aims to produce abstracting summaries which are coherent and easy to read thereby lessening users’ time in judging the relevance of pages. However, automatic abstracting technique has its domain restriction. For solving this problem we propose to employ text classification techniques. We propose a new approach to initially classify whole web documents into sixteen top level ODP categories by using machine learning and a Bayesian classifier. We then manually create sixteen templates for each category. The summarisation techniques we use include a natural language processing techniques to weight words and analyse lexical chains to identify salient phrases and place them into relevant template slots to produce summaries

Southampton (e-Prints Soton)