1,557 research outputs found

    Upper Bound Approximations for BlockMaxWand

    Get PDF
    BlockMaxWand is a recent advance on the Wand dynamic pruning technique, which allows efficient retrieval without any effectiveness degradation to rank K. However, while BMW uses docid-sorted indices, it relies on recording the upper bound of the term weighting model scores for each block of postings in the inverted index. Such a requirement can be disadvantageous in situations such as when an index must be updated. In this work, we examine the appropriateness of upper-bound approximation – which have previously been shown suitable for Wand– in providing efficient retrieval for BMW. Experiments on the ClueWeb12 category B13 corpus using 5000 queries from a real search engine’s query log demonstrate that BMW still provides benefits w.r.t. Wand when approximate upper bounds are used, and that, if approximations on upper bounds are tight, BMW with approximate upper bounds can provide efficiency gains w.r.t.Wand with exact upper bounds, in particular for queries of short to medium length

    Efficient & Effective Selective Query Rewriting with Efficiency Predictions

    Get PDF
    To enhance effectiveness, a user's query can be rewritten internally by the search engine in many ways, for example by applying proximity, or by expanding the query with related terms. However, approaches that benefit effectiveness often have a negative impact on efficiency, which has impacts upon the user satisfaction, if the query is excessively slow. In this paper, we propose a novel framework for using the predicted execution time of various query rewritings to select between alternatives on a per-query basis, in a manner that ensures both effectiveness and efficiency. In particular, we propose the prediction of the execution time of ephemeral (e.g., proximity) posting lists generated from uni-gram inverted index posting lists, which are used in establishing the permissible query rewriting alternatives that may execute in the allowed time. Experiments examining both the effectiveness and efficiency of the proposed approach demonstrate that a 49% decrease in mean response time (and 62% decrease in 95th-percentile response time) can be attained without significantly hindering the effectiveness of the search engine

    Queuing theory-based latency/power tradeoff models for replicated search engines

    Get PDF
    Large-scale search engines are built upon huge infrastructures involving thousands of computers in order to achieve fast response times. In contrast, the energy consumed (and hence the financial cost) is also high, leading to environmental damage. This paper proposes new approaches to increase energy and financial savings in large-scale search engines, while maintaining good query response times. We aim to improve current state-of-the-art models used for balancing power and latency, by integrating new advanced features. On one hand, we propose to improve the power savings by completely powering down the query servers that are not necessary when the load of the system is low. Besides, we consider energy rates into the model formulation. On the other hand, we focus on how to accurately estimate the latency of the whole system by means of Queueing Theory. Experiments using actual query logs attest the high energy (and financial) savings regarding current baselines. To the best of our knowledge, this is the first paper in successfully applying stationary Queueing Theory models to estimate the latency in a large-scale search engine

    Efficient query processing for scalable web search

    Get PDF
    Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures

    Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval

    Full text link
    Pseudo-relevance feedback mechanisms, from Rocchio to the relevance models, have shown the usefulness of expanding and reweighting the users' initial queries using information occurring in an initial set of retrieved documents, known as the pseudo-relevant set. Recently, dense retrieval -- through the use of neural contextual language models such as BERT for analysing the documents' and queries' contents and computing their relevance scores -- has shown a promising performance on several information retrieval tasks still relying on the traditional inverted index for identifying documents relevant to a query. Two different dense retrieval families have emerged: the use of single embedded representations for each passage and query (e.g. using BERT's [CLS] token), or via multiple representations (e.g. using an embedding for each token of the query and document). In this work, we conduct the first study into the potential for multiple representation dense retrieval to be enhanced using pseudo-relevance feedback. In particular, based on the pseudo-relevant set of documents identified using a first-pass dense retrieval, we extract representative feedback embeddings (using KMeans clustering) -- while ensuring that these embeddings discriminate among passages (based on IDF) -- which are then added to the query representation. These additional feedback embeddings are shown to both enhance the effectiveness of a reranking as well as an additional dense retrieval operation. Indeed, experiments on the MSMARCO passage ranking dataset show that MAP can be improved by upto 26% on the TREC 2019 query set and 10% on the TREC 2020 query set by the application of our proposed ColBERT-PRF method on a ColBERT dense retrieval approach.Comment: 10 page

    Employer and employment agency attitudes towards employing individuals with mental health needs

    Get PDF
    Background: The positive benefits of paid employment for individuals with mental health needs are well known yet many still remain unemployed (Perkins & Rinaldi, (2002). Unemployment rates among patients with long-term mental health problems: A decade of rising unemployment. Psychiatric Bulletin, 26(8), 295–298.).\ud \ud Aims: Attitudes of employers and employment agencies that may provide short-term contracts to individuals with mental health needs are important to understand if these individuals are to be given access to paid employment.\ud \ud Methods: A mixed methods approach was used to investigate this phenomenon comprising of interviews and a follow-up survey. Interviews were conducted with 10 employment agencies and 10 employers. The results of these interviews then informed a follow-up survey of 200 businesses in Gloucestershire.\ud \ud Results: The findings demonstrated that employment agencies would consider putting forward individuals with previous mental health needs to employers. However, employers had a high level of concern around employing these individuals. Employers reported issues of trust, needing supervision, inability to use initiative and inability to deal with the public for individuals with either existing or previous mental health needs.\ud \ud Conclusions: The findings of this research suggest a need for employers to have more accurate information regarding hiring individuals with mental health needs

    Erosion protection benefits of stabilized SnF2 dentifrice versus an arginine–sodium monofluorophosphate dentifrice:results from in vitro and in situ clinical studies

    Get PDF
    OBJECTIVES: The aim of these investigations was to assess the ability of two fluoride dentifrices to protect against the initiation and progression of dental erosion using a predictive in vitro erosion cycling model and a human in situ erosion prevention clinical trial for verification of effectiveness. MATERIALS AND METHODS: A stabilized stannous fluoride (SnF(2)) dentifrice (0.454 % SnF(2) + 0.077 % sodium fluoride [NaF]; total F = 1450 ppm F) [dentifrice A] and a sodium monofluorophosphate [SMFP]/arginine dentifrice (1.1 % SMFP + 1.5 % arginine; total F = 1450 ppm F) [dentifrice B] were tested in a 5-day in vitro erosion cycling model and a 10-day randomized, controlled, double-blind, two-treatment, four-period crossover in situ clinical trial. In each study, human enamel specimens were exposed to repetitive product treatments using a standardized dilution of test products followed by erosive acid challenges in a systematic fashion. RESULTS: Both studies demonstrated statistically significant differences between the two products, with dentifrice A providing significantly better enamel protection in each study. In vitro, dentifrice A provided a 75.8 % benefit over dentifrice B (p < 0.05, ANOVA), while after 10 days in the in situ model, dentifrice A provided 93.9 % greater protection versus dentifrice B (p < 0.0001, general linear mixed model). CONCLUSION: These results support the superiority of stabilized SnF(2) dentifrices for protecting human teeth against the initiation and progression of dental erosion. CLINICAL RELEVANCE: Stabilized SnF(2) dentifrices may provide more significant benefits to consumers than conventional fluoride dentifrices

    AddressingHistory—Crowdsourcing a Nation's Past

    Get PDF
    The AddressingHistory project was funded as part of the Developing Community Content strand of the JISC Digitisation and e-Content Programme and led by EDINA National Data Centre2 at the University of Edinburgh,This paper charts the development and delivery of a Web 2.0 informed community engagement tool and application programming interface (API) developed at EDINA in partnership with the National Library of Scotland, as part of the JISC Digitisation and e-Content Programme. The AddressingHistory Web tool enables members of the community, both within and beyond academia (particularly local history groups and genealogists), to enhance and combine data from digitized historical Scottish Post Office Directories with contemporaneous large-scale historical maps. The paper discusses the background of post office directories and the corresponding georeferenced old maps for Scotland, the technical platforms deployed including sustainable software components, and Web applications and services. It also examines issues relating to data parsing, user generated content (UGC) created by the community including georeferencing, and the use of social media amplification for community engagement and future directions. To conclude, the paper argues that to be successful online, crowdsourcing tools such as the one developed for this project require a critical mass of content to fully engage the user community and that such success will ultimately be measured by continual and extended use within the wider community
    • …
    corecore