72 research outputs found
Managing tail latency in large scale information retrieval systems
As both the availability of internet access and the prominence of smart devices continue to increase, data is being generated at a rate faster than ever before. This massive increase in data production comes with many challenges, including efficiency concerns for the storage and retrieval of such large-scale data. However, users have grown to expect the sub-second response times that are common in most modern search engines, creating a problem - how can such large amounts of data continue to be served efficiently enough to satisfy end users? This dissertation investigates several issues regarding tail latency in large-scale information retrieval systems. Tail latency corresponds to the high percentile latency that is observed from a system - in the case of search, this latency typically corresponds to how long it takes for a query to be processed. In particular, keeping tail latency as low as possible translates to a good experience for all users, as tail latency is directly related to the worst-case latency and hence, the worst possible user experience. The key idea in targeting tail latency is to move from questions such as "what is the median latency of our search engine?" to questions which more accurately capture user experience such as "how many queries take more than 200ms to return answers?" or "what is the worst case latency that a user may be subject to, and how often might it occur?" While various strategies exist for efficiently processing queries over large textual corpora, prior research has focused almost entirely on improvements to the average processing time or cost of search systems. As a first contribution, we examine some state-of-the-art retrieval algorithms for two popular index organizations, and discuss the trade-offs between them, paying special attention to the notion of tail latency. This research uncovers a number of observations that are subsequently leveraged for improved search efficiency and effectiveness. We then propose and solve a new problem, which involves processing a number of related queries together, known as multi-queries, to yield higher quality search results. We experiment with a number of algorithmic approaches to efficiently process these multi-queries, and report on the cost, efficiency, and effectiveness trade-offs present with each. Ultimately, we find that some solutions yield a low tail latency, and are hence suitable for use in real-time search environments. Finally, we examine how predictive models can be used to improve the tail latency and end-to-end cost of a commonly used multi-stage retrieval architecture without impacting result effectiveness. By combining ideas from numerous areas of information retrieval, we propose a prediction framework which can be used for training and evaluating several efficiency/effectiveness trade-off parameters, resulting in improved trade-offs between cost, result quality, and tail latency
Efficient query processing for scalable web search
Search engines are exceptionally important tools for accessing information in today’s world. In satisfying the information needs of millions of users, the effectiveness (the quality of the search results) and the efficiency (the speed at which the results are returned to the users) of a search engine are two goals that form a natural trade-off, as techniques that improve the effectiveness of the search engine can also make it less efficient. Meanwhile, search engines continue to rapidly evolve, with larger indexes, more complex retrieval strategies and growing query volumes. Hence, there is a need for the development of efficient query processing infrastructures that make appropriate sacrifices in effectiveness in order to make gains in efficiency. This survey comprehensively reviews the foundations of search engines, from index layouts to basic term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing strategies, while also providing the latest trends in the literature in efficient query processing, including the coherent and systematic reviews of techniques such as dynamic pruning and impact-sorted posting lists as well as their variants and optimisations. Our explanations of query processing strategies, for instance the WAND and BMW dynamic pruning algorithms, are presented with illustrative figures showing how the processing state changes as the algorithms progress. Moreover, acknowledging the recent trends in applying a cascading infrastructure within search systems, this survey describes techniques for efficiently integrating effective learned models, such as those obtained from learning-to-rank techniques. The survey also covers the selective application of query processing techniques, often achieved by predicting the response times of the search engine (known as query efficiency prediction), and making per-query tradeoffs between efficiency and effectiveness to ensure that the required retrieval speed targets can be met. Finally, the survey concludes with a summary of open directions in efficient search infrastructures, namely the use of signatures, real-time, energy-efficient and modern hardware and software architectures
Selective Query Processing: a Risk-Sensitive Selection of System Configurations
In information retrieval systems, search parameters are optimized to ensure
high effectiveness based on a set of past searches and these optimized
parameters are then used as the system configuration for all subsequent
queries. A better approach, however, would be to adapt the parameters to fit
the query at hand. Selective query expansion is one such an approach, in which
the system decides automatically whether or not to expand the query, resulting
in two possible system configurations. This approach was extended recently to
include many other parameters, leading to many possible system configurations
where the system automatically selects the best configuration on a per-query
basis. To determine the ideal configurations to use on a per-query basis in
real-world systems we developed a method in which a restricted number of
possible configurations is pre-selected and then used in a meta-search engine
that decides the best search configuration on a per query basis. We define a
risk-sensitive approach for configuration pre-selection that considers the
risk-reward trade-off between the number of configurations kept, and system
effectiveness. For final configuration selection, the decision is based on
query feature similarities. We find that a relatively small number of
configurations (20) selected by our risk-sensitive model is sufficient to
increase effectiveness by about 15% according(P@10, nDCG@10) when compared to
traditional grid search using a single configuration and by about 20% when
compared to learning to rank documents. Our risk-sensitive approach works for
both diversity- and ad hoc-oriented searches. Moreover, the similarity-based
selection method outperforms the more sophisticated approaches. Thus, we
demonstrate the feasibility of developing per-query information retrieval
systems, which will guide future research in this direction.Comment: 30 pages, 5 figures, 8 tables; submitted to TOIS ACM journa
Evaluation with uncertainty
Experimental uncertainty arises as a consequence of: (1) bias (systematic error), and (2) variance in measurements. Popular evaluation techniques only account for the variance due to sampling of experimental units, and assume the other sources of uncertainty can be ignored. For example, only the uncertainty due to sampling of topics (queries) and sampling of training:test datasets is considered in standard information retrieval (IR) and classifier system evaluation respectively. However, incomplete relevance judgements, assessor disagreement, non-deterministic systems, and the measurement bias can also cause uncertainty in these experiments. In this thesis, the impact of other sources of uncertainty on evaluating IR and classification experiments are investigated. The uncertainty due to:(1) incomplete relevance judgements in IR test collections,(2) non-determinism in IR systems / classifiers, and (3) high variance of classifiers is analysed using case studies from distributed information retrieval and information security. The thesis illustrates the importance of reducing and accurately accounting for uncertainty when evaluating complex IR and classifier systems. Novel techniques to(1) reduce uncertainty due to test collection bias in IR evaluation and high classifier variance (overfitting) in detecting drive-by download attacks,(2) account for multidimensional variance due to sampling of IR systems instances from non-deterministic IR systems in addition to sampling of topics, and (3) account for repeated measurements due to non-deterministic classification algorithms are introduced
Temporal Information Models for Real-Time Microblog Search
Real-time search in Twitter and other social media services is often biased
towards the most recent results due to the “in the moment” nature of topic
trends and their ephemeral relevance to users and media in general. However,
“in the moment”, it is often difficult to look at all emerging topics and single-out
the important ones from the rest of the social media chatter. This thesis proposes
to leverage on external sources to estimate the duration and burstiness of live
Twitter topics. It extends preliminary research where itwas shown that temporal
re-ranking using external sources could indeed improve the accuracy of results.
To further explore this topic we pursued three significant novel approaches: (1)
multi-source information analysis that explores behavioral dynamics of users,
such as Wikipedia live edits and page view streams, to detect topic trends
and estimate the topic interest over time; (2) efficient methods for federated
query expansion towards the improvement of query meaning; and (3) exploiting
multiple sources towards the detection of temporal query intent. It differs from
past approaches in the sense that it will work over real-time queries, leveraging
on live user-generated content. This approach contrasts with previous methods
that require an offline preprocessing step
SparkIR: a Scalable Distributed Information Retrieval Engine over Spark
Search engines have to deal with a huge amount of data (e.g., billions of
documents in the case of the Web) and find scalable and efficient ways to produce
effective search results. In this thesis, we propose to use Spark framework, an in
memory distributed big data processing framework, and leverage its powerful
capabilities of handling large amount of data to build an efficient and scalable
experimental search engine over textual documents. The proposed system, SparkIR,
can serve as a research framework for conducting information retrieval (IR)
experiments. SparkIR supports two indexing schemes, document-based partitioning
and term-based partitioning, to adopt document-at-a-time (DAAT) and term-at-a-time
(TAAT) query evaluation methods. Moreover, it offers static and dynamic pruning to
improve the retrieval efficiency. For static pruning, it employs champion list and
tiering, while for dynamic pruning, it uses MaxScore top k retrieval. We evaluated the
performance of SparkIR using ClueWeb12-B13 collection that contains about 50M
English Web pages. Experiments over different subsets of the collection and
compared the Elasticsearch baseline show that SparkIR exhibits reasonable efficiency
and scalability performance overall for both indexing and retrieval. Implemented as
an open-source library over Spark, users of SparkIR can also benefit from other Spark
libraries (e.g., MLlib and GraphX), which, therefore, eliminates the need of usin
An MLP-based Algorithm for Efficient Contrastive Graph Recommendations
Graph-based recommender systems (GBRSs) have achieved promising performance by incorporating the user-item bipartite graph using the Graph Neural Network (GNN). Among GBRSs, the information from each user and item's multi-hop neighbours is effectively conveyed between nodes through neighbourhood aggregation and message passing. Although effective, existing neighbourhood information aggregation and passing functions are usually computationally expensive. Motivated by the emerging contrastive learning technique, we design a simple neighbourhood construction method in conjunction with the contrastive objective function to simulate the neighbourhood information processing of GNN. In addition, we propose a simple algorithm based on Multilayer Perceptron (MLP) for learning users and items' representations with extra non-linearity while lowering computational burden compared with multi-layers GNNs. Our extensive empirical experiments on three public datasets demonstrate that our proposed model, i.e. MLP-CGRec, can reduce the GPU memory consumption and training time by up to 24.0% and 33.1%, respectively, without significantly degenerating the recommendation accuracy in comparison with competitive baselines
- …