6 research outputs found

    A New Probabilistic Model for Top-k Ranking Problem

    Get PDF
    ABSTRACT This paper is concerned with top-k ranking problem, which reflects the fact that people pay more attention to the top ranked objects in real ranking application like information retrieval. A popular approach to top-k ranking problem is based on probabilistic models, such as Luce model and Mallows model. However, whether the sequential generative process described in these models is a suitable way for top-k ranking remains a question. According to the riffled independence factorization proposed in recent literature, which is a natural structural assumption on top-k ranking, we propose a new generative process of top-k ranking data. Our approach decomposes distributions over the top-k ranking into two layers: the first layer describes the relative ordering between the top k objects and the rest n − k objects, and the second layer describes the full ordering on the top k objects. On this basis, we propose a new probabilistic model for top-k ranking problem, called hierarchical ordering model. Specifically, we use three different probabilistic models to describe different generative processes of the first layer, and Luce model to describe the sequential generative process of the second layer, thus we obtain three different specific hierarchical ordering models. We also conduct extensive experiments on benchmark datasets to show that our proposed models can outperform previous models significantly

    Acceleration of ListNet for ranking using reconfigurable architecture

    Get PDF
    Document ranking is used to order query results by relevance with ranking models. ListNet is a well-known ranking approach for constructing and training learning-to-rank models. Compared with traditional learning approaches, ListNet delivers better accuracy, but is computationally too expensive to learn models with large data sets due to the large number of permutations and documents involved in computing the gradients. Currently, the long training time limits the practicality of ListNet in ranking applications such as breaking news search and stock prediction, and this situation is getting worse with the increase in data-set size. In order to tackle the challenge of long training time, this thesis optimises the ListNet algorithm, and designs hardware accelerators for learning the ListNet algorithm using Field Programmable Gate Arrays (FPGAs), making the algorithm more practical for real-world application. The contributions of this thesis include: 1) A novel computation method of the ListNet algorithm for ranking. The proposed computation method exposes more fine-grained parallelism for FPGA implementation. 2) A weighted sampling method that takes into account the ranking positions, along with an effective quantisation method based on FPGA devices. The proposed design achieves a 4.42x improvement over GPU implementation speed, while still guaranteeing the accuracy. 3) A full reconfigurable architecture for the ListNet training using multiple bitstream kernels. The proposed method achieves a higher model accuracy than pure fixed point training, and a better throughput than pure floating point training. This thesis has resulted in the acceleration of the ListNet algorithm for ranking using FPGAs by applying the above techniques. Significant improvements in speed have been achieved in this work against CPU and GPU implementations.Open Acces

    An Investigation of Preference Judging Consistency

    Get PDF
    Preference judging has been proposed as an effective method to identify the most relevant documents for a given search query. In this thesis, we investigate the degree to which assessors using a preference judging system are able to consistently find the same top documents and how consistent they are in their own preferences. We also examine to what extent variability in assessor preferences affect the evaluation of information retrieval systems. We designed and conducted a user study where 40 participants were recruited to preference judge 30 topics taken from the 2021 TREC Health Misinformation track. The research study found that the number of judgments needed to find the top-10 preferred documents using preference judging is about twice the number of documents in that topic. It also suggests that relying on just one non-professional assessor to do preference judging is not sufficient for evaluating information retrieval systems. Additionally, the study showed that preference judging to find the top-10 documents does significantly change the rankings of runs as compared to the rankings reported in the TREC 2021 Health Misinformation track, with most changes happening among the lower-ranked runs rather than the top-ranked runs. Overall, this thesis provides insights into assessor behaviour and assessor agreement when using preference judgments for evaluating information retrieval systems
    corecore