56,863 research outputs found

    Rank aggregation for non-stationary data streams

    Get PDF
    The problem of learning over non-stationary ranking streams arises naturally, particularly in recommender systems. The rankings represent the preferences of a population, and the non-stationarity means that the distribution of preferences changes over time. We propose an algorithm that learns the current distribution of ranking in an online manner. The bottleneck of this process is a rank aggregation problem. We propose a generalization of the Borda algorithm for non-stationary ranking streams. As a main result, we bound the minimum number of samples required to output the ground truth with high probability. Besides, we show how the optimal parameters are set. Then, we generalize the whole family of weighted voting rules (the family to which Borda belongs) to situations in which some rankings are more reliable than others. We show that, under mild assumptions, this generalization can solve the problem of rank aggregation over non-stationary data streams

    Fair Rank Aggregation

    Full text link
    Ranking algorithms find extensive usage in diverse areas such as web search, employment, college admission, voting, etc. The related rank aggregation problem deals with combining multiple rankings into a single aggregate ranking. However, algorithms for both these problems might be biased against some individuals or groups due to implicit prejudice or marginalization in the historical data. We study ranking and rank aggregation problems from a fairness or diversity perspective, where the candidates (to be ranked) may belong to different groups and each group should have a fair representation in the final ranking. We allow the designer to set the parameters that define fair representation. These parameters specify the allowed range of the number of candidates from a particular group in the top-kk positions of the ranking. Given any ranking, we provide a fast and exact algorithm for finding the closest fair ranking for the Kendall tau metric under block-fairness. We also provide an exact algorithm for finding the closest fair ranking for the Ulam metric under strict-fairness, when there are only O(1)O(1) number of groups. Our algorithms are simple, fast, and might be extendable to other relevant metrics. We also give a novel meta-algorithm for the general rank aggregation problem under the fairness framework. Surprisingly, this meta-algorithm works for any generalized mean objective (including center and median problems) and any fairness criteria. As a byproduct, we obtain 3-approximation algorithms for both center and median problems, under both Kendall tau and Ulam metrics. Furthermore, using sophisticated techniques we obtain a (3ε)(3-\varepsilon)-approximation algorithm, for a constant ε>0\varepsilon>0, for the Ulam metric under strong fairness.Comment: A preliminary version of this paper appeared in NeurIPS 202

    Preference relations based unsupervised rank aggregation for metasearch

    Get PDF
    Rank aggregation mechanisms have been used in solving problems from various domains such as bioinformatics, natural language processing, information retrieval, etc. Metasearch is one such application where a user gives a query to the metasearch engine, and the metasearch engine forwards the query to multiple individual search engines. Results or rankings returned by these individual search engines are combined using rank aggregation algorithms to produce the final result to be displayed to the user. We identify few aspects that should be kept in mind for designing any rank aggregation algorithms for metasearch. For example, generally equal importance is given to the input rankings while performing the aggregation. However, depending on the indexed set of web pages, features considered for ranking, ranking functions used etc. by the individual search engines, the individual rankings may be of different qualities. So, the aggregation algorithm should give more weight to the better rankings while giving less weight to others. Also, since the aggregation is performed when the user is waiting for response, the operations performed in the algorithm need to be light weight. Moreover, getting supervised data for rank aggregation problem is often difficult. In this paper, we present an unsupervised rank aggregation algorithm that is suitable for metasearch and addresses the aspects mentioned above. We also perform detailed experimental evaluation of the proposed algorithm on four different benchmark datasets having ground truth information. Apart from the unsupervised Kendall-Tau distance measure, several supervised evaluation measures are used for performance comparison. Experimental results demonstrate the efficacy of the proposed algorithm over baseline methods in terms of supervised evaluation metrics. Through these experiments we also show that Kendall-Tau distance metric may not be suitable for evaluating rank aggregation algorithms for metasearch

    Rank Aggregation for Non-stationary Data Streams

    Get PDF
    The problem of learning over non-stationary ranking streams arises naturally, particularly in recommender systems. The rankings represent the preferences of a population, and the non-stationarity means that the distribution of preferences changes over time. We propose an algorithm that learns the current distribution of ranking in an online manner. The bottleneck of this process is a rank aggregation problem. We propose a generalization of the Borda algorithm for non-stationary ranking streams. As a main result, we bound the minimum number of samples required to output the ground truth with high probability. Besides, we show how the optimal parameters are set. Then, we generalize the whole family of weighted voting rules (the family to which Borda belongs) to situations in which some rankings are more reliable than others. We show that, under mild assumptions, this generalization can solve the problem of rank aggregation over non-stationary data streams.This work is partially funded by the Industrial Chair “Data science & Artificial Intelligence for Digitalized Industry & Services” from Telecom Paris (France), the Basque Government through the BERC 2018–2021 and the Elkartek program (KK-2018/00096, KK-2020/00049), and by the Spanish Government excellence accreditation Severo Ochoa SEV-2013-0323 (MICIU) and the project TIN2017-82626-R (MINECO). J. Del Ser also acknowledges funding support from the Basque Government (Consolidated Research Gr. MATHMODE, IT1294-19)
    corecore