4,870 research outputs found
Efficient Diversification of Web Search Results
In this paper we analyze the efficiency of various search results
diversification methods. While efficacy of diversification approaches has been
deeply investigated in the past, response time and scalability issues have been
rarely addressed. A unified framework for studying performance and feasibility
of result diversification solutions is thus proposed. First we define a new
methodology for detecting when, and how, query results need to be diversified.
To this purpose, we rely on the concept of "query refinement" to estimate the
probability of a query to be ambiguous. Then, relying on this novel ambiguity
detection method, we deploy and compare on a standard test set, three different
diversification methods: IASelect, xQuAD, and OptSelect. While the first two
are recent state-of-the-art proposals, the latter is an original algorithm
introduced in this paper. We evaluate both the efficiency and the effectiveness
of our approach against its competitors by using the standard TREC Web
diversification track testbed. Results shown that OptSelect is able to run two
orders of magnitude faster than the two other state-of-the-art approaches and
to obtain comparable figures in diversification effectiveness.Comment: VLDB201
Sparse spatial selection for novelty-based search result diversification
Abstract. Novelty-based diversification approaches aim to produce a diverse ranking by directly comparing the retrieved documents. However, since such approaches are typically greedy, they require O(n 2) documentdocument comparisons in order to diversify a ranking of n documents. In this work, we propose to model novelty-based diversification as a similarity search in a sparse metric space. In particular, we exploit the triangle inequality property of metric spaces in order to drastically reduce the number of required document-document comparisons. Thorough experiments using three TREC test collections show that our approach is at least as effective as existing novelty-based diversification approaches, while improving their efficiency by an order of magnitude.
Intent-aware search result diversification
Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved documents with respect to the different aspects. However, such aspects themselves may represent information needs with rather distinct intents (e.g., informational or navigational). Hence, a diverse ranking could benefit from applying intent-aware retrieval models when estimating the relevance of documents to different aspects. In this paper, we propose to diversify the results retrieved for a given query, by learning the appropriateness of different retrieval models for each of the aspects underlying this query. Thorough experiments within the evaluation framework provided by the diversity task of the TREC 2009 and 2010 Web tracks show that the proposed approach can significantly improve state-of-the-art diversification approaches
Measuring Online Social Bubbles
Social media have quickly become a prevalent channel to access information,
spread ideas, and influence opinions. However, it has been suggested that
social and algorithmic filtering may cause exposure to less diverse points of
view, and even foster polarization and misinformation. Here we explore and
validate this hypothesis quantitatively for the first time, at the collective
and individual levels, by mining three massive datasets of web traffic, search
logs, and Twitter posts. Our analysis shows that collectively, people access
information from a significantly narrower spectrum of sources through social
media and email, compared to search. The significance of this finding for
individual exposure is revealed by investigating the relationship between the
diversity of information sources experienced by users at the collective and
individual level. There is a strong correlation between collective and
individual diversity, supporting the notion that when we use social media we
find ourselves inside "social bubbles". Our results could lead to a deeper
understanding of how technology biases our exposure to new information
On the Additivity and Weak Baselines for Search Result Diversification Research
A recent study on the topic of additivity addresses the task of search result diversification and concludes that while weaker baselines are almost always significantly improved by the evaluated diversification methods, for stronger baselines, just the opposite happens, i.e., no significant improvement can be observed. Due to the importance of the issue in shaping future research directions and evaluation strategies in search results diversification, in this work, we first aim to reproduce the findings reported in the previous study, and then investigate its possible limitations. Our extensive experiments first reveal that under the same experimental setting with that previous study, we can reach similar results. Next, we hypothesize that for stronger baselines, tuning the parameters of some methods (i.e., the trade-off parameter between the relevance and diversity of the results in this particular scenario) should be done in a more fine-grained manner. With trade-off parameters that are specifically determined for each baseline run, we show that the percentage of significant improvements even over the strong baselines can be doubled. As a further issue, we discuss the possible impact of using the same strong baseline retrieval function for the diversity computations of the methods. Our takeaway message is that in the case of a strong baseline, it is more crucial to tune the parameters of the diversification methods to be evaluated; but once this is done, additivity is achievable
A Query Performance Analysis for Result Diversification
Which queries stand to gain or loose from diversifying their results? Some queries are more difficult than others for diversification. Across a number of conceptually different diversification methods, performance on such queries tends to deteriorate after applying these diversification methods, even though their initial performance in terms of relevance or diversity tends to be good
- ā¦