152,044 research outputs found
Recommended from our members
Combining interaction and content for feedback-based ranking
The paper is concerned with the design and the evaluation of the combination of user interaction and informative content features for implicit and pseudo feedback-based document re-ranking. The features are observed during the visit of the top-ranked documents returned in response to a query. Experiments on a TREC Web test collection have been carried out and the experimental results are illustrated. We report that the effectiveness of the combination of user interaction for implicit feedback depends on whether document re-ranking is on a single-user or a user-group basis. Moreover, the adoption of document re-ranking on a user-group basis can improve pseudo-relevance feedback by providing more effective document for expanding queries
A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine
We propose a new model to interpret the clickthrough logs of a web search engine. This model is based on explicit assumptions on the user behavior. In particular, we draw conclusions on a document relevance by observing the user behavior after he examined the document and not based on whether a user clicks or not a document url. This results in a model based on intrinsic relevance, as opposed to perceived relevance. We use the model to predict document relevance and then use this as feature for a “Learning to Rank ” machine learning algorithm. Comparing the ranking functions obtained by training the algorithm with and without the new feature we observe surprisingly good results. This is particularly notable given that the baseline we use is the heavily optimized ranking function of a leading commercial search engine. A deeper analysis shows that the new feature is particularly helpful for non navigational queries and queries with a large abandonment rate or a large average number of queries per session. This is important because these types of query is considered to be the most difficult to solve
Sensitive and Scalable Online Evaluation with Theoretical Guarantees
Multileaved comparison methods generalize interleaved comparison methods to
provide a scalable approach for comparing ranking systems based on regular user
interactions. Such methods enable the increasingly rapid research and
development of search engines. However, existing multileaved comparison methods
that provide reliable outcomes do so by degrading the user experience during
evaluation. Conversely, current multileaved comparison methods that maintain
the user experience cannot guarantee correctness. Our contribution is two-fold.
First, we propose a theoretical framework for systematically comparing
multileaved comparison methods using the notions of considerateness, which
concerns maintaining the user experience, and fidelity, which concerns reliable
correct outcomes. Second, we introduce a novel multileaved comparison method,
Pairwise Preference Multileaving (PPM), that performs comparisons based on
document-pair preferences, and prove that it is considerate and has fidelity.
We show empirically that, compared to previous multileaved comparison methods,
PPM is more sensitive to user preferences and scalable with the number of
rankers being compared.Comment: CIKM 2017, Proceedings of the 2017 ACM on Conference on Information
and Knowledge Managemen
SemRank: ranking refinement strategy by using the semantic intensity
AbstractThe ubiquity of the multimedia has raised a need for the system that can store, manage, structured the multimedia data in such a way that it can be retrieved intelligently. One of the current issues in media management or data mining research is ranking of retrieved documents. Ranking is one of the provocative problems for information retrieval systems. Given a user query comes up with the millions of relevant results but if the ranking function cannot rank it according to the relevancy than all results are just obsolete. However, the current ranking techniques are in the level of keyword matching. The ranking among the results is usually done by using the term frequency. This paper is concerned with ranking the document relying merely on the rich semantic inside the document instead of the contents. Our proposed ranking refinement strategy known as SemRank, rank the document based on the semantic intensity. Our approach has been applied on the open benchmark LabelMe dataset and compared against one of the well known ranking model i.e. Vector Space Model (VSM). The experimental results depicts that our approach has achieved significant improvement in retrieval performance over the state of the art ranking methods
Multi-Task Learning for Email Search Ranking with Auxiliary Query Clustering
User information needs vary significantly across different tasks, and
therefore their queries will also differ considerably in their expressiveness
and semantics. Many studies have been proposed to model such query diversity by
obtaining query types and building query-dependent ranking models. These
studies typically require either a labeled query dataset or clicks from
multiple users aggregated over the same document. These techniques, however,
are not applicable when manual query labeling is not viable, and aggregated
clicks are unavailable due to the private nature of the document collection,
e.g., in email search scenarios. In this paper, we study how to obtain query
type in an unsupervised fashion and how to incorporate this information into
query-dependent ranking models. We first develop a hierarchical clustering
algorithm based on truncated SVD and varimax rotation to obtain coarse-to-fine
query types. Then, we study three query-dependent ranking models, including two
neural models that leverage query type information as additional features, and
one novel multi-task neural model that views query type as the label for the
auxiliary query cluster prediction task. This multi-task model is trained to
simultaneously rank documents and predict query types. Our experiments on tens
of millions of real-world email search queries demonstrate that the proposed
multi-task model can significantly outperform the baseline neural ranking
models, which either do not incorporate query type information or just simply
feed query type as an additional feature.Comment: CIKM 201
PREFERENCE BASED TERM WEIGHTING FOR ARABIC FIQH DOCUMENT RANKING
In document retrieval, besides the suitability of query with search results, there is also a subjective user assessment that is expected to be a deciding factor in document ranking. This preference aspect is referred at the fiqh document searching. People tend to prefer on certain fiqh methodology without rejecting other fiqh methodologies. It is necessary to investigate preference factor in addition to the relevance factor in the document ranking. Therefore, this research proposed a method of term weighting based on preference to rank documents according to user preference. The proposed method is also combined with term weighting based on documents index and books index so it sees relevance and preference aspect. The proposed method is Inverse Preference Frequency with α value (IPFα). In this method, we calculate preference value by IPF term weighting. Then, the preference values of terms that is equal with the query are multiplied by α. IPFα combined with the existing weighting methods become TF.IDF.IBF.IPFα. Experiment of the proposed method uses dataset of several Arabic fiqh documents. Evaluation uses recall, precision, and f-measure calculations. Proposed term weighting method is obtained to rank the document in the right order according to user preference. It is shown from the result with recall value reach 75%, precision 100%, and f-measure 85.7% respectively
Pretrained Language Model based Web Search Ranking: From Relevance to Satisfaction
Search engine plays a crucial role in satisfying users' diverse information
needs. Recently, Pretrained Language Models (PLMs) based text ranking models
have achieved huge success in web search. However, many state-of-the-art text
ranking approaches only focus on core relevance while ignoring other dimensions
that contribute to user satisfaction, e.g., document quality, recency,
authority, etc. In this work, we focus on ranking user satisfaction rather than
relevance in web search, and propose a PLM-based framework, namely SAT-Ranker,
which comprehensively models different dimensions of user satisfaction in a
unified manner. In particular, we leverage the capacities of PLMs on both
textual and numerical inputs, and apply a multi-field input that modularizes
each dimension of user satisfaction as an input field. Overall, SAT-Ranker is
an effective, extensible, and data-centric framework that has huge potential
for industrial applications. On rigorous offline and online experiments,
SAT-Ranker obtains remarkable gains on various evaluation sets targeting
different dimensions of user satisfaction. It is now fully deployed online to
improve the usability of our search engine
- …