719 research outputs found
Axiomatic analysis of smoothing methods in language models for pseudo-relevance feedback
Pseudo-Relevance Feedback (PRF) is an important general technique for improving retrieval effectiveness without requiring any user effort. Several state-of-the-art PRF models are based on the language modeling approach where a query language model is learned based on feedback documents. In all these models, feedback documents are represented with unigram language models smoothed with a collection language model. While collection language model-based smoothing has proven both effective and necessary in using language models for retrieval, we use axiomatic analysis to show that this smoothing scheme inherently causes the feedback model to favor frequent terms and thus violates the IDF constraint needed to ensure selection of discriminative feedback terms. To address this problem, we propose replacing collection language model-based smoothing in the feedback stage with additive smoothing, which is analytically shown to select more discriminative terms. Empirical evaluation further confirms that additive smoothing indeed significantly outperforms collection-based smoothing methods in multiple language model-based PRF models
Information retrieval models for recommender systems
Programa Oficial de Doutoramento en Computación . 5009V01[Abstract]
Information retrieval addresses the information needs of users by delivering
relevant pieces of information but requires users to convey their
information needs explicitly. In contrast, recommender systems offer personalized
suggestions of items automatically. Ultimately, both fields help
users cope with information overload by providing them with relevant
items of information.
This thesis aims to explore the connections between information retrieval
and recommender systems. Our objective is to devise recommendation
models inspired in information retrieval techniques. We begin by
borrowing ideas from the information retrieval evaluation literature to analyze
evaluation metrics in recommender systems. Second, we study the
applicability of pseudo-relevance feedback models to different recommendation
tasks. We investigate the conventional top-N recommendation
task, but we also explore the recently formulated user-item group formation
problem and propose a novel task based on the liquidation oflong
tail items. Third, we exploit ad hoc retrieval models to compute neighborhoods
in a collaborative filtering scenario. Fourth, we explore the
opposite direction by adapting an effective recommendation framework
to pseudo-relevance feedback. Finally, we discuss the results and present
our concIusions.
In summary, this doctoral thesis adapts a series of information retrieval
models to recommender systems. Our investigation shows that many
retrieval models can be accommodated to deal with different recommendation
tasks. Moreover, we find that taking the opposite path is also
possible. Exhaustive experimentation confirms that the proposed models
are competitive. Finally, we also perform a theoretical analysis of sorne
models to explain their effectiveness.[Resumen]
La recuperación de información da respuesta a las necesidades de información
de los usuarios proporcionando información relevante, pero
requiere que los usuarios expresen explícitamente sus necesidades de
información. Por el contrario, los sistemas de recomendación ofrecen
sugerencias personalizadas de elementos automáticamente. En última
instancia, ambos campos ayudan a los usuarios a lidiar con la sobrecarga
de información al proporcionarles información relevante.
Esta tesis tiene como propósito explorar las conexiones entre la recuperación
de información y los sistemas de recomendación. Nuestro
objetivo es diseñar modelos de recomendación inspirados en técnicas de
recuperación de información. Comenzamos tomando prestadas ideas de
la literatura de evaluación en recuperación de información para analizar
las métricas de evaluación en los sistemas de recomendación. En segundo
lugar, estudiamos la aplicabilidad de los modelos de retroalimentación de
pseudo-relevancia a diferentes tareas de recomendación. Investigamos
la tarea de recomendar listas ordenadas de elementos, pero también exploramos
el problema recientemente formulado de formación de grupos
usuario-elemento y proponemos una tarea novedosa basada en la liquidación
de los elementos de la larga cola. Tercero, explotamos modelos
de recuperación ad hoc para calcular vecindarios en un escenario de
filtrado colaborativo. En cuarto lugar, exploramos la dirección opuesta
adaptando un método eficaz de recomendación a la retroalimentación de
pseudo-relevancia. Finalmente, discutimos los resultados y presentamos
nuestras conclusiones.
En resumen, esta tesis doctoral adapta varios modelos de recuperación
de información para su uso como sistemas de recomendación. Nuestra
investigación muestra que muchos modelos de recuperación de información
se pueden aplicar para tratar diferentes tareas de recomendación.
Además, comprobamos que tomar el camino contrario también es posible.
Una experimentación exhaustiva confirma que los modelos propuestos
son competitivos. Finalmente, también realizamos un análisis teórico de
algunos modelos para explicar su efectividad.[Resumo]
A recuperación de información dá resposta ás necesidades de información
dos usuarios proporcionando información relevante, pero require
que os usuarios expresen explicitamente as súas necesidades de información.
Pola contra, os sistemas de recomendación ofrecen suxestións
personalizadas de elementos automaticamente. En última instancia, ambos
os campos axudan aos usuarios a lidar coa sobrecarga de información
ao proporcionarlles información relevante.
Esta tese ten como propósito explorar as conexións entre a recuperación
de información e os sistemas de recomendación. O naso obxectivo é deseñar
modelos de recomendación inspirados en técnicas de recuperación
de información. Comezamos tomando prestadas ideas da literatura de
avaliación en recuperación de información para analizar as métricas de
avaliación nos sistemas de recomendación. En segundo lugar, estudamos
a aplicabilidade dos modelos de retroalimentación de seudo-relevancia a
diferentes tarefas de recomendación. Investigamos a tarefa de recomendar
listas ordenadas de elementos, pero tamén exploramos o problema
recentemente formulado de formación de grupos de usuario-elemento e
propoñemos unha tarefa nova baseada na liquidación dos elementos da
longa cola. Terceiro, explotamos modelos de recuperación ad hoc para
calcular veciñanzas nun escenario de filtrado colaborativo. En cuarto
lugar, exploramos a dirección aposta adaptando un método eficaz de
recomendación á retroalimentación de seudo-relevancia. Finalmente,
discutimos os resultados e presentamos as nasas conclusións.
En resumo, esta tese doutoral adapta varios modelos de recuperación
de información para o seu uso como sistemas de recomendación. A nosa
investigación mostra que moitos modelos de recuperación de información
pódense aplicar para tratar diferentes tarefas de recomendación.
Ademais, comprobamos que tomar o camiño contrario tamén é posible.
Unha experimentación exhaustiva confirma que os modelos propostos
son competitivos. Finalmente, tamén realizamos unha análise teórica
dalgúns modelos para explicar a súa efectividade
Mining document, concept, and term associations for effective biomedical retrieval - Introducing MeSH-enhanced retrieval models
Manually assigned subject terms, such as Medical Subject Headings (MeSH) in the health domain, describe the concepts or topics of a document. Existing information retrieval models do not take full advantage of such information. In this paper, we propose two MeSH-enhanced (ME) retrieval models that integrate the concept layer (i.e. MeSH) into the language modeling framework to improve retrieval performance. The new models quantify associations between documents and their assigned concepts to construct conceptual representations for the documents, and mine associations between concepts and terms to construct generative concept models. The two ME models reconstruct two essential estimation processes of the relevance model (Lavrenko and Croft 2001) by incorporating the document-concept and the concept-term associations. More specifically, in Model 1, language models of the pseudo-feedback documents are enriched by their assigned concepts. In Model 2, concepts that are related to users’ queries are first identified, and then used to reweight the pseudo-feedback documents according to the document-concept associations. Experiments carried out on two standard test collections show that the ME models outperformed the query likelihood model, the relevance model (RM3), and an earlier ME model. A detailed case analysis provides insight into how and why the new models improve/worsen retrieval performance. Implications and limitations of the study are discussed. This study provides new ways to formally incorporate semantic annotations, such as subject terms, into retrieval models. The findings of this study suggest that integrating the concept layer into retrieval models can further improve the performance over the current state-of-the-art models.Ye
Information Retrieval: Recent Advances and Beyond
In this paper, we provide a detailed overview of the models used for
information retrieval in the first and second stages of the typical processing
chain. We discuss the current state-of-the-art models, including methods based
on terms, semantic retrieval, and neural. Additionally, we delve into the key
topics related to the learning process of these models. This way, this survey
offers a comprehensive understanding of the field and is of interest for for
researchers and practitioners entering/working in the information retrieval
domain
Recommended from our members
Neural Models for Information Retrieval without Labeled Data
Recent developments of machine learning models, and in particular deep neural networks, have yielded significant improvements on several computer vision, natural language processing, and speech recognition tasks. Progress with information retrieval (IR) tasks has been slower, however, due to the lack of large-scale training data as well as neural network models specifically designed for effective information retrieval. In this dissertation, we address these two issues by introducing task-specific neural network architectures for a set of IR tasks and proposing novel unsupervised or \emph{weakly supervised} solutions for training the models. The proposed learning solutions do not require labeled training data. Instead, in our weak supervision approach, neural models are trained on a large set of noisy and biased training data obtained from external resources, existing models, or heuristics.
We first introduce relevance-based embedding models that learn distributed representations for words and queries. We show that the learned representations can be effectively employed for a set of IR tasks, including query expansion, pseudo-relevance feedback, and query classification.
We further propose a standalone learning to rank model based on deep neural networks. Our model learns a sparse representation for queries and documents. This enables us to perform efficient retrieval by constructing an inverted index in the learned semantic space. Our model outperforms state-of-the-art retrieval models, while performing as efficiently as term matching retrieval models.
We additionally propose a neural network framework for predicting the performance of a retrieval model for a given query. Inspired by existing query performance prediction models, our framework integrates several information sources, such as retrieval score distribution and term distribution in the top retrieved documents. This leads to state-of-the-art results for the performance prediction task on various standard collections.
We finally bridge the gap between retrieval and recommendation models, as the two key components in most information systems. Search and recommendation often share the same goal: helping people get the information they need at the right time. Therefore, joint modeling and optimization of search engines and recommender systems could potentially benefit both systems. In more detail, we introduce a retrieval model that is trained using user-item interaction (e.g., recommendation data), with no need to query-document relevance information for training.
Our solutions and findings in this dissertation smooth the path towards learning efficient and effective models for various information retrieval and related tasks, especially when large-scale training data is not available
Ranking for Web Data Search Using On-The-Fly Data Integration
Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines
- …