35 research outputs found
Lexical Query Modeling in Session Search
Lexical query modeling has been the leading paradigm for session search. In
this paper, we analyze TREC session query logs and compare the performance of
different lexical matching approaches for session search. Naive methods based
on term frequency weighing perform on par with specialized session models. In
addition, we investigate the viability of lexical query models in the setting
of session search. We give important insights into the potential and
limitations of lexical query modeling for session search and propose future
directions for the field of session search.Comment: ICTIR2016, Proceedings of the 2nd ACM International Conference on the
Theory of Information Retrieval. 201
Recommended from our members
A user-centred approach to information retrieval
A user model is a fundamental component in user-centred information retrieval systems. It enables personalization of a user's search experience. The development of such a model involves three phases: collecting information about each user, representing such information, and integrating the model into a retrieval application. Progress in this area is typically met with privacy and scalability challenges that hinder the ability to synthesize collective knowledge from each user's search behaviour. In this thesis, I propose a framework that addresses each of these three phases. The proposed framework is based on social role theory from the social science literature and at the centre of this theory is the concept of a social position. A social position is a label for a group of users with similar behavioural patterns. Examples of such positions are traveller, patient, movie fan, and computer scientist. In this thesis, a social position acts as a label for users who are expected to have similar interests. The proposed framework does not require real users' data; rather it uses the web as a resource to model users.
The proposed framework offers a data-driven and modular design for each of the three phases of building a user model. First, I present an approach to identify social positions from natural language sentences. I formulate this task as a binary classification task and develop a method to enumerate candidate social positions. The proposed classifier achieves an accuracy score of 85.8%, which indicates that social positions can be identified with good accuracy. Through an inter-annotator agreement study, I further show a reasonable level of agreement between users when identifying social positions.
Second, I introduce a novel topic modelling-based approach to represent each social position as a multinomial distribution over words. This approach estimates a topic from a document collection for each position. To construct such a collection for a particular position, I propose a seeding algorithm that extracts a set of terms relevant to the social position. Coherence-based evaluation shows that the proposed approach learns significantly more coherent representations when compared with a relevance modelling baseline.
Third, I present a diversification approach based on the proposed framework. Diversification algorithms aim to return a result list for a search query that would potentially satisfy users with diverse information needs. I propose to identify social positions that are relevant to a search query. These positions act as an implicit representation of the many possible interpretations of the search query. Then, relevant positions are provided to a diversification technique that proportionally diversifies results based on each social position's importance. I evaluate my approach using four test collections provided by the diversity task of the Text REtrieval Conference (TREC) web tracks for 2009, 2010, 2011, and 2012. Results demonstrate that my proposed diversification approach is effective and provides statistically significant improvements over various implicit diversification approaches.
Fourth, I introduce a session-based search system under the framework of learning to rank. Such a system aims to improve the retrieval performance for a search query using previous user interactions during the search session. I present a method to match a search session to its most relevant social positions based on the session's interaction data. I then suggest identifying related sessions from query logs that are likely to be issued by users with similar information needs. Novel learning features are then estimated from the session's social positions, related sessions, and interaction data. I evaluate the proposed system using four test collections from the TREC session track. This approach achieves state-of-the-art results compared with effective session-based search systems. I demonstrate that such a strong performance is mainly attributed to features that are derived from social positions' data
Personalized information retrieval based on time-sensitive user profile
Les moteurs de recherche, largement utilisĂ©s dans diffĂ©rents domaines, sont devenus la principale source d'information pour de nombreux utilisateurs. Cependant, les SystĂšmes de Recherche d'Information (SRI) font face Ă de nouveaux dĂ©fis liĂ©s Ă la croissance et Ă la diversitĂ© des donnĂ©es disponibles. Un SRI analyse la requĂȘte soumise par l'utilisateur et explore des collections de donnĂ©es de nature non structurĂ©e ou semi-structurĂ©e (par exemple : texte, image, vidĂ©o, page Web, etc.) afin de fournir des rĂ©sultats qui correspondent le mieux Ă son intention et ses intĂ©rĂȘts. Afin d'atteindre cet objectif, au lieu de prendre en considĂ©ration l'appariement requĂȘte-document uniquement, les SRI s'intĂ©ressent aussi au contexte de l'utilisateur. En effet, le profil utilisateur a Ă©tĂ© considĂ©rĂ© dans la littĂ©rature comme l'Ă©lĂ©ment contextuel le plus important permettant d'amĂ©liorer la pertinence de la recherche. Il est intĂ©grĂ© dans le processus de recherche d'information afin d'amĂ©liorer l'expĂ©rience utilisateur en recherchant des informations spĂ©cifiques. Comme le facteur temps a gagnĂ© beaucoup d'importance ces derniĂšres annĂ©es, la dynamique temporelle est introduite pour Ă©tudier l'Ă©volution du profil utilisateur qui consiste principalement Ă saisir les changements du comportement, des intĂ©rĂȘts et des prĂ©fĂ©rences de l'utilisateur en fonction du temps et Ă actualiser le profil en consĂ©quence. Les travaux antĂ©rieurs ont distinguĂ© deux types de profils utilisateurs : les profils Ă court-terme et ceux Ă long-terme. Le premier type de profil est limitĂ© aux intĂ©rĂȘts liĂ©s aux activitĂ©s actuelles de l'utilisateur tandis que le second reprĂ©sente les intĂ©rĂȘts persistants de l'utilisateur extraits de ses activitĂ©s antĂ©rieures tout en excluant les intĂ©rĂȘts rĂ©cents. Toutefois, pour les utilisateurs qui ne sont pas trĂšs actifs dont les activitĂ©s sont peu nombreuses et sĂ©parĂ©es dans le temps, le profil Ă court-terme peut Ă©liminer des rĂ©sultats pertinents qui sont davantage liĂ©s Ă leurs intĂ©rĂȘts personnels. Pour les utilisateurs qui sont trĂšs actifs, l'agrĂ©gation des activitĂ©s rĂ©centes sans ignorer les intĂ©rĂȘts anciens serait trĂšs intĂ©ressante parce que ce type de profil est gĂ©nĂ©ralement en Ă©volution au fil du temps. Contrairement Ă ces approches, nous proposons, dans cette thĂšse, un profil utilisateur gĂ©nĂ©rique et sensible au temps qui est implicitement construit comme un vecteur de termes pondĂ©rĂ©s afin de trouver un compromis en unifiant les intĂ©rĂȘts rĂ©cents et anciens. Les informations du profil utilisateur peuvent ĂȘtre extraites Ă partir de sources multiples. Parmi les mĂ©thodes les plus prometteuses, nous proposons d'utiliser, d'une part, l'historique de recherche, et d'autre part les mĂ©dias sociaux. En effet, les donnĂ©es de l'historique de recherche peuvent ĂȘtre extraites implicitement sans aucun effort de l'utilisateur et comprennent les requĂȘtes Ă©mises, les rĂ©sultats correspondants, les requĂȘtes reformulĂ©es et les donnĂ©es de clics qui ont un potentiel de retour de pertinence/rĂ©troaction. Par ailleurs, la popularitĂ© des mĂ©dias sociaux permet d'en faire une source inestimable de donnĂ©es utilisĂ©es par les utilisateurs pour exprimer, partager et marquer comme favori le contenu qui les intĂ©resse. En premier lieu, nous avons modĂ©lisĂ© le profil utilisateur utilisateur non seulement en fonction du contenu de ses activitĂ©s mais aussi de leur fraĂźcheur en supposant que les termes utilisĂ©s rĂ©cemment dans les activitĂ©s de l'utilisateur contiennent de nouveaux intĂ©rĂȘts, prĂ©fĂ©rences et pensĂ©es et doivent ĂȘtre pris en considĂ©ration plus que les anciens intĂ©rĂȘts surtout que de nombreux travaux antĂ©rieurs ont prouvĂ© que l'intĂ©rĂȘt de l'utilisateur diminue avec le temps. Nous avons modĂ©lisĂ© le profil utilisateur sensible au temps en fonction d'un ensemble de donnĂ©es collectĂ©es de Twitter (un rĂ©seau social et un service de microblogging) et nous l'avons intĂ©grĂ© dans le processus de reclassement afin de personnaliser les rĂ©sultats standards en fonction des intĂ©rĂȘts de l'utilisateur.En second lieu, nous avons Ă©tudiĂ© la dynamique temporelle dans le cadre de la session de recherche oĂč les requĂȘtes rĂ©centes soumises par l'utilisateur contiennent des informations supplĂ©mentaires permettant de mieux expliquer l'intention de l'utilisateur et prouvant qu'il n'a pas trouvĂ© les informations recherchĂ©es Ă partir des requĂȘtes prĂ©cĂ©dentes.Ainsi, nous avons considĂ©rĂ© les interactions rĂ©centes et rĂ©currentes au sein d'une session de recherche en donnant plus d'importance aux termes apparus dans les requĂȘtes rĂ©centes et leurs rĂ©sultats cliquĂ©s. Nos expĂ©rimentations sont basĂ©s sur la tĂąche Session TREC 2013 et la collection ClueWeb12 qui ont montrĂ© l'efficacitĂ© de notre approche par rapport Ă celles de l'Ă©tat de l'art. Au terme de ces diffĂ©rentes contributions et expĂ©rimentations, nous prouvons que notre modĂšle gĂ©nĂ©rique de profil utilisateur sensible au temps assure une meilleure performance de personnalisation et aide Ă analyser le comportement des utilisateurs dans les contextes de session de recherche et de mĂ©dias sociaux.Recently, search engines have become the main source of information for many users and have been widely used in different fields. However, Information Retrieval Systems (IRS) face new challenges due to the growth and diversity of available data. An IRS analyses the query submitted by the user and explores collections of data with unstructured or semi-structured nature (e.g. text, image, video, Web page etc.) in order to deliver items that best match his/her intent and interests. In order to achieve this goal, we have moved from considering the query-document matching to consider the user context. In fact, the user profile has been considered, in the literature, as the most important contextual element which can improve the accuracy of the search. It is integrated in the process of information retrieval in order to improve the user experience while searching for specific information. As time factor has gained increasing importance in recent years, the temporal dynamics are introduced to study the user profile evolution that consists mainly in capturing the changes of the user behavior, interests and preferences, and updating the profile accordingly. Prior work used to discern short-term and long-term profiles. The first profile type is limited to interests related to the user's current activities while the second one represents user's persisting interests extracted from his prior activities excluding the current ones. However, for users who are not very active, the short-term profile can eliminate relevant results which are more related to their personal interests. This is because their activities are few and separated over time. For users who are very active, the aggregation of recent activities without ignoring the old interests would be very interesting because this kind of profile is usually changing over time. Unlike those approaches, we propose, in this thesis, a generic time-sensitive user profile that is implicitly constructed as a vector of weighted terms in order to find a trade-off by unifying both current and recurrent interests. User profile information can be extracted from multiple sources. Among the most promising ones, we propose to use, on the one hand, searching history. Data from searching history can be extracted implicitly without any effort from the user and includes issued queries, their corresponding results, reformulated queries and click-through data that has relevance feedback potential. On the other hand, the popularity of Social Media makes it as an invaluable source of data used by users to express, share and mark as favorite the content that interests them. First, we modeled a user profile not only according to the content of his activities but also to their freshness under the assumption that terms used recently in the user's activities contain new interests, preferences and thoughts and should be considered more than old interests. In fact, many prior works have proved that the user interest is decreasing as time goes by. In order to evaluate the time-sensitive user profile, we used a set of data collected from Twitter, i.e a social networking and microblogging service. Then, we apply our re-ranking process to a Web search system in order to adapt the user's online interests to the original retrieved results. Second, we studied the temporal dynamics within session search where recent submitted queries contain additional information explaining better the user intent and prove that the user hasn't found the information sought from previous submitted ones. We integrated current and recurrent interactions within a unique session model giving more importance to terms appeared in recently submitted queries and clicked results. We conducted experiments using the 2013 TREC Session track and the ClueWeb12 collection that showed the effectiveness of our approach compared to state-of-the-art ones. Overall, in those different contributions and experiments, we prove that our time-sensitive user profile insures better performance of personalization and helps to analyze user behavior in both session search and social media contexts
Towards Collaborative Session-based Semantic Search
In recent years, the most popular web search engines have excelled in their ability to answer short queries that require clear, localized and personalized answers. When it comes to complex exploratory search tasks however, the main challenge for the searcher remains the same as back in the 1990s: Trying to formulate a single query that contains all the right keywords to produce at least some relevant results.
In this work we want to investigate new ways to facilitate exploratory search by making use of context information from the user's entire search process. Therefore we present the concept of session-based semantic search, with an optional extension to collaborative search scenarios. To improve the relevance of search results we expand queries with terms from the user's recent query history in the same search context (session-based search). We introduce a novel method for query classification based on statistical topic models which allows us to track the most important topics in a search session so that we can suggest relevant documents that could not be found through keyword matching.
To demonstrate the potential of these concepts, we have built the prototype of a session-based semantic search engine which we release as free and open source software. In a qualitative user study that we have conducted, this prototype has shown promising results and was well-received by the participants.:1. Introduction
2. Related Work
2.1. Topic Models
2.1.1. Common Traits
2.1.2. Topic Modeling Techniques
2.1.3. Topic Labeling
2.1.4. Topic Graph Visualization
2.2. Session-based Search
2.3. Query Classification
2.4. Collaborative Search
2.4.1. Aspects of Collaborative Search Systems
2.4.2. Collaborative Information Retrieval Systems
3. Core Concepts
3.1. Session-based Search
3.1.1. Session Data
3.1.2. Query Aggregation
3.2. Topic Centroid
3.2.1. Topic Identification
3.2.2. Topic Shift
3.2.3. Relevance Feedback
3.2.4. Topic Graph Visualization
3.3. Search Strategy
3.3.1. Prerequisites
3.3.2. Search Algorithms
3.3.3. Query Pipeline
3.4. Collaborative Search
3.4.1. Shared Topic Centroid
3.4.2. Group Management
3.4.3. Collaboration
3.5. Discussion
4. Prototype
4.1. Document Collection
4.1.1. Selection Criteria
4.1.2. Data Preparation
4.1.3. Search Index
4.2. Search Engine
4.2.1. Search Algorithms
4.2.2. Query Pipeline
4.2.3. Session Persistence
4.3. User Interface
4.4. Performance Review
4.5. Discussion
5. User Study
5.1. Methods
5.1.1. Procedure
5.1.2. Implementation
5.1.3. Tasks
5.1.4. Questionnaires
5.2. Results
5.2.1. Participants
5.2.2. Task Review
5.2.3. Literature Research Results
5.3. Discussion
6. Conclusion
Bibliography
Weblinks
A. Appendix
A.1. Prototype: Source Code
A.2. Survey
A.2.1. Tasks
A.2.2. Document Filter for Google Scholar
A.2.3. Questionnaires
A.2.4. Participantâs Answers
A.2.5. Participantâs Search ResultsDie fĂŒhrenden Web-Suchmaschinen haben sich in den letzten Jahren gegenseitig darin ĂŒbertroffen, möglichst leicht verstĂ€ndliche, lokalisierte und personalisierte Antworten auf kurze Suchanfragen anzubieten. Bei komplexen explorativen Rechercheaufgaben hingegen ist die gröĂte Herausforderung fĂŒr den Nutzer immer noch die gleiche wie in den 1990er Jahren: Eine einzige Suchanfrage so zu formulieren, dass alle notwendigen SchlĂŒsselwörter enthalten sind, um zumindest ein paar relevante Ergebnisse zu erhalten.
In der vorliegenden Arbeit sollen neue Methoden entwickelt werden, um die explorative Suche zu erleichtern, indem Kontextinformationen aus dem gesamten Suchprozess des Nutzers einbezogen werden. Daher stellen wir das Konzept der sitzungsbasierten semantischen Suche vor, mit einer optionalen Erweiterung auf kollaborative Suchszenarien. Um die Relevanz von Suchergebnissen zu steigern, werden Suchanfragen mit Begriffen aus den letzten Anfragen des Nutzers angereichert, die im selben Suchkontext gestellt wurden (sitzungsbasierte Suche). AuĂerdem wird ein neuartiger Ansatz zur Klassifizierung von Suchanfragen eingefĂŒhrt, der auf statistischen Themenmodellen basiert und es uns ermöglicht, die wichtigsten Themen in einer Suchsitzung zu erkennen, um damit weitere relevante Dokumente vorzuschlagen, die nicht durch Keyword-Matching gefunden werden konnten.
Um das Potential dieser Konzepte zu demonstrieren, wurde im Rahmen dieser Arbeit der Prototyp einer sitzungsbasierten semantischen Suchmaschine entwickelt, den wir als freie Software veröffentlichen. In einer qualitativen Nutzerstudie hat dieser Prototyp vielversprechende Ergebnisse hervorgebracht und wurde von den Teilnehmern positiv aufgenommen.:1. Introduction
2. Related Work
2.1. Topic Models
2.1.1. Common Traits
2.1.2. Topic Modeling Techniques
2.1.3. Topic Labeling
2.1.4. Topic Graph Visualization
2.2. Session-based Search
2.3. Query Classification
2.4. Collaborative Search
2.4.1. Aspects of Collaborative Search Systems
2.4.2. Collaborative Information Retrieval Systems
3. Core Concepts
3.1. Session-based Search
3.1.1. Session Data
3.1.2. Query Aggregation
3.2. Topic Centroid
3.2.1. Topic Identification
3.2.2. Topic Shift
3.2.3. Relevance Feedback
3.2.4. Topic Graph Visualization
3.3. Search Strategy
3.3.1. Prerequisites
3.3.2. Search Algorithms
3.3.3. Query Pipeline
3.4. Collaborative Search
3.4.1. Shared Topic Centroid
3.4.2. Group Management
3.4.3. Collaboration
3.5. Discussion
4. Prototype
4.1. Document Collection
4.1.1. Selection Criteria
4.1.2. Data Preparation
4.1.3. Search Index
4.2. Search Engine
4.2.1. Search Algorithms
4.2.2. Query Pipeline
4.2.3. Session Persistence
4.3. User Interface
4.4. Performance Review
4.5. Discussion
5. User Study
5.1. Methods
5.1.1. Procedure
5.1.2. Implementation
5.1.3. Tasks
5.1.4. Questionnaires
5.2. Results
5.2.1. Participants
5.2.2. Task Review
5.2.3. Literature Research Results
5.3. Discussion
6. Conclusion
Bibliography
Weblinks
A. Appendix
A.1. Prototype: Source Code
A.2. Survey
A.2.1. Tasks
A.2.2. Document Filter for Google Scholar
A.2.3. Questionnaires
A.2.4. Participantâs Answers
A.2.5. Participantâs Search Result
Pretrained Transformers for Text Ranking: BERT and Beyond
The goal of text ranking is to generate an ordered list of texts retrieved
from a corpus in response to a query. Although the most common formulation of
text ranking is search, instances of the task can also be found in many natural
language processing applications. This survey provides an overview of text
ranking with neural network architectures known as transformers, of which BERT
is the best-known example. The combination of transformers and self-supervised
pretraining has been responsible for a paradigm shift in natural language
processing (NLP), information retrieval (IR), and beyond. In this survey, we
provide a synthesis of existing work as a single point of entry for
practitioners who wish to gain a better understanding of how to apply
transformers to text ranking problems and researchers who wish to pursue work
in this area. We cover a wide range of modern techniques, grouped into two
high-level categories: transformer models that perform reranking in multi-stage
architectures and dense retrieval techniques that perform ranking directly.
There are two themes that pervade our survey: techniques for handling long
documents, beyond typical sentence-by-sentence processing in NLP, and
techniques for addressing the tradeoff between effectiveness (i.e., result
quality) and efficiency (e.g., query latency, model and index size). Although
transformer architectures and pretraining techniques are recent innovations,
many aspects of how they are applied to text ranking are relatively well
understood and represent mature techniques. However, there remain many open
research questions, and thus in addition to laying out the foundations of
pretrained transformers for text ranking, this survey also attempts to
prognosticate where the field is heading
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information â provided implicitly or explicitly â is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction