3,814 research outputs found
Recommended from our members
A user-centred approach to information retrieval
A user model is a fundamental component in user-centred information retrieval systems. It enables personalization of a user's search experience. The development of such a model involves three phases: collecting information about each user, representing such information, and integrating the model into a retrieval application. Progress in this area is typically met with privacy and scalability challenges that hinder the ability to synthesize collective knowledge from each user's search behaviour. In this thesis, I propose a framework that addresses each of these three phases. The proposed framework is based on social role theory from the social science literature and at the centre of this theory is the concept of a social position. A social position is a label for a group of users with similar behavioural patterns. Examples of such positions are traveller, patient, movie fan, and computer scientist. In this thesis, a social position acts as a label for users who are expected to have similar interests. The proposed framework does not require real users' data; rather it uses the web as a resource to model users.
The proposed framework offers a data-driven and modular design for each of the three phases of building a user model. First, I present an approach to identify social positions from natural language sentences. I formulate this task as a binary classification task and develop a method to enumerate candidate social positions. The proposed classifier achieves an accuracy score of 85.8%, which indicates that social positions can be identified with good accuracy. Through an inter-annotator agreement study, I further show a reasonable level of agreement between users when identifying social positions.
Second, I introduce a novel topic modelling-based approach to represent each social position as a multinomial distribution over words. This approach estimates a topic from a document collection for each position. To construct such a collection for a particular position, I propose a seeding algorithm that extracts a set of terms relevant to the social position. Coherence-based evaluation shows that the proposed approach learns significantly more coherent representations when compared with a relevance modelling baseline.
Third, I present a diversification approach based on the proposed framework. Diversification algorithms aim to return a result list for a search query that would potentially satisfy users with diverse information needs. I propose to identify social positions that are relevant to a search query. These positions act as an implicit representation of the many possible interpretations of the search query. Then, relevant positions are provided to a diversification technique that proportionally diversifies results based on each social position's importance. I evaluate my approach using four test collections provided by the diversity task of the Text REtrieval Conference (TREC) web tracks for 2009, 2010, 2011, and 2012. Results demonstrate that my proposed diversification approach is effective and provides statistically significant improvements over various implicit diversification approaches.
Fourth, I introduce a session-based search system under the framework of learning to rank. Such a system aims to improve the retrieval performance for a search query using previous user interactions during the search session. I present a method to match a search session to its most relevant social positions based on the session's interaction data. I then suggest identifying related sessions from query logs that are likely to be issued by users with similar information needs. Novel learning features are then estimated from the session's social positions, related sessions, and interaction data. I evaluate the proposed system using four test collections from the TREC session track. This approach achieves state-of-the-art results compared with effective session-based search systems. I demonstrate that such a strong performance is mainly attributed to features that are derived from social positions' data
Probabilistic Modeling in Dynamic Information Retrieval
Dynamic modeling is used to design systems that are adaptive to their changing environment and is currently poorly understood in information retrieval systems. Common elements in the information retrieval methodology, such as documents, relevance, users and tasks, are dynamic entities that may evolve over the course of several interactions, which is increasingly captured in search log datasets. Conventional frameworks and models in information retrieval treat these elements as static, or only consider local interactivity, without consideration for the optimisation of all potential interactions. Further to this, advances in information retrieval interface, contextual personalization and ad display demand models that can intelligently react to users over time. This thesis proposes a new area of information retrieval research called Dynamic Information Retrieval. The term dynamics is defined and what it means within the context of information retrieval. Three examples of current areas of research in information retrieval which can be described as dynamic are covered: multi-page search, online learning to rank and session search. A probabilistic model for dynamic information retrieval is introduced and analysed, and applied in practical algorithms throughout. This framework is based on the partially observable Markov decision process model, and solved using dynamic programming and the Bellman equation. Comparisons are made against well-established techniques that show improvements in ranking quality and in particular, document diversification. The limitations of this approach are explored and appropriate approximation techniques are investigated, resulting in the development of an efficient multi-armed bandit based ranking algorithm. Finally, the extraction of dynamic behaviour from search logs is also demonstrated as an application, showing that dynamic information retrieval modeling is an effective and versatile tool in state of the art information retrieval research
Query Generation as Result Aggregation for Knowledge Representation
Knowledge representations have greatly enhanced the fundamental human problem of information search, profoundly changing representations of queries and database information for various retrieval tasks. Despite new technologies, little thought has been given in the field of query recommendation – recommending keyword queries to end users – to a holistic approach that recommends constructed queries from relevant snippets of information; pre-existing queries are used instead. Can we instead determine relevant information a user should see and aggregate it into a query? We construct a general framework leveraging various retrieval architectures to aggregate relevant information into a natural language query for recommendation. We test this framework in text retrieval, aggregating text snippets and comparing output queries to user generated queries. We show that an algorithm can generate queries more closely resembling the original and give effective retrieval results. Our simple approach shows promise for also leveraging knowledge structures to generate effective query recommendations
Ranking for Relevance and Display Preferences in Complex Presentation Layouts
Learning to Rank has traditionally considered settings where given the
relevance information of objects, the desired order in which to rank the
objects is clear. However, with today's large variety of users and layouts this
is not always the case. In this paper, we consider so-called complex ranking
settings where it is not clear what should be displayed, that is, what the
relevant items are, and how they should be displayed, that is, where the most
relevant items should be placed. These ranking settings are complex as they
involve both traditional ranking and inferring the best display order. Existing
learning to rank methods cannot handle such complex ranking settings as they
assume that the display order is known beforehand. To address this gap we
introduce a novel Deep Reinforcement Learning method that is capable of
learning complex rankings, both the layout and the best ranking given the
layout, from weak reward signals. Our proposed method does so by selecting
documents and positions sequentially, hence it ranks both the documents and
positions, which is why we call it the Double-Rank Model (DRM). Our experiments
show that DRM outperforms all existing methods in complex ranking settings,
thus it leads to substantial ranking improvements in cases where the display
order is not known a priori
Interactive exploratory search for multi page search results
Modern information retrieval interfaces typically involve multiple pages of search results, and users who are recall minded or engaging in exploratory search using ad hoc queries are likely to access more than one page. Document rankings for such queries can be improved by allowing additional context to the query to be provided by the user herself using explicit ratings or implicit actions such as clickthroughs. Existing methods using this information usually involved detrimental UI changes that can lower user satisfaction. Instead, we propose a new feedback scheme that makes use of existing UIs and does not alter user's browsing behaviour; to maximise retrieval performance over multiple result pages, we propose a novel retrieval optimisation framework and show that the optimal ranking policy should choose a diverse, exploratory ranking to display on the first page. Then, a personalised re-ranking of the next pages can be generated based on the user's feedback from the first page. We show that document correlations used in result diversification have a significant impact on relevance feedback and its effectiveness over a search session. TREC evaluations demonstrate that our optimal rank strategy (including approximative Monte Carlo Sampling) can naturally optimise the trade-off between exploration and exploitation and maximise the overall user's satisfaction over time against a number of similar baselines. Copyright is held by the International World Wide Web Conference Committee (IW3C2)
- …