8,023 research outputs found

    Intent Models for Contextualising and Diversifying Query Suggestions

    Full text link
    The query suggestion or auto-completion mechanisms help users to type less while interacting with a search engine. A basic approach that ranks suggestions according to their frequency in the query logs is suboptimal. Firstly, many candidate queries with the same prefix can be removed as redundant. Secondly, the suggestions can also be personalised based on the user's context. These two directions to improve the aforementioned mechanisms' quality can be in opposition: while the latter aims to promote suggestions that address search intents that a user is likely to have, the former aims to diversify the suggestions to cover as many intents as possible. We introduce a contextualisation framework that utilises a short-term context using the user's behaviour within the current search session, such as the previous query, the documents examined, and the candidate query suggestions that the user has discarded. This short-term context is used to contextualise and diversify the ranking of query suggestions, by modelling the user's information need as a mixture of intent-specific user models. The evaluation is performed offline on a set of approximately 1.0M test user sessions. Our results suggest that the proposed approach significantly improves query suggestions compared to the baseline approach.Comment: A short version of this paper was presented at CIKM 201

    Efficient Diversification of Web Search Results

    Full text link
    In this paper we analyze the efficiency of various search results diversification methods. While efficacy of diversification approaches has been deeply investigated in the past, response time and scalability issues have been rarely addressed. A unified framework for studying performance and feasibility of result diversification solutions is thus proposed. First we define a new methodology for detecting when, and how, query results need to be diversified. To this purpose, we rely on the concept of "query refinement" to estimate the probability of a query to be ambiguous. Then, relying on this novel ambiguity detection method, we deploy and compare on a standard test set, three different diversification methods: IASelect, xQuAD, and OptSelect. While the first two are recent state-of-the-art proposals, the latter is an original algorithm introduced in this paper. We evaluate both the efficiency and the effectiveness of our approach against its competitors by using the standard TREC Web diversification track testbed. Results shown that OptSelect is able to run two orders of magnitude faster than the two other state-of-the-art approaches and to obtain comparable figures in diversification effectiveness.Comment: VLDB201

    The impact of result diversification on search behaviour and performance

    Get PDF
    Result diversification aims to provide searchers with a broader view of a given topic while attempting to maximise the chances of retrieving relevant material. Diversifying results also aims to reduce search bias by increasing the coverage over different aspects of the topic. As such, searchers should learn more about the given topic in general. Despite diversification algorithms being introduced over two decades ago, little research has explicitly examined their impact on search behaviour and performance in the context of Interactive Information Retrieval (IIR). In this paper, we explore the impact of diversification when searchers undertake complex search tasks that require learning about different aspects of a topic (aspectual retrieval). We hypothesise that by diversifying search results, searchers will be exposed to a greater number of aspects. In turn, this will maximise their coverage of the topic (and thus reduce possible search bias). As a consequence, diversification should lead to performance benefits, regardless of the task, but how does diversification affect search behaviours and search satisfaction? Based on Information Foraging Theory (IFT), we infer two hypotheses regarding search behaviours due to diversification, namely that (i) it will lead to searchers examining fewer documents per query, and (ii) it will also mean searchers will issue more queries overall. To this end, we performed a within-subjects user study using the TREC AQUAINT collection with 51 participants, examining the differences in search performance and behaviour when using (i) a non-diversified system (BM25) versus (ii) a diversified system (BM25+xQuAD) when the search task is either (a) ad-hoc or (b) aspectual. Our results show a number of notable findings in terms of search behaviour: participants on the diversified system issued more queries and examined fewer documents per query when performing the aspectual search task. Furthermore, we showed that when using the diversified system, participants were: more successful in marking relevant documents, and obtained a greater awareness of the topics (i.e. identified relevant documents containing more novel aspects). These findings show that search behaviour is influenced by diversification and task complexity. They also motivate further research into complex search tasks such as aspectual retrieval -- and how diversity can play an important role in improving the search experience, by providing greater coverage of a topic and mitigating potential bias in search results

    Personalized Dialogue Generation with Diversified Traits

    Full text link
    Endowing a dialogue system with particular personality traits is essential to deliver more human-like conversations. However, due to the challenge of embodying personality via language expression and the lack of large-scale persona-labeled dialogue data, this research problem is still far from well-studied. In this paper, we investigate the problem of incorporating explicit personality traits in dialogue generation to deliver personalized dialogues. To this end, firstly, we construct PersonalDialog, a large-scale multi-turn dialogue dataset containing various traits from a large number of speakers. The dataset consists of 20.83M sessions and 56.25M utterances from 8.47M speakers. Each utterance is associated with a speaker who is marked with traits like Age, Gender, Location, Interest Tags, etc. Several anonymization schemes are designed to protect the privacy of each speaker. This large-scale dataset will facilitate not only the study of personalized dialogue generation, but also other researches on sociolinguistics or social science. Secondly, to study how personality traits can be captured and addressed in dialogue generation, we propose persona-aware dialogue generation models within the sequence to sequence learning framework. Explicit personality traits (structured by key-value pairs) are embedded using a trait fusion module. During the decoding process, two techniques, namely persona-aware attention and persona-aware bias, are devised to capture and address trait-related information. Experiments demonstrate that our model is able to address proper traits in different contexts. Case studies also show interesting results for this challenging research problem.Comment: Please contact [zhengyinhe1 at 163 dot com] for the PersonalDialog datase
    • …
    corecore