54,071 research outputs found

    Topic-based mixture language modelling

    Get PDF
    This paper describes an approach for constructing a mixture of language models based on simple statistical notions of semantics using probabilistic models developed for information retrieval. The approach encapsulates corpus-derived semantic information and is able to model varying styles of text. Using such information, the corpus texts are clustered in an unsupervised manner and a mixture of topic-specific language models is automatically created. The principal contribution of this work is to characterise the document space resulting from information retrieval techniques and to demonstrate the approach for mixture language modelling. A comparison is made between manual and automatic clustering in order to elucidate how the global content information is expressed in the space. We also compare (in terms of association with manual clustering and language modelling accuracy) alternative term-weighting schemes and the effect of singular value decomposition dimension reduction (latent semantic analysis). Test set perplexity results using the British National Corpus indicate that the approach can improve the potential of statistical language modelling. Using an adaptive procedure, the conventional model may be tuned to track text data with a slight increase in computational cost

    On the probabilistic logical modelling of quantum and geometrically-inspired IR

    Get PDF
    Information Retrieval approaches can mostly be classed into probabilistic, geometric or logic-based. Recently, a new unifying framework for IR has emerged that integrates a probabilistic description within a geometric framework, namely vectors in Hilbert spaces. The geometric model leads naturally to a predicate logic over linear subspaces, also known as quantum logic. In this paper we show the relation between this model and classic concepts such as the Generalised Vector Space Model, highlighting similarities and differences. We also show how some fundamental components of quantum-based IR can be modelled in a descriptive way using a well-established tool, i.e. Probabilistic Datalog

    Deep Learning based Recommender System: A Survey and New Perspectives

    Full text link
    With the ever-growing volume of online information, recommender systems have been an effective strategy to overcome such information overload. The utility of recommender systems cannot be overstated, given its widespread adoption in many web applications, along with its potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. Evidently, the field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems. More concretely, we provide and devise a taxonomy of deep learning based recommendation models, along with providing a comprehensive summary of the state-of-the-art. Finally, we expand on current trends and provide new perspectives pertaining to this new exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys. https://doi.acm.org/10.1145/328502

    IRGAN: A Minimax Game for Unifying Generative and Discriminative Information Retrieval Models

    Get PDF
    This paper provides a unified account of two schools of thinking in information retrieval modelling: the generative retrieval focusing on predicting relevant documents given a query, and the discriminative retrieval focusing on predicting relevancy given a query-document pair. We propose a game theoretical minimax game to iteratively optimise both models. On one hand, the discriminative model, aiming to mine signals from labelled and unlabelled data, provides guidance to train the generative model towards fitting the underlying relevance distribution over documents given the query. On the other hand, the generative model, acting as an attacker to the current discriminative model, generates difficult examples for the discriminative model in an adversarial way by minimising its discrimination objective. With the competition between these two models, we show that the unified framework takes advantage of both schools of thinking: (i) the generative model learns to fit the relevance distribution over documents via the signals from the discriminative model, and (ii) the discriminative model is able to exploit the unlabelled data selected by the generative model to achieve a better estimation for document ranking. Our experimental results have demonstrated significant performance gains as much as 23.96% on Precision@5 and 15.50% on MAP over strong baselines in a variety of applications including web search, item recommendation, and question answering.Comment: 12 pages; appendix adde

    Assessing the impact of user interaction with thesaural knowledge structures: a quantitative analysis framework

    Get PDF
    Thesauri have been important information and knowledge organisation tools for more than three decades. The recent emergence and phenomenal growth of the World Wide Web has created new opportunities to introduce thesauri as information search and retrieval aids to end user communities. While the number of web-based and hypertextual thesauri continues to grow, few investigations have yet been carried out to evaluate how end-users, for whom all these efforts are ostensibly made, interact with and make use of thesauri for query building and expansion. The present paper reports a pilot study carried out to determine the extent to which a thesaurus-enhanced search interface to a web-based database aided end-users in their selection of search terms. The study also investigated the ways in which users interacted with the thesaurus structure, terms, and interface. Thesaurus-based searching and browsing behaviours adopted by users while interacting with the thesaurus-enhanced search interface were also examined

    Footprints of information foragers: Behaviour semantics of visual exploration

    Get PDF
    Social navigation exploits the knowledge and experience of peer users of information resources. A wide variety of visual–spatial approaches become increasingly popular as a means to optimize information access as well as to foster and sustain a virtual community among geographically distributed users. An information landscape is among the most appealing design options of representing and communicating the essence of distributed information resources to users. A fundamental and challenging issue is how an information landscape can be designed such that it will not only preserve the essence of the underlying information structure, but also accommodate the diversity of individual users. The majority of research in social navigation has been focusing on how to extract useful information from what is in common between users' profiles, their interests and preferences. In this article, we explore the role of modelling sequential behaviour patterns of users in augmenting social navigation in thematic landscapes. In particular, we compare and analyse the trails of individual users in thematic spaces along with their cognitive ability measures. We are interested in whether such trails can provide useful guidance for social navigation if they are embedded in a visual–spatial environment. Furthermore, we are interested in whether such information can help users to learn from each other, for example, from the ones who have been successful in retrieving documents. In this article, we first describe how users' trails in sessions of an experimental study of visual information retrieval can be characterized by Hidden Markov Models. Trails of users with the most successful retrieval performance are used to estimate parameters of such models. Optimal virtual trails generated from the models are visualized and animated as if they were actual trails of individual users in order to highlight behavioural patterns that may foster social navigation. The findings of the research will provide direct input to the design of social navigation systems as well as to enrich theories of social navigation in a wider context. These findings will lead to the further development and consolidation of a tightly coupled paradigm of spatial, semantic and social navigation
    • …
    corecore