37,044 research outputs found

    Temporal latent topic user profiles for search personalisation

    Get PDF
    The performance of search personalisation largely depends on how to build user profiles effectively. Many approaches have been developed to build user profiles using topics discussed in relevant documents, where the topics are usually obtained from human-generated online ontology such as Open Directory Project. The limitation of these approaches is that many documents may not contain the topics covered in the ontology. Moreover, the human-generated topics require expensive manual effort to determine the correct categories for each document. This paper addresses these problems by using Latent Dirichlet Allocation for unsupervised extraction of the topics from documents. With the learned topics, we observe that the search intent and user interests are dynamic, i.e., they change from time to time. In order to evaluate the effectiveness of temporal aspects in personalisation, we apply three typical time scales for building a long-term profile, a daily profile and a session profile. In the experiments, we utilise the profiles to re-rank search results returned by a commercial web search engine. Our experimental results demonstrate that our temporal profiles can significantly improve the ranking quality. The results further show a promising effect of temporal features in correlation with click entropy and query position in a search session

    Precursors and Laggards: An Analysis of Semantic Temporal Relationships on a Blog Network

    Full text link
    We explore the hypothesis that it is possible to obtain information about the dynamics of a blog network by analysing the temporal relationships between blogs at a semantic level, and that this type of analysis adds to the knowledge that can be extracted by studying the network only at the structural level of URL links. We present an algorithm to automatically detect fine-grained discussion topics, characterized by n-grams and time intervals. We then propose a probabilistic model to estimate the temporal relationships that blogs have with one another. We define the precursor score of blog A in relation to blog B as the probability that A enters a new topic before B, discounting the effect created by asymmetric posting rates. Network-level metrics of precursor and laggard behavior are derived from these dyadic precursor score estimations. This model is used to analyze a network of French political blogs. The scores are compared to traditional link degree metrics. We obtain insights into the dynamics of topic participation on this network, as well as the relationship between precursor/laggard and linking behaviors. We validate and analyze results with the help of an expert on the French blogosphere. Finally, we propose possible applications to the improvement of search engine ranking algorithms

    Precursors and Laggards: An Analysis of Semantic Temporal Relationships on a Blog Network

    Full text link
    We explore the hypothesis that it is possible to obtain information about the dynamics of a blog network by analysing the temporal relationships between blogs at a semantic level, and that this type of analysis adds to the knowledge that can be extracted by studying the network only at the structural level of URL links. We present an algorithm to automatically detect fine-grained discussion topics, characterized by n-grams and time intervals. We then propose a probabilistic model to estimate the temporal relationships that blogs have with one another. We define the precursor score of blog A in relation to blog B as the probability that A enters a new topic before B, discounting the effect created by asymmetric posting rates. Network-level metrics of precursor and laggard behavior are derived from these dyadic precursor score estimations. This model is used to analyze a network of French political blogs. The scores are compared to traditional link degree metrics. We obtain insights into the dynamics of topic participation on this network, as well as the relationship between precursor/laggard and linking behaviors. We validate and analyze results with the help of an expert on the French blogosphere. Finally, we propose possible applications to the improvement of search engine ranking algorithms

    Intelligent personalized approaches for semantic search and query expansion

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.In today’s highly advanced technological world, the Internet has taken over all aspects of human life. Many services are advertised and provided to the users through online channels. The user looks for services and obtains them through different search engines. To obtain the best results that meet the needs and requirements of the users, researchers have extensively studied methods such as different personalization methods by which to improve the performance and efficiency of the retrieval process. A key part of the personalization process is the generation of user models. The most commonly used user models are still rather simplistic, representing the user as a vector of ratings or using a set of keywords. Recently, semantic techniques have had a significant importance in the field of personalized querying and personalized web search engines. This thesis focuses on both processes of personalized web search engines, first the reformulation of queries and second ranking query results. The importance of personalized web search lies in its ability to identify users' interests based on their personal profiles. This work contributes to personalized web search services in three aspects. These contributions can be summarized as follows: First, it creates user profiles based on a user’s browsing behaviour, as well as the semantic knowledge of a domain ontology, aiming to improve the quality of the search results. However, it is not easy to acquire personalized web search results, hence one of the problems that is encountered in this approach is how to get a precise representation of the user interests, as well as how to use it to find search results. The second contribution builds on the first contribution. A personalized web search approach is introduced by integrating user context history into the information retrieval process. This integration process aims to provide search results that meet the user’s needs. It also aims to create contextual profiles for the user based on several basic factors: user temporal behaviour during browsing, semantic knowledge of a specific domain ontology, as well as an algorithm based on re-ranking the search results. The previous solutions were related to the re-ranking of the returned search results to match the user’s requirements. The third contribution includes a comparison of three-term weight methods in personalized query expansion. This model has been built to incorporate both latent semantics and weighting terms. Experiments conducted in the real world to evaluate the proposed personalized web search approach; show promising results in the quality of reformulation and re-ranking processes compared to Google engine techniques

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    Utilising semantic technologies for intelligent indexing and retrieval of digital images

    Get PDF
    The proliferation of digital media has led to a huge interest in classifying and indexing media objects for generic search and usage. In particular, we are witnessing colossal growth in digital image repositories that are difficult to navigate using free-text search mechanisms, which often return inaccurate matches as they in principle rely on statistical analysis of query keyword recurrence in the image annotation or surrounding text. In this paper we present a semantically-enabled image annotation and retrieval engine that is designed to satisfy the requirements of the commercial image collections market in terms of both accuracy and efficiency of the retrieval process. Our search engine relies on methodically structured ontologies for image annotation, thus allowing for more intelligent reasoning about the image content and subsequently obtaining a more accurate set of results and a richer set of alternatives matchmaking the original query. We also show how our well-analysed and designed domain ontology contributes to the implicit expansion of user queries as well as the exploitation of lexical databases for explicit semantic-based query expansion
    • …
    corecore