7 research outputs found

    Incorporating seasonality into search suggestions derived from intranet query logs

    Get PDF
    While much research has been performed on query logs collected for major Web search engines, query log analysis to enhance search on smaller and more focused collections has attracted less attention. Our hypothesis is that an intranet search engine can be enhanced by adapting the search system to real users’ search behaviour through exploiting its query logs. In this work we describe how a constantly adapting domain model can be used to identify and capture changes in intranet users’ search requirements over time. We employ an algorithm that dynamically builds a domain model from query modifications taken from an intranet query log and employs a decay measure, as used in Machine Learning and Optimisation methods, to promote more recent terms. This model is used to suggest query refinements and additions to users and to elevate seasonally relevant terms. A user evaluation using models constructed from a substantial university intranet query log is provided. Statistical evidence demonstrates the system’s ability to suggest seasonally relevant terms over three different academic trimesters. We conclude that log files of an intranet search engine are a rich resource to build adaptive domain models, and in our experiments these models significantly outperform sensible baselines

    Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

    Full text link
    We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

    Query Log Mining to Enhance User Experience in Search Engines

    Get PDF
    The Web is the biggest repository of documents humans have ever built. Even more, it is increasingly growing in size every day. Users rely on Web search engines (WSEs) for finding information on the Web. By submitting a textual query expressing their information need, WSE users obtain a list of documents that are highly relevant to the query. Moreover, WSEs tend to store such huge amount of users activities in "query logs". Query log mining is the set of techniques aiming at extracting valuable knowledge from query logs. This knowledge represents one of the most used ways of enhancing the users’ search experience. According to this vision, in this thesis we firstly prove that the knowledge extracted from query logs suffer aging effects and we thus propose a solution to this phenomenon. Secondly, we propose new algorithms for query recommendation that overcome the aging problem. Moreover, we study new query recommendation techniques for efficiently producing recommendations for rare queries. Finally, we study the problem of diversifying Web search engine results. We define a methodology based on the knowledge derived from query logs for detecting when and how query results need to be diversified and we develop an efficient algorithm for diversifying search results

    查询推荐研究综述

    Get PDF
    查询推荐是一种提高用户搜索效率的重要技术,其核心任务是帮助用户构造有效查询并以此准确描述用户信息需求。作为当今搜索引擎的核心技术之一,查询推荐吸引了学术界和工业界的广泛关注,一直以来都是信息检索领域中重要的研究主题。本文以国内外会议、期刊发表的有关查询推荐研究的文献为对象,利用归纳总结方法,首先详细梳理了查询推荐中主流方法&mdash;&mdash;基于简单共现信息的方法、基于图模型的方法以及融合多种信息的方法,然后总结评述了评测方法与指标,最后分析了未来可能的研究方向。</p

    The Effects of Time on Query Flow Graph-based Models for Query Suggestion

    No full text
    A recent query-log mining approach for query recommendation is based on Query Flow Graphs, a markov-chain representation of the query reformulation process followed by users of Web Search Engines trying to satisfy their information needs. In this paper we aim at extending this model by providing methods for dealing with evolving data. In fact, users ’ interests change over time, and the knowledge extracted from query logs may suffer an aging effect as new interesting topics appear. Starting from this observation validated experimentally, we introduce a novel algorithm for updating an existing query flow graph. The proposed solution allows the recommendation model to be kept always updated without reconstructing it from scratch every time, by incrementally merging efficiently the past and present data. Categories and Subject Descriptor

    Contextual Factors Affecting Information Sharing Patterns in Technology Mediated Communication

    Get PDF
    In this thesis, we investigate how and what contextual factors affect user’s information sharing. We build our work on six individual research projects which cover a variety of systems (search engines, social network sites, teleconferencing systems, monitoring technology, and general purpose conversational agents) in a variety of communication scenarios with diverse relationships and dispositions of users. Alongside detailed findings for particular systems and communication scenarios from each individual project, we provide a consolidated analysis of these results across systems and scenarios, which allows us to identify patterns specific for different system types and aspects shared between systems. In particular, we show that depending on the system’s position between a user and an intended information receiving agent – whether communication happens through, around, or directly with the system – the system should have different patterns of operational adaptation to communication context. Specifically, when communication happens through the system, the system needs to gather communication context unavailable to the user and integrate it into information communication; when communication happens around the system, the system should adapt its operations to provide information in the most contextually suitable format; finally, when a user communicates with the system, the role of the system is to “match” this context in communication with the user. We then argue that despite the differences between system types in patterns of required context-based adaptation, there are contextual factors affecting user’s information sharing intent that should be acknowledged across systems. Grounded in our cumulative findings and analysis of related literature, we identify four such high-level contextual factors. We then present these four factors synthesized into an early design framework, which we call SART according to the included factors of space, addressee, reason, and time. Each factor in SART is presented as a continuum defined through a descriptive dichotomy: perceived breadth of communication space (public to private); perceived specificity of an information addressee (defined to undefined); intended reason for information sharing (instrumental to objective); and perceived time of information relevance and life-span (immediate to indefinite)
    corecore