Search CORE

105 research outputs found

Examining repetition in user search behavior

Author: Dumais S.
Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This paper describes analyses of the repeated use of search engines. It is shown that users commonly re-issue queries, either to examine search results deeply or simply to query again, often days or weeks later. Hourly and weekly periodicities in behavior are observed for both queries and clicks. Navigational queries were found to be repeated differently from others

White Rose Research Online

Context Modeling for Ranking and Tagging Bursty Features in Text Streams

Author: HE Jing
JIANG Jing
LI Xiaoming
Shan Dongdong
YAN Hongfei
ZHAO Xin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Bursty features in text streams are very useful in many text mining applications. Most existing studies detect bursty features based purely on term frequency changes without taking into account the semantic contexts of terms, and as a result the detected bursty features may not always be interesting or easy to interpret. In this paper we propose to model the contexts of bursty features using a language modeling approach. We then propose a novel topic diversity-based metric using the context models to find newsworthy bursty features. We also propose to use the context models to automatically assign meaningful tags to bursty features. Using a large corpus of a stream of news articles, we quantitatively show that the proposed context language models for bursty features can effectively help rank bursty features based on their newsworthiness and to assign meaningful tags to annotate bursty features. ? 2010 ACM.EI

Institutional Knowledge at Singapore Management University

Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture

Author: Dalton Jeff
Li Zhenghua
Lin Jimmy
Mishne Gilad
Sharma Aneesh
Publication venue
Publication date: 27/10/2012
Field of study

We present the architecture behind Twitter's real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time "twist": after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of "big data". We tell the story of how our system was built twice: our first implementation was built on a typical Hadoop-based analytics stack, but was later replaced because it did not meet the latency requirements necessary to generate meaningful real-time results. The second implementation, which is the system deployed in production, is a custom in-memory processing engine specifically designed for the task. This experience taught us that the current typical usage of Hadoop as a "big data" platform, while great for experimentation, is not well suited to low-latency processing, and points the way to future work on data analytics platforms that can handle "big" as well as "fast" data

arXiv.org e-Print Archive

CiteSeerX

Exploiting Query’s Temporal Patterns for Query Autocompletion

Author: Danyang Jiang
Fei Cai
Honghui Chen
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Query autocompletion (QAC) is a common interactive feature of web search engines. It aims at assisting users to formulate queries and avoiding spelling mistakes by presenting them with a list of query completions as soon as they start typing in the search box. Existing QAC models mostly rank the query completions by their past popularity collected in the query logs. For some queries, their popularity exhibits relatively stable or periodic behavior while others may experience a sudden rise in their query popularity. Current time-sensitive QAC models focus on either periodicity or recency and are unable to respond swiftly to such sudden rise, resulting in a less optimal QAC performance. In this paper, we propose a hybrid QAC model that considers two temporal patterns of query’s popularity, that is, periodicity and burst trend. In detail, we first employ the Discrete Fourier Transform (DFT) to identify the periodicity of a query’s popularity, by which we forecast its future popularity. Then the burst trend of query’s popularity is detected and incorporated into the hybrid model with its cyclic behavior. Extensive experiments on a large, real-world query log dataset infer that modeling the temporal patterns of query popularity in the form of its periodicity and its burst trend can significantly improve the effectiveness of ranking query completions

Crossref

Directory of Open Access Journals

Demographic information flows

Author: Alejandro Jaimes
Ingmar Weber
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2010
Field of study

ABSTRACT In advertising and content relevancy prediction it is important to understand whether, over time, information that reaches one demographic group spreads to others. In this paper we analyze the query log of a large U.S. web search engine to determine whether the same queries are performed by different demographic groups at different times, particularly when there are query bursts. We obtain aggregate demographic features from user-provided registration information (gender, birth year, ZIP code), U.S. census data, and election results. Given certain queries, we examine trends (from high to low and vice versa) and changes in the statistical spread of the demographic features of users that issue the queries over time periods that include query bursts. Our analysis shows that for certain types of queries (movies and news) distinct demographic groups perform searches at different times, suggesting that information related to such queries flows between them. Queries of movie titles, for instance, tend to be issued first by young and then by older users, where a sudden jump in age occurs upon the movie's release. To the best of our knowledge, this is the first time this problem has been studied using search query logs

CiteSeerX

Analyzing feature trajectories for event detection

Author: CHANG Kuiyu
HE Qi
LIM Ee Peng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2007
Field of study

Crossref

Institutional Knowledge at Singapore Management University