9,294 research outputs found
Adapting to the Shifting Intent of Search Queries
Search engines today present results that are often oblivious to abrupt
shifts in intent. For example, the query `independence day' usually refers to a
US holiday, but the intent of this query abruptly changed during the release of
a major film by that name. While no studies exactly quantify the magnitude of
intent-shifting traffic, studies suggest that news events, seasonal topics, pop
culture, etc account for 50% of all search queries. This paper shows that the
signals a search engine receives can be used to both determine that a shift in
intent has happened, as well as find a result that is now more relevant. We
present a meta-algorithm that marries a classifier with a bandit algorithm to
achieve regret that depends logarithmically on the number of query impressions,
under certain assumptions. We provide strong evidence that this regret is close
to the best achievable. Finally, via a series of experiments, we demonstrate
that our algorithm outperforms prior approaches, particularly as the amount of
intent-shifting traffic increases.Comment: This is the full version of the paper in NIPS'0
Incorporating Clicks, Attention and Satisfaction into a Search Engine Result Page Evaluation Model
Modern search engine result pages often provide immediate value to users and
organize information in such a way that it is easy to navigate. The core
ranking function contributes to this and so do result snippets, smart
organization of result blocks and extensive use of one-box answers or side
panels. While they are useful to the user and help search engines to stand out,
such features present two big challenges for evaluation. First, the presence of
such elements on a search engine result page (SERP) may lead to the absence of
clicks, which is, however, not related to dissatisfaction, so-called "good
abandonments." Second, the non-linear layout and visual difference of SERP
items may lead to non-trivial patterns of user attention, which is not captured
by existing evaluation metrics.
In this paper we propose a model of user behavior on a SERP that jointly
captures click behavior, user attention and satisfaction, the CAS model, and
demonstrate that it gives more accurate predictions of user actions and
self-reported satisfaction than existing models based on clicks alone. We use
the CAS model to build a novel evaluation metric that can be applied to
non-linear SERP layouts and that can account for the utility that users obtain
directly on a SERP. We demonstrate that this metric shows better agreement with
user-reported satisfaction than conventional evaluation metrics.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on
Information and Knowledge Management. 201
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
Towards an automated query modification assistant
Users who need several queries before finding what they need can benefit from
an automatic search assistant that provides feedback on their query
modification strategies. We present a method to learn from a search log which
types of query modifications have and have not been effective in the past. The
method analyses query modifications along two dimensions: a traditional
term-based dimension and a semantic dimension, for which queries are enriches
with linked data entities. Applying the method to the search logs of two search
engines, we identify six opportunities for a query modification assistant to
improve search: modification strategies that are commonly used, but that often
do not lead to satisfactory results.Comment: 1st International Workshop on Usage Analysis and the Web of Data
(USEWOD2011) in the 20th International World Wide Web Conference (WWW2011),
Hyderabad, India, March 28th, 201
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
A Utility-Theoretic Approach to Privacy in Online Services
Online offerings such as web search, news portals, and e-commerce applications face the challenge of providing high-quality service to a large, heterogeneous user base. Recent efforts have highlighted the potential to improve performance by introducing methods to personalize services based on special knowledge about users and their context. For example, a user's demographics, location, and past search and browsing may be useful in enhancing the results offered in response to web search queries. However, reasonable concerns about privacy by both users, providers, and government agencies acting on behalf of citizens, may limit access by services to such information. We introduce and explore an economics of privacy in personalization, where people can opt to share personal information, in a standing or on-demand manner, in return for expected enhancements in the quality of an online service. We focus on the example of web search and formulate realistic objective functions for search efficacy and privacy. We demonstrate how we can find a provably near-optimal optimization of the utility-privacy tradeoff in an efficient manner. We evaluate our methodology on data drawn from a log of the search activity of volunteer participants. We separately assess usersā preferences about privacy and utility via a large-scale survey, aimed at eliciting preferences about peoplesā willingness to trade the sharing of personal data in returns for gains in search efficiency. We show that a significant level of personalization can be achieved using a relatively small amount of information about users
Learning Dynamic Classes of Events using Stacked Multilayer Perceptron Networks
People often use a web search engine to find information about events of
interest, for example, sport competitions, political elections, festivals and
entertainment news. In this paper, we study a problem of detecting
event-related queries, which is the first step before selecting a suitable
time-aware retrieval model. In general, event-related information needs can be
observed in query streams through various temporal patterns of user search
behavior, e.g., spiky peaks for popular events, and periodicities for
repetitive events. However, it is also common that users search for non-popular
events, which may not exhibit temporal variations in query streams, e.g., past
events recently occurred, historical events triggered by anniversaries or
similar events, and future events anticipated to happen. To address the
challenge of detecting dynamic classes of events, we propose a novel deep
learning model to classify a given query into a predetermined set of multiple
event types. Our proposed model, a Stacked Multilayer Perceptron (S-MLP)
network, consists of multilayer perceptron used as a basic learning unit. We
assemble stacked units to further learn complex relationships between neutrons
in successive layers. To evaluate our proposed model, we conduct experiments
using real-world queries and a set of manually created ground truth.
Preliminary results have shown that our proposed deep learning model
outperforms the state-of-the-art classification models significantly.Comment: Neu-IR '16 SIGIR Workshop on Neural Information Retrieval, 6 pages, 4
figure
- ā¦