6,640 research outputs found
Why We Read Wikipedia
Wikipedia is one of the most popular sites on the Web, with millions of users
relying on it to satisfy a broad range of information needs every day. Although
it is crucial to understand what exactly these needs are in order to be able to
meet them, little is currently known about why users visit Wikipedia. The goal
of this paper is to fill this gap by combining a survey of Wikipedia readers
with a log-based analysis of user activity. Based on an initial series of user
surveys, we build a taxonomy of Wikipedia use cases along several dimensions,
capturing users' motivations to visit Wikipedia, the depth of knowledge they
are seeking, and their knowledge of the topic of interest prior to visiting
Wikipedia. Then, we quantify the prevalence of these use cases via a
large-scale user survey conducted on live Wikipedia with almost 30,000
responses. Our analyses highlight the variety of factors driving users to
Wikipedia, such as current events, media coverage of a topic, personal
curiosity, work or school assignments, or boredom. Finally, we match survey
responses to the respondents' digital traces in Wikipedia's server logs,
enabling the discovery of behavioral patterns associated with specific use
cases. For instance, we observe long and fast-paced page sequences across
topics for users who are bored or exploring randomly, whereas those using
Wikipedia for work or school spend more time on individual articles focused on
topics such as science. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and can have implications for developers
aiming to improve Wikipedia's user experience, editors striving to cater to
their readers' needs, third-party services (such as search engines) providing
access to Wikipedia content, and researchers aiming to build tools such as
recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table
Extracting Hierarchies of Search Tasks & Subtasks via a Bayesian Nonparametric Approach
A significant amount of search queries originate from some real world
information need or tasks. In order to improve the search experience of the end
users, it is important to have accurate representations of tasks. As a result,
significant amount of research has been devoted to extracting proper
representations of tasks in order to enable search systems to help users
complete their tasks, as well as providing the end user with better query
suggestions, for better recommendations, for satisfaction prediction, and for
improved personalization in terms of tasks. Most existing task extraction
methodologies focus on representing tasks as flat structures. However, tasks
often tend to have multiple subtasks associated with them and a more
naturalistic representation of tasks would be in terms of a hierarchy, where
each task can be composed of multiple (sub)tasks. To this end, we propose an
efficient Bayesian nonparametric model for extracting hierarchies of such tasks
\& subtasks. We evaluate our method based on real world query log data both
through quantitative and crowdsourced experiments and highlight the importance
of considering task/subtask hierarchies.Comment: 10 pages. Accepted at SIGIR 2017 as a full pape
- …