132 research outputs found
Differentiable Unbiased Online Learning to Rank
Online Learning to Rank (OLTR) methods optimize rankers based on user
interactions. State-of-the-art OLTR methods are built specifically for linear
models. Their approaches do not extend well to non-linear models such as neural
networks. We introduce an entirely novel approach to OLTR that constructs a
weighted differentiable pairwise loss after each interaction: Pairwise
Differentiable Gradient Descent (PDGD). PDGD breaks away from the traditional
approach that relies on interleaving or multileaving and extensive sampling of
models to estimate gradients. Instead, its gradient is based on inferring
preferences between document pairs from user clicks and can optimize any
differentiable model. We prove that the gradient of PDGD is unbiased w.r.t.
user document pair preferences. Our experiments on the largest publicly
available Learning to Rank (LTR) datasets show considerable and significant
improvements under all levels of interaction noise. PDGD outperforms existing
OLTR methods both in terms of learning speed as well as final convergence.
Furthermore, unlike previous OLTR methods, PDGD also allows for non-linear
models to be optimized effectively. Our results show that using a neural
network leads to even better performance at convergence than a linear model. In
summary, PDGD is an efficient and unbiased OLTR approach that provides a better
user experience than previously possible.Comment: Conference on Information and Knowledge Management 201
Solutions to Detect and Analyze Online Radicalization : A Survey
Online Radicalization (also called Cyber-Terrorism or Extremism or
Cyber-Racism or Cyber- Hate) is widespread and has become a major and growing
concern to the society, governments and law enforcement agencies around the
world. Research shows that various platforms on the Internet (low barrier to
publish content, allows anonymity, provides exposure to millions of users and a
potential of a very quick and widespread diffusion of message) such as YouTube
(a popular video sharing website), Twitter (an online micro-blogging service),
Facebook (a popular social networking website), online discussion forums and
blogosphere are being misused for malicious intent. Such platforms are being
used to form hate groups, racist communities, spread extremist agenda, incite
anger or violence, promote radicalization, recruit members and create virtual
organi- zations and communities. Automatic detection of online radicalization
is a technically challenging problem because of the vast amount of the data,
unstructured and noisy user-generated content, dynamically changing content and
adversary behavior. There are several solutions proposed in the literature
aiming to combat and counter cyber-hate and cyber-extremism. In this survey, we
review solutions to detect and analyze online radicalization. We review 40
papers published at 12 venues from June 2003 to November 2011. We present a
novel classification scheme to classify these papers. We analyze these
techniques, perform trend analysis, discuss limitations of existing techniques
and find out research gaps
Characterizing User Search Intent and Behavior for Click Analysis in Sponsored Search
Interpreting user actions to better understand their needs provides an important tool for improving information access services. In the context of organic Web search, considerable effort has been made to model user behavior and infer query intent, with the goal of improving the overall user experience. Much less work has been done in the area of sponsored search, i.e., with respect to the advertisement links (ads) displayed on search result pages by many commercial search engines. This thesis develops and evaluates new models and methods required to interpret user browsing and click behavior and understand query intent in this very different context.
The concern of the initial part of the thesis is on extending the query categories for commercial search and on inferring query intent, with a focus on two major tasks: i) enriching queries with contextual information obtained from search result pages returned for these queries, and ii) developing relatively simple methods for the reliable labeling of training data via crowdsourcing. A central idea of this thesis work is to study the impact of contextual factors (including query intent, ad placement, and page structure) on user behavior. Later, this information is incorporated into probabilistic models to evaluate the quality of advertisement links within the context that they are displayed in their history of appearance. In order to account for these factors, a number of query and location biases are proposed and formulated into a group of browsing and click models.
To explore user intent and behavior and to evaluate the performance of the proposed models and methods, logs of query and click information provided for research purposes are used. Overall, query intent is found to have substantial impact on predictions of user click behavior in sponsored search. Predictions are further improved by considering ads in the context of the other ads displayed on a result page. The parameters of the browsing and click models are learned using an expectation maximization technique applied to click signals recorded in the logs. The initial motivation of the user to browse the ad list and their browsing persistence are found to be related to query intent and browsing/click behavior. Accommodating these biases along with the location bias in user models appear as effective contextual signals, improving the performance of the existing models
Watching inside the Screen: Digital Activity Monitoring for Task Recognition and Proactive Information Retrieval
We investigate to what extent it is possible to infer a user’s work tasks by digital activity monitoring and use the task models for proactive information retrieval. Ten participants volunteered for the study, in which their computer screen was monitored and related logs were recorded for 14 days. Corresponding diary entries were collected to provide ground truth to the task detection method. We report two experiments using this data. The unsupervised task detection experiment was conducted to detect tasks using unsupervised topic modeling. The results show an average task detection accuracy of more than 70% by using rich screen monitoring data. The single-trial task detection and retrieval experiment utilized unseen user inputs in order to detect related work tasks and retrieve task-relevant information on-line. We report an average task detection accuracy of 95%, and the corresponding model-based document retrieval with Normalized Discounted Cumulative Gain of 98%. We discuss and provide insights regarding the types of digital tasks occurring in the data, the accuracy of task detection on different task types, and the role of using different data input such as application names, extracted keywords, and bag-of-words representations in the task detection process. We also discuss the implications of our results for ubiquitous user modeling and privacy.Peer reviewe
- …