92 research outputs found
Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering
We consider an online model for recommendation systems, with each user being
recommended an item at each time-step and providing 'like' or 'dislike'
feedback. Each user may be recommended a given item at most once. A latent
variable model specifies the user preferences: both users and items are
clustered into types. All users of a given type have identical preferences for
the items, and similarly, items of a given type are either all liked or all
disliked by a given user. We assume that the matrix encoding the preferences of
each user type for each item type is randomly generated; in this way, the model
captures structure in both the item and user spaces, the amount of structure
depending on the number of each of the types. The measure of performance of the
recommendation system is the expected number of disliked recommendations per
user, defined as expected regret. We propose two algorithms inspired by
user-user and item-item collaborative filtering (CF), modified to explicitly
make exploratory recommendations, and prove performance guarantees in terms of
their expected regret. For two regimes of model parameters, with structure only
in item space or only in user space, we prove information-theoretic lower
bounds on regret that match our upper bounds up to logarithmic factors. Our
analysis elucidates system operating regimes in which existing CF algorithms
are nearly optimal.Comment: 51 page
Exploration vs. Exploitation in the Information Filtering Problem
We consider information filtering, in which we face a stream of items too
voluminous to process by hand (e.g., scientific articles, blog posts, emails),
and must rely on a computer system to automatically filter out irrelevant
items. Such systems face the exploration vs. exploitation tradeoff, in which it
may be beneficial to present an item despite a low probability of relevance,
just to learn about future items with similar content. We present a Bayesian
sequential decision-making model of this problem, show how it may be solved to
optimality using a decomposition to a collection of two-armed bandit problems,
and show structural results for the optimal policy. We show that the resulting
method is especially useful when facing the cold start problem, i.e., when
filtering items for new users without a long history of past interactions. We
then present an application of this information filtering method to a
historical dataset from the arXiv.org repository of scientific articles.Comment: 36 pages, 5 figure
Optimal Algorithms for Latent Bandits with Cluster Structure
We consider the problem of latent bandits with cluster structure where there
are multiple users, each with an associated multi-armed bandit problem. These
users are grouped into \emph{latent} clusters such that the mean reward vectors
of users within the same cluster are identical. At each round, a user, selected
uniformly at random, pulls an arm and observes a corresponding noisy reward.
The goal of the users is to maximize their cumulative rewards. This problem is
central to practical recommendation systems and has received wide attention of
late \cite{gentile2014online, maillard2014latent}. Now, if each user acts
independently, then they would have to explore each arm independently and a
regret of is unavoidable, where are the number of arms and users, respectively. Instead, we propose
LATTICE (Latent bAndiTs via maTrIx ComplEtion) which allows exploitation of the
latent cluster structure to provide the minimax optimal regret of
, when the number of
clusters is . This is the first algorithm to guarantee such
strong regret bound. LATTICE is based on a careful exploitation of arm
information within a cluster while simultaneously clustering users.
Furthermore, it is computationally efficient and requires only
calls to an offline matrix completion oracle across all
rounds.Comment: 48 pages. Accepted to AISTATS 2023. Added Experiment
The art of clustering bandits.
Multi-armed bandit problems are receiving a great deal of attention because they adequately formalize the exploration-exploitation trade-offs arising in several industrially relevant applications, such as online advertisement and, more generally, recommendation systems. In many cases, however, these applications have a strong social component, whose integration in the bandit algorithms could lead to a dramatic performance increase. For instance, we may want to serve content to a group of users by taking advantage of an underlying network of social relationships among them. The purpose of this thesis is to introduce novel and principled algorithmic approaches to the solution of such networked bandit problems. Starting from a global (Laplacian-based) strategy which allocates a bandit algorithm to each network node (user), and allows it to "share" signals (contexts and payoffs) with the neghboring nodes, our goal is to derive and experimentally test more scalable approaches based on different ways of clustering the graph nodes. More importantly, we shall investigate the case when the graph structure is not given ahead of time, and has to be inferred based on past user behavior. A general difficulty arising in such practical scenarios is that data sequences are typically nonstationary, implying that traditional statistical inference methods should be used cautiously, possibly replacing them with by more robust nonstochastic (e.g., game-theoretic) inference methods.
In this thesis, we will firstly introduce the centralized clustering bandits. Then, we propose the corresponding solution in decentralized scenario. After that, we explain the generic collaborative clustering bandits. Finally, we extend and showcase the state-of-the-art clustering bandits that we developed in the quantification problem
Bandits on graphs and structures
We investigate the structural properties of certain sequential decision-making problems with limited feedback (bandits) in order to bring the known algorithmic solutions closer to a practical use. In the first part, we put a special emphasis on structures that can be represented as graphs on actions, in the second part we study the large action spaces that can be of exponential size in the number of base actions or even infinite. We show how to take advantage of structures over the actions and (provably) learn faster
Recommended from our members
Online learning and decision-making from implicit feedback
This thesis focuses on designing learning and control algorithms for emerging resource allocation platforms like recommender systems, 5G wireless networks, and online marketplaces. These systems have an environment which is only partially known. Thus, the controllers need to make resource allocation decisions based on implicit feedback obtained from the environment based on past actions. The goal is to sequentially select actions using incremental feedback so as to optimize performance while simultaneously learning about the environment. We study three problems which exemplify this setting. The first is an inference problem which requires identification of sponsored content in recommender systems. Specifically, we ask if it is possible to detect the existence of sponsored content disguised as genuine recommendations using implicit feedback from a subset of users of the recommender system. The second problem is the design of scheduling algorithms for switch networks when the user-server link statistics are unknown (for e.g., in wireless networks, online marketplaces). The scheduling algorithm has to tradeoff between scheduling the optimal links and obtaining sufficient feedback about all the links for accurate estimates. We observe the close connection of this problem to the stochastic multi-armed bandit problem and analyze bandit-style explore-exploit algorithms for learning the statistical parameters while simultaneously assigning servers to users. The third is the joint problem of base station activation and rate allocation in an energy efficient wireless network when the channel statistics are unknown. The controller observes instantaneous channel rates of activated BSs, and thereby sequentially obtains implicit feedback about the channel. Here again, there is a tradeoff between learning the channel versus optimizing the operation cost based on estimated parameters. For each of these systems, we propose algorithms with provable asymptotic guarantees. These learning algorithms highlight the use of implicit feedback in online decision making and control.Electrical and Computer Engineerin
Learning with Exposure Constraints in Recommendation Systems
Recommendation systems are dynamic economic systems that balance the needs of
multiple stakeholders. A recent line of work studies incentives from the
content providers' point of view. Content providers, e.g., vloggers and
bloggers, contribute fresh content and rely on user engagement to create
revenue and finance their operations. In this work, we propose a contextual
multi-armed bandit setting to model the dependency of content providers on
exposure. In our model, the system receives a user context in every round and
has to select one of the arms. Every arm is a content provider who must receive
a minimum number of pulls every fixed time period (e.g., a month) to remain
viable in later rounds; otherwise, the arm departs and is no longer available.
The system aims to maximize the users' (content consumers) welfare. To that
end, it should learn which arms are vital and ensure they remain viable by
subsidizing arm pulls if needed. We develop algorithms with sub-linear regret,
as well as a lower bound that demonstrates that our algorithms are optimal up
to logarithmic factors.Comment: Published in The Web Conference 2023 (WWW 23
Recommended from our members
Sequential Optimization in Changing Environments: Theory and Application to Online Content Recommendation Services
Recent technological developments allow the online collection of valuable information that can be efficiently used to optimize decisions "on the fly" and at a low cost. These advances have greatly influenced the decision-making process in various areas of operations management, including pricing, inventory, and retail management. In this thesis we study methodological as well as practical aspects arising in online sequential optimization in the presence of such real-time information streams. On the methodological front, we study aspects of sequential optimization in the presence of temporal changes, such as designing decision making policies that adopt to temporal changes in the underlying environment (that drives performance) when only partial information about this changing environment is available, and quantifying the added complexity in sequential decision making problems when temporal changes are introduced. On the applied front, we study practical aspects associated with a class of online services that focus on creating customized recommendations (e.g., Amazon, Netflix). In particular, we focus on online content recommendations, a new class of online services that allows publishers to direct readers from articles they are currently reading to other web-based content they may be interested in, by means of links attached to said article.
In the first part of the thesis we consider a non-stationary variant of a sequential stochastic optimization problem, where the underlying cost functions may change along the horizon. We propose a measure, termed {\it variation budget}, that controls the extent of said change, and study how restrictions on this budget impact achievable performance. As a yardstick to quantify performance in non-stationary settings we propose a regret measure relative to a dynamic oracle benchmark. We identify sharp conditions under which it is possible to achieve long-run-average optimality and more refined performance measures such as rate optimality that fully characterize the complexity of such problems. In doing so, we also establish a strong connection between two rather disparate strands of literature: adversarial online convex optimization; and the more traditional stochastic approximation paradigm (couched in a non-stationary setting). This connection is the key to deriving well performing policies in the latter, by leveraging structure of optimal policies in the former. Finally, tight bounds on the minimax regret allow us to quantify the "price of non-stationarity," which mathematically captures the added complexity embedded in a temporally changing environment versus a stationary one.
In the second part of the thesis we consider another core stochastic optimization problem couched in a multi-armed bandit (MAB) setting. We develop a MAB formulation that allows for a broad range of temporal uncertainties in the rewards, characterize the (regret) complexity of this class of MAB problems by establishing a direct link between the extent of allowable reward "variation" and the minimal achievable worst-case regret, and provide an optimal policy that achieves that performance. Similarly to the first part of the thesis, our analysis draws concrete connections between two strands of literature: the adversarial and the stochastic MAB frameworks.
The third part of the thesis studies applied optimization aspects arising in online content recommendations, that allow web-based publishers to direct readers from articles they are currently reading to other web-based content. We study the content recommendation problem and its unique dynamic features from both theoretical as well as practical perspectives. Using a large data set of browsing history at major media sites, we develop a representation of content along two key dimensions: clickability, the likelihood to click to an article when it is recommended; and engageability, the likelihood to click from an article when it hosts a recommendation. Based on this representation, we propose a class of user path-focused heuristics, whose purpose is to simultaneously ensure a high instantaneous probability of clicking recommended articles, while also optimizing engagement along the future path. We rigorously quantify the performance of these heuristics and validate their impact through a live experiment. The third part of the thesis is based on a collaboration with a leading provider of content recommendations to online publishers
- …