110 research outputs found
Scalable Methods for Adaptively Seeding a Social Network
In recent years, social networking platforms have developed into
extraordinary channels for spreading and consuming information. Along with the
rise of such infrastructure, there is continuous progress on techniques for
spreading information effectively through influential users. In many
applications, one is restricted to select influencers from a set of users who
engaged with the topic being promoted, and due to the structure of social
networks, these users often rank low in terms of their influence potential. An
alternative approach one can consider is an adaptive method which selects users
in a manner which targets their influential neighbors. The advantage of such an
approach is that it leverages the friendship paradox in social networks: while
users are often not influential, they often know someone who is.
Despite the various complexities in such optimization problems, we show that
scalable adaptive seeding is achievable. In particular, we develop algorithms
for linear influence models with provable approximation guarantees that can be
gracefully parallelized. To show the effectiveness of our methods we collected
data from various verticals social network users follow. For each vertical, we
collected data on the users who responded to a certain post as well as their
neighbors, and applied our methods on this data. Our experiments show that
adaptive seeding is scalable, and importantly, that it obtains dramatic
improvements over standard approaches of information dissemination.Comment: Full version of the paper appearing in WWW 201
Answer Set Programming for Non-Stationary Markov Decision Processes
Non-stationary domains, where unforeseen changes happen, present a challenge
for agents to find an optimal policy for a sequential decision making problem.
This work investigates a solution to this problem that combines Markov Decision
Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming
(ASP) in a method we call ASP(RL). In this method, Answer Set Programming is
used to find the possible trajectories of an MDP, from where Reinforcement
Learning is applied to learn the optimal policy of the problem. Results show
that ASP(RL) is capable of efficiently finding the optimal solution of an MDP
representing non-stationary domains
Concurrent bandits and cognitive radio networks
We consider the problem of multiple users targeting the arms of a single
multi-armed stochastic bandit. The motivation for this problem comes from
cognitive radio networks, where selfish users need to coexist without any side
communication between them, implicit cooperation or common control. Even the
number of users may be unknown and can vary as users join or leave the network.
We propose an algorithm that combines an -greedy learning rule with a
collision avoidance mechanism. We analyze its regret with respect to the
system-wide optimum and show that sub-linear regret can be obtained in this
setting. Experiments show dramatic improvement compared to other algorithms for
this setting
Rollout Sampling Approximate Policy Iteration
Several researchers have recently investigated the connection between
reinforcement learning and classification. We are motivated by proposals of
approximate policy iteration schemes without value functions which focus on
policy representation using classifiers and address policy learning as a
supervised learning problem. This paper proposes variants of an improved policy
iteration scheme which addresses the core sampling problem in evaluating a
policy through simulation as a multi-armed bandit machine. The resulting
algorithm offers comparable performance to the previous algorithm achieved,
however, with significantly less computational effort. An order of magnitude
improvement is demonstrated experimentally in two standard reinforcement
learning domains: inverted pendulum and mountain-car.Comment: 18 pages, 2 figures, to appear in Machine Learning 72(3). Presented
at EWRL08, to be presented at ECML 200
Bayesian reinforcement learning with exploration
We consider a general reinforcement learning problem and
show that carefully combining the Bayesian optimal policy and an exploring
policy leads to minimax sample-complexity bounds in a very general
class of (history-based) environments. We also prove lower bounds
and show that the new algorithm displays adaptive behaviour when the
environment is easier than worst-case
Sequential decision making with vector outcomes
We study a multi-round optimization setting in which in each round a player may select one of several actions, and each action produces an outcome vector, not observable to the player until the round ends. The final payoff for the player is computed by applying some known function f to the sum of all outcome vectors (e.g., the minimum of all coordinates of the sum). We show that standard notions of performance measure (such as comparison to the best single action) used in related expert and bandit settings (in which the payoff in each round is scalar) are not useful in our vector setting. Instead, we propose a different performance measure, and design algorithms that have vanishing regret with respect to our new measure
Bayesian Best-Arm Identification for Selecting Influenza Mitigation Strategies
Pandemic influenza has the epidemic potential to kill millions of people.
While various preventive measures exist (i.a., vaccination and school
closures), deciding on strategies that lead to their most effective and
efficient use remains challenging. To this end, individual-based
epidemiological models are essential to assist decision makers in determining
the best strategy to curb epidemic spread. However, individual-based models are
computationally intensive and it is therefore pivotal to identify the optimal
strategy using a minimal amount of model evaluations. Additionally, as
epidemiological modeling experiments need to be planned, a computational budget
needs to be specified a priori. Consequently, we present a new sampling
technique to optimize the evaluation of preventive strategies using fixed
budget best-arm identification algorithms. We use epidemiological modeling
theory to derive knowledge about the reward distribution which we exploit using
Bayesian best-arm identification algorithms (i.e., Top-two Thompson sampling
and BayesGap). We evaluate these algorithms in a realistic experimental setting
and demonstrate that it is possible to identify the optimal strategy using only
a limited number of model evaluations, i.e., 2-to-3 times faster compared to
the uniform sampling method, the predominant technique used for epidemiological
decision making in the literature. Finally, we contribute and evaluate a
statistic for Top-two Thompson sampling to inform the decision makers about the
confidence of an arm recommendation
- …