4 research outputs found
Automatic Ensemble Learning for Online Influence Maximization
We consider the problem of selecting a seed set to maximize the expected
number of influenced nodes in the social network, referred to as the
\textit{influence maximization} (IM) problem. We assume that the topology of
the social network is prescribed while the influence probabilities among edges
are unknown. In order to learn the influence probabilities and simultaneously
maximize the influence spread, we consider the tradeoff between exploiting the
current estimation of the influence probabilities to ensure certain influence
spread and exploring more nodes to learn better about the influence
probabilities. The exploitation-exploration trade-off is the core issue in the
multi-armed bandit (MAB) problem. If we regard the influence spread as the
reward, then the IM problem could be reduced to the combinatorial multi-armed
bandits. At each round, the learner selects a limited number of seed nodes in
the social network, then the influence spreads over the network according to
the real influence probabilities. The learner could observe the activation
status of the edge if and only if its start node is influenced, which is
referred to as the edge-level semi-bandit feedback. Two classical bandit
algorithms including Thompson Sampling and Epsilon Greedy are used to solve
this combinatorial problem. To ensure the robustness of these two algorithms,
we use an automatic ensemble learning strategy, which combines the exploration
strategy with exploitation strategy. The ensemble algorithm is self-adaptive
regarding that the probability of each algorithm could be adjusted based on the
historical performance of the algorithm. Experimental evaluation illustrates
the effectiveness of the automatically adjusted hybridization of exploration
algorithm with exploitation algorithm
Contextual Combinatorial Conservative Bandits
The problem of multi-armed bandits (MAB) asks to make sequential decisions
while balancing between exploitation and exploration, and have been
successfully applied to a wide range of practical scenarios. Various algorithms
have been designed to achieve a high reward in a long term. However, its
short-term performance might be rather low, which is injurious in risk
sensitive applications. Building on previous work of conservative bandits, we
bring up a framework of contextual combinatorial conservative bandits. An
algorithm is presented and a regret bound of is
proven, where is the dimension of the feature vectors, and is the total
number of time steps. We further provide an algorithm as well as regret
analysis for the case when the conservative reward is unknown. Experiments are
conducted, and the results validate the effectiveness of our algorithm
Multi-Round Influence Maximization
In this paper, we study the Multi-Round Influence Maximization (MRIM)
problem, where influence propagates in multiple rounds independently from
possibly different seed sets, and the goal is to select seeds for each round to
maximize the expected number of nodes that are activated in at least one round.
MRIM problem models the viral marketing scenarios in which advertisers conduct
multiple rounds of viral marketing to promote one product. We consider two
different settings: 1) the non-adaptive MRIM, where the advertiser needs to
determine the seed sets for all rounds at the very beginning, and 2) the
adaptive MRIM, where the advertiser can select seed sets adaptively based on
the propagation results in the previous rounds. For the non-adaptive setting,
we design two algorithms that exhibit an interesting tradeoff between
efficiency and effectiveness: a cross-round greedy algorithm that selects seeds
at a global level and achieves approximation ratio, and a
within-round greedy algorithm that selects seeds round by round and achieves
approximation ratio but
saves running time by a factor related to the number of rounds. For the
adaptive setting, we design an adaptive algorithm that guarantees
approximation to the adaptive optimal solution. In
all cases, we further design scalable algorithms based on the reverse influence
sampling approach and achieve near-linear running time. We conduct experiments
on several real-world networks and demonstrate that our algorithms are
effective for the MRIM task.Comment: Conference version accepted by KDD-1
Online Learning with Cumulative Oversampling: Application to Budgeted Influence Maximization
We propose a cumulative oversampling (CO) method for online learning. Our key
idea is to sample parameter estimations from the updated belief space once in
each round (similar to Thompson Sampling), and utilize the cumulative samples
up to the current round to construct optimistic parameter estimations that
asymptotically concentrate around the true parameters as tighter upper
confidence bounds compared to the ones constructed with standard UCB methods.
We apply CO to a novel budgeted variant of the Influence Maximization (IM)
semi-bandits with linear generalization of edge weights, whose offline problem
is NP-hard. Combining CO with the oracle we design for the offline problem, our
online learning algorithm simultaneously tackles budget allocation, parameter
learning, and reward maximization. We show that for IM semi-bandits, our
CO-based algorithm achieves a scaled regret comparable to that of the UCB-based
algorithms in theory, and performs on par with Thompson Sampling in numerical
experiments