1 research outputs found
Bandit algorithms for real-time data capture on large social medias
We study the problem of real time data capture on social media. Due to the
different limitations imposed by those media, but also to the very large amount
of information, it is impossible to collect all the data produced by social
networks such as Twitter. Therefore, to be able to gather enough relevant
information related to a predefined need, it is necessary to focus on a subset
of the information sources. In this work, we focus on user-centered data
capture and consider each account of a social network as a source that can be
listened to at each iteration of a data capture process, in order to collect
the corresponding produced contents. This process, whose aim is to maximize the
quality of the information gathered, is constrained by the number of users that
can be monitored simultaneously. The problem of selecting a subset of accounts
to listen to over time is a sequential decision problem under constraints,
which we formalize as a bandit problem with multiple selections. Therefore, we
propose several bandit models to identify the most relevant users in real time.
First, we study of the case of the stochastic bandit, in which each user
corresponds to a stationary distribution. Then, we introduce two contextual
bandit models, one stationary and the other non stationary, in which the
utility of each user can be estimated by assuming some underlying structure in
the reward space. The first approach introduces the notion of profile, which
corresponds to the average behavior of a user. The second approach takes into
account the activity of a user in order to predict his future behavior.
Finally, we are interested in models that are able to tackle complex temporal
dependencies between users, with the use of a latent space within which the
information transits from one iteration to the other. Each of the proposed
approaches is validated on both artificial and real datasets.Comment: in Frenc