518 research outputs found
Multi-Armed Bandits for Adaptive Constraint Propagation
International audienceAdaptive constraint propagation has recently received a great attention. It allows a constraint solver to exploit various levels of propagation during search, and in many cases it shows better performance than static/predefined. The crucial point is to make adaptive constraint propagation automatic , so that no expert knowledge or parameter specification is required. In this work, we propose a simple learning technique, based on multi-armed bandits, that allows to automatically select among several levels of propagation during search. Our technique enables the combination of any number of levels of propagation whereas existing techniques are only defined for pairs. An experimental evaluation demonstrates that the proposed technique results in a more efficient and stable solver
Influence Maximization with Bandits
We consider the problem of \emph{influence maximization}, the problem of
maximizing the number of people that become aware of a product by finding the
`best' set of `seed' users to expose the product to. Most prior work on this
topic assumes that we know the probability of each user influencing each other
user, or we have data that lets us estimate these influences. However, this
information is typically not initially available or is difficult to obtain. To
avoid this assumption, we adopt a combinatorial multi-armed bandit paradigm
that estimates the influence probabilities as we sequentially try different
seed sets. We establish bounds on the performance of this procedure under the
existing edge-level feedback as well as a novel and more realistic node-level
feedback. Beyond our theoretical results, we describe a practical
implementation and experimentally demonstrate its efficiency and effectiveness
on four real datasets.Comment: 12 page
Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits
In this paper, we investigate the problem of beam alignment in millimeter
wave (mmWave) systems, and design an optimal algorithm to reduce the overhead.
Specifically, due to directional communications, the transmitter and receiver
beams need to be aligned, which incurs high delay overhead since without a
priori knowledge of the transmitter/receiver location, the search space spans
the entire angular domain. This is further exacerbated under dynamic conditions
(e.g., moving vehicles) where the access to the base station (access point) is
highly dynamic with intermittent on-off periods, requiring more frequent beam
alignment and signal training. To mitigate this issue, we consider an online
stochastic optimization formulation where the goal is to maximize the
directivity gain (i.e., received energy) of the beam alignment policy within a
time period. We exploit the inherent correlation and unimodality properties of
the model, and demonstrate that contextual information improves the
performance. To this end, we propose an equivalent structured Multi-Armed
Bandit model to optimally exploit the exploration-exploitation tradeoff. In
contrast to the classical MAB models, the contextual information makes the
lower bound on regret (i.e., performance loss compared with an oracle policy)
independent of the number of beams. This is a crucial property since the number
of all combinations of beam patterns can be large in transceiver antenna
arrays, especially in massive MIMO systems. We further provide an
asymptotically optimal beam alignment algorithm, and investigate its
performance via simulations.Comment: To Appear in IEEE INFOCOM 2018. arXiv admin note: text overlap with
arXiv:1611.05724 by other author
- …